文章目录
一、Hive SQL
1.1、窗口函数
1.1.1、统计出每个用户的累积访问次数
- 一、Hive SQL
- 1.1、窗口函数
- 1.1.1、统计出每个用户的累积访问次数
- 想了解更多,请关注我的公众号【宝哥大数据】
有如下的用户访问数据(t1) 要求使用SQL统计出每个用户的累积访问次数,如下表所示:
解答 1、创建表
create table action (
userId string,
visitDate string,
visitCount int
) row format delimited fields terminated by "\t";
2、插入数据
insert into table action values('u01','2021/1/21','5');
insert into table action values('u02','2021/1/23','6');
insert into table action values('u03','2021/1/22','8');
insert into table action values('u04','2021/1/20','3');
insert into table action values('u01','2021/1/23','6');
insert into table action values('u01','2021/2/21','8');
insert into table action values('U02','2021/1/23','6');
insert into table action values('U01','2021/2/22','4');
3、转换时间格式
select from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm') from action;
4、统计每个用户每个月的访问次数
select
lower(userId) userId, -- 转换大小写
from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm') visitMonth,
sum(visitCount) visit_month_count
from action
group by lower(userId),from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm')
5、统计累计访问次数
select
t1.userId userId,
t1.visitMonth visitMonth,
t1.visit_month_count visit_month_count,
sum(t1.visit_month_count) over (partition by userId order by visitMonth rows between UNBOUNDED PRECEDING AND CURRENT ROW) sum_month
from
(
select
lower(userId) userId,
from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm') visitMonth,
sum(visitCount) visit_month_count
from action
group by lower(userId),from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm')
) t1
字段解释
unbounded:无界限
preceding:从分区第一行头开始,则为 unbounded。 N为:相对当前行向前的偏移量
following :与preceding相反,到该分区结束,则为 unbounded。N为:相对当前行向后的偏移量
current row:顾名思义,当前行,偏移量为0
想了解更多,请关注我的公众号【宝哥大数据】