您当前的位置: 首页 > 

梁云亮

暂无认证

  • 1浏览

    0关注

    1211博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

谷粒影音:业务处理

梁云亮 发布时间:2020-02-23 02:15:49 ,浏览量:1

统计视频观看数Top10
  • 思路:使用order by按照 views 字段做一个全局排序,设置只显示前10条即可 实现SQL:
select videoid,uploader,age,category,length,views,rate,ratings,comments
from guliyingyin_video_orc
order by views desc
limit 10;
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200222225830179.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2xpYW5naGVjYWk1MjE3MTMxNA==,size_16,color_FFFFFF,t_70)
  • 结果: 在这里插入图片描述
统计视频类别热度Top10

思路:炸开数组”视频类别“字段,然后按照类别分组,最后按照热度(视频个数)排序。

因为当前表结构为:一个视频对应一个或多个类别。所以如果要 group by 类别,需要先将类别进行列转行(展开),然后再进行count即可

  • 分析过程:
  1. 炸开类别,记为临时表t1

    select videoId,catName from guliyingyin_video_orc lateral view explode(category) tb_category as catName; — t1

  2. 按照类别 group by 聚合,然后count组内的videoId个数,记为临时表t2

    select catName,count(*) hot from t1 group by catName; —t2

  3. 最后按照热度排序,显示前10条。

    select catName, hot from t2 order by hot desc limit 10;

  • 完整的SQL:
    SELECT
    	catName,
    	hot 
    FROM
    	(
    	SELECT
    		catName,
    		count( * ) hot 
    	FROM
    		( SELECT videoId, catName FROM guliyingyin_video_orc lateral VIEW explode ( category ) tb_category AS catName ) t1 
    	GROUP BY
    		catName 
    	) t2 
    ORDER BY
    	hot DESC 
    	LIMIT 10;
    

将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令: 在这里插入图片描述 结果: 在这里插入图片描述

统计视频观看数Top20所属类别
  • 思路:
  1. 先找到观看数最高的20个视频,记为临时表t1

    select videoid,views,category from guliyingyin_video_orc order by views desc limit 20; ----t1

  2. 把这20条信息中的category分裂出来(列转行),记为临时表t2

    select videoid,catName from t1 lateral view explode(category) tb_category as catName; — t2

  3. 去重

    select distinct catName from t2;

  • 完整SQL
SELECT DISTINCT
	catName 
FROM
	(
	SELECT
		videoid,
		catName 
	FROM
	( SELECT videoid, views, category FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 20 ) t1 lateral VIEW explode ( category ) tb2_category AS catName 
	) t2;

将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令: 在这里插入图片描述 结果: 在这里插入图片描述

统计视频观看数Top50所关联视频的类别的Rank

思路分析: 在这里插入图片描述 思路:

  1. 查询出观看数最多的前50个视频的所有信息(包含每个视频对应的关联视频),记为临时表t1

    select videoId, views, category, relatedId from guliyingyin_video_orc order by views desc limit 50; ----t1

  2. 炸裂关联视频id:将找到的50条视频信息的相关视频的relatedId列转行,记为临时表t2

    select distinct videoId_name from t1 lateral view explode(relatedId) tb_relatedId as videoId_name; ----t2

  3. 将关联视频的id和guliyingyin_video_orc表进行inner join操作,得到每个关联视频id的详细数据,记为临时表t4

    select * from t2 inner join guliyingyin_video_orc t3 on t2.videoId_name=t3.videoId; ---- t4

  4. 炸裂关联视频的类别

    select * from t4 lateral view explode(category) tb_category as catName; ----t5

  5. 统计类别个数

    select catName, count(*) hot from t5 group by catName; ----t6

  6. 统计类别的热度排名

    select * from t6 order by hot desc;

  • 完整SQL:
SELECT
	* 
FROM
	(
	SELECT
		catName,
		count( * ) hot 
	FROM
		(
		SELECT
			* 
		FROM
			(
			SELECT
				* 
			FROM
				(
				SELECT DISTINCT
					videoId_name 
				FROM
					( SELECT videoId, views, category, relatedId FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 50 ) t1 lateral VIEW explode ( relatedId ) tb_relatedId AS videoId_name 
				) t2
				INNER JOIN guliyingyin_video_orc t3 ON t2.videoId_name = t3.videoId 
			) t4 lateral VIEW explode ( category ) tb_category AS catName 
		) t5 
	GROUP BY
		catName 
	) t6 
ORDER BY
	hot DESC;

将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令: 在这里插入图片描述 结果: 在这里插入图片描述

统计每个类别中的视频热度Top10,以Music为例

思路:要想统计Music类别中的视频热度Top10,需要先找到Music类别,那么就需要将category炸开。

  1. 创建一个临时表用于存放categoryId炸开的数据

    create table guliyingyin_category_orc(
    videoId string, 
    uploader string, 
    age int, 
    categoryId string, 
    length int, 
    views int, 
    rate float, 
    ratings int, 
    comments int,
    relatedId array
    )
    row format delimited 
    fields terminated by "\t"
    collection items terminated by "&"
    stored as orc;
    

    在这里插入图片描述

  2. 向category展开的表中插入数据。

    insert overwrite table guliyingyin_category_orc
    select  videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId
    from guliyingyin_video_orc
    lateral view explode(category) tb_category as categoryId;
    

    在这里插入图片描述

  3. 统计对应类别(Music)中的视频热度。

    select  videoId,views
    from guliyingyin_category_orc
    where categoryId = "Music"
    order by views desc
    limit 10;
    

    在这里插入图片描述 结果: 在这里插入图片描述

统计每个类别中视频流量Top10,以Music为例
select  videoId,ratings
from guliyingyin_category_orc
where categoryId = "Music"
order by ratings desc
limit 10;

在这里插入图片描述 结果:

在这里插入图片描述

统计上传视频最多的用户Top10以及他们上传的观看次数在前20的视频
  • 分析
  1. 上传视频最多的用户Top10,记为表t1

    select uploader
    from guliyingyin_user_orc
    order by videos desc
    limit 10; -----t1
    
  2. 观看次数在前20的视频

    select t1.uploader,t2.videoId,views
    from t1 inner join guliyingyin_video_orc t2
    on t1.uploader= t2.uploader 
    order by views desc
    limit 20;
    
  • 完整SQL语句
SELECT
	t1.uploader,
	t2.videoId,
	views 
FROM
	( SELECT uploader FROM guliyingyin_user_orc ORDER BY videos DESC LIMIT 10 ) t1
	INNER JOIN guliyingyin_video_orc t2 ON t1.uploader = t2.uploader 
ORDER BY
	views DESC 
	LIMIT 20;
  • 结果 在这里插入图片描述
统计每个类别视频观看数Top10
  • 思路:
  1. 先得到categoryId展开的中间表

  2. 子查询按照categoryId进行分区,然后分区内排序,并生成递增数字,该递增数字这一列起名为rank列

    select videoid,categoryid,views,
    row_number() over(partition by categoryid order by views desc) rank234
    from guliyingyin_category_orc;
    
  3. 通过子查询产生的临时表,查询rank值小于等于10的数据行即可

    select * 
    from t1
    where rank             
关注
打赏
1665409997
查看更多评论
0.0494s