- 思路:使用order by按照 views 字段做一个全局排序,设置只显示前10条即可 实现SQL:
select videoid,uploader,age,category,length,views,rate,ratings,comments
from guliyingyin_video_orc
order by views desc
limit 10;

- 结果:
思路:炸开数组”视频类别“字段,然后按照类别分组,最后按照热度(视频个数)排序。
因为当前表结构为:一个视频对应一个或多个类别。所以如果要 group by 类别,需要先将类别进行列转行(展开),然后再进行count即可
- 分析过程:
- 炸开类别,记为临时表t1
select videoId,catName from guliyingyin_video_orc lateral view explode(category) tb_category as catName; — t1
- 按照类别 group by 聚合,然后count组内的videoId个数,记为临时表t2
select catName,count(*) hot from t1 group by catName; —t2
- 最后按照热度排序,显示前10条。
select catName, hot from t2 order by hot desc limit 10;
- 完整的SQL:
SELECT catName, hot FROM ( SELECT catName, count( * ) hot FROM ( SELECT videoId, catName FROM guliyingyin_video_orc lateral VIEW explode ( category ) tb_category AS catName ) t1 GROUP BY catName ) t2 ORDER BY hot DESC LIMIT 10;
将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令: 结果:
- 思路:
- 先找到观看数最高的20个视频,记为临时表t1
select videoid,views,category from guliyingyin_video_orc order by views desc limit 20; ----t1
- 把这20条信息中的category分裂出来(列转行),记为临时表t2
select videoid,catName from t1 lateral view explode(category) tb_category as catName; — t2
- 去重
select distinct catName from t2;
- 完整SQL
SELECT DISTINCT
catName
FROM
(
SELECT
videoid,
catName
FROM
( SELECT videoid, views, category FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 20 ) t1 lateral VIEW explode ( category ) tb2_category AS catName
) t2;
将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令: 结果:
思路分析: 思路:
- 查询出观看数最多的前50个视频的所有信息(包含每个视频对应的关联视频),记为临时表t1
select videoId, views, category, relatedId from guliyingyin_video_orc order by views desc limit 50; ----t1
- 炸裂关联视频id:将找到的50条视频信息的相关视频的relatedId列转行,记为临时表t2
select distinct videoId_name from t1 lateral view explode(relatedId) tb_relatedId as videoId_name; ----t2
- 将关联视频的id和guliyingyin_video_orc表进行inner join操作,得到每个关联视频id的详细数据,记为临时表t4
select * from t2 inner join guliyingyin_video_orc t3 on t2.videoId_name=t3.videoId; ---- t4
- 炸裂关联视频的类别
select * from t4 lateral view explode(category) tb_category as catName; ----t5
- 统计类别个数
select catName, count(*) hot from t5 group by catName; ----t6
- 统计类别的热度排名
select * from t6 order by hot desc;
- 完整SQL:
SELECT
*
FROM
(
SELECT
catName,
count( * ) hot
FROM
(
SELECT
*
FROM
(
SELECT
*
FROM
(
SELECT DISTINCT
videoId_name
FROM
( SELECT videoId, views, category, relatedId FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 50 ) t1 lateral VIEW explode ( relatedId ) tb_relatedId AS videoId_name
) t2
INNER JOIN guliyingyin_video_orc t3 ON t2.videoId_name = t3.videoId
) t4 lateral VIEW explode ( category ) tb_category AS catName
) t5
GROUP BY
catName
) t6
ORDER BY
hot DESC;
将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令: 结果:
思路:要想统计Music类别中的视频热度Top10,需要先找到Music类别,那么就需要将category炸开。
-
创建一个临时表用于存放categoryId炸开的数据
create table guliyingyin_category_orc( videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int, relatedId array ) row format delimited fields terminated by "\t" collection items terminated by "&" stored as orc;
-
向category展开的表中插入数据。
insert overwrite table guliyingyin_category_orc select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from guliyingyin_video_orc lateral view explode(category) tb_category as categoryId;
-
统计对应类别(Music)中的视频热度。
select videoId,views from guliyingyin_category_orc where categoryId = "Music" order by views desc limit 10;
结果:
select videoId,ratings
from guliyingyin_category_orc
where categoryId = "Music"
order by ratings desc
limit 10;
结果:
- 分析
-
上传视频最多的用户Top10,记为表t1
select uploader from guliyingyin_user_orc order by videos desc limit 10; -----t1
-
观看次数在前20的视频
select t1.uploader,t2.videoId,views from t1 inner join guliyingyin_video_orc t2 on t1.uploader= t2.uploader order by views desc limit 20;
- 完整SQL语句
SELECT
t1.uploader,
t2.videoId,
views
FROM
( SELECT uploader FROM guliyingyin_user_orc ORDER BY videos DESC LIMIT 10 ) t1
INNER JOIN guliyingyin_video_orc t2 ON t1.uploader = t2.uploader
ORDER BY
views DESC
LIMIT 20;
- 结果
- 思路:
-
先得到categoryId展开的中间表
-
子查询按照categoryId进行分区,然后分区内排序,并生成递增数字,该递增数字这一列起名为rank列
select videoid,categoryid,views, row_number() over(partition by categoryid order by views desc) rank234 from guliyingyin_category_orc;
-
通过子查询产生的临时表,查询rank值小于等于10的数据行即可
select * from t1 where rank
关注打赏
最近更新
- 深拷贝和浅拷贝的区别(重点)
- 【Vue】走进Vue框架世界
- 【云服务器】项目部署—搭建网站—vue电商后台管理系统
- 【React介绍】 一文带你深入React
- 【React】React组件实例的三大属性之state,props,refs(你学废了吗)
- 【脚手架VueCLI】从零开始,创建一个VUE项目
- 【React】深入理解React组件生命周期----图文详解(含代码)
- 【React】DOM的Diffing算法是什么?以及DOM中key的作用----经典面试题
- 【React】1_使用React脚手架创建项目步骤--------详解(含项目结构说明)
- 【React】2_如何使用react脚手架写一个简单的页面?