目录
1. select完整语法
- 1. select完整语法
- 1.1 准备数据
- 2. with
- 3. from
- 4. sample
- 5. array join
- 6. join
- 6.1 数据准备
- 6.2 连接精度all配合连接类型(inner、left、right、full)
- 6.3 连接精度any配合连接类型(inner、left、right)
- 6.4 连接精度asof配合连接类型(inner、left)
- 6.5 连接类型cross join
- 6.6 多表join、join的注意事项
- 7. where和prewhere
- 8. group by
- 9. having
- 10. order by
- 11. limit by
- 12. limit
- 13. select
- 14. union all
- 15. 执行计划
[WITH expr |(subquery)]
SELECT [DISTINCT] expr
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE expr]
[[LEFT] ARRAY JOIN]
[GLOBAL] [ALL|ANY|ASOF] [INNER | CROSS | [LEFT|RIGHT|FULL [OUTER]] ] JOIN
(subquery)|table ON|USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr] [WITH ROLLUP|CUBE|TOTALS]
[HAVING expr]
[ORDER BY expr]
[LIMIT [n[,m]]
[UNION ALL]
[INTO OUTFILE filename]
[FORMAT format]
[LIMIT [offset] n BY columns]
语法的解析大致是按此顺序进行的
1.1 准备数据clickhouse1 :)
clickhouse1 :) create table select_table_test(
:-] id UInt32,
:-] name String,
:-] age UInt8,
:-] city String,
:-] score Float64
:-] ) engine = MergeTree()
:-] order by (id, intHash32(id))
:-] sample by intHash32(id);
clickhouse1 :)
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(1, 'Zhangsan', 30, 'Beijing', 70);
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(2, 'Lisi', 40, 'Shanghai', 80);
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(3, 'Wangwu', 50, 'Guangzhou', 90);
clickhouse1 :)
clickhouse1 :) select * from select_table_test;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│ 3 │ Wangwu │ 50 │ Guangzhou │ 90 │
└────┴────────┴─────┴───────────┴───────┘
2. with
clickhouse1 :)
clickhouse1 :) with 40 as fit_age
:-] select * from select_table_test where age = fit_age;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)
clickhouse1 :) with sum(score) as sum_score
:-] select city, toString(sum_score), toUInt16(sum_score) from select_table_test group by city;
┌─city──────┬─toString(sum_score)─┬─toUInt16(sum_score)─┐
│ Shanghai │ 80 │ 80 │
│ Beijing │ 70 │ 70 │
│ Guangzhou │ 90 │ 90 │
└───────────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) with (
:-] select age from select_table_test where city = 'Beijing'
:-] ) as fit_age
:-] select * from select_table_test where age = fit_age;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
clickhouse1 :)
- 也可在from子句中的select查询使用with
clickhouse1 :)
clickhouse1 :) select number from numbers(3);
┌─number─┐
│ 0 │
│ 1 │
│ 2 │
└────────┘
clickhouse1 :)
clickhouse1 :) select 10 from system.one;
┌─10─┐
│ 10 │
└────┘
clickhouse1 :)
clickhouse1 :) select 10;
┌─10─┐
│ 10 │
└────┘
clickhouse1 :)
select 10
会从虚拟表system.one查询- from后的final关键字,表示先强制partition合并,再得到结果
- 在数据相同,采样规则相同,采样结果是一样的
- create table的sample by int_type必须整数类型
- sample 0 ~ 1
clickhouse1 :)
clickhouse1 :) select name, _sample_factor from select_table_test sample 0.4;
┌─name─────┬─_sample_factor─┐
│ Zhangsan │ 2.5 │
└──────────┴────────────────┘
┌─name───┬─_sample_factor─┐
│ Wangwu │ 2.5 │
└────────┴────────────────┘
clickhouse1 :)
clickhouse1 :) select count(*) * any(_sample_factor) from select_table_test sample 4 / 10;
┌─multiply(count(), any(_sample_factor))─┐
│ 5 │
└────────────────────────────────────────┘
clickhouse1 :)
- 采样因子等于0时,表示不进行采用,返回所有数据
- sample 0 ~ 1 offset 0 ~ 1
clickhouse1 :)
clickhouse1 :) select * from select_table_test sample 0.3 offset 0.1;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│ 3 │ Wangwu │ 50 │ Guangzhou │ 90 │
└────┴────────┴─────┴───────────┴───────┘
clickhouse1 :)
- 假如共有10条数据,offset为0.8,sample为0.3;则会从数据的8/10(第9条)开始进行采样,采走3(10 * 0.3)条数据,但实际剩余只有2条数据,最终采样结果为2条数据
- sample rows
clickhouse1 :)
clickhouse1 :) select * from select_table_test sample 2;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│ 3 │ Wangwu │ 50 │ Guangzhou │ 90 │
└────┴────────┴─────┴───────────┴───────┘
clickhouse1 :)
- 表示采样后有rows行数据
- 采样的rows参数必须大于1,当rows参数设置的比index_granularity索引粒度值小的多时,也会返回近似index_granularity条数据
- 表A array join 表A的数组Array或嵌套类型Nested字段,将一行数据打平至多行
- 在一条select语句中,只能由一个array join(from子句除外)
- 可以直接
array join nested_parent_column as n
- 准备数据
clickhouse1 :)
clickhouse1 :) create table array_join_table(
:-] fruit String,
:-] values Array(UInt8),
:-] city String,
:-] values2 Array(UInt8)
:-] ) engine = TinyLog;
clickhouse1 :)
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('banana', [1,2,3], 'Hainan', [18,19]);
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('orange', [66,88], 'Hunan', [6,7,8,9]);
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('apple',[], 'Shangxi', [11,12,13]);
clickhouse1 :)
clickhouse1 :) select * from array_join_table;
┌─fruit──┬─values──┬─city────┬─values2────┐
│ banana │ [1,2,3] │ Hainan │ [18,19] │
│ orange │ [66,88] │ Hunan │ [6,7,8,9] │
│ apple │ [] │ Shangxi │ [11,12,13] │
└────────┴─────────┴─────────┴────────────┘
clickhouse1 :)
- inner array join(默认)
clickhouse1 :)
clickhouse1 :) select fruit, values from array_join_table array join values;
┌─fruit──┬─values─┐
│ banana │ 1 │
│ banana │ 2 │
│ banana │ 3 │
│ orange │ 66 │
│ orange │ 88 │
└────────┴────────┘
clickhouse1 :)
clickhouse1 :) select fruit, values, v from array_join_table array join values as v;
┌─fruit──┬─values──┬──v─┐
│ banana │ [1,2,3] │ 1 │
│ banana │ [1,2,3] │ 2 │
│ banana │ [1,2,3] │ 3 │
│ orange │ [66,88] │ 66 │
│ orange │ [66,88] │ 88 │
└────────┴─────────┴────┘
clickhouse1 :)
- 排除掉了空数组
- left array join
clickhouse1 :)
clickhouse1 :) select fruit, values, v from array_join_table left array join values as v;
┌─fruit──┬─values──┬──v─┐
│ banana │ [1,2,3] │ 1 │
│ banana │ [1,2,3] │ 2 │
│ banana │ [1,2,3] │ 3 │
│ orange │ [66,88] │ 66 │
│ orange │ [66,88] │ 88 │
│ apple │ [] │ 0 │
└────────┴─────────┴────┘
clickhouse1 :)
clickhouse1 :) select fruit, values, v, arrayMap(x -> x * 2, values) as map_values, m_v from array_join_table left array join values as v, map_values as m_v;
┌─fruit──┬─values──┬──v─┬─map_values─┬─m_v─┐
│ banana │ [1,2,3] │ 1 │ [2,4,6] │ 2 │
│ banana │ [1,2,3] │ 2 │ [2,4,6] │ 4 │
│ banana │ [1,2,3] │ 3 │ [2,4,6] │ 6 │
│ orange │ [66,88] │ 66 │ [132,176] │ 132 │
│ orange │ [66,88] │ 88 │ [132,176] │ 176 │
│ apple │ [] │ 0 │ [] │ 0 │
└────────┴─────────┴────┴────────────┴─────┘
clickhouse1 :)
- 数组为空的出现在结果中
- 与多个数组字段进行array join时,在一行数据中多个数组的元素个数要相同 ,也是将一行数据打平至多行
join_table_test1
clickhouse1 :)
clickhouse1 :) create table join_table_test1(
:-] id UInt32,
:-] name String,
:-] datetime Datetime
:-] ) engine = Memory();
clickhouse1 :)
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(1, 'clickhouse', '2021-08-10 11:00:00');
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(2, 'spark', '2021-08-10 12:01:00');
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(2, 'flink', '2021-08-10 12:02:00');
clickhouse1 :)
clickhouse1 :) select * from join_table_test1;
┌─id─┬─name───────┬────────────datetime─┐
│ 1 │ clickhouse │ 2021-08-10 11:00:00 │
└────┴────────────┴─────────────────────┘
┌─id─┬─name──┬────────────datetime─┐
│ 2 │ spark │ 2021-08-10 12:01:00 │
└────┴───────┴─────────────────────┘
┌─id─┬─name──┬────────────datetime─┐
│ 2 │ flink │ 2021-08-10 12:02:00 │
└────┴───────┴─────────────────────┘
clickhouse1 :)
join_table_test2
clickhouse1 :)
clickhouse1 :) create table join_table_test2(
:-] id UInt32,
:-] rate UInt8,
:-] datetime Datetime
:-] ) engine = Memory();
clickhouse1 :)
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(2, 100, '2021-08-10 11:58:00');
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(2, 90, '2021-08-10 11:59:00');
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(3, 80, '2021-08-10 13:00:00');
clickhouse1 :)
clickhouse1 :) select * from join_table_test2;
┌─id─┬─rate─┬────────────datetime─┐
│ 2 │ 100 │ 2021-08-10 11:58:00 │
└────┴──────┴─────────────────────┘
┌─id─┬─rate─┬────────────datetime─┐
│ 2 │ 90 │ 2021-08-10 11:59:00 │
└────┴──────┴─────────────────────┘
┌─id─┬─rate─┬────────────datetime─┐
│ 3 │ 80 │ 2021-08-10 13:00:00 │
└────┴──────┴─────────────────────┘
clickhouse1 :)
join_table_test3
clickhouse1 :)
clickhouse1 :) create table join_table_test3(
:-] id UInt32,
:-] star UInt32
:-] ) engine = Memory();
clickhouse1 :)
clickhouse1 :) insert into join_table_test3(id, star) values(1, 1000);
clickhouse1 :) insert into join_table_test3(id, star) values(2, 900);
clickhouse1 :)
clickhouse1 :) select * from join_table_test3;
┌─id─┬─star─┐
│ 1 │ 1000 │
└────┴──────┘
┌─id─┬─star─┐
│ 2 │ 900 │
└────┴──────┘
clickhouse1 :)
6.2 连接精度all配合连接类型(inner、left、right、full)
- 默认连接精度,可通过join_default_strictness配置参数修改
- 基准表一行数据与非基准表多行数据匹配,返回非基准表多行数据
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all inner join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all left join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│ 1 │ clickhouse │ 0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all right join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│ 0 │ │ 80 │
└────┴──────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all full join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│ 1 │ clickhouse │ 0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│ 0 │ │ 80 │
└────┴──────┴──────┘
clickhouse1 :)
6.3 连接精度any配合连接类型(inner、left、right)
- 基准表一行数据与非基准表多行数据匹配,返回非基准表第一行数据
- 注意inner join时,只返回了一行数据
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any inner join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any left join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│ 1 │ clickhouse │ 0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any right join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│ 0 │ │ 80 │
└────┴──────┴──────┘
clickhouse1 :)
6.4 连接精度asof配合连接类型(inner、left)
- join key一般都是等于(join key1 = join key2)匹配,asof允许定义一个或多个等于(join key1 = join key2)匹配 + 一个不等于(join key1 >、>=、 b.datetime;
┌─id─┬─name───────┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 1 │ clickhouse │ 0 │ 2021-08-10 11:00:00 │ 1970-01-01 08:00:00 │
└────┴────────────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ spark │ 90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ flink │ 90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof inner join join_table_test2 b using(id, datetime);
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ spark │ 90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ flink │ 90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof left join join_table_test2 b using(id, datetime);
┌─id─┬─name───────┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 1 │ clickhouse │ 0 │ 2021-08-10 11:00:00 │ 1970-01-01 08:00:00 │
└────┴────────────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ spark │ 90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ flink │ 90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
6.5 连接类型cross join
6.6 多表join、join的注意事项clickhouse1 :) clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a cross join join_table_test2 b; ┌─id─┬─name───────┬─rate─┐ │ 1 │ clickhouse │ 100 │ │ 1 │ clickhouse │ 90 │ │ 1 │ clickhouse │ 80 │ └────┴────────────┴──────┘ ┌─id─┬─name──┬─rate─┐ │ 2 │ spark │ 100 │ │ 2 │ spark │ 90 │ │ 2 │ spark │ 80 │ └────┴───────┴──────┘ ┌─id─┬─name──┬─rate─┐ │ 2 │ flink │ 100 │ │ 2 │ flink │ 90 │ │ 2 │ flink │ 80 │ └────┴───────┴──────┘ clickhouse1 :)
- 多表join
对于3个表,先将前2个表进行join,再将它们的结果集与第3个表进行join; 其它情形以此类推
-
性能问题
- join时请遵循左大右小,因为会将右表加载至内存,与左表关联
- 可以考虑使用join表引擎或字典表提示join性能
-
空值策略
- 默认是使用数据类型的默认值填充,可以将join_use_nulls参数设置为1,用Null值填充
- 会根据where条件字段,判定select是否启用了索引
- prewhere只能用于mergeTree系列,先对条件字段进行过滤,再对符合的行,进行select其它字段
- 当使用where的时候,clickhouse会根据where的条件字段,判断是否自动启用prewhere;但下列几种不会自动启用prewhere
- 常量表达式
where 1 = 1
where a = 3
, a字段为primary key或使用了alias默认值select a ...... where a = 3
, select的字段与where字段相同...... from tb array join nest_field where nest_field.col1 = 10
, select查询包含了array join、global in、global not in、 indexHint(函数)
- 常量表达式
- 准备数据
clickhouse1 :) create table group_table_test( :-] province String, :-] city String, :-] season String, :-] sales Float64 :-] ) engine = Memory(); clickhouse1 :) clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '春', 1100); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '夏', 1200); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '秋', 1300); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '冬', 1400); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '春', 2100); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '夏', 2200); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '秋', 2300); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '冬', 2400); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '春', 3100); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '夏', 3200); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '秋', 3300); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '冬', 3400); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '春', 4100); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '夏', 4200); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '秋', 4300); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '冬', 4400); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '春', 5100); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '夏', 5200); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '秋', 5300); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '冬', 5400); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '春', 6100); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '夏', 6200); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '秋', 6300); clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '冬', 6400); clickhouse1 :)
- with rollup
group by province, city with rollup
: 从右向左进行上钻,形成组合[province, city]、[province]、[ ],再对sum(sales)进行计算
clickhouse1 :) clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with rollup order by province; ┌─province─┬─city─┬─sum(sales)─┐ │ │ │ 90000 │ │ 广东 │ │ 63000 │ │ 广东 │ 深圳 │ 25000 │ │ 广东 │ 广州 │ 17000 │ │ 广东 │ 东莞 │ 21000 │ │ 湖南 │ │ 27000 │ │ 湖南 │ 株洲 │ 9000 │ │ 湖南 │ 湘潭 │ 13000 │ │ 湖南 │ 长沙 │ 5000 │ └──────────┴──────┴────────────┘ clickhouse1 :)
- with cube
group by province, city with cube
: 对所有维度进行组合,形成组合[province, city]、[province]、[city]、[ ],再对sum(sales)进行计算
clickhouse1 :) clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with cube order by province, city; ┌─province─┬─city─┬─sum(sales)─┐ │ │ │ 90000 │ │ │ 东莞 │ 21000 │ │ │ 广州 │ 17000 │ │ │ 株洲 │ 9000 │ │ │ 深圳 │ 25000 │ │ │ 湘潭 │ 13000 │ │ │ 长沙 │ 5000 │ │ 广东 │ │ 63000 │ │ 广东 │ 东莞 │ 21000 │ │ 广东 │ 广州 │ 17000 │ │ 广东 │ 深圳 │ 25000 │ │ 湖南 │ │ 27000 │ │ 湖南 │ 株洲 │ 9000 │ │ 湖南 │ 湘潭 │ 13000 │ │ 湖南 │ 长沙 │ 5000 │ └──────────┴──────┴────────────┘ clickhouse1 :)
- with totals
- 对所有的数据进行sum(sales)计算
9. havingclickhouse1 :) clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with totals order by province, city; ┌─province─┬─city─┬─sum(sales)─┐ │ 广东 │ 东莞 │ 21000 │ │ 广东 │ 广州 │ 17000 │ │ 广东 │ 深圳 │ 25000 │ │ 湖南 │ 株洲 │ 9000 │ │ 湖南 │ 湘潭 │ 13000 │ │ 湖南 │ 长沙 │ 5000 │ └──────────┴──────┴────────────┘ Totals: ┌─province─┬─city─┬─sum(sales)─┐ │ │ │ 90000 │ └──────────┴──────┴────────────┘ clickhouse1 :)
对group by聚合之后的结果,再次进行过滤
10. order byclickhouse1 :) clickhouse1 :) select province, city, sum(sales) total_sales from group_table_test group by province, city having total_sales > 10000 order by province, city; ┌─province─┬─city─┬─total_sales─┐ │ 广东 │ 东莞 │ 21000 │ │ 广东 │ 广州 │ 17000 │ │ 广东 │ 深圳 │ 25000 │ │ 湖南 │ 湘潭 │ 13000 │ └──────────┴──────┴─────────────┘ clickhouse1 :)
- 不同partition的数据不能进行排序
- nulls first
clickhouse1 :) clickhouse1 :) with arrayJoin([30, null, 0/0, 50]) as v :-] select v order by v nulls first; ┌────v─┐ │ ᴺᵁᴸᴸ │ │ nan │ │ 30 │ │ 50 │ └──────┘ clickhouse1 :)
- nulls last
- 默认操作
11. limit byclickhouse1 :) clickhouse1 :) with arrayJoin([30, null, 0/0, 50]) as v :-] select v order by v nulls last; ┌────v─┐ │ 30 │ │ 50 │ │ nan │ │ ᴺᵁᴸᴸ │ └──────┘ clickhouse1 :)
clickhouse1 :) clickhouse1 :) select province, city, season, sales from group_table_test limit 1 offset 1 by province, city; ┌─province─┬─city─┬─season─┬─sales─┐ │ 湖南 │ 长沙 │ 夏 │ 1200 │ └──────────┴──────┴────────┴───────┘ ┌─province─┬─city─┬─season─┬─sales─┐ │ 湖南 │ 株洲 │ 夏 │ 2200 │ └──────────┴──────┴────────┴───────┘ ┌─province─┬─city─┬─season─┬─sales─┐ │ 湖南 │ 湘潭 │ 夏 │ 3200 │ └──────────┴──────┴────────┴───────┘ ┌─province─┬─city─┬─season─┬─sales─┐ │ 广东 │ 广州 │ 夏 │ 4200 │ └──────────┴──────┴────────┴───────┘ ┌─province─┬─city─┬─season─┬─sales─┐ │ 广东 │ 东莞 │ 夏 │ 5200 │ └──────────┴──────┴────────┴───────┘ ┌─province─┬─city─┬─season─┬─sales─┐ │ 广东 │ 深圳 │ 夏 │ 6200 │ └──────────┴──────┴────────┴───────┘ clickhouse1 :)
- 执行于order by之后和limit之前
- 按province、city进行分组,在每组中,跳过1条数据,取1条数据
- 简写形式:
limit 1, 1 by province, city
clickhouse1 :) clickhouse1 :) select * from select_table_test limit 1 offset 1; ┌─id─┬─name─┬─age─┬─city─────┬─score─┐ │ 2 │ Lisi │ 40 │ Shanghai │ 80 │ └────┴──────┴─────┴──────────┴───────┘ clickhouse1 :) clickhouse1 :) select * from select_table_test limit 1, 1; ┌─id─┬─name─┬─age─┬─city─────┬─score─┐ │ 2 │ Lisi │ 40 │ Shanghai │ 80 │ └────┴──────┴─────┴──────────┴───────┘ clickhouse1 :)
- 如结果数据位于多个partition, 且未使用order by, 每次limit返回的数据可能不同
正则匹配查询字段
clickhouse1 :) clickhouse1 :) select columns('^i'), columns('a') from select_table_test; ┌─id─┬─name─────┬─age─┐ │ 1 │ Zhangsan │ 30 │ │ 2 │ Lisi │ 40 │ │ 3 │ Wangwu │ 50 │ └────┴──────────┴─────┘ clickhouse1 :)
- 返回以i开头的字段,和包含a的字段
clickhouse1 :) clickhouse1 :) select * from select_table_test where id = 1 :-] union all :-] select * from select_table_test where id = 2; ┌─id─┬─name─────┬─age─┬─city────┬─score─┐ │ 1 │ Zhangsan │ 30 │ Beijing │ 70 │ └────┴──────────┴─────┴─────────┴───────┘ ┌─id─┬─name─┬─age─┬─city─────┬─score─┐ │ 2 │ Lisi │ 40 │ Shanghai │ 80 │ └────┴──────┴─────┴──────────┴───────┘ clickhouse1 :)
- union all后的字段名称以左表的为准
clickhouse1 :) clickhouse1 :) explain :-] with 30 as fit_age :-] select age, count() num from select_table_test :-] where age = fit_age :-] group by age :-] having num = 1 :-] order by num :-] limit 1; ┌─explain─────────────────────────────────────────────────────────────────────────────────────┐ │ Expression (Projection) │ │ Limit (preliminary LIMIT) │ │ MergingSorted (Merge sorted streams for ORDER BY) │ │ MergeSorting (Merge sorted blocks for ORDER BY) │ │ PartialSorting (Sort each block for ORDER BY) │ │ Expression (Before ORDER BY) │ │ Filter (HAVING) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ Filter (WHERE) │ │ SettingQuotaAndLimits (Set limits and quota after reading from storage) │ │ ReadFromMergeTree │ └─────────────────────────────────────────────────────────────────────────────────────────────┘ clickhouse1 :)
也可以通过查看执行日志来查看具体的信息:
[root@clickhouse1 ~]# [root@clickhouse1 ~]# clickhouse-client -u default --password default123 --send_logs_level=trace Complete [clickhouse1] 2021.08.09 18:20:59.418954 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} MergeTreeSelectProcessor: Reading 1 ranges from part all_1_3_1, approx. 3 rows starting from 0 [clickhouse1] 2021.08.09 18:20:59.420643 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} executeQuery: Read 3 rows, 135.00 B in 0.0028947 sec., 1036 rows/sec., 45.54 KiB/sec. [clickhouse1] 2021.08.09 18:20:59.420674 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} MemoryTracker: Peak memory usage (for query): 0.00 B. [root@clickhouse1 ~]#