elasticsearch支持各种类型的聚合查询,给我们做数据统计、数据分析时提供了强大的处理能力,但是作为java开发者,如何在java client中实现这些聚合呢?
我们知道spring-data-elasticsearch提供了针对整合spring的es java client,但是在elastic、spring-data官方文档中都没有详细说明聚合查询在java client中如何实现。
所以本期,我们的目标就是一篇将这些聚合操作一网打尽!
为了更好的将这些聚合讲解清楚,我们结合es官方文档的结构,将三种类型的聚合一一讲解。但不会将每种小类型都演示一遍,相信经过几种常用类型的演示,大家自己也能推敲出其他类型的用法。如果实在写不出来的可以留言博主,我们一起来探讨
本次演示基于以下环境
spring-data-elasticsearch3.2.12.RELEASE
基础环境的搭建可参考这篇文章:
从零搭建springboot+spring data elasticsearch3.x环境
在开始讲解之前,我们先声明我们的索引结构,方便大家后续理解我们的案例
# 订单索引,一个订单下有多个商品
PUT order_test
{
"mappings": {
"properties": {
// 订单状态 0未付款 1未发货 2运输中 3待签收 4已签收
"status": {
"type": "integer"
},
// 订单编号
"no": {
"type": "keyword"
},
// 下单时间
"create_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
// 订单金额
"amount": {
"type": "double"
},
// 创建人
"creator":{
"type": "keyword"
},
// 商品信息
"product":{
"type": "nested",
"properties": {
// 商品ID
"id": {
"type": "keyword"
},
// 商品名称
"name":{
"type": "keyword"
},
// 商品价格
"price": {
"type": "double"
},
// 商品数量
"quantity": {
"type": "integer"
}
}
}
}
}
}
测试数据,供大家跟练
POST order_test/_bulk
{"index":{}}
{"status":0,"no":"DD202205280001","create_time":"2022-05-01 12:00:00","amount":100.0,"creator":"张三","product":[{"id":"1","name":"苹果","price":20.0,"quantity":5}]}
{"index":{}}
{"status":0,"no":"DD202205280002","create_time":"2022-05-01 12:00:00","amount":100.0,"creator":"李四","product":[{"id":"2","name":"香蕉","price":20.0,"quantity":5}]}
{"index":{}}
{"status":1,"no":"DD202205280003","create_time":"2022-05-02 12:00:00","amount":100.0,"creator":"张三","product":[{"id":"2","name":"香蕉","price":20.0,"quantity":5}]}
{"index":{}}
{"status":2,"no":"DD202205280004","create_time":"2022-05-01 12:00:00","amount":150.0,"creator":"王二","product":[{"id":"1","name":"苹果","price":30.0,"quantity":5}]}
{"index":{}}
{"status":2,"no":"DD202205280005","create_time":"2022-05-03 12:00:00","amount":100.0,"creator":"55555","product":[{"id":"2","name":"香蕉","price":20.0,"quantity":5}]}
{"index":{}}
{"status":3,"no":"DD202205280006","create_time":"2022-05-04 12:00:00","amount":150.0,"creator":"李四","product":[{"id":"3","name":"榴莲","price":150.0,"quantity":1}]}
{"index":{}}
{"status":4,"no":"DD202205280007","create_time":"2022-05-04 12:00:00","amount":100.0,"creator":"张三","product":[{"id":"2","name":"香蕉","price":20.0,"quantity":5}]}
{"index":{}}
{"status":3,"no":"DD202205280008","create_time":"2022-05-01 12:00:00","amount":200.0,"creator":"王二","product":[{"id":"1","name":"苹果","price":40.0,"quantity":5}]}
{"index":{}}
{"status":4,"no":"DD202205280009","create_time":"2022-05-03 12:00:00","amount":100.0,"creator":"55555","product":[{"id":"2","name":"香蕉","price":20.0,"quantity":5}]}
2. 分桶聚合 Bucket aggregations
2.1 Terms aggregation
分桶聚合中最常用的就是terms聚合了,它可以按照指定字段将数据分组聚合,类似mysql中的group by
- 案例
要求统计各种状态的单数
- DSL
GET order_test/_search
{
"size": 0,
"aggs": {
"status_bucket": {
"terms": {
"field": "status"
}
}
}
}
- 执行结果
- java实现
public void termsAgg(){
String aggName = "status_bucket";
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
queryBuilder.withPageable(PageRequest.of(0,1));
TermsAggregationBuilder termsAgg = AggregationBuilders.terms(aggName).field("status");
queryBuilder.addAggregation(termsAgg);
Aggregations aggregations = restTemplate.query(queryBuilder.build(), SearchResponse::getAggregations);
Terms terms = aggregations.get(aggName);
List buckets = terms.getBuckets();
HashMap statusRes = new HashMap();
buckets.forEach(bucket -> {
statusRes.put(bucket.getKeyAsString(),bucket.getDocCount());
});
System.out.println("---聚合结果---");
System.out.println(statusRes);
}
2.2 Date histogram aggregation
日期分组聚合可以按照日期进行分组,常用到一些日期趋势统计中
- 案例
统计每天的下单量
- DSL
GET order_test/_search
{
"size": 0,
"aggs": {
"date": {
"date_histogram": {
"field": "create_time",
"calendar_interval": "day",
"format": "yyyy-MM-dd"
}
}
}
}
- 执行结果
- java
public void dateHistogramAgg(){
String aggName = "date";
DateHistogramAggregationBuilder dateHistogramAggregation = AggregationBuilders.dateHistogram(aggName).field("create_time")
.calendarInterval(DateHistogramInterval.days(1)).format("yyyy-MM-dd");
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
queryBuilder.withPageable(PageRequest.of(0,1)).addAggregation(dateHistogramAggregation);
Aggregations aggregations = restTemplate.query(queryBuilder.build(), SearchResponse::getAggregations);
ParsedDateHistogram terms = aggregations.get(aggName);
List buckets = terms.getBuckets();
HashMap resultMap = new HashMap();
buckets.forEach(bucket -> {
resultMap.put(bucket.getKeyAsString(),bucket.getDocCount());
});
System.out.println("---聚合结果---");
System.out.println(resultMap);
}
拓展:
这里大家会发现使用的是ParsedDateHistogram
来承接结果,与上述的Term
不一致,那么我们怎么知道什么时候该用哪个呢?实际上可以通过断点来判断
我们通过把断点截取到restTemplate.query
的执行结果aggregations
之后,会发现该aggregations中的元素已经标明了其类型为ParsedDateHistogram
,所以大家只需要跟着用就可以了。
范围分组聚合可以帮助我们按照指定的数值范围进行分组
- 案例
统计订单金额在0~100,100~200,200+ 这几个区间的订单数量
- DSL
GET order_test/_search
{
"size": 0,
"aggs": {
"date_range": {
"range": {
"field": "amount",
"ranges": [
{
"to": "100"
},
{
"from": "100",
"to": "200"
},
{
"from": "200"
}
]
}
}
}
}
- 执行结果
- java
public void rangeAgg(){
String aggName = "range";
RangeAggregationBuilder agg = AggregationBuilders.range(aggName).field("amount").addUnboundedTo(100).addRange(100, 200).addUnboundedFrom(200);
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
queryBuilder.withPageable(PageRequest.of(0,1)).addAggregation(agg);
Aggregations aggregations = restTemplate.query(queryBuilder.build(), SearchResponse::getAggregations);
ParsedRange terms = aggregations.get(aggName);
List buckets = terms.getBuckets();
HashMap resultMap = new HashMap();
buckets.forEach(bucket -> {
resultMap.put(bucket.getKeyAsString(),bucket.getDocCount());
});
System.out.println("---聚合结果---");
System.out.println(resultMap);
}
2.4 Nested aggregation
nested聚合专用于json型子对象进行聚合,比如上述案例中product是json型数组,如果当我们想通过商品中的属性来聚合统计时就需要用到nested聚合,直接使用product.name
来聚合其结果则不会是我们预期的,这主要与es针对数组的存储形式有关。
- 案例
统计每种货物的订单数
- DSL
GET order_test/_search
{
"size": 0,
"aggs": {
"product_nested": {
"nested": {
"path": "product"
},
"aggs": {
"name_bucket": {
"terms": {
"field": "product.name"
}
}
}
}
}
}
- 执行结果
- java:这里我们涉及到要设置一个嵌套聚合,可以通过
subAggregation
方法来定义子聚合
public void nestedAgg(){
String aggName = "product_nested";
String termsAggName = "name_bucket";
NestedAggregationBuilder aggregationBuilder = AggregationBuilders.nested(aggName, "product").subAggregation(AggregationBuilders.terms(termsAggName).field("product.name"));
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder()
.withPageable(PageRequest.of(0,1))
.addAggregation(aggregationBuilder);
Aggregations aggregations = restTemplate.query(queryBuilder.build(), SearchResponse::getAggregations);
ParsedNested nestedRes = aggregations.get(aggName);
Terms terms = nestedRes.getAggregations().get(termsAggName);
List buckets = terms.getBuckets();
HashMap resMap = new HashMap();
buckets.forEach(bucket -> {
resMap.put(bucket.getKeyAsString(),bucket.getDocCount());
});
System.out.println("---聚合结果---");
System.out.println(resMap);
}
3. 数值聚合 Metrics aggregations
3.1 Sum aggregations
求和聚合是常用的聚合之一,经常与分组聚合配合使用,用来统计出各组下的合计
- 案例
求5月1日销售总额
- DSL:这里我们添加了一个query语句,用来限定聚合范围是5.1日的订单
GET order_test/_search
{
"query": {
"range": {
"create_time": {
"format": "yyyy-MM-dd",
"from": "2022-05-01",
"to": "2022-05-01"
}
}
},
"size": 0,
"aggs": {
"sum_amount": {
"sum": {
"field": "amount"
}
}
}
}
- 执行结果
- java
public void sumAgg(){
String aggName = "sumAmount";
SumAggregationBuilder agg = AggregationBuilders.sum(aggName).field("amount");
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder()
.withPageable(PageRequest.of(0,1))
.withQuery(QueryBuilders.rangeQuery("create_time").format("yyyy-MM-dd").from("2022-05-01").to("2022-05-01"))
.addAggregation(agg);
Aggregations aggregations = restTemplate.query(queryBuilder.build(), SearchResponse::getAggregations);
ParsedSum metric = aggregations.get(aggName);
double value = metric.getValue();
System.out.println("---聚合结果---");
System.out.println(value);
}
3.2 Script aggregation
脚本聚合支持我们通过脚本语言来自定义聚合的数值,es中脚本默认的语言为painless。需要注意的是脚本语法非常影响性能,我们一般是尽量避免使用。同时es中还提供了专门的脚本数值聚合 script metric aggregation,但因为不太常用,所以我们这里以更加常用的聚合脚本来讲解
- 案例
求所有货物平均单价
- DSL:注意这里不能直接用product.price。因为product是数组,里面可能包含多种货物,所以应该用订单总金额除以所有订单的货物数量
GET order_test/_search
{
"size": 0,
"aggs": {
"total_amount":{
"sum": {
"field": "amount"
}
},
"total_quantity":{
"sum": {
"script": {
"source": """
int total = 0;
for(int i=0; i
关注
打赏
最近更新
- 深拷贝和浅拷贝的区别(重点)
- 【Vue】走进Vue框架世界
- 【云服务器】项目部署—搭建网站—vue电商后台管理系统
- 【React介绍】 一文带你深入React
- 【React】React组件实例的三大属性之state,props,refs(你学废了吗)
- 【脚手架VueCLI】从零开始,创建一个VUE项目
- 【React】深入理解React组件生命周期----图文详解(含代码)
- 【React】DOM的Diffing算法是什么?以及DOM中key的作用----经典面试题
- 【React】1_使用React脚手架创建项目步骤--------详解(含项目结构说明)
- 【React】2_如何使用react脚手架写一个简单的页面?