from : 从第几个角标开始
size : 查询多少条数据
GET /_search?size=10
GET /_search?size=10&from=0
GET /_search?size=10&from=20
分页
GET /test_index/test_type/_search
响应结果
{
"took": 27,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_score": 1,
"_source": {
"test_field": "test client 2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "10",
"_score": 1,
"_source": {
"test_field1": "test1",
"test_field2": "updated test2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "12",
"_score": 1,
"_source": {
"test_field": "test12"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "4",
"_score": 1,
"_source": {
"test_field1": "test field111111"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "6",
"_score": 1,
"_source": {
"test_field": "test test"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_score": 1,
"_source": {
"test_field": "replaced test2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "7",
"_score": 1,
"_source": {
"test_field": "test client 2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": 1,
"_source": {
"test_field1": "test field1",
"test_field2": "bulk test1"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "11",
"_score": 1,
"_source": {
"num": 1,
"tags": []
}
}
]
}
}
我们假设将这9条数据分成3页,每一页是3条数据,来实验一下这个分页搜索的效果
GET /test_index/test_type/_search?from=0&size=3
响应结果
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_score": 1,
"_source": {
"test_field": "test client 2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "10",
"_score": 1,
"_source": {
"test_field1": "test1",
"test_field2": "updated test2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "12",
"_score": 1,
"_source": {
"test_field": "test12"
}
}
]
}
}
第一页:id=8,10,12
GET /test_index/test_type/_search?from=3&size=3
响应结果
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "4",
"_score": 1,
"_source": {
"test_field1": "test field111111"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "6",
"_score": 1,
"_source": {
"test_field": "test test"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_score": 1,
"_source": {
"test_field": "replaced test2"
}
}
]
}
}
第二页:id=4,6,2
GET /test_index/test_type/_search?from=6&size=3
响应结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "7",
"_score": 1,
"_source": {
"test_field": "test client 2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": 1,
"_source": {
"test_field1": "test field1",
"test_field2": "bulk test1"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "11",
"_score": 1,
"_source": {
"num": 1,
"tags": []
}
}
]
}
}
第三页:id=7,1,11
2、什么是deep paging(深度分页)问题?为什么会产生这个问题,它的底层原理是什么?deep paging性能问题,以及原理深度图解揭秘,很高级的知识点
什么是deep paging简单来说,就是搜索的特别深,比如总共有60000条数据,每个shard上分了20000条数据,每页是10条数据,那么就要分2000页。
这个时候,你想要搜索第1000页,就要思考一下,是第几条到第几条,实际上是要拿到10001~10010的数据
那么是每个shard都返回10001~10010的数据吗?并不是!
你的请求可能会达到一个不包含这个index的shard所在的node上,那么个这个node就是一个coordinate node (协调节点),那么这个coordinate node 就会将搜索请求转发到index的3个shard所在的node上。
那么这种情况下,要搜索60000条数据中的第1000页,实际上每个shard都要将内部的20000条数据中的第10001~10010条数据拿出来,这里并不是这10条,而是10010条数据,所以3个shard共返回30030条数据给coordinate node,coordinate node会对这些数据进行排序,根据相关度分数_socre排序,然后取出需要的第1000页的十条数据。
搜索过深的时候,需要在coordinate node 上保存大量的数据,还要进行大量数据的排序,排序之后,再取出对应的那一页数据返回。这个过程既耗费网络带宽,耗费内存,还耗费CPU,所以deep paging 的性能问题,我们应该尽量避免出现deep paging操作