28分页搜索以及deep paging性能问题深度图解

Dongguo丶发布时间：2021-11-07 00:17:55 ，浏览量：3

1、讲解如何使用es进行分页搜索的语法

from ：从第几个角标开始

size ：查询多少条数据

GET /_search?size=10
GET /_search?size=10&from=0
GET /_search?size=10&from=20

分页

GET /test_index/test_type/_search

响应结果

{
  "took": 27,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "10",
        "_score": 1,
        "_source": {
          "test_field1": "test1",
          "test_field2": "updated test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "12",
        "_score": 1,
        "_source": {
          "test_field": "test12"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field1": "test field111111"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaced test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "bulk test1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "num": 1,
          "tags": []
        }
      }
    ]
  }
}

我们假设将这9条数据分成3页，每一页是3条数据，来实验一下这个分页搜索的效果

GET /test_index/test_type/_search?from=0&size=3

响应结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "10",
        "_score": 1,
        "_source": {
          "test_field1": "test1",
          "test_field2": "updated test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "12",
        "_score": 1,
        "_source": {
          "test_field": "test12"
        }
      }
    ]
  }
}

第一页：id=8,10,12

GET /test_index/test_type/_search?from=3&size=3

响应结果

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field1": "test field111111"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaced test2"
        }
      }
    ]
  }
}

第二页：id=4,6,2

GET /test_index/test_type/_search?from=6&size=3

响应结果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "bulk test1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "num": 1,
          "tags": []
        }
      }
    ]
  }
}

第三页：id=7,1,11

2、什么是deep paging（深度分页）问题？为什么会产生这个问题，它的底层原理是什么？

deep paging性能问题，以及原理深度图解揭秘，很高级的知识点

什么是deep paging

简单来说，就是搜索的特别深，比如总共有60000条数据，每个shard上分了20000条数据，每页是10条数据，那么就要分2000页。

这个时候，你想要搜索第1000页，就要思考一下，是第几条到第几条，实际上是要拿到10001~10010的数据

那么是每个shard都返回10001~10010的数据吗？并不是！

你的请求可能会达到一个不包含这个index的shard所在的node上，那么个这个node就是一个coordinate node (协调节点)，那么这个coordinate node 就会将搜索请求转发到index的3个shard所在的node上。

那么这种情况下，要搜索60000条数据中的第1000页，实际上每个shard都要将内部的20000条数据中的第10001~10010条数据拿出来，这里并不是这10条，而是10010条数据，所以3个shard共返回30030条数据给coordinate node，coordinate node会对这些数据进行排序，根据相关度分数_socre排序，然后取出需要的第1000页的十条数据。

搜索过深的时候，需要在coordinate node 上保存大量的数据，还要进行大量数据的排序，排序之后，再取出对应的那一页数据返回。这个过程既耗费网络带宽，耗费内存，还耗费CPU，所以deep paging 的性能问题，我们应该尽量避免出现deep paging操作

关注

打赏

1638062488

查看更多评论

28分页搜索以及deep paging性能问题深度图解

最近更新

热门博客

[ 申请 ]友情链接：