您当前的位置: 首页 >  搜索

Dongguo丶

暂无认证

  • 1浏览

    0关注

    472博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

28分页搜索以及deep paging性能问题深度图解

Dongguo丶 发布时间:2021-11-07 00:17:55 ,浏览量:1

1、讲解如何使用es进行分页搜索的语法

from : 从第几个角标开始

size : 查询多少条数据

GET /_search?size=10
GET /_search?size=10&from=0
GET /_search?size=10&from=20

分页

GET /test_index/test_type/_search

响应结果

{
  "took": 27,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "10",
        "_score": 1,
        "_source": {
          "test_field1": "test1",
          "test_field2": "updated test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "12",
        "_score": 1,
        "_source": {
          "test_field": "test12"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field1": "test field111111"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaced test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "bulk test1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "num": 1,
          "tags": []
        }
      }
    ]
  }
}

我们假设将这9条数据分成3页,每一页是3条数据,来实验一下这个分页搜索的效果

GET /test_index/test_type/_search?from=0&size=3

响应结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "10",
        "_score": 1,
        "_source": {
          "test_field1": "test1",
          "test_field2": "updated test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "12",
        "_score": 1,
        "_source": {
          "test_field": "test12"
        }
      }
    ]
  }
}

第一页:id=8,10,12

GET /test_index/test_type/_search?from=3&size=3

响应结果

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field1": "test field111111"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaced test2"
        }
      }
    ]
  }
}

第二页:id=4,6,2

GET /test_index/test_type/_search?from=6&size=3

响应结果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "bulk test1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "num": 1,
          "tags": []
        }
      }
    ]
  }
}

第三页:id=7,1,11

2、什么是deep paging(深度分页)问题?为什么会产生这个问题,它的底层原理是什么?

deep paging性能问题,以及原理深度图解揭秘,很高级的知识点

什么是deep paging

简单来说,就是搜索的特别深,比如总共有60000条数据,每个shard上分了20000条数据,每页是10条数据,那么就要分2000页。

这个时候,你想要搜索第1000页,就要思考一下,是第几条到第几条,实际上是要拿到10001~10010的数据

那么是每个shard都返回10001~10010的数据吗?并不是!

你的请求可能会达到一个不包含这个index的shard所在的node上,那么个这个node就是一个coordinate node (协调节点),那么这个coordinate node 就会将搜索请求转发到index的3个shard所在的node上。

那么这种情况下,要搜索60000条数据中的第1000页,实际上每个shard都要将内部的20000条数据中的第10001~10010条数据拿出来,这里并不是这10条,而是10010条数据,所以3个shard共返回30030条数据给coordinate node,coordinate node会对这些数据进行排序,根据相关度分数_socre排序,然后取出需要的第1000页的十条数据。

image-20211107001414750

搜索过深的时候,需要在coordinate node 上保存大量的数据,还要进行大量数据的排序,排序之后,再取出对应的那一页数据返回。这个过程既耗费网络带宽,耗费内存,还耗费CPU,所以deep paging 的性能问题,我们应该尽量避免出现deep paging操作

关注
打赏
1638062488
查看更多评论
立即登录/注册

微信扫码登录

0.0414s