34query string的分词以及mapping引入案例遗留问题

Dongguo丶发布时间：2021-11-07 21:40:40 ，浏览量：5

之前一部分的内容，看起来是非常散乱的，但是一些前后章节是有些许联系的，而且在后续会将之前的内容慢慢的串起来，边复习边扩展。

1、query string分词

query string必须以和index建立时相同的analyzer进行分词 query string对exact value和full text的区别对待：

date：exact value _all：full text

比如我们有一个document，其中有一个field，包含的value是：hello you and me，建立倒排索引我们要搜索这个document对应的index，搜索文本是hell me，这个搜索文本的方式是query string query string：在默认情况下，es会使用它对应的field建立倒排索引时相同的分词器去进行分词和normalization，只有这样，才能实现正确的搜索

比如我们建立倒排索引的时候，将dogs 转换成 dog，结果你搜索的时候，还是搜索dogs，那不就搜索不到了吗？所以搜索的时候，那个dogs也必须变成dog才行。才能搜索到。

知识点：对于不同类型的field，可能有的就是full text全文搜索，有的就是exact value精确搜索

2、mapping引入案例遗留问题

《30用一个例子解释mapping到底是什么》搜索结果不相同问题

GET /website/article/_search?q=2017	
GET /website/article/_search?q=post_date:2017

全文查询

GET /website/article/_search?q=2017

搜索的是_all field，document所有的field都会拼接成一个大串，进行分词

2017-01-02 my second article this is my second article in this website 11400

doc1doc2doc32017***01*02*03*…

_all，query string会用跟建立倒排索引一样的分词器去进行分词，搜索2017，自然会搜索到3个docuemnt

精确匹配

GET /_search?q=post_date:2017-01-01

date，会作为exact value去建立索引

doc1doc2doc32017-01-01*2017-01-02*2017-01-03*

post_date:2017-01-01，query string会用跟建立倒排索引一样的分词器去进行分词，搜索2017-01-01，搜索到doc1一条document

GET /_search?q=post_date:2017，搜索出也是一条document，这个es 5.2以后做的一个优化

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 26,
    "successful": 26,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "post_date": "2017-01-01",
          "title": "my first article",
          "content": "this is my first article in this website",
          "author_id": 11400
        }
      }
    ]
  }
}

3、测试分词器分词

GET /_analyze
{
  "analyzer": "standard",
  "text": "Text to analyze"
}

响应结果

{
  "tokens": [
    {
      "token": "text",
      "start_offset": 0,
      "end_offset": 4,
      "type": "",
      "position": 0
    },
    {
      "token": "to",
      "start_offset": 5,
      "end_offset": 7,
      "type": "",
      "position": 1
    },
    {
      "token": "analyze",
      "start_offset": 8,
      "end_offset": 15,
      "type": "",
      "position": 2
    }
  ]
}

Text to analyze使用standard analyze

被分成text、to、analyze3个关键词

4什么是mapping再次回炉透彻理解

（1）往es里面直接插入数据，es会自动建立索引，同时建立type以及对应的mapping （2）mapping中就自动定义了每个field的数据类型（3）不同的数据类型（比如说text和date），可能有的是exact value，有的是full text （4）exact value，在建立倒排索引的时候，分词的时候，是将整个值一起作为一个关键词建立到倒排索引中的；full text，会经历各种各样的处理，分词，normaliztion（时态转换，同义词转换，大小写转换），才会建立到倒排索引中（5）同时呢，exact value和full text类型的field就决定了，在一个搜索过来的时候，对exact value field或者是full text field进行搜索的行为也是不一样的，会跟建立倒排索引的行为保持一致；比如说exact value搜索的时候，就是直接按照整个值进行匹配，full text query string，也会进行分词和normalization再去倒排索引中去搜索（6）可以用es的dynamic mapping，让其自动建立mapping，包括自动设置数据类型；也可以提前手动创建index和type的mapping，自己对各个field进行设置，包括数据类型，包括索引行为，包括分词器，等等

mapping，就是index的type的元数据，每个type都有一个自己的mapping，决定了数据类型，建立倒排索引的行为，还有进行搜索的行为

关注

打赏

1688896170

查看更多评论

34query string的分词以及mapping引入案例遗留问题

[ 申请 ]友情链接：