您当前的位置: 首页 >  彭世瑜 ar

elasticsearch安装中文分词扩展elasticsearch-analysis-ik

彭世瑜 发布时间:2019-01-02 11:46:54 ,浏览量:2

github: https://github.com/medcl/elasticsearch-analysis-ik

安装方式

1、先查看版本号: http://localhost:9200/

找到对应版本: https://github.com/medcl/elasticsearch-analysis-ik/releases

2、安装

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip

3、重启es

4、分词测试

curl -X PUT 'localhost:9200/website'

curl -XGET "http://localhost:9200/website/_analyze" -H 'Content-Type: application/json' -d'
{
   "text":"中华人民共和国国歌","tokenizer": "ik_max_word"
}'

返回内容

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "中华人民",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "华人",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "人民共和国",
            "start_offset": 2,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "人民",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "共和国",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "共和",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 8
        },
        {
            "token": "国歌",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 9
        }
    ]
}

如果安装失败,可以使用如下方式进行安装

源码解压后拷贝至es目录: plugins/ik , 重启服务

ik_max_word: 会将文本做最细粒度的拆分 ik_smart: 会做最粗粒度的拆分

参考 Elasticsearch5.x安装IK分词器以及使用

关注
打赏
1688896170
查看更多评论

彭世瑜

暂无认证

  • 2浏览

    0关注

    2727博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文
立即登录/注册

微信扫码登录

0.0862s