slop的含义是什么?
query string,搜索文本中的term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop
实际举例,一个query string经过几次移动之后可以匹配到一个document,然后设置slop
hello world, java is very good, spark is also very good.
搜索java spark,使用match phrase,时搜不到的
如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配
java is very good spark is
java spark java --> spark java --> spark java --> spark
这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了
slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。一个query string terms,最多可以移动几次去尝试跟一个doc匹配上
slop,设置的是3,那么就ok
GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java spark",
"slop": 3
}
}
}
}
就可以把刚才那个doc匹配上,那个doc会作为结果返回
但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的
做实验,验证slop的含义
GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "spark data",
"slop": 3
}
}
}
}
响应结果
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.21824157,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "5",
"_score": 0.21824157,
"_source": {
"articleID": "DHJK-B-1395-#Ky5",
"userID": 3,
"hidden": false,
"postDate": "2021-11-11",
"tag": [
"elasticsearch"
],
"tag_cnt": 1,
"view_cnt": 10,
"title": "this is spark blog",
"content": "spark is best big data solution based on scala ,an programming language similar to java spark",
"sub_title": "haha, hello world",
"author_first_name": "Tonny",
"author_last_name": "Peter Smith",
"new_author_last_name": "Peter Smith",
"new_author_first_name": "Tonny"
}
}
]
}
}
spark is best big data solution based on scala ,an programming language similar to java spark
spark is best big data
spark data spark --> data spark --> data spark --> data
GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "spark data",
"slop": 2
}
}
}
}
响应结果
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
如果是搜索data spark呢
spark is best big data
data spark –> data/spark spark data spark --> data spark --> data
data spark首先要交换为止,需要移动两次
1 data移动到索引1的位置
2 spark移动到索引0的位置
GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "data spark",
"slop": 5
}
}
}
}
响应结果
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.154366,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "5",
"_score": 0.154366,
"_source": {
"articleID": "DHJK-B-1395-#Ky5",
"userID": 3,
"hidden": false,
"postDate": "2021-11-11",
"tag": [
"elasticsearch"
],
"tag_cnt": 1,
"view_cnt": 10,
"title": "this is spark blog",
"content": "spark is best big data solution based on scala ,an programming language similar to java spark",
"sub_title": "haha, hello world",
"author_first_name": "Tonny",
"author_last_name": "Peter Smith",
"new_author_last_name": "Peter Smith",
"new_author_first_name": "Tonny"
}
}
]
}
}
slop搜索下,关键词离的越近,relevance score就会越高,做实验说明。。。
GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "java best",
"slop": 13
}
}
}
}
响应结果
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.65380025,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 0.65380025,
"_source": {
"articleID": "KDKE-B-9947-#kL5",
"userID": 1,
"hidden": false,
"postDate": "2017-01-02",
"tag": [
"java"
],
"tag_cnt": 1,
"view_cnt": 50,
"title": "this is java blog",
"content": "i think java is the best programming language",
"sub_title": "learned a lot of course",
"author_first_name": "Smith",
"author_last_name": "Williams",
"new_author_last_name": "Williams",
"new_author_first_name": "Smith"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "5",
"_score": 0.07111243,
"_source": {
"articleID": "DHJK-B-1395-#Ky5",
"userID": 3,
"hidden": false,
"postDate": "2021-11-11",
"tag": [
"elasticsearch"
],
"tag_cnt": 1,
"view_cnt": 10,
"title": "this is spark blog",
"content": "spark is best big data solution based on scala ,an programming language similar to java spark",
"sub_title": "haha, hello world",
"author_first_name": "Tonny",
"author_last_name": "Peter Smith",
"new_author_last_name": "Peter Smith",
"new_author_first_name": "Tonny"
}
}
]
}
}
doc5的java best离得过远,relevance score就非常低。
spark is best big data solution based on scala ,an programming language similar to java spark
搜索java best
bestbigdatasolutionbasedonscalaanprogramminglanguagesimilartojavajavabest1java/best2bestjava3bestjava4bestjava5bestjava6bestjava7bestjava8bestjava9bestjava10bestjava11bestjava12bestjava13bestjava其实,加了slop的phrase match,就是proximity match,近似匹配
1、java spark,短语,doc,phrase match 2、java spark,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match