elasticsearch 的 updateByQuery 使用脚本script 完成部分字段的更新

需求：根据某个条件查询es中的数据，并更新es中的部分字段。

更新需求细化：

直接修改/添加一个字段，这里是指直接修改/添加一个一级字段。（我们知道es是json格式，通常情况下，我们会有多层次的字段。本文中，对于多层次的字段，我用一级字段，二级字段来描述。如果还不明白，直接看我的案例吧）这个是最简单的，不会有什么错误。
修改一个二级字段，这里有一个问题，我们需要考虑到到一级字段不存在的情况。否则会报一个空指针异常。
如果是嵌套类型，该如何处理。
删除一个字段。
如果字段上已经有值，再天添加一个值，并且不能覆盖之前的。

我在遇到这些问题的时候，先看了es的官网，官网只有简单的例子，并无法完成上边的需求。然后查了网上已有的文章。都挺乱的，所以有了这篇文章，我想以完成需求的方式，来掌握这些知识点。

先说一声抱歉，文章是周末写的。案例没有通过测试。稍微晚一点，我再修正一下。

前置知识

已经会用es了。
对es的 updateByQuery 有一定了解。可以使用代码来操作。
需要了解script。
需要了解painless语法。这里是最重要的。

ps~ 不会也没有关系，通过这篇文章，都可以掌握的。

本文目标

完成本文开头的需求。
熟悉 updateByQuery API。
可以写出dsl语句来。可以在kibana上成功执行。
可以使用java代码完成上边的需求。
熟悉painless语法。其实关键点就是语法。

案例

案例会和上边的需求对齐

案例1 （对应需求1:直接修改/添加一个一级字段）

我们用一个人员的索引，当做案例（不方便把生产环境的索引拿出来，从网上借用的案例）

对应的表结构 mapping

#可以直接在kibnaa上执行下边的创建索引的语句。
PUT twitter/_mapping
{
  "mappings": {
    "properties": {
      "DOB": {
        "type": "date"
      },
      "address": {
        "type": "keyword"
      },
      "city": {
        "type": "keyword"
      },
     "country": {
        "type": "keyword"
      }
      "uid": {
        "type": "long"
      },
      "user": {
        "type": "keyword"
      },
      "hobby": {
        "type": "keyword"
      },
      "message": {
        "type": "text"
      },
      "location": {
        "properties": {
            "lat": {
                "type": "keyword"
             },
             "lon": {
                "type": "keyword"
              }
        }
      }
    }
  }
}

推测试数据进去

{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"张三","message":"今儿天气不错啊，出去转转去","uid":2,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}, "DOB":"1980-12-01"}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"老刘","message":"出发，下一站云南！","uid":3,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}, "DOB":"1981-12-01"}

在kibana上使用 updateByQuery API 修改

假设我们要根据指定条件修改一个字段：把张三的生日改成 1996-12-01

如下命令

POST twitter/_update_by_query
{
  "query": {
    "term": {
      "user": "张三"
    }
  }，
  "script": {
    "source": "ctx._source.DOB = params.DOB",
    "params": {
      "DOB": 1996-12-01
    }
  }
}

同样添加一个新的一级字段，也是使用上边的命令，比如，我们上边的数据，没有hobby这个字段。我们想要给名字叫张三的人添加一个爱好。

POST twitter/_update_by_query
{
  "query": {
    "term": {
      "user": "张三"
    }
  }，
  "script": {
    "source": "ctx._source.hobby = params.hobby",
    "params": {
      "hobby": "和女朋友爬山"
    }
  }
}

案例一在java中的代码

这里需要你已经会使用java调用es了，其它的知识就不展开了。

参看es官方文档：Update API | Java REST Client [7.15] | Elastic

更新数据的底层方法代码：

    public void updateByQuery(QueryBuilder queryBuilder, String index, Script script) throws IOException {
        UpdateByQueryRequest request = new UpdateByQueryRequest(index);
        request.setConflicts("proceed");
        request.setQuery(queryBuilder);
        request.setMaxDocs(1000);
        request.setScript(script);
        getClient(esEntity).updateByQuery(request, RequestOptions.DEFAULT);
    }

接着，调用这个方法，

需要传入 script 对象（执行脚本）。看上层的调用方法。

public void updateHobby(String user, ESEntity esEntity) throws IOException {
        final BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.termQuery("user", user));
        // 这个实际上就是对应给脚本中传参数的对象。
        HashMap params = new HashMap(16);
        params.put("hobby", "和女朋友爬山");
        final Script script = new Script(
                ScriptType.INLINE, "painless",
                "ctx._source.hobby = params.hobby",
                params);
        updateByQuery(queryBuilder, "索引名称", script);
    }

第一个需求案例已经写完了，串一下知识点。在java中调用，我们是要new 一个 Script对象的。里边写了脚本的内容。想要给脚本传参数进去，实际上就是一个 Map 类型的对象。我们可以灵活的传多个参数进去。

案例2 （对应需求2:修改/添加一个二级字段）

这里有一个问题，我们需要考虑到到一级字段不存在的情况。否则会报一个空指针异常。所以我们需要在脚本中多加一层判断。

还用案例1的数据结构，这次我们改一下 location 中的二级字段，这里location相当于是一级字段，lat 和 lon相当于都是二级字段。

如果location这个一级字段一定存在的话，可以这样写脚本，这样是可以执行成功的。但是如果一级字段location不存在的话，就会报空指针异常。

POST twitter/_update_by_query
{
  "query": {
    "term": {
      "user": "张三"
    }
  }，
  "script": {
    "source": "ctx._source.location.lon = params.location.lon",
    "params": {
      "location": {
          "lat": "39.970788",
          "lon":  "119.970718"
      }
    }
  }
}

那应该如何才能在没有一级字段的情况下，添加一个二级字段?

实际上很简单，我们参数中的对象直接放进去呗。

POST twitter/_update_by_query
{
  "query": {
    "term": {
      "user": "张三"
    }
  }，
  "script": {
    "source": "ctx._source.location = params.location",
    "params": {
      "location": {
          "lat": "39.970788",
          "lon":  "119.970718"
      }
    }
  }
}

如果不能确定一级字段是否存在，也就是可能有也可能没有的时候如何写脚本呢？

POST twitter/_update_by_query
{
  "query": {
    "term": {
      "user": "张三"
    }
  }，
  "script": {
    "source": "if(ctx._source.containsKey('location')){ 
        ctx._source.location.lon = params.location.lon;
    }else{
        ctx._source.location = params.location;
    }",
    "params": {
      "location": {
          "lat": "39.970788",
          "lon":  "119.970718"
      }
    }
  }
}

对应的java代码

public void updateLocation(String user, ESEntity esEntity, String lon, String lat) throws IOException {
        final BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.termQuery("user", user));
        // 这个实际上就是对应给脚本中传参数的对象。
        HashMap params = new HashMap(16);
        JSONObject location = new JSONObject();
        if(StringUtils.isNotBlank(lon)){
            location.put("lon", lon);
        }
        if(StringUtils.isNotBlank(lat)){
            location.put("lat", lat);
        }
        params.put("location", location);
        inal Script script = new Script(
                ScriptType.INLINE, "painless",
                "if(ctx._source.containsKey('location')){ \n" +
                        "        ctx._source.location.lon = params.location.lon;\n" +
                        "    }else{\n" +
                        "        ctx._source.location = params.location;\n" +
                        "    }",
                params);
        updateByQuery(queryBuilder, "索引名称", script);
    }

案例3 （对应需求3:如果是嵌套类型）

POST twitter/_update_by_query
{
  "query": {
    "term": {
      "user": "张三"
    }
  }，
  "script": {
    "source": "if(ctx._source.containsKey('location')){ 
        ctx._source.location.lon = params.location.lon;
    }else{
        ctx._source.location = [params.location]; // 注意这里的区别，区别只在这里。是一个中括号。
    }",

    "params": {
      "location": {
          "lat": "39.970788",
          "lon":  "119.970718"
      }
    }
  }
}

对应的java代码：

public void updateLocation(String user, ESEntity esEntity, String lon, String lat) throws IOException {
        final BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().must(QueryBuilders.termQuery("user", user));
        // 这个实际上就是对应给脚本中传参数的对象。
        HashMap params = new HashMap(16);
        JSONObject location = new JSONObject();
        if(StringUtils.isNotBlank(lon)){
            location.put("lon", lon);
        }
        if(StringUtils.isNotBlank(lat)){
            location.put("lat", lat);
        }
        params.put("location", location);
        inal Script script = new Script(
                ScriptType.INLINE, "painless",
                "if(ctx._source.containsKey('location')){ \n" +
                        "        ctx._source.location.lon = params.location.lon;\n" +
                        "    }else{\n" +
                        "        ctx._source.location = [params.location];\n" +
                        "    }",
                params);
        updateByQuery(queryBuilder, "索引名称", script);
    }

案例4 （对应需求4:删除一个字段）

这要熟悉一下painless语法

参考文章：painless语法入门_忽如一夜听春雨的博客-CSDN博客_painless语法

ctx._source.field: add, contains, remove, indexOf, length （针对文档字段内容做操作）
ctx.op: The operation that should be applied to the document: index or delete（针对文档本身做操作，可以直接删除这个文档）
ctx._index: Access to document metadata fields
_score 只在script_score中有效
doc[‘field’], doc[‘field’].value: add, contains, remove, indexOf, length

POST twitter/_update_by_query
{
  "query": {
    "term": {
      "user": "张三"
    }
  }，
  "script": {
    "source": "ctx._source.remove(params.removeField)",
    "params": {
      "removeField": "hobby"
    }
  }
}

java代码此处省略了，如果前边的案例有看懂，这里应该很容易写出来了。

案例5 （对应需求5:如果字段上已经有值，再天添加一个值，并且不能覆盖之前的）

POST twitter/_update_by_query
{
  "query": {
    "term": {
      "user": "张三"
    }
  }，
  "script": {
    "source": "ctx._source.hobby.add(params.hobby)",
    "params": {
      "hobby": "做运动"
    }
  }
}

java代码此处省略了，如果前边的案例有看懂，这里应该很容易写出来了。

elasticsearch 的 updateByQuery 使用脚本script 完成部分字段的更新

[ 申请 ]友情链接：