使用Elasticsearch和C＃理解和实现CRUD APP的初学者教程——第1部分

介绍

背景

Elasticsearch到底是什么？

入门

1）依赖关系

2）API

3）配置

4）运行！

5）用于查询的IDE

编写Elastic命令

映射

插入行

更新

删除

查询

寻找精确值

组合布尔过滤器

查找多个精确值

范围查询

聚合

名字计数

获得最低报名费

获得平均年龄

获取多值聚合

嵌套聚合

结论

在这一部分中，我们将演示如何设置Elasticsearch并学习如何编写基本语句。您将学习Elasticsearch API中的结构、命令、工具，并使用标准设置启动和运行它。

介绍

在这篇由两部分组成的文章中，我们将讨论Elasticsearch API的基础知识。

在第1部分中，我们将逐步了解其结构、命令、工具，并使用标准设置来启动和运行它。我们还将创建一个简单的Windows Form应用程序，以演示CRUD操作，并在整个过程中展示Elasticsearch的一些出色功能。

在第二部分中，我们将探索Elasticsearch和.NET之间的集成。

背景

如果您对本文感兴趣，我想您已经对Lucene有一定的经验，或者至少听说过它。显然，这是最近行业中的热门话题。

由于Lucene是Elasticsearch的重要技术，因此在使用Elastic之前了解其工作原理非常重要。

我已经使用Lucene工作了一年，而我在Elastic方面的个人经验开始于几个月前，当时我们公司决定将BI的核心从“纯粹的” Lucene迁移到Elastic。我的主要信息来源来自其官方网站。

Elasticsearch到底是什么？

如前所述，Elastic在Lucene存储之上运行。除其他外，它使我们节省了创建标准功能的繁琐工作，例如：索引、查询、汇总以及物理索引文件在服务器之间的分布。

对于那些使用多个框架甚至创建了自己的类集以处理“纯” Lucene的人来说，这是令人振奋的消息。这确实是一个高效的API。

入门

因此，让我们开始“有趣”的部分（我相信我并不是唯一不喜欢环境设置的开发人员！）。首先，您需要安装并进行一些设置：

1）依赖关系

幸运的是，Elasticsearch需要Java的最新版本。您应该从Java官方网站安装最新版本。

2）API

您可以从Elasticsearch网站下载最新版本。

3）配置

启动Elastic服务器之前，必须在[安装路径]\config\elasticsearch.yml文件上更改以下设置：

取消注释，然后选择没有空格的名称：

cluster.name：new_name

取消注释，然后选择没有空格的名称：

node.name：new_node_name

取消注释，然后输入“true”作为值：

bootstrap.mlockall：true

取消注释，然后输入“127.0.0.1”作为值：

network.host：127.0.0.1

这两个设置默认情况下不存在，您可以将它们直接粘贴在文件末尾：

script.inline：on

script.engine.groovy.inline.aggs：on

4）运行！

启动和运行服务器很简单，如下所示：

以管理员身份打开命令提示符。
转到您已安装Elastic的文件夹。
转到bin文件夹。
输入elasticsearch.bat，然后按Enter键：

您可以通过在浏览器中测试以下URL来检查它是否仍然存在：

http://localhost:9200/_cluster/health。

5）用于查询的IDE

为了测试您的全新存储，您肯定需要一个好的IDE。幸运的是，Elastic.org提供了它。

我一直在测试其他IDE，但是Marvel和Sense是迄今为止最好的。使Elastic服务器运行并且安装命令为后，应执行此步骤：

 [Installation Path]\bin> plugin -i elasticsearch/marvel/latest

然后，您可以通过浏览器访问这些工具：

Marvel （健康监测器）：

http://localhost:9200/_plugin/marvel/kibana/index.html#/dashboard/file/marvel.overview.json

Sense（用于查询的IDE）：

http://localhost:9200/_plugin/marvel/sense/index.html

编写Elastic命令

好吧，如果您在安装会话中幸存下来，那么有趣的部分现在开始！

正如您在此阶段可以意识到的那样，Elastic是一个RESTFul API，因此其命令完全基于Json。好消息，因为它已在当今的行业中广泛使用。

粗略地说，我在这里显示的内容与我们将在关系数据库中创建的内容相似。考虑到这一点，让我们从“create database”和“create table”语句开始。

映射

映射是我们告诉Elastic如何创建“表”的方式。在整个映射过程中，您将定义文档的结构，字段的类型等。

我们将使用假设的（虽然不太有创意！）实体“Customer”。因此，您必须在Sense IDE中编写的命令是：

PUT crud_sample
{
  "mappings": {
    "Customer_Info" : {
      "properties": { 
        "_id":{
         "type": "long"
        },
        "name":{
          "type": "string",
          "index" : "not_analyzed"
        },
        "age":{
          "type": "integer"
        },
        "birthday":{
          "type": "date",
          "format": "basic_date"
        },
        "hasChildren":{
          "type": "boolean"
        },
        "enrollmentFee":{
          "type": "double"
        }
      }
    }

您可以使用以下命令测试映射是否正确：

GET /crud_sample/_mapping

结果，您应该获得：

假设我们忘记创建一个字段。没问题，您可以通过以下方式添加它：

PUT /crud_sample/_mapping/Customer_Info
{
  "properties" : {
    "opinion" : {
     "type" : "string",
	 "index" : "not_analyzed"
    }
  }
}

您可以使用上一条命令检查运行情况。

插入行

插入新行（索引是正确的术语）非常简单：

PUT /crud_sample/Customer_Info/1
{
  "age" : 32,
  "birthday": "19830120",
  "enrollmentFee": 175.25,
  "hasChildren": false,
  "name": "PH",
  "opinion": "It's Ok, I guess..."
}

您可以通过以下命令检查它：

GET /crud_sample/Customer_Info/_search

但是，逐行插入可能会有些痛苦。幸运的是，我们可以使用批量加载，如下所示：

POST /crud_sample/Customer_Info/_bulk
{"index": { "_id": 1 }}
{"age" : 32, "birthday": "19830120", "enrollmentFee": 175.25, 
 "hasChildren": false, "name": "PH", "opinion": "It's cool, I guess..." }
{"index": { "_id": 2 }}
{"age" : 32, "birthday": "19830215", "enrollmentFee": 175.25, 
 "hasChildren": true, "name": "Marcel", "opinion": "It's very nice!" }
{"index": { "_id": 3 }}
{"age" : 62, "birthday": "19530215", "enrollmentFee": 205.25, 
 "hasChildren": false, "name": "Mayra", "opinion": "I'm too old for that!" }
{"index": { "_id": 4 }}
{"age" : 32, "birthday": "19830101", "enrollmentFee": 100.10, 
 "hasChildren": false, "name": "Juan", "opinion": "¿Qué tal estás?" }
{"index": { "_id": 5 }}
{"age" : 30, "birthday": "19850101", "enrollmentFee": 100.10, 
 "hasChildren": true, "name": "Cezar", "opinion": "Just came for the food..." }
{"index": { "_id": 6 }}
{"age" : 42, "birthday": "19730101", "enrollmentFee": 50.00, 
 "hasChildren": true, "name": "Vanda", "opinion": "Where am I again?" }
{"index": { "_id": 7 }}
{"age" : 42, "birthday": "19730101", "enrollmentFee": 65.00, 
 "hasChildren": false, "name": "Nice", "opinion": "What were u saying again?" }
{"index": { "_id": 8 }}
{"age" : 22, "birthday": "19930101", "enrollmentFee": 150.10, 
 "hasChildren": false, "name": "Telks", "opinion": "Can we go out now?" }
{"index": { "_id": 9 }}
{"age" : 32, "birthday": "19830120", "enrollmentFee": 175.25, 
 "hasChildren": false, "name": "Rafael", "opinion": "Should be fine..." }

现在，如果您运行search语句，您将看到9个命中，这几乎是我们到目前为止所获得的全部。但是，如果您要检查一个特定的客户，只需在URL的末尾添加其ID：

GET crud_sample/Customer_Info/3

更新

如果您要添加新文档或通过声明中的“id”对其进行更新，Elastic足够聪明。例如，假设您需要更改客户编号3的“意见”：

POST /crud_sample/Customer_Info/3/_update
{
  "doc": {
    "opinion": "I'm really too old for it."
  }
}

删除

您必须像对待任何数据库一样小心使用这些命令。这里的主要选项是：

删除整个存储：

delete crud_sample

删除特定客户：

delete crud_sample/Customer_Info/1

查询

当涉及到查询时，Elastic的资源极其丰富。我将介绍CRUD应用程序的基础知识。在运行以下查询示例之前，如果添加一些额外的行/文档可能会很有趣。

寻找精确值

GET /crud_sample/Customer_Info/_search
{
    "query" : {
        "filtered" : { 
            "query" : {
                "match_all" : {} 
            },
            "filter" : {
                "term" : { 
                    "opinion" : "It's cool, I guess..."
                }
            }
        }
    }
}

您应该获得“意见”与查询匹配的文档作为响应：

组合布尔过滤器

GET /crud_sample/Customer_Info/_search
{
   "query" : {
      "filtered" : { 
         "filter" : {
            "bool" : {
              "must" : {
                 "term" : {"hasChildren" : false} 
              },
              "must_not": [ 
                { "term": { "name": "PH"  }},
                { "term": { "name": "Felix"  }}
              ],
              "should" : [
                 { "term" : {"age" : 30}}, 
                 { "term" : {"age" : 31}}, 
                 { "term" : {"age" : 32}} 
              ]
           }
         }
      }
   }
}

请注意，我们在此处合并了三个子句：

“must”：查询必须出现在匹配的文档中。
“must_not”：查询不得出现在匹配的文档中。
“should”：查询应出现在匹配的文档中，但不是必需的。

查找多个精确值

GET /crud_sample/Customer_Info/_search
{
    "query" : {
        "filtered" : {
            "filter" : {
                "terms" : { 
                    "age" : [22, 62]
                }
            }
        }
    }
}

如上所示，Elastic允许为同一字段通知多个值。

范围查询

在此示例中，我们将获取所有文档，其注册费介于10和100之间：

GET /crud_sample/Customer_Info/_search
{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "enrollmentFee" : {
                        "gte" : 10,
                        "lt"  : 100
                    }
                }
            }
        }
    }
}

现在，使用生日的日期范围：

GET /crud_sample/Customer_Info/_search
{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "birthday" : {
                        "gt" : "19820101",
                        "lt" : "19840101"
                    }
                }
            }
        }
    }
}

并结合日期和数字字段：

GET /crud_sample/Customer_Info/_search
{
  "query" : {
    "filtered" : {
        "filter" : {
          "bool": { 
            "must": [
                {"range": {"enrollmentFee": { "gte": 100, "lte": 200 }}},
                {"range": {"birthday": { "gte": "19850101" }}}
            ]
          }
        }
     }
  }
}

聚合

对于大数据分析，聚合是锦上添花。在BI项目要求完成所有步骤之后，您便开始从大量数据中了解。

聚合使您能够即时计算和汇总有关当前查询的数据。

它们可用于各种任务，例如：动态计数、平均值、最小值和最大值、百分位等。

粗略地比较一下Customer，SQL服务器中我们实体的计数聚合为：

Select Count(id) From customer

让我们用索引得出一些样本：

名字计数

GET /crud_sample/Customer_Info/_search?search_type=count
{
  "aggregations": {
    "my_agg": {
      "terms": {
        "field": "name",
         "size": 1000
      }
    }
  }
}

您应该得到回应：

获得最低报名费

GET /crud_sample/Customer_Info/_search?search_type=count
{
    "aggs" : {
        "min_price" : { "min" : { "field" : "enrollmentFee" } }
    }
}

语法非常相似，基本上，聚合关键字将更改。在这种情况下，我们使用“min”。

获得平均年龄

GET /crud_sample/Customer_Info/_search?search_type=count
{
    "aggs" : {
        "avg_grade" : { "avg" : { "field" : "age" } }
    }
}

现在，计算平均值...

获取多值聚合

GET /crud_sample/Customer_Info/_search?search_type=count
{
    "aggs" : {
        "grades_stats" : { "extended_stats" : { "field" : "enrollmentFee" } }
    }
}

这是一个有用的资源，运行它将获得多个聚合。

嵌套聚合

GET /crud_sample/Customer_Info/_search?search_type=count
{
   "aggs": {
      "colors": {
         "terms": {
            "field": "hasChildren"
         },
         "aggs": { 
            "avg_age": { 
               "avg": {
                  "field": "age" 
               }
            }
         }
      }
   }
}

最后，嵌套聚合。这很有趣，上面的语句按字段'hasChildren'对聚合进行分组，并且在其值（True或False）内可以找到平均年龄：

结论

本文的目的是演示如何设置Elasticsearch并学习如何编写基本语句。所有这些概念都将在我们的CRUD应用程序中应用。

我们的主要目标是将所有这些与.NET相结合，这将在下一篇（也是最后一篇）文章中实现。

使用Elasticsearch和C＃理解和实现CRUD APP的初学者教程——第1部分

最近更新

热门博客

[ 申请 ]友情链接：