Apache Druid的介绍和0.22.1版本集群部署

1. Apache Druid的介绍
2. 部署规划
3. 安装要求
4. 下载解压(bigdata001上)
5. 修改配置(bigdata001上)
- 5.1 Druid通用配置
5.2 修改Master服务配置
- 5.3 修改Data服务配置
- 5.4 修改Query服务
6. 分发Druid目录(bigdata001上)
7. 启动(在所有服务器上)
- 7.1 一台服务器，启动多个服务设置
- 7.2 启动master服务
- 7.3 启动data服务
- 7.4 启动query服务
8. 停止服务
9. 查看Druid控制台

1. Apache Druid的介绍

Apache Druid适合面向事件类型的数据

Druid的优点如下：

列式储存
实时或批量插入数据：实时插入的数据可立即用于查询
自修复、自平衡：伸缩集群只需添加或删除服务，集群就会在后台自动对自身进行重新平衡。进行配置修改不需要对集群进行停机
不会丢失数据的云原生容错架构：数据储存在HDFS上，计算与储存进行解耦
用于快速过滤的索引：Druid使用CONCISE或Roaring压缩的位图索引来创建索引，以支持快速过滤和跨多列搜索
基于时间的分区：Druid首先按时间对数据进行分区，另外同时可以根据其他字段进行分区
近似算法：Druid应用了近似count-distinct、近似排序、近似直方图和分位数计算的算法。Druid还提供了精确count-distinct和精确排序
摄取时自动汇总聚合：Druid支持在数据插入时可选地进行数据汇总

Druid的适用场景

数据插入频率比较高，但较少更新数据
数据具有时间属性，Druid针对时间做了优化和设计
在多表场景下，每次查询仅命中一个大的分布式表，查询又可能命中多个较小的lookup表

Druid的不适用场景

根据主键对数据进行低延迟更新。Druid只支持批量更新，不支持流式更新
将一个大事实表和另一个大事实表进行连接

2. 部署规划

最基本的安装要求是1个Master服务器、2个Data服务器(用于容错)、1个Query服务器

所有我们在bigdata001、bigdata002、bigdata003三台服务器上，都部署Master(Coordinator和Overlord)、Data(Historical和MiddleManager)、Query(Broker和Router)服务

各服务的作用：

Coordinator进程和Overlord进程负责处理集群的元数据和协调需求
Historical和MiddleManager处理集群中的实际数据
Broker服务接收查询请求，并将其转发到集群中的其他部分

3. 安装要求

在3台服务器上安装Java8，参考centos7同时安装java8和openJdk11、windows同时安装java8和openJdk11

4. 下载解压(bigdata001上)

[root@bigdata001 opt]# wget --no-check-certificate https://dlcdn.apache.org/druid/0.22.1/apache-druid-0.22.1-bin.tar.gz
[root@bigdata001 opt]#
[root@bigdata001 opt]# tar -zxvf apache-druid-0.22.1-bin.tar.gz
[root@bigdata001 opt]#

5. 修改配置(bigdata001上)

配置文件位于apache-druid-0.22.1/conf/druid/cluster目录下

异常1：如果报如下错误

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000400000000, 16106127360, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 16106127360 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/apache-druid-0.22.1/hs_err_pid9178.log

则修改cluster/master/druid_service/jvm.config的如下内容，调整JVM的使用内存

-Xms1g
-Xmx1g
-XX:MaxDirectMemorySize=1536m

5.1 Druid通用配置

修改cluster/_common/common.runtime.properties

将derby元数据储存 + 本地文件储存 + 本地文件Indexing service logs进行注释

#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527

#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments

#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs

修改为mysql元数据储存 + HDFS文件储存 + HDFS文件Indexing service logs + 远程Zookeeper服务

druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches", "mysql-metadata-storage"]

druid.host=bigdata001

druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://bigdata005:3306/druid
druid.metadata.storage.connector.user=root
druid.metadata.storage.connector.password=Root_123

druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments

druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs

druid.zk.service.host=bigdata001:2181,bigdata002:2181,bigdata003:2181

如果需要从Hadoop加载数据，则需要添加如下配置

# 当从Hadoop加载数据时，一个用于临时储存数据的HDFS目录
druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing

将Hadoop的配置文件拷贝到Druid的配置目录下

[root@bigdata001 _common]# pwd
/opt/apache-druid-0.22.1/conf/druid/cluster/_common
[root@bigdata001 _common]# 
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/core-site.xml .
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/hdfs-site.xml .
您在 /var/spool/mail/root 中有邮件
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/yarn-site.xml .
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/mapred-site.xml .
[root@bigdata001 _common]#

添加mysql-connector-java-8.0.25.jar包到apache-druid-0.22.1/extensions/mysql-metadata-storage目录下。并在mysql中创建druid数据库

5.2 修改Master服务配置

修改cluster/master/coordinator-overlord/runtime.properties

修改内容如下：

druid.plaintextPort=9081

5.3 修改Data服务配置

修改cluster/data/historical/runtime.properties

修改内容如下：

druid.processing.buffer.sizeBytes=256MiB
druid.processing.numMergeBuffers=1
druid.processing.numThreads=3

修改cluster/data/middleManager/runtime.properties 修改内容如下：

druid.worker.capacity=1

druid.indexer.fork.property.druid.processing.numMergeBuffers=1
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB
druid.indexer.fork.property.druid.processing.numThreads=1

5.4 修改Query服务

修改cluster/query/broker/runtime.properties

修改内容如下：

druid.processing.buffer.sizeBytes=100MiB

6. 分发Druid目录(bigdata001上)

将apache-druid-0.22.1目录，从bidata001复制到bigdata002、bigdata003上

[root@bigdata001 opt]# scp -r apache-druid-0.22.1 root@bigdata002:/opt
[root@bigdata001 opt]#

修改bigdata002和bigdata003服务器cluster/_common/common.runtime.properties文件的druid.host

7. 启动(在所有服务器上) 7.1 一台服务器，启动多个服务设置

当执行启动脚本的时候，会报如下错误

Cannot lock svdir, maybe another 'supervise' is running: /opt/apache-druid-0.22.1/var/sv

因为sv目录被另一个服务占用了，所以需要修改每个启动服务的使用目录。添加-d "/opt/apache-druid-0.22.1/var_xxxxxx"到3个启动脚本中


[root@bigdata001 bin]# 
[root@bigdata001 bin]# cat start-cluster-master-no-zk-server
#!/bin/bash -eu

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"

exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/master-no-zk.conf" -d "/opt/apache-druid-0.22.1/var_master"
[root@bigdata001 bin]# 
[root@bigdata001 bin]# cat start-cluster-data-server 
#!/bin/bash -eu

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"

exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/data.conf" -d "/opt/apache-druid-0.22.1/var_data"
[root@bigdata001 bin]# 
[root@bigdata001 bin]# cat start-cluster-query-server 
#!/bin/bash -eu

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"

exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/query.conf" -d "/opt/apache-druid-0.22.1/var_query"
[root@bigdata001 bin]#

创建Druid的临时目录

[root@bigdata002 apache-druid-0.22.1]# 
[root@bigdata002 apache-druid-0.22.1]# mkdir var/tmp
[root@bigdata002 apache-druid-0.22.1]#

7.2 启动master服务

[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-master-no-zk-server > /opt/apache-druid-0.22.1/bin/start-cluster-master-no-zk-server.log 2>&1 &
[1] 12085
[root@bigdata001 ~]#

master服务的log文件位于var_master/sv文件中

7.3 启动data服务

[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-data-server > /opt/apache-druid-0.22.1/bin/start-cluster-data-server.log 2>&1 &
[2] 17919
[root@bigdata001 ~]#

master服务的log文件位于var_data/sv文件中

7.4 启动query服务

[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-query-server > /opt/apache-druid-0.22.1/bin/start-cluster-query-server.log 2>&1 &
[3] 20487
[root@bigdata001 ~]# 
[root@bigdata001 ~]# exit
登出
Connection closing...Socket close.

Connection closed by foreign host.

Disconnected from remote host(bigdata001) at 22:40:24.

Type `help' to learn how to use Xshell prompt.
[C:\~]$

query服务的log文件位于var_query/sv文件中
这里使用nohup启动，需要执行exit退出客户端。不然客户端以非exit方式断开，再次连接客户端时，会报如下错误，该节点所有Druid服务都会自动stop

[root@bigdata001 ~]# cat /opt/apache-druid-0.22.1/bin/start-cluster-query-server.log 
nohup: 忽略输入
[Mon Mar 28 19:39:45 2022] Running command[broker], logging to[/opt/apache-druid-0.22.1/var_query/sv/broker.log]: bin/run-druid broker conf/druid/cluster/query
[Mon Mar 28 19:39:45 2022] Running command[router], logging to[/opt/apache-druid-0.22.1/var_query/sv/router.log]: bin/run-druid router conf/druid/cluster/query
[Mon Mar 28 22:25:47 2022] Sending signal[15] to command[broker] (timeout 360s).
[Mon Mar 28 22:25:47 2022] Sending signal[15] to command[router] (timeout 360s).
[Mon Mar 28 22:25:47 2022] Command[router] exited (pid = 31414, exited = 143)
[Mon Mar 28 22:25:47 2022] Command[broker] exited (pid = 31413, exited = 143)
[Mon Mar 28 22:25:47 2022] Exiting.
[root@bigdata001 ~]#

8. 停止服务

[root@bigdata002 var_master]# ps -ef | grep druid

查看进程ID，Kill掉

9. 查看Druid控制台

Druid控制台由Druid Router进程启动，通过http://bigdata001:9081访问

如下所示：

Druid控制台

Apache Druid的介绍和0.22.1版本集群部署

[ 申请 ]友情链接：