- 1. Apache Druid的介绍
- 2. 部署规划
- 3. 安装要求
- 4. 下载解压(bigdata001上)
- 5. 修改配置(bigdata001上)
- 5.1 Druid通用配置
- 5.2 修改Master服务配置
- 5.3 修改Data服务配置
- 5.4 修改Query服务
- 6. 分发Druid目录(bigdata001上)
- 7. 启动(在所有服务器上)
- 7.1 一台服务器,启动多个服务设置
- 7.2 启动master服务
- 7.3 启动data服务
- 7.4 启动query服务
- 8. 停止服务
- 9. 查看Druid控制台
Apache Druid适合面向事件类型的数据
Druid的优点如下:
- 列式储存
- 实时或批量插入数据:实时插入的数据可立即用于查询
- 自修复、自平衡:伸缩集群只需添加或删除服务,集群就会在后台自动对自身进行重新平衡。进行配置修改不需要对集群进行停机
- 不会丢失数据的云原生容错架构:数据储存在HDFS上,计算与储存进行解耦
- 用于快速过滤的索引:Druid使用CONCISE或Roaring压缩的位图索引来创建索引,以支持快速过滤和跨多列搜索
- 基于时间的分区:Druid首先按时间对数据进行分区,另外同时可以根据其他字段进行分区
- 近似算法:Druid应用了近似count-distinct、近似排序、近似直方图和分位数计算的算法。Druid还提供了精确count-distinct和精确排序
- 摄取时自动汇总聚合:Druid支持在数据插入时可选地进行数据汇总
Druid的适用场景
- 数据插入频率比较高,但较少更新数据
- 数据具有时间属性,Druid针对时间做了优化和设计
- 在多表场景下,每次查询仅命中一个大的分布式表,查询又可能命中多个较小的lookup表
Druid的不适用场景
- 根据主键对数据进行低延迟更新。Druid只支持批量更新,不支持流式更新
- 将一个大事实表和另一个大事实表进行连接
最基本的安装要求是1个Master服务器、2个Data服务器(用于容错)、1个Query服务器
所有我们在bigdata001、bigdata002、bigdata003三台服务器上,都部署Master(Coordinator和Overlord)、Data(Historical和MiddleManager)、Query(Broker和Router)服务
各服务的作用:
- Coordinator进程和Overlord进程负责处理集群的元数据和协调需求
- Historical和MiddleManager处理集群中的实际数据
- Broker服务接收查询请求,并将其转发到集群中的其他部分
- 在3台服务器上安装Java8,参考centos7同时安装java8和openJdk11、windows同时安装java8和openJdk11
[root@bigdata001 opt]# wget --no-check-certificate https://dlcdn.apache.org/druid/0.22.1/apache-druid-0.22.1-bin.tar.gz
[root@bigdata001 opt]#
[root@bigdata001 opt]# tar -zxvf apache-druid-0.22.1-bin.tar.gz
[root@bigdata001 opt]#
5. 修改配置(bigdata001上)
配置文件位于apache-druid-0.22.1/conf/druid/cluster目录下
异常1:如果报如下错误
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000400000000, 16106127360, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 16106127360 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/apache-druid-0.22.1/hs_err_pid9178.log
则修改cluster/master/druid_service/jvm.config的如下内容,调整JVM的使用内存
-Xms1g
-Xmx1g
-XX:MaxDirectMemorySize=1536m
5.1 Druid通用配置
- 修改cluster/_common/common.runtime.properties
将derby元数据储存 + 本地文件储存 + 本地文件Indexing service logs进行注释
#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527
#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs
修改为mysql元数据储存 + HDFS文件储存 + HDFS文件Indexing service logs + 远程Zookeeper服务
druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches", "mysql-metadata-storage"]
druid.host=bigdata001
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://bigdata005:3306/druid
druid.metadata.storage.connector.user=root
druid.metadata.storage.connector.password=Root_123
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
druid.zk.service.host=bigdata001:2181,bigdata002:2181,bigdata003:2181
如果需要从Hadoop加载数据,则需要添加如下配置
# 当从Hadoop加载数据时,一个用于临时储存数据的HDFS目录
druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing
- 将Hadoop的配置文件拷贝到Druid的配置目录下
[root@bigdata001 _common]# pwd
/opt/apache-druid-0.22.1/conf/druid/cluster/_common
[root@bigdata001 _common]#
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/core-site.xml .
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/hdfs-site.xml .
您在 /var/spool/mail/root 中有邮件
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/yarn-site.xml .
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/mapred-site.xml .
[root@bigdata001 _common]#
- 添加mysql-connector-java-8.0.25.jar包到apache-druid-0.22.1/extensions/mysql-metadata-storage目录下。并在mysql中创建druid数据库
修改cluster/master/coordinator-overlord/runtime.properties
修改内容如下:
druid.plaintextPort=9081
5.3 修改Data服务配置
- 修改cluster/data/historical/runtime.properties
修改内容如下:
druid.processing.buffer.sizeBytes=256MiB
druid.processing.numMergeBuffers=1
druid.processing.numThreads=3
- 修改cluster/data/middleManager/runtime.properties 修改内容如下:
druid.worker.capacity=1
druid.indexer.fork.property.druid.processing.numMergeBuffers=1
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB
druid.indexer.fork.property.druid.processing.numThreads=1
5.4 修改Query服务
修改cluster/query/broker/runtime.properties
修改内容如下:
druid.processing.buffer.sizeBytes=100MiB
6. 分发Druid目录(bigdata001上)
将apache-druid-0.22.1目录,从bidata001复制到bigdata002、bigdata003上
[root@bigdata001 opt]# scp -r apache-druid-0.22.1 root@bigdata002:/opt
[root@bigdata001 opt]#
修改bigdata002和bigdata003服务器cluster/_common/common.runtime.properties文件的druid.host
7. 启动(在所有服务器上) 7.1 一台服务器,启动多个服务设置当执行启动脚本的时候,会报如下错误
Cannot lock svdir, maybe another 'supervise' is running: /opt/apache-druid-0.22.1/var/sv
因为sv目录被另一个服务占用了,所以需要修改每个启动服务的使用目录。添加-d "/opt/apache-druid-0.22.1/var_xxxxxx"到3个启动脚本中
[root@bigdata001 bin]#
[root@bigdata001 bin]# cat start-cluster-master-no-zk-server
#!/bin/bash -eu
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"
exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/master-no-zk.conf" -d "/opt/apache-druid-0.22.1/var_master"
[root@bigdata001 bin]#
[root@bigdata001 bin]# cat start-cluster-data-server
#!/bin/bash -eu
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"
exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/data.conf" -d "/opt/apache-druid-0.22.1/var_data"
[root@bigdata001 bin]#
[root@bigdata001 bin]# cat start-cluster-query-server
#!/bin/bash -eu
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"
exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/query.conf" -d "/opt/apache-druid-0.22.1/var_query"
[root@bigdata001 bin]#
创建Druid的临时目录
[root@bigdata002 apache-druid-0.22.1]#
[root@bigdata002 apache-druid-0.22.1]# mkdir var/tmp
[root@bigdata002 apache-druid-0.22.1]#
7.2 启动master服务
[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-master-no-zk-server > /opt/apache-druid-0.22.1/bin/start-cluster-master-no-zk-server.log 2>&1 &
[1] 12085
[root@bigdata001 ~]#
master服务的log文件位于var_master/sv文件中
7.3 启动data服务[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-data-server > /opt/apache-druid-0.22.1/bin/start-cluster-data-server.log 2>&1 &
[2] 17919
[root@bigdata001 ~]#
master服务的log文件位于var_data/sv文件中
7.4 启动query服务[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-query-server > /opt/apache-druid-0.22.1/bin/start-cluster-query-server.log 2>&1 &
[3] 20487
[root@bigdata001 ~]#
[root@bigdata001 ~]# exit
登出
Connection closing...Socket close.
Connection closed by foreign host.
Disconnected from remote host(bigdata001) at 22:40:24.
Type `help' to learn how to use Xshell prompt.
[C:\~]$
- query服务的log文件位于var_query/sv文件中
- 这里使用nohup启动,需要执行exit退出客户端。不然客户端以非exit方式断开,再次连接客户端时,会报如下错误,该节点所有Druid服务都会自动stop
[root@bigdata001 ~]# cat /opt/apache-druid-0.22.1/bin/start-cluster-query-server.log
nohup: 忽略输入
[Mon Mar 28 19:39:45 2022] Running command[broker], logging to[/opt/apache-druid-0.22.1/var_query/sv/broker.log]: bin/run-druid broker conf/druid/cluster/query
[Mon Mar 28 19:39:45 2022] Running command[router], logging to[/opt/apache-druid-0.22.1/var_query/sv/router.log]: bin/run-druid router conf/druid/cluster/query
[Mon Mar 28 22:25:47 2022] Sending signal[15] to command[broker] (timeout 360s).
[Mon Mar 28 22:25:47 2022] Sending signal[15] to command[router] (timeout 360s).
[Mon Mar 28 22:25:47 2022] Command[router] exited (pid = 31414, exited = 143)
[Mon Mar 28 22:25:47 2022] Command[broker] exited (pid = 31413, exited = 143)
[Mon Mar 28 22:25:47 2022] Exiting.
[root@bigdata001 ~]#
8. 停止服务
[root@bigdata002 var_master]# ps -ef | grep druid
查看进程ID,Kill掉
9. 查看Druid控制台Druid控制台由Druid Router进程启动,通过http://bigdata001:9081访问
如下所示: