您当前的位置: 首页 >  ui

Bulut0907

暂无认证

  • 1浏览

    0关注

    346博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

Apache Druid的介绍和0.22.1版本集群部署

Bulut0907 发布时间:2022-06-20 09:18:02 ,浏览量:1

目录
  • 1. Apache Druid的介绍
  • 2. 部署规划
  • 3. 安装要求
  • 4. 下载解压(bigdata001上)
  • 5. 修改配置(bigdata001上)
    • 5.1 Druid通用配置
  • 5.2 修改Master服务配置
    • 5.3 修改Data服务配置
    • 5.4 修改Query服务
  • 6. 分发Druid目录(bigdata001上)
  • 7. 启动(在所有服务器上)
    • 7.1 一台服务器,启动多个服务设置
    • 7.2 启动master服务
    • 7.3 启动data服务
    • 7.4 启动query服务
  • 8. 停止服务
  • 9. 查看Druid控制台

1. Apache Druid的介绍

Apache Druid适合面向事件类型的数据

Druid的优点如下:

  1. 列式储存
  2. 实时或批量插入数据:实时插入的数据可立即用于查询
  3. 自修复、自平衡:伸缩集群只需添加或删除服务,集群就会在后台自动对自身进行重新平衡。进行配置修改不需要对集群进行停机
  4. 不会丢失数据的云原生容错架构:数据储存在HDFS上,计算与储存进行解耦
  5. 用于快速过滤的索引:Druid使用CONCISE或Roaring压缩的位图索引来创建索引,以支持快速过滤和跨多列搜索
  6. 基于时间的分区:Druid首先按时间对数据进行分区,另外同时可以根据其他字段进行分区
  7. 近似算法:Druid应用了近似count-distinct、近似排序、近似直方图和分位数计算的算法。Druid还提供了精确count-distinct和精确排序
  8. 摄取时自动汇总聚合:Druid支持在数据插入时可选地进行数据汇总

Druid的适用场景

  1. 数据插入频率比较高,但较少更新数据
  2. 数据具有时间属性,Druid针对时间做了优化和设计
  3. 在多表场景下,每次查询仅命中一个大的分布式表,查询又可能命中多个较小的lookup表

Druid的不适用场景

  1. 根据主键对数据进行低延迟更新。Druid只支持批量更新,不支持流式更新
  2. 将一个大事实表和另一个大事实表进行连接
2. 部署规划

最基本的安装要求是1个Master服务器、2个Data服务器(用于容错)、1个Query服务器

所有我们在bigdata001、bigdata002、bigdata003三台服务器上,都部署Master(Coordinator和Overlord)、Data(Historical和MiddleManager)、Query(Broker和Router)服务

各服务的作用:

  • Coordinator进程和Overlord进程负责处理集群的元数据和协调需求
  • Historical和MiddleManager处理集群中的实际数据
  • Broker服务接收查询请求,并将其转发到集群中的其他部分
3. 安装要求
  • 在3台服务器上安装Java8,参考centos7同时安装java8和openJdk11、windows同时安装java8和openJdk11
4. 下载解压(bigdata001上)
[root@bigdata001 opt]# wget --no-check-certificate https://dlcdn.apache.org/druid/0.22.1/apache-druid-0.22.1-bin.tar.gz
[root@bigdata001 opt]#
[root@bigdata001 opt]# tar -zxvf apache-druid-0.22.1-bin.tar.gz
[root@bigdata001 opt]#
5. 修改配置(bigdata001上)

配置文件位于apache-druid-0.22.1/conf/druid/cluster目录下

异常1:如果报如下错误

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000400000000, 16106127360, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 16106127360 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/apache-druid-0.22.1/hs_err_pid9178.log

则修改cluster/master/druid_service/jvm.config的如下内容,调整JVM的使用内存

-Xms1g
-Xmx1g
-XX:MaxDirectMemorySize=1536m
5.1 Druid通用配置
  1. 修改cluster/_common/common.runtime.properties

将derby元数据储存 + 本地文件储存 + 本地文件Indexing service logs进行注释

#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527

#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments

#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs

修改为mysql元数据储存 + HDFS文件储存 + HDFS文件Indexing service logs + 远程Zookeeper服务

druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches", "mysql-metadata-storage"]

druid.host=bigdata001

druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://bigdata005:3306/druid
druid.metadata.storage.connector.user=root
druid.metadata.storage.connector.password=Root_123

druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments

druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs

druid.zk.service.host=bigdata001:2181,bigdata002:2181,bigdata003:2181

如果需要从Hadoop加载数据,则需要添加如下配置

# 当从Hadoop加载数据时,一个用于临时储存数据的HDFS目录
druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing
  1. 将Hadoop的配置文件拷贝到Druid的配置目录下
[root@bigdata001 _common]# pwd
/opt/apache-druid-0.22.1/conf/druid/cluster/_common
[root@bigdata001 _common]# 
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/core-site.xml .
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/hdfs-site.xml .
您在 /var/spool/mail/root 中有邮件
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/yarn-site.xml .
[root@bigdata001 _common]# cp /opt/hadoop-3.3.1/etc/hadoop/mapred-site.xml .
[root@bigdata001 _common]# 
  1. 添加mysql-connector-java-8.0.25.jar包到apache-druid-0.22.1/extensions/mysql-metadata-storage目录下。并在mysql中创建druid数据库
5.2 修改Master服务配置

修改cluster/master/coordinator-overlord/runtime.properties

修改内容如下:

druid.plaintextPort=9081
5.3 修改Data服务配置
  1. 修改cluster/data/historical/runtime.properties

修改内容如下:

druid.processing.buffer.sizeBytes=256MiB
druid.processing.numMergeBuffers=1
druid.processing.numThreads=3
  1. 修改cluster/data/middleManager/runtime.properties 修改内容如下:
druid.worker.capacity=1

druid.indexer.fork.property.druid.processing.numMergeBuffers=1
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB
druid.indexer.fork.property.druid.processing.numThreads=1
5.4 修改Query服务

修改cluster/query/broker/runtime.properties

修改内容如下:

druid.processing.buffer.sizeBytes=100MiB
6. 分发Druid目录(bigdata001上)

将apache-druid-0.22.1目录,从bidata001复制到bigdata002、bigdata003上

[root@bigdata001 opt]# scp -r apache-druid-0.22.1 root@bigdata002:/opt
[root@bigdata001 opt]#

修改bigdata002和bigdata003服务器cluster/_common/common.runtime.properties文件的druid.host

7. 启动(在所有服务器上) 7.1 一台服务器,启动多个服务设置

当执行启动脚本的时候,会报如下错误

Cannot lock svdir, maybe another 'supervise' is running: /opt/apache-druid-0.22.1/var/sv

因为sv目录被另一个服务占用了,所以需要修改每个启动服务的使用目录。添加-d "/opt/apache-druid-0.22.1/var_xxxxxx"到3个启动脚本中


[root@bigdata001 bin]# 
[root@bigdata001 bin]# cat start-cluster-master-no-zk-server
#!/bin/bash -eu

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"

exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/master-no-zk.conf" -d "/opt/apache-druid-0.22.1/var_master"
[root@bigdata001 bin]# 
[root@bigdata001 bin]# cat start-cluster-data-server 
#!/bin/bash -eu

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"

exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/data.conf" -d "/opt/apache-druid-0.22.1/var_data"
[root@bigdata001 bin]# 
[root@bigdata001 bin]# cat start-cluster-query-server 
#!/bin/bash -eu

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

PWD="$(pwd)"
WHEREAMI="$(dirname "$0")"
WHEREAMI="$(cd "$WHEREAMI" && pwd)"

exec "$WHEREAMI/supervise" -c "$WHEREAMI/../conf/supervise/cluster/query.conf" -d "/opt/apache-druid-0.22.1/var_query"
[root@bigdata001 bin]#

创建Druid的临时目录

[root@bigdata002 apache-druid-0.22.1]# 
[root@bigdata002 apache-druid-0.22.1]# mkdir var/tmp
[root@bigdata002 apache-druid-0.22.1]#
7.2 启动master服务
[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-master-no-zk-server > /opt/apache-druid-0.22.1/bin/start-cluster-master-no-zk-server.log 2>&1 &
[1] 12085
[root@bigdata001 ~]# 

master服务的log文件位于var_master/sv文件中

7.3 启动data服务
[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-data-server > /opt/apache-druid-0.22.1/bin/start-cluster-data-server.log 2>&1 &
[2] 17919
[root@bigdata001 ~]# 

master服务的log文件位于var_data/sv文件中

7.4 启动query服务
[root@bigdata001 ~]# nohup sh /opt/apache-druid-0.22.1/bin/start-cluster-query-server > /opt/apache-druid-0.22.1/bin/start-cluster-query-server.log 2>&1 &
[3] 20487
[root@bigdata001 ~]# 
[root@bigdata001 ~]# exit
登出
Connection closing...Socket close.

Connection closed by foreign host.

Disconnected from remote host(bigdata001) at 22:40:24.

Type `help' to learn how to use Xshell prompt.
[C:\~]$
  • query服务的log文件位于var_query/sv文件中
  • 这里使用nohup启动,需要执行exit退出客户端。不然客户端以非exit方式断开,再次连接客户端时,会报如下错误,该节点所有Druid服务都会自动stop
[root@bigdata001 ~]# cat /opt/apache-druid-0.22.1/bin/start-cluster-query-server.log 
nohup: 忽略输入
[Mon Mar 28 19:39:45 2022] Running command[broker], logging to[/opt/apache-druid-0.22.1/var_query/sv/broker.log]: bin/run-druid broker conf/druid/cluster/query
[Mon Mar 28 19:39:45 2022] Running command[router], logging to[/opt/apache-druid-0.22.1/var_query/sv/router.log]: bin/run-druid router conf/druid/cluster/query
[Mon Mar 28 22:25:47 2022] Sending signal[15] to command[broker] (timeout 360s).
[Mon Mar 28 22:25:47 2022] Sending signal[15] to command[router] (timeout 360s).
[Mon Mar 28 22:25:47 2022] Command[router] exited (pid = 31414, exited = 143)
[Mon Mar 28 22:25:47 2022] Command[broker] exited (pid = 31413, exited = 143)
[Mon Mar 28 22:25:47 2022] Exiting.
[root@bigdata001 ~]#
8. 停止服务
[root@bigdata002 var_master]# ps -ef | grep druid

查看进程ID,Kill掉

9. 查看Druid控制台

Druid控制台由Druid Router进程启动,通过http://bigdata001:9081访问

如下所示:

Druid控制台

关注
打赏
1664501120
查看更多评论
立即登录/注册

微信扫码登录

0.0495s