- 1. TiFlash介绍
- 2. TiFlash的使用
- 2.1 按表构建TiFlash副本
- 2.2 查看表同步进度
- 2.3 使用TiDB读取TiFlash
- 2.3.1 智能选择
- 2.3.2 储存Engine隔离
- 2.3.3 手工提示
- 2.4 broadcast join
- 2.5 TiFlash不支持的功能和与TiDB不兼容的情况
同步的架构图如下:
- TiFlash主要包含存储引擎组件、tiflash proxy组件、pd buddy组件,其中tiflash proxy主要处理Multi-Raft协议通信的相关工作,pd buddy负责与PD协同工作,将TiKV数据按表同步到TiFlash
- TiFlash默认不同步数据,且同步粒度为表粒度
- 以region为单位进行数据复制和分列,跟随TiKV中的Leader副本同时进行分裂与合并,只要TiKV中数据不丢失,就可以随时恢复TiFlash的副本
- 一致性保证:TiFlash每次收到读取请求,TiFlash中的Region副本会向TiKV的Leader副本发起数据校对,保证读取请求时间戳之前的所有数据都被同步到TiFlash,才返回数据
- 在一个select查询中,可能同时使用TiKV和TiFlash,TiDB默认会根据select查询进行判断
默认不会同步数据到TiFlash,开启同步命令如下,副本数为0表示删除TiFlash副本
mysql>
mysql> use test_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql>
mysql> alter table test_tb set tiflash replica 2;
Query OK, 0 rows affected (0.55 sec)
mysql>
- 对于已经开启TiFlash同步的表test_tb,通过
create table test_tb2 like test_tb;
创建的表test_tb2,test_tb2也会开启TiFlash同步 - 不支持给系统表开启TiFlash同步
mysql>
mysql> select * from information_schema.tiflash_replica where table_schema = 'test_db' and table_name = 'test_tb2';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test_db | test_tb2 | 59 | 2 | | 1 | 1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.00 sec)
mysql>
- AVAILABLE表示该表的TiFlash副本是否可用。TiFlash副本状态为可用之后就不再改变,如果通过DDL命令修改TiFlash副本数则会重新计算AVAILABLE状态
- PROGRESS同步进度,在0.0~1.0之间,1代表至少1个副本已经完成同步
非只读SQL(如insert into … select …)只能从TiKV中读取数据
2.3.1 智能选择TiDB会自动根据select语句,判断是读取TiKV还是TiFlash,如果一个select语句有多个子查询,可能会同时读取TiKV和TiFlash
可以通过desc和explain analyze命令查看读取的TiKV还是TiFlash
mysql>
mysql> desc select count(*) from test_tb;
+----------------------------+---------+-------------------+---------------+---------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+---------+-------------------+---------------+---------------------------------+
| StreamAgg_25 | 1.00 | root | | funcs:count(Column#7)->Column#4 |
| └─TableReader_26 | 1.00 | root | | data:StreamAgg_9 |
| └─StreamAgg_9 | 1.00 | batchCop[tiflash] | | funcs:count(1)->Column#7 |
| └─TableFullScan_24 | 2.00 | batchCop[tiflash] | table:test_tb | keep order:false, stats:pseudo |
+----------------------------+---------+-------------------+---------------+---------------------------------+
4 rows in set (0.01 sec)
mysql>
mysql> explain analyze select count(*) from test_tb;
+----------------------------+---------+---------+-------------------+---------------+---------------------------------------------------------------------------------------------+---------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+----------------------------+---------+---------+-------------------+---------------+---------------------------------------------------------------------------------------------+---------------------------------+-----------+------+
| StreamAgg_25 | 1.00 | 1 | root | | time:81.7ms, loops:2 | funcs:count(Column#7)->Column#4 | 380 Bytes | N/A |
| └─TableReader_26 | 1.00 | 1 | root | | time:81.7ms, loops:2, cop_task: {num: 1, max: 0s, proc_keys: 0, copr_cache_hit_ratio: 0.00} | data:StreamAgg_9 | 79 Bytes | N/A |
| └─StreamAgg_9 | 1.00 | 1 | batchCop[tiflash] | | tiflash_task:{time:45.4ms, loops:1, threads:1} | funcs:count(1)->Column#7 | N/A | N/A |
| └─TableFullScan_24 | 2.00 | 2 | batchCop[tiflash] | table:test_tb | tiflash_task:{time:43.4ms, loops:1, threads:1} | keep order:false, stats:pseudo | N/A | N/A |
+----------------------------+---------+---------+-------------------+---------------+---------------------------------------------------------------------------------------------+---------------------------------+-----------+------+
4 rows in set (0.10 sec)
mysql>
- cop[tiflash]表示该任务会发送至TiFlash进行处理
方式一:修改配置文件方式,永久有效
添加以下内容到tidb_servers的各个节点(默认的配置如下):
config:
isolation-read.engines:
- tikv
- tidb
- tiflash
- tidb储存引擎为TiDB系统内部使用的内存表,必须指定,普通用户不能使用
修改步骤如下,需要重启服务,但不影响客户端的使用:
[root@tidb1 ~]#
[root@tidb1 ~]# tiup cluster edit-config tidb-cluster
Starting component `cluster`: /root/.tiup/components/cluster/v1.6.1/tiup-cluster edit-config tidb-cluster
......省略部分......
tidb_servers:
- host: tidb1
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /opt/tidb-v5.2.2-install/tidb-deploy/tidb-4000
log_dir: /opt/tidb-v5.2.2-install/tidb-deploy/tidb-4000/log
config:
isolation-read.engines:
- tikv
- tidb
arch: amd64
os: linux
- host: tidb2
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /opt/tidb-v5.2.2-install/tidb-deploy/tidb-4000
log_dir: /opt/tidb-v5.2.2-install/tidb-deploy/tidb-4000/log
config:
isolation-read.engines:
- tikv
- tidb
arch: amd64
os: linux
- host: tidb3
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /opt/tidb-v5.2.2-install/tidb-deploy/tidb-4000
log_dir: /opt/tidb-v5.2.2-install/tidb-deploy/tidb-4000/log
config:
isolation-read.engines:
- tikv
- tidb
arch: amd64
os: linux
- host: tidb4
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /opt/tidb-v5.2.2-install/tidb-deploy/tidb-4000
log_dir: /opt/tidb-v5.2.2-install/tidb-deploy/tidb-4000/log
config:
isolation-read.engines:
- tikv
- tidb
arch: amd64
os: linux
......省略部分......
Please check change highlight above, do you want to apply the change? [y/N]:(default=N) y
Applying changes...
Applied successfully, please use `tiup cluster reload tidb-cluster [-N ] [-R ]` to reload config.
[root@tidb1 ~]#
[root@tidb1 ~]# tiup cluster reload tidb-cluster;
Starting component `cluster`: /root/.tiup/components/cluster/v1.6.1/tiup-cluster reload tidb-cluster
Will reload the cluster tidb-cluster with restart policy is true, nodes: , roles: .
Do you want to continue? [y/N]:(default=N) y
+ [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/tidb-cluster/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/tidb-cluster/ssh/id_rsa.pub
......省略部分......
Reloaded cluster `tidb-cluster` successfully
[root@tidb1 ~]#
进行查询验证:
mysql>
mysql> desc select count(*) from test_tb;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id: 3
Current database: test_db
+----------------------------+---------+-----------+---------------+---------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+---------+-----------+---------------+---------------------------------+
| StreamAgg_16 | 1.00 | root | | funcs:count(Column#6)->Column#4 |
| └─TableReader_17 | 1.00 | root | | data:StreamAgg_8 |
| └─StreamAgg_8 | 1.00 | cop[tikv] | | funcs:count(1)->Column#6 |
| └─TableFullScan_15 | 2.00 | cop[tikv] | table:test_tb | keep order:false |
+----------------------------+---------+-----------+---------------+---------------------------------+
4 rows in set (0.00 sec)
mysql>
方式二:会话级别,只对当前会话有效
mysql>
mysql> set SESSION tidb_isolation_read_engines = "tidb,tikv,tiflash";
Query OK, 0 rows affected (0.00 sec)
mysql>
2.3.3 手工提示
强制某些表使用指定的储存引擎
mysql>
mysql> desc select /*+ read_from_storage(tikv[b,c],tiflash[d]) */ count(*) from test_tb a join test_tb b on a.id = b.id join test_tb c on b.id = c.id join test_tb d on c.idd = d.id;
+------------------------------------+---------+--------------+---------------+----------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+---------+--------------+---------------+----------------------------------------------------------------+
| StreamAgg_20 | 1.00 | root | | funcs:count(1)->Column#13 |
| └─HashJoin_53 | 2.00 | root | | inner join, equal:[eq(test_db.test_tb.id, test_db.test_tb.id)] |
| ├─TableReader_50(Build) | 2.00 | root | | data:Selection_49 |
| │ └─Selection_49 | 2.00 | cop[tiflash] | | not(isnull(test_db.test_tb.id)) |
| │ └─TableFullScan_48 | 2.00 | cop[tiflash] | table:d | keep order:false |
| └─HashJoin_31(Probe) | 2.00 | root | | inner join, equal:[eq(test_db.test_tb.id, test_db.test_tb.id)] |
| ├─TableReader_47(Build) | 2.00 | root | | data:Selection_46 |
| │ └─Selection_46 | 2.00 | cop[tikv] | | not(isnull(test_db.test_tb.id)) |
| │ └─TableFullScan_45 | 2.00 | cop[tikv] | table:c | keep order:false |
| └─HashJoin_34(Probe) | 2.00 | root | | inner join, equal:[eq(test_db.test_tb.id, test_db.test_tb.id)] |
| ├─TableReader_44(Build) | 2.00 | root | | data:Selection_43 |
| │ └─Selection_43 | 2.00 | cop[tikv] | | not(isnull(test_db.test_tb.id)) |
| │ └─TableFullScan_42 | 2.00 | cop[tikv] | table:b | keep order:false |
| └─TableReader_41(Probe) | 2.00 | root | | data:Selection_40 |
| └─Selection_40 | 2.00 | cop[tiflash] | | not(isnull(test_db.test_tb.id)) |
| └─TableFullScan_39 | 2.00 | cop[tiflash] | table:a | keep order:false |
+------------------------------------+---------+--------------+---------------+----------------------------------------------------------------+
16 rows in set (0.00 sec)
mysql>
- 如果Hint指定的表在指定的引擎上不存在副本,则Hint会被忽略,并产生warning
- 如果Hint中指定的引擎不在engine隔离列表中,Hint同样会被忽略,并产生warning
目前支持的物理算法有:broadcast hash join、shuffled hash join、shuffled hash aggregation、union all、topN、limit
tidb根据查询来判断,是否可以进行broadcast join。如果满足条件,则将一个小表的数据broadcast到其它服务器,减少大表的数据shuffle
TiFlash提供了两个global/session变量决定是否选择broadcast hash join,分别为:
- tidb_broadcast_join_threshold_size,单位为bytes。如果表大小(字节数)小于该值,则选择Broadcast Hash Join算法。否则选择Shuffled Hash Join算法
- tidb_broadcast_join_threshold_count,单位为行数。如果join的对象为子查询,优化器无法估计子查询结果集大小,则通过结果集行数判断。如果子查询的行数估计值小于该变量,则选择Broadcast Hash Join算法。否则选择Shuffled Hash Join算法
查看变量的值如下:
mysql>
mysql> show global variables like '%tidb_broadcast_join_threshold_size%';
+------------------------------------+-----------+
| Variable_name | Value |
+------------------------------------+-----------+
| tidb_broadcast_join_threshold_size | 104857600 |
+------------------------------------+-----------+
1 row in set (0.02 sec)
mysql>
mysql> show global variables like '%tidb_broadcast_join_threshold_count%';
+-------------------------------------+-------+
| Variable_name | Value |
+-------------------------------------+-------+
| tidb_broadcast_join_threshold_count | 10240 |
+-------------------------------------+-------+
1 row in set (0.02 sec)
mysql>
也可以通过explain来进行验证,如果结果中出现ExchangeSender和ExchangeReceiver算子,表明发生了broadcast hash join,如官网的例子所示:
mysql>
mysql> explain select count(*) from customer c join nation n on c.c_nationkey=n.n_nationkey;
+------------------------------------------+------------+-------------------+---------------+----------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------------+------------+-------------------+---------------+----------------------------------------------------------------------------+
| HashAgg_23 | 1.00 | root | | funcs:count(Column#16)->Column#15 |
| └─TableReader_25 | 1.00 | root | | data:ExchangeSender_24 |
| └─ExchangeSender_24 | 1.00 | batchCop[tiflash] | | ExchangeType: PassThrough |
| └─HashAgg_12 | 1.00 | batchCop[tiflash] | | funcs:count(1)->Column#16 |
| └─HashJoin_17 | 3000000.00 | batchCop[tiflash] | | inner join, equal:[eq(tpch.nation.n_nationkey, tpch.customer.c_nationkey)] |
| ├─ExchangeReceiver_21(Build) | 25.00 | batchCop[tiflash] | | |
| │ └─ExchangeSender_20 | 25.00 | batchCop[tiflash] | | ExchangeType: Broadcast |
| │ └─TableFullScan_18 | 25.00 | batchCop[tiflash] | table:n | keep order:false |
| └─TableFullScan_22(Probe) | 3000000.00 | batchCop[tiflash] | table:c | keep order:false |
+------------------------------------------+------------+-------------------+---------------+----------------------------------------------------------------------------+
9 rows in set (0.00 sec)
mysql>
2.5 TiFlash不支持的功能和与TiDB不兼容的情况
- 在TiFlash储存层的数值计算,不支持数据值溢出检查,如
9223372036854775807 + 9223372036854775807
会得到结果:-2 - 当进行Decimal类型数据相除,TiFlash按照编译时推断出来的数据类型进行计算,而TiDB采用精度高于编译时推断出来的数据类型进行计算。会导致在TiDB + TiKV上的执行结果会和TiDB + TiFlash上的执行结果不一样,示例如下:
mysql>
mysql> create table t (a decimal(3,0), b decimal(10, 0));
Query OK, 0 rows affected (0.07 sec)
mysql>
mysql> insert into t values (43, 1044774912);
Query OK, 1 row affected (0.03 sec)
mysql>
mysql> alter table t set tiflash replica 1;
Query OK, 0 rows affected (0.07 sec)
mysql>
mysql> set session tidb_isolation_read_engines='tikv';
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> select a/b, a/b + 0.0000000000001 from t where a/b;
+--------+-----------------------+
| a/b | a/b + 0.0000000000001 |
+--------+-----------------------+
| 0.0000 | 0.0000000410001 |
+--------+-----------------------+
1 row in set (0.00 sec)
mysql>
mysql> set session tidb_isolation_read_engines='tiflash';
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> select a/b, a/b + 0.0000000000001 from t where a/b;
Empty set (0.01 sec)
mysql>
- a/b在编译期推导出来的数据类型为Decimal(7,4),a/b返回的结果应该为 0.0000。在TiDB + TiFlash中原表中的数据被where a/b过滤掉了。但在TiDB + TiKV中,a/b运行期的精度比Decimal(7,4)高,所以原表中的数据没有被where a/b过滤掉