您当前的位置: 首页 >  sql

衣舞晨风

暂无认证

  • 2浏览

    0关注

    1156博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

Databus for MySQL

衣舞晨风 发布时间:2018-12-31 16:32:44 ,浏览量:2

Introduction

A frequently asked question on the Databus open source mailing list is about the possibility of capturing changes in MySQL through Databus. Below, we describe a proof-of-concept implementation of a Databus fetcher that taps into MySQL internal replication stream, the MySQL binary log (the binlog, for short).

At a high-level, the Databus MySQL adapter connects as a replication slave to the MySQL database. It converts the replication events to Databus internal event format and stores those events in the relay’s memory buffer. For parsing the incoming binlog replication stream, the fetcher uses an open-source Java library called “OpenReplicator”. The library is available here.

Quick Start
  • You need a MySQL instance running with binlog replication enabled. Please refer to How to Set Up Replication guide on how to enable the binlog replication for the MySQL instance. That MySQL instance will act as a master for the Databus relay. If your mysql version has binlog_checksum feature, it has to be disabled for the change-capture to work.
  • Compile the example

快速开始的步骤,参见原文:Databus for MySQL

Notes on the implementation

The current implementation although fully functional should be considered a proof-of-concept. Its main goal is to demonstrate how to write a MySQL fetcher for Databus. The basic functionality of the fetcher has been tested but has not been deployed in our production environment. At LinkedIn, we run a slightly different implementation which relies on a custom-patched MySQL.

A quick question arises in the implementation of a MySQL adapter - how do we define the logical clock for the event sequence ?. The event sequence number (System Change Number or SCN for short) is used as a timeline for ordering events from various transactions occurring at the database. Please refer to Section 3.2 in our paper titled “All Aboard the Databus”. It is also used in the consumers’ checkpoints to determine where they are in the change stream.

There are several possible approaches for defining the logical clock.

The MySQL binlog offset

This is the simplest approach and it is the one used by this implementation. Each replication slave keeps track of its position in the replication stream through a pair of values: the current binlog file and the offset within that file. The binlog files share common prefix and have an increasing index number for their suffix. Therefore, the position can be uniquely identified by the tuple (file number, file offset). Both the file index and offset increase monotonically and thus the SCNs will be exactly in the commit order.

In the current implementation, SCN is represented as a long (64 bits). Of the 64 bits, the high-order 32 bits are used to represent the binlog file number, and the low-order 32 bits are used to represent the binlog file offset. Therefore, an event starting in binary log file mysql-bin.000001 at binlog offset 4 is represented as (1 {auto_increment}} column in a dedicated table. As in the application case, this will create ordering that may differ from the commit order. Further, depending on the write load, there may be contention in updating the sequence.

generation id. Relays can use the shared map to convert the timestamp and GTID of a binlog event to a 64-bit SCN consisting of a generation id and sequence number.

If using a MySQL 5.6 and GTIDs is not possible, yet another approach is to have relays generate independent SCNs. Relays can map (binlog file number, binlog file offset) pairs from different MySQL replicas to the same SCN timeline. Even though different MySQL replicas have different binlog offsets, they still follow the same timeline determined with the master. The state of a relay can be described by (binlog file number FN, binlog file offset FO, sequence number SCN). This specifies relay the last consumed binlog event has binlog coordinates (FN, FO) that has the given SCN. Every time a relay reads a new transaction from the binlog, it updates (FN, FO) and increases SCN by 1. Since all replicas consume the same update timeline, the changes to SCN are deterministic.

Let’s say that two MySQL replicas M1 and M2 are at the same point in applying the master changes. Let’s say that M1 has its binlog stream at (10, 1000) and M2 has its binlog stream at (50, 4). If a relay R1 is to capture changes from M1, we can bootstrap it with the state (FN=10, FO=1000, SCN=100). Similarly, if a relay R2 is to capture changes from M2, we can bootstrap it with the state (FN=50, FO=4, SCN=100). 100 transactions later, R1 state will change to something like (FN=10, FO=11000, SCN=200) and R2 state will change to (FN=50, FO=10004, SCN=200).

Note that with this approach, divergence between replicas (say, a replica missed a transaction), will lead to divergence of SCNs across different relays consuming from those replicas.

Comparison of SCN generation approaches SCN generation approachEase of implementationCommit orderNeeds MySQL modificationSupport for MySQL internal replicationSupport for MySQL mastership transfersSusceptible to master/slave divergenceBinlog offsetEasyYesNoNoNoYesApplication-generatedMediumNoNoYesYesNoMaster-generated sequenceComplexYesYesYes (with modification to ship SCN)YesNoRelay-generated using GTIDsComplexYesNoYesYesNoRelay-generated independentComplexYesNoYesYesYes

原文地址:https://github.com/linkedin/databus/wiki/Databus-for-MySQL

个人微信公众号: 这里写图片描述

作者:jiankunking 出处:http://blog.csdn.net/jiankunking

关注
打赏
1647422595
查看更多评论
立即登录/注册

微信扫码登录

0.0452s