Replicates data changes from MySQL binlog to HBase or Kafka. In case of HBase, preserves the previous data versions. HBase storage is intended for auditing and analysis of historical data. In addition, special daily-changes tables can be maintained in HBase, which are convenient for fast and cheap imports from HBase to Hive. Replication to Kafka is intended for easy real-time access to a stream of data changes.
This readme file provides some basic documentation on how to get started. For more details, refer to official documentation at mysql-time-machine.
Replicator assumes that there is a preinstalled environment in which it can run. This environment consists of:
- MySQL Instance
- Zookeeper Instance
- Graphite Instance
- Target Store Instance (Kafka, HBase, or none in case of STDOUT)
Easiest way to test drive the replicator is to use docker to locally create this needed environment. In addition to docker you will need docker-compose installed locally.
git clone https://github.com/mysql-time-machine/docker.git
cd docker/docker-compose/replicator_kafka
Start all containers (mysql, kafka, graphite, replicator, zookeeper)
./run_all
Now, in another terminal, you can connect to the replicator container
./attach_to_replicator
cd /replicator
This folder contains the replicator jar, the replicator configuration file, log configuration and some utility scripts. Now we can insert some random data in mysql:
./random_mysql_ops
...
('TwIPn','4216871','313785','NIrnXGEpqJI gGDstvhs'),
('AwqgI','4831311','930233','IHwkTOuEnOqGdEWNzJtq'),
('WIJCB','1516599','487420','rPnOHfZlIvEEvFFEIGiW'),
...
This data has been inserted in pre-created database 'test' in precreated table 'sometable'. The provided mysql instance is configured to use RBR and binlogs are active.
mysql --host=mysql --user=root --pass=mysqlPass
mysql> use test;
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| sometable |
+----------------+
1 row in set (0.00 sec)
Now we can replicate the binlog content to Kafka.
./run_kafka
And read the data from Kafka
./read_kafka
In this example we have writen rows to mysql, then replicated the binlogs to kafka and then red from Kafka sequentially. However, these processes can be run in parallel as the real life setup would work.
As the replication is running, you can observe the replication statisticts at graphite dashboard: http://localhost/dashboard/
Bosko Devetak [email protected]
Carlos Tasada [ctasada]
Dmitrii Tcyganov [dtcyganov]
Evgeny Dmitriev [dmitrieveu]
Greg Franklin [gregf1]
Islam Hassan [ishassan]
Mikhail Dutikov [mikhaildutikov]
Muhammad Abbady [muhammad-abbady]
Philippe Bruhat (BooK) [book]
Pavel Salimov [chcat]
Pedro Silva [pedros]
Raynald Chung [raynald]
Rares Mirica [mrares]
Replicator was originally developed for Booking.com. With approval from Booking.com, the code and specification were generalized and published as Open Source on github, for which the author would like to express his gratitude.
Copyright (C) 2015, 2016, 2017, 2018 by Author and Contributors
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.