A low latency, multi-tenant Change Data Capture(CDC) pipeline to continuously replicate data from OLTP(MySQL) to OLAP(NoSQL) systems with no impact to the source.
This project demonstrate how to build dataflow pipeline to move data from O]operational databases(MySQL, Oracle) to analytics databases(Hadoop, MongoDB, MarkLogic) in real-time using Change Data Capture(CDC), Kafka and tools like Apache NiFi, Kafka Streams or Spark to process and ingest data into Hadoop.
at least once
vs. exactly once
)Install source MySQL database and configure it with row based replication as per instructions.
Follow the instructions
cd cdc/maxwell
# curl -L -0 https://github.com/zendesk/maxwell/releases/download/v1.0.0/maxwell-1.1.2.tar.gz | tar --strip-components=1 -zx -C .
curl -L -0 https://github.com/xmlking/maxwell/releases/download/1.1.2.1/maxwell-1.1.2.1-kafka-connect.tar.gz | tar --strip-components=1 -zx -C .
cd cdc/maxwell
Run with stdout producer (for testing only)
bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' --producer=stdout
Run with kafka producer
bin/maxwell
If all goes well you'll see maxwell replaying your inserts:
mysql -u root -p
mysql> CREATE TABLE test.shop
(
id BIGINT(20) NOT NULL AUTO_INCREMENT,
version BIGINT(20) NOT NULL,
name VARCHAR(255) NOT NULL,
owner VARCHAR(255) NOT NULL,
phone_number VARCHAR(255) NOT NULL,
primary key (id, name)
);
mysql> INSERT INTO test.shop (version, name, owner, phone_number) values (0, 'aaa', 'bbb', '3331114444');
Query OK, 1 row affected (0.02 sec)
(maxwell)
{"database":"test","table":"shop","pk.id":4,"pk.name":"aaa"}
{"database":"test","table":"shop","type":"insert","ts":1458510224,"xid":33531,"commit":true,"data":{"owner":"bbb","name":"aaa","phone_number":"3331114444","id":4,"version":0}}
You can also use testApp to generate load.