Addax

Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Alternatives To Addax
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Tidb34,158681012 hours ago1,289April 07, 20224,005apache-2.0Go
TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial
Airbyte10,808
3 hours ago90June 23, 20224,654otherPython
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
Doris8,407
3 hours ago1,718apache-2.0Java
Apache Doris is an easy-to-use, high performance and unified analytics database.
Linq2db2,6331481183 hours ago123September 01, 2022402mitC#
Linq to database provider.
Awesome Business Intelligence1,747
2 months ago8mit
Actively curated list of awesome BI tools. PRs welcome!
Addax91024 hours ago7June 19, 202211apache-2.0Java
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Pgsync851
a month ago49July 04, 2022128lgpl-3.0Python
Postgres to Elasticsearch/OpenSearch sync
Data Engineering Wiki625
2 days ago1cc0-1.0CSS
The best place to learn data engineering. Built and maintained by the data engineering community.
Datacleaner49310710 months ago53June 20, 2017182lgpl-3.0Java
The premier open source Data Quality solution
Etlalchemy414
4 years ago15May 15, 201732mitPython
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Alternatives To Addax
Select To Compare


Alternative Project Comparisons
Readme

Addax Logo

Addax is a versatile open-source ETL tool

Documentation detailed description of how to install and deploy and how to use each collection plugin

release version Maven Package

English | 简体中文

The project, originally from Ali's DataX, has been streamlined and adapted, as described below

Supported Data Sources

Addax supports more than 20 SQL and NoSQL data sources. It can also be extended to support more.

Cassandra Clickhouse IMB DB2 dBase
Doris Elasticsearch Excel Greenplum
Apache HBase Hive InfluxDB Kafka
Kudu MinIO MongoDB MySQL
Oracle Phoenix PostgreSQL Presto
Redis Amazon S3 SQLite SQLServer
Starrocks Sybase TDengine Trino

Getting Started

Use docker image

docker pull wgzhao/addax:latest
docker run -ti --rm --name addax wgzhao/addax:latest /opt/addax/bin/addax.sh /opt/addax/job/job.json

If you want to use common reader and writer plugins, you can pull the image whose name ends with -lite, it's very small.

docker pull wgzhao/addax:4.0.12-lite
docker run -ti --rm --name addax wgzhao/addax:4.0.12-lite /opt/addax/bin/addax.sh /opt/addax/job/job.json

Use install script

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/wgzhao/Addax/master/install.sh)"

This script installs Addax to its preferred prefix (/usr/local for macOS Intel, /opt/addax for Apple Silicon and /opt/addax/ for Linux)

Compile and Package

git clone https://github.com/wgzhao/addax.git addax
cd addax
mvn clean package
mvn package assembly:single

After successful compilation and packaging, a addax-<version> folder will be created in the target/datax directory of the project directory, where <version indicates the version.

Begin your first task

The job subdirectory contains many sample jobs, of which job.json can be used as a smoke-out test and executed as follows

bin/addax.sh job/job.json

The output of the above command is roughly as follows.

Click to expand
$ bin/addax.sh job/job.json
  ___      _     _
 / _ \    | |   | |
/ /_\ \ __| | __| | __ ___  __
|  _  |/ _` |/ _` |/ _` \ \/ /
| | | | (_| | (_| | (_| |>  <
\_| |_/\__,_|\__,_|\__,_/_/\_\

:: Addax version ::    (v4.0.13-SNAPSHOT)

2023-05-14 11:43:38.040 [        main] INFO  VMInfo               - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-05-14 11:43:38.062 [        main] INFO  Engine               -
{
	"setting":{
		"speed":{
			"byte":-1,
			"channel":1,
			"record":-1
		}
	},
	"content":{
		"reader":{
			"name":"streamreader",
			"parameter":{
				"sliceRecordCount":10,
				"column":[
					{
						"value":"addax",
						"type":"string"
					},
					{
						"value":19890604,
						"type":"long"
					},
					{
						"value":"1989-06-04 11:22:33 123456",
						"type":"date",
						"dateFormat":"yyyy-MM-dd HH:mm:ss SSSSSS"
					},
					{
						"value":true,
						"type":"bool"
					},
					{
						"value":"test",
						"type":"bytes"
					}
				]
			}
		},
		"writer":{
			"name":"streamwriter",
			"parameter":{
				"print":true,
				"encoding":"UTF-8"
			}
		}
	}
}

2023-05-14 11:43:38.092 [        main] INFO  JobContainer         - The jobContainer begins to process the job.
2023-05-14 11:43:38.107 [       job-0] INFO  JobContainer         - The Reader.Job [streamreader] perform prepare work .
2023-05-14 11:43:38.107 [       job-0] INFO  JobContainer         - The Writer.Job [streamwriter] perform prepare work .
2023-05-14 11:43:38.108 [       job-0] INFO  JobContainer         - Job set Channel-Number to 1 channel(s).
2023-05-14 11:43:38.108 [       job-0] INFO  JobContainer         - The Reader.Job [streamreader] is divided into [1] task(s).
2023-05-14 11:43:38.108 [       job-0] INFO  JobContainer         - The Writer.Job [streamwriter] is divided into [1] task(s).
2023-05-14 11:43:38.130 [       job-0] INFO  JobContainer         - The Scheduler launches [1] taskGroup(s).
2023-05-14 11:43:38.138 [ taskGroup-0] INFO  TaskGroupContainer   - The taskGroupId=[0] started [1] channels for [1] tasks.
2023-05-14 11:43:38.141 [ taskGroup-0] INFO  Channel              - The Channel set byte_speed_limit to -1, No bps activated.
2023-05-14 11:43:38.141 [ taskGroup-0] INFO  Channel              - The Channel set record_speed_limit to -1, No tps activated.
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
2023-05-14 11:43:41.157 [       job-0] INFO  AbstractScheduler    - The scheduler has completed all tasks.
2023-05-14 11:43:41.158 [       job-0] INFO  JobContainer         - The Writer.Job [streamwriter] perform post work.
2023-05-14 11:43:41.159 [       job-0] INFO  JobContainer         - The Reader.Job [streamreader] perform post work.
2023-05-14 11:43:41.162 [       job-0] INFO  StandAloneJobContainerCommunicator - Total 10 records, 260 bytes | Speed 86B/s, 3 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-05-14 11:43:41.596 [       job-0] INFO  JobContainer         -
Job start  at             : 2023-05-14 11:43:38
Job end    at             : 2023-05-14 11:43:41
Job took secs             :                  3ss
Average   bps             :               86B/s
Average   rps             :              3rec/s
Number of rec             :                  10
Failed record             :                   0

Here and Here provides all kinds of job configuration examples

Runtime Requirements

  • JDK 1.8+
  • Python 2.7+ / Python 3.7+ (Windows)

Documentation

Code Style

We recommend you use IntelliJ as your IDE. The code style template for the project can be found in the codestyle repository along with our general programming and Java guidelines. In addition to those you should also adhere to the following:

  • Alphabetize sections in the documentation source files (both in table of contents files and other regular documentation files). In general, alphabetize methods/variables/sections if such ordering already exists in the surrounding code.
  • When appropriate, use the Java 8 stream API. However, note that the stream implementation does not perform well so avoid using it in inner loops or otherwise performance sensitive sections.
  • Categorize errors when throwing exceptions. For example, AddaxException takes an error code and error message as arguments, AddaxException(REQUIRE_VALUE, "lack of required item"). This categorization lets you generate reports, so you can monitor the frequency of various failures.
  • Ensure that all files have the appropriate license header; you can generate the license by running mvn license:format.
  • Consider using String formatting (printf style formatting using the Java Formatter class): format("Session property %s is invalid: %s", name, value) (note that format() should always be statically imported). Sometimes, if you only need to append something, consider using the + operator.
  • Avoid using the ternary operator except for trivial expressions.
  • Use an assertion from Airlift's Assertions class if there is one that covers your case rather than writing the assertion by hand. Over time, we may move over to more fluent assertions like AssertJ.
  • When writing a Git commit message, follow these guidelines.

License

This software is free to use under the Apache License Apache license.

Special Thanks

Special thanks to JetBrains for his supports to this project.

Popular Database Projects
Popular Etl Projects
Popular Data Storage Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Database
Mysql
Postgresql
Excel
Oracle
Hadoop
Sql Server
Influxdb
Hive
Etl
Hdfs
Clickhouse