Datax Bin

DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase 等各种异构数据源之间高效的数据同步功能。
Alternatives To Datax Bin
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Grafana55,51825246 hours ago3,297September 23, 20223,593agpl-3.0TypeScript
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
Gogs42,312
4 days ago84August 02, 2022850mitGo
Gogs is a painless self-hosted Git service
Nocodb36,19632 hours ago110September 06, 2022453agpl-3.0TypeScript
🔥 🔥 🔥 Open Source Airtable Alternative
Metabase32,546
2 days ago1June 08, 20222,999otherClojure
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
Dbeaver32,157
2 days ago1,756apache-2.0Java
Free universal database tool and SQL client
Prisma31,6214428 hours ago4,993September 24, 20222,898apache-2.0TypeScript
Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB
Typeorm31,3331,9942,1647 hours ago650September 20, 20221,959mitTypeScript
ORM for TypeScript and JavaScript. Supports MySQL, PostgreSQL, MariaDB, SQLite, MS SQL Server, Oracle, SAP Hana, WebSQL databases. Works in NodeJS, Browser, Ionic, Cordova and Electron platforms.
Graphql Engine29,81012 days ago17June 22, 20222,132apache-2.0TypeScript
Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
Redash23,207
2 days ago2May 05, 2020786bsd-2-clausePython
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Directus21,693502 days ago55September 22, 2022220otherTypeScript
The Modern Data Stack 🐰 — Directus is an instant REST+GraphQL API and intuitive no-code data collaboration app for any SQL database.
Alternatives To Datax Bin
Select To Compare


Alternative Project Comparisons
Readme

System Requirements

Quick Start

  • 工具部署

    • 方法一、直接下载Datax-bin工具包:DataX-bin

      下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

      $ cd  {YOUR_DATAX_HOME}/bin
      $ python datax.py {YOUR_JOB.json}
      
    • 方法二、下载DataX-src源码,自己编译:DataX-src源码

      (1)、下载DataX源码:

      $ git clone [email protected]:Arvin-Mark/DataX-src.git
      

      (2)、通过maven打包:

      $ cd  {DataX_source_code_home}
      $ mvn -U clean package assembly:assembly -Dmaven.test.skip=true
      

      打包成功,日志显示如下:

      [INFO] BUILD SUCCESS
      [INFO] -----------------------------------------------------------------
      [INFO] Total time: 08:12 min
      [INFO] Finished at: 2015-12-13T16:26:48+08:00
      [INFO] Final Memory: 133M/960M
      [INFO] -----------------------------------------------------------------
      

      打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax-bin/ ,结构如下:

      $ cd  {DataX_source_code_home}
      $ ls ./target/datax/datax-bin/
      bin		conf		job		lib		log		log_perf	plugin		script		tmp
      
  • 配置示例:从stream读取数据并打印到控制台

    • 第一步、创建创业的配置文件(json格式)

      可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

      例如:python datax.py -r streamreader -w streamwriter

      $ cd  {YOUR_DATAX_HOME}/bin
      $  python datax.py -r streamreader -w streamwriter
      DataX (UNKNOWN_DATAX_VERSION), From Alibaba !
      Copyright (C) 2010-2015, Alibaba Group. All Rights Reserved.
      Please refer to the streamreader document:
          * [streamreader.md](https://github.com/Arvin-Mark/DataX-src/blob/master/streamreader/doc/streamreader.md)
      
      Please refer to the streamwriter document:
          * [streamwriter.md](https://github.com/Arvin-Mark/DataX-src/blob/master/streamwriter/doc/streamwriter.md)
      
      Please save the following configuration as a json file and  use
           python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
      to run the job.
      
      {
          "job": {
              "content": [
                  {
                      "reader": {
                          "name": "streamreader",
                          "parameter": {
                              "column": [],
                              "sliceRecordCount": ""
                          }
                      },
                      "writer": {
                          "name": "streamwriter",
                          "parameter": {
                              "encoding": "",
                              "print": true
                          }
                      }
                  }
              ],
              "setting": {
                  "speed": {
                      "channel": ""
                  }
              }
          }
      }
      

      根据模板配置json如下:

      #stream2stream.json
      {
        "job": {
          "content": [
            {
              "reader": {
                "name": "streamreader",
                "parameter": {
                  "sliceRecordCount": 10,
                  "column": [
                    {
                      "type": "long",
                      "value": "10"
                    },
                    {
                      "type": "string",
                      "value": "hello,你好,世界-DataX"
                    }
                  ]
                }
              },
              "writer": {
                "name": "streamwriter",
                "parameter": {
                  "encoding": "UTF-8",
                  "print": true
                }
              }
            }
          ],
          "setting": {
            "speed": {
              "channel": 5
             }
          }
        }
      }
      
    • 第二步:启动DataX

      $ cd {YOUR_DATAX_DIR_BIN}
      $ python datax.py ./stream2stream.json
      

      同步结束,显示日志如下:

      ...
      2015-12-17 11:20:25.263 [job-0] INFO  JobContainer -
      任务启动时刻                    : 2015-12-17 11:20:15
      任务结束时刻                    : 2015-12-17 11:20:25
      任务总计耗时                    :                 10s
      任务平均流量                    :              205B/s
      记录写入速度                    :              5rec/s
      读出记录总数                    :                  50
      读写失败总数                    :                   0
      

Support Data Channels

目前DataX支持的数据源有:

Reader


Reader实现了从数据存储系统批量抽取数据,并转换为DataX标准数据交换协议,DataX任意Reader能与DataX任意Writer实现无缝对接,达到任意异构数据互通之目的。


RDBMS 关系型数据库

数仓数据存储

  • ODPSReader: 使用ODPS Tunnel SDK批量抽取ODPS数据。

NoSQL数据存储

无结构化数据存储

Writer


Writer实现了从DataX标准数据交换协议,翻译为具体的数据存储类型并写入目的数据存储。DataX任意Writer能与DataX任意Reader实现无缝对接,达到任意异构数据互通之目的。


RDBMS 关系型数据库

数仓数据存储

  • ODPSWriter: 使用ODPS Tunnel SDK向ODPS写入数据。
  • ADSWriter: 使用ODPS中转将数据导入ADS。

NoSQL数据存储

无结构化数据存储

Support Data Channels List

Contact us

Google Groups: DataX-user

Popular Mysql Projects
Popular Postgresql Projects
Popular Data Storage Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Mysql
Postgresql
Oracle
Jdbc
Ads
Writer
Sql Server
Tunnel
Hive
Etl
Nosql
Hdfs
Hbase
Relational Databases