Awesome Open Source
Awesome Open Source
Combined Topics
etl
x
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210
The Top 89 Etl Open Source Projects
Categories
>
Data Processing
>
Etl
Benthos
⭐
2,786
Declarative streaming ETL for mundane tasks, written in Go
Dagster
⭐
2,520
A data orchestrator for machine learning, analytics, and ETL.
Linq2db
⭐
1,944
Linq to database provider.
Mara Pipelines
⭐
1,602
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Riko
⭐
1,550
A Python stream processing engine modeled after Yahoo! Pipes
Kiba
⭐
1,517
Data processing & ETL framework for Ruby
Aws Data Wrangler
⭐
1,358
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Transporter
⭐
1,160
Sync data between persistence engines, like ETL only not stodgy
Awesome Business Intelligence
⭐
1,073
Actively curated list of awesome BI tools. PRs welcome!
Dataspherestudio
⭐
1,067
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Ethereum Etl
⭐
897
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Panther
⭐
822
Detect threats with log data and improve cloud security posture
Monstache
⭐
715
a go daemon that syncs MongoDB to Elasticsearch in realtime
React Csv
⭐
699
React components to build CSV files on the fly basing on Array/literal object of data
Getting Started
⭐
665
This repository is a getting started guide to Singer.
Pyspark Example Project
⭐
591
Example project implementing best practices for PySpark ETL jobs and applications.
Go Streams
⭐
578
A lightweight stream processing library for Go
Baby Names Analysis
⭐
556
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Ananas Desktop
⭐
544
A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Koop
⭐
498
🔮 Transform, query, and download geospatial data on the web.
Bigslice
⭐
471
A serverless cluster computing system for the Go programming language
Smartcode
⭐
454
SmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
Etlalchemy
⭐
447
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Pglogical
⭐
432
Logical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Airbyte
⭐
427
Airbyte is an open-source data integration platform that helps you consolidate your data in your warehouses, lakes and databases.
Datacleaner
⭐
383
The premier open source Data Quality solution
Abc
⭐
362
Power of appbase.io via CLI, with nifty imports from your favorite data sources
Choetl
⭐
355
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml formatted files)
Wedatasphere
⭐
342
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Aistore
⭐
341
AIStore: scalable storage for AI applications
Metorikku
⭐
335
A simplified, lightweight ETL Framework based on Apache Spark
Webkettle
⭐
314
基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
Dataform
⭐
308
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Smooks
⭐
285
An extensible Java framework for building XML and non-XML (CSV, EDI, Java, etc...) streaming applications
Datavec
⭐
270
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Data Making Guidelines
⭐
247
📘 Making Data, the DataMade Way
Example Airflow Dags
⭐
238
Example DAGs using hooks and operators from Airflow Plugins
Aws Etl Orchestrator
⭐
235
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Storagetapper
⭐
225
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Elastic
⭐
223
R client for the Elasticsearch HTTP API
Eland
⭐
222
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Bulk Writer
⭐
206
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Etlbox
⭐
194
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Extract
⭐
189
A cross-platform command line tool for parallelised content extraction and analysis.
Cql
⭐
187
Categorical Query Language IDE
Mongo Es
⭐
185
A MongoDB to Elasticsearch connector
Metl
⭐
182
Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file based Extract/Transform/Load (ETL), and remote procedure invocation via Web Services. Read more at www.jumpmind.com/products/metl/overview
Grafter
⭐
173
Linked Data & RDF Manufacturing Tools in Clojure
Bender
⭐
169
Bender - Serverless ETL Framework
Open Semantic Etl
⭐
159
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Etl_unicorn
⭐
154
数据可视化, 数据挖掘, 数据处理 ETL
Bitcoin Etl
⭐
154
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Mara Example Project 2
⭐
153
An example mini data warehouse for python project stats, template for new projects
Metl
⭐
151
mito ETL tool
Aws Serverless Data Lake Framework
⭐
151
Enterprise-grade, production-hardened, serverless data lake on AWS
Hydrograph
⭐
143
A visual ETL development and debugging tool for big data
Eel Sdk
⭐
140
Big Data Toolkit for the JVM
Etl.net
⭐
125
Mass processing data with a complete ETL for .net developers
Omniparser
⭐
124
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Openkettlewebui
⭐
121
一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Transformalize
⭐
119
Configurable Extract, Transform, and Load
Sentinel Crawler
⭐
119
Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure 💫 多语言执行器,分布式爬虫
Reddit Detective
⭐
119
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Kettle Web
⭐
118
基于spring boot通过java代码调用kette
Marklogic Data Hub
⭐
111
The MarkLogic Data Hub: documentation ==>
Butterfree
⭐
105
A tool for building feature stores.
Kafka Connect
⭐
102
equivalent to kafka-connect 🔧 for nodejs ✨🐢🚀✨
Aws Ecs Airflow
⭐
101
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Csv2db
⭐
96
The CSV to database command line loader
Open Data Etl Utility Kit
⭐
92
Use Pentaho's open source data integration tool (Kettle) to create Extract-Transform-Load (ETL) processes to update a Socrata open data portal. Documentation is available at http://open-data-etl-utility-kit.readthedocs.io/en/stable
Etl
⭐
84
LinkedPipes ETL is an RDF based, lightweight ETL tool
Udacity Data Engineering
⭐
82
Udacity Data Engineering Nano Degree (DEND)
Od
⭐
76
Česká otevřená data
Luigi Warehouse
⭐
72
A luigi powered analytics / warehouse stack
Locopy
⭐
69
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Etl_with_python
⭐
65
ETL with Python - Taught at DWH course 2017 (TAU)
Stetl
⭐
63
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Discreetly
⭐
59
ETLy is an add-on dashboard service on top of Apache Airflow.
Sayn
⭐
58
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Kiba Plus
⭐
47
Kiba enhancement for Ruby ETL.
Bentools Etl
⭐
41
PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Ether_sql
⭐
37
A python library to push ethereum blockchain data into an sql database.
Pyetl
⭐
29
python ETL framework
Yunmai Data Extract
⭐
21
Extract your data from the Yunmai weighing scales cloud API so you can use it elsewhere
Aws Auto Terminate Idle Emr
⭐
21
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Phila Airflow
⭐
16
Bandar Log
⭐
14
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Dswarm Backoffice Web
⭐
11
The backoffice web application of d:swarm (https://github.com/dswarm/dswarm-documentation/wiki)
Tuna
⭐
11
🐟 A streaming ETL for fish
1-89 of 89 projects
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210