Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for etl
etl
x
964 search results found
Tidb
⭐
35,604
TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial
Airflow
⭐
34,299
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Doris
⭐
11,243
Apache Doris is an easy-to-use, high performance and unified analytics database.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Benthos
⭐
7,407
Fancy stream processing made operationally mundane
Pentaho Kettle
⭐
7,194
Pentaho Data Integration ( ETL ) a.k.a Kettle
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Steampipe
⭐
6,061
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Cloudquery
⭐
5,380
The open source high performance data integration platform built for developers.
Kestra
⭐
5,257
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Rudder Server
⭐
3,841
Privacy and Security focused Segment-alternative, in Golang and React
Aws Sdk Pandas
⭐
3,779
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Awesome Etl
⭐
3,041
A curated list of awesome ETL frameworks, libraries, and software.
Ethereum Etl
⭐
2,760
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Hawk
⭐
2,638
visualized crawler & ETL IDE written with C#/WPF
Quadratic
⭐
2,485
Quadratic | Data Science Spreadsheet with Python & SQL
Incubator Devlake
⭐
2,322
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Mara Pipelines
⭐
2,053
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Awesome Business Intelligence
⭐
1,862
Actively curated list of awesome BI tools. PRs welcome!
Deepie
⭐
1,737
DeepIE: Deep Learning for Information Extraction
Kiba
⭐
1,684
Data processing & ETL framework for Ruby
Go Streams
⭐
1,656
A lightweight stream processing library for Go
Awesome Node Based Uis
⭐
1,648
A curated list with resources about node-based UIs
Riko
⭐
1,573
A Python stream processing engine modeled after Yahoo! Pipes
Vdp
⭐
1,556
💧 Instill VDP (Versatile Data Pipeline) is an open-source tool to seamlessly integrate AI to process unstructured data in the modern data stack
Transporter
⭐
1,447
Sync data between persistence engines, like ETL only not stodgy
Dozer
⭐
1,367
Dozer is a real-time data platform for building, deploying and maintaining data products.
Aws Glue Samples
⭐
1,334
AWS Glue code samples
Peerdb
⭐
1,315
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
Hamilton
⭐
1,272
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Monstache
⭐
1,208
a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
React Csv
⭐
1,130
React components to build CSV files on the fly basing on Array/literal object of data
Getting Started
⭐
1,098
This repository is a getting started guide to Singer.
Etl With Airflow
⭐
1,053
ETL best practices with airflow, with examples
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Addax
⭐
1,034
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Pgsync
⭐
1,003
Postgres to Elasticsearch/OpenSearch sync
Butano
⭐
946
Modern C++ high level GBA engine
Data Engineering Wiki
⭐
934
The best place to learn data engineering. Built and maintained by the data engineering community.
Sqlmesh
⭐
931
SQLMesh is a data transformation framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
Hamilton
⭐
877
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Pglogical
⭐
839
Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Tis
⭐
833
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Dataform
⭐
757
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Open Semantic Search
⭐
741
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Net Libraries That Make Your Life Easier
⭐
730
Open Source .NET libraries that make your life easier.
Optimus
⭐
707
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Onepanel
⭐
704
The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.
Neumai
⭐
693
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Choetl
⭐
693
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Koop
⭐
618
Transform, query, and download geospatial data on the web.
Omniparser
⭐
597
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Eland
⭐
588
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Aws Glue Libs
⭐
568
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Smartcode
⭐
566
SmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
Ananas Desktop
⭐
563
A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Etl.net
⭐
559
Mass processing data with a complete ETL for .net developers
Datacleaner
⭐
557
The premier open source Data Quality solution
Baby Names Analysis
⭐
555
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Etl2pcapng
⭐
544
Utility that converts an .etl file containing a Windows network packet capture into .pcapng format.
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Bigslice
⭐
525
A serverless cluster computing system for the Go programming language
Redun
⭐
464
Yet another redundant workflow engine
Abc
⭐
455
Power of appbase.io via CLI, with nifty imports from your favorite data sources
Automate Dv
⭐
435
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Spark Excel
⭐
421
A Spark plugin for reading and writing Excel files
Pudl
⭐
417
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Etlalchemy
⭐
414
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Neosync
⭐
413
A developer-first way to create high-fidelity synthetic data or anonymize sensitive data and sync it across all environments for testing, fine-tuning or model training.
Exchangis
⭐
401
Exchangis is a lightweight,highly extensible data exchange platform that supports data transmission between structured and unstructured heterogeneous data sources
Etlpy
⭐
393
a smart stream-like crawler & etl python library
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Aws Serverless Data Lake Framework
⭐
379
Enterprise-grade, production-hardened, serverless data lake on AWS
Zdh_web
⭐
379
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批
Ethereum Etl Airflow
⭐
378
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe
Smooks
⭐
377
Extensible data integration Java framework for building XML and non-XML fragment-based applications
Etl
⭐
367
Extract, Transform, and Load data with Ruby
Hnpickup
⭐
358
This is an educational example of a data mining web application: when is good time to post on HN
Big_data_architect_skills
⭐
353
一个大数据架构师应该掌握的技能
Webkettle
⭐
350
基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
Bitcoin Etl
⭐
350
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Etl
⭐
327
PHP - ETL (Extract Transform Load) data processing library
Data Engineering Projects
⭐
322
Personal Data Engineering Projects
Awesome Bigquery Views
⭐
322
Useful SQL queries for Blockchain ETL datasets in BigQuery.
Cascading
⭐
321
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.
Conduit
⭐
321
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Replicadb
⭐
304
ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases
Astro Sdk
⭐
303
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Nango Sync
⭐
294
Sync external APIs to your DB, fast.
Recap
⭐
292
Work with your web service, database, and streaming schemas in a single format.
Flow
⭐
290
Flow PHP - strongly typed data processing framework
Kafka Connect File Pulse
⭐
289
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
Cql
⭐
285
Categorical Query Language IDE
Awesome Integration
⭐
282
A curated list of awesome system integration software and resources.
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Pygrametl
⭐
275
Official repository for pygrametl - ETL programming in Python
1-100 of 964 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.