Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for etl data engineering
data-engineering
x
etl
x
88 search results found
Airflow
⭐
34,299
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Benthos
⭐
7,407
Fancy stream processing made operationally mundane
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Cloudquery
⭐
5,380
The open source high performance data integration platform built for developers.
Kestra
⭐
5,257
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Aws Sdk Pandas
⭐
3,779
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Quadratic
⭐
2,485
Quadratic | Data Science Spreadsheet with Python & SQL
Incubator Devlake
⭐
2,322
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Hamilton
⭐
1,272
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Data Engineering Wiki
⭐
934
The best place to learn data engineering. Built and maintained by the data engineering community.
Sqlmesh
⭐
931
SQLMesh is a data transformation framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
Hamilton
⭐
877
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Dataform
⭐
757
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Neumai
⭐
693
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Redun
⭐
464
Yet another redundant workflow engine
Automate Dv
⭐
435
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Aws Serverless Data Lake Framework
⭐
379
Enterprise-grade, production-hardened, serverless data lake on AWS
Ethereum Etl Airflow
⭐
378
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe
Bitcoin Etl
⭐
350
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Etl
⭐
327
PHP - ETL (Extract Transform Load) data processing library
Data Engineering Projects
⭐
322
Personal Data Engineering Projects
Awesome Bigquery Views
⭐
322
Useful SQL queries for Blockchain ETL datasets in BigQuery.
Cascading
⭐
321
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.
Conduit
⭐
321
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Recap
⭐
292
Work with your web service, database, and streaming schemas in a single format.
Butterfree
⭐
269
A tool for building feature stores.
Cuelake
⭐
266
Use SQL to build ELT pipelines on a data lakehouse.
Orbital
⭐
255
Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.
Substation
⭐
242
Substation is a cloud-native, event-driven data pipeline toolkit built for security teams.
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Dataplane
⭐
171
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Morph Kgc
⭐
143
Powerful RDF Knowledge Graph Generation with RML Mappings
Datachecks
⭐
117
Open Source Data Quality Monitoring.
Sayn
⭐
117
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Pangeo Forge Recipes
⭐
108
Python library for building Pangeo Forge recipes.
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Polygon Etl
⭐
93
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Bulker
⭐
92
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Airbyte_serverless
⭐
83
Airbyte made simple (no UI, no database, no cluster)
Dataengineeringpilipinas
⭐
80
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.
Gallia Core
⭐
79
A schema-aware Scala library for data transformation
Data Engineering Nanodegree
⭐
76
Projects done in the Data Engineering Nanodegree by Udacity.com
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Rony
⭐
56
Data Engineering made simple - An opinionated Data Engineering framework
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Sqlpipe
⭐
52
SQLpipe makes it easy to move the result of one query from one database to another.
Uptasticsearch
⭐
48
An Elasticsearch client tailored to data science workflows.
Yaetos
⭐
32
Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
Hive Metastore Client
⭐
32
A client for connecting and running DDLs on hive metastore.
Arthur Redshift Etl
⭐
25
ELT Code for your Data Warehouse
Nodestream
⭐
23
A Fast, Declarative, and Extensible ETL Framework for Graph Databases.
Aws Glue Docker
⭐
22
🐋 Docker image for AWS Glue Spark/Python
De 100 Days
⭐
22
data engineering 100 days 🤖 🧲 🦾 | #DE
Etl_manager
⭐
21
A python package to create a database on the platform using our moj data warehousing framework
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Papers4dataachitect
⭐
15
Collect papers for data engineering such as OLTP/OLAP/ETL/DistributedStorage.
Airflowdatapipeline
⭐
15
Example of an ETL Pipeline using Airflow
Ghcn D
⭐
14
Data Pipeline from the Global Historical Climatology Network DataSet
Sheetwork
⭐
14
A handy package to load Google Sheets to your database right from the CLI and with easy configuration via YAML files.
Thepipelinetool
⭐
13
A pipeline orchestration tool
Zksync Era Etl
⭐
12
Best zkSync-era ETL ever 😜
Data Brewery
⭐
12
Data Brewery is an ETL (Extract-Transform-Load) program that connect to many data sources (cloud services, databases, ...) and manage data warehouse workflow.
Airflowjob
⭐
11
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
Social Media Analysis
⭐
11
Social Media Analysis, scalable solution, flexible deployment that analyses social media contents
Apache Airflow Providers Transfers
⭐
10
Awesome Dataops
⭐
10
Awesome list of dataops products, open source and resources
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Data Pipeline With Dbt Using Airflow On Gcp
⭐
10
This project demonstrates how to build and automate an ETL pipeline using DAGs in Airflow and load the transformed data to Bigquery. There are different tools that have been used in this project such as Astro, DBT, GCP, Airflow, Metabase.
Data Engineering
⭐
9
This is an all-in-one repository for Data Engineers, ideal for beginners & interview preparation, which includes Python as the main Programing language incorporating MySQL, MongoDB and Docker
Pyspark Template
⭐
8
A Python PySpark Projet with Poetry
Hedera Etl
⭐
8
ETL scripts for Hedera Hashgraph
Data Engineering Onboarding Starter
⭐
8
This repository contains a 10 step program to enter the world of Data Engineering
Spooq
⭐
8
Datacrafter
⭐
6
NoSQL extract, transform, load (ETL) toolkit with Python
Spark Databricks
⭐
6
🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀
Data Engineer Portfolio
⭐
6
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
Data.engineers.lunch
⭐
6
Resources from weekly Zoom lunches revolving around Data Engineering. Hosted by Anant Corporation.
Unblind
⭐
6
Proyecto para el Datatón anticorrupción 2022 - By Dataket 🔥
Udacity Data Engineering Nanodegree
⭐
5
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
Etl Adapter Parquet
⭐
5
PHP ETL Adapter: Parquet
Spark Structured Streaming Kafka
⭐
5
Spark Structured Streaming + Kafka + Delta pipeline.
Build Etl Using Ssis
⭐
5
Starter project for building an ETL pipeline using SSIS in Visual Studio 2019
Stock Market Real Time Data Pipeline With Apache Kafka And Cassandra
⭐
5
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Related Searches
Python Etl (814)
1-88 of 88 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.