Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python etl
etl
x
python
x
261 search results found
Airflow
⭐
40,068
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Aws Sdk Pandas
⭐
4,015
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Quadratic
⭐
2,485
Quadratic | Data Science Spreadsheet with Python & SQL
Mara Pipelines
⭐
2,083
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Hamilton
⭐
1,864
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Riko
⭐
1,573
A Python stream processing engine modeled after Yahoo! Pipes
Vdp
⭐
1,556
💧 Instill VDP (Versatile Data Pipeline) is an open-source tool to seamlessly integrate AI to process unstructured data in the modern data stack
Aws Glue Samples
⭐
1,334
AWS Glue code samples
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Pgsync
⭐
1,003
Postgres to Elasticsearch/OpenSearch sync
Sqlmesh
⭐
931
SQLMesh is a data transformation framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
Hamilton
⭐
877
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Neumai
⭐
693
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Eland
⭐
588
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Aws Glue Libs
⭐
568
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Baby Names Analysis
⭐
555
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Redun
⭐
464
Yet another redundant workflow engine
Pudl
⭐
417
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Etlalchemy
⭐
414
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Bitcoin Etl
⭐
350
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Astro Sdk
⭐
303
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Recap
⭐
292
Work with your web service, database, and streaming schemas in a single format.
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Butterfree
⭐
269
A tool for building feature stores.
Synch
⭐
268
Sync data from the other DB to ClickHouse(cluster)
Naas
⭐
266
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
Paperetl
⭐
235
📄 ⚙️ ETL processes for medical and scientific papers
Bigquery Etl
⭐
216
Bigquery ETL
Example Airflow Dags
⭐
204
Example DAGs using hooks and operators from Airflow Plugins
Amundsendatabuilder
⭐
196
Data ingestion library for Amundsen to build graph and search index
Dbt Coves
⭐
193
CLI tool for dbt users to simplify creation of staging models (yml and sql) files
Trex
⭐
182
Intelligently transform unstructured to structured data
Airflow_for_beginners
⭐
166
Dbt Databricks
⭐
165
A dbt adapter for Databricks.
Reddit Detective
⭐
160
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Aliyun Log Python Sdk
⭐
159
Use python to manage, produce and consume data with Aliyun Log Service.
Meilisync
⭐
154
Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch
Metl
⭐
154
mito ETL tool
Morph Kgc
⭐
151
Powerful RDF Knowledge Graph Generation with RML Mappings
Csv2db
⭐
133
The CSV to database command line loader
Od
⭐
131
Česká otevřená data
Easy_sql
⭐
126
A library developed to ease the data ETL development process.
Datagristle
⭐
122
Tough and flexible tools for data analysis, transformation, validation and movement.
Sayn
⭐
117
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Pangeo Forge Recipes
⭐
108
Python library for building Pangeo Forge recipes.
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Locopy
⭐
99
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Carry
⭐
93
Python ETL(Extract-Transform-Load) tool / Data migration tool
Target Postgres
⭐
93
A Singer.io Target for Postgres
Airbyte_serverless
⭐
83
Airbyte made simple (no UI, no database, no cluster)
Sycamore
⭐
82
🍁 Sycamore is an LLM-powered semantic data preparation system for building search applications.
Etlhelper
⭐
81
ETL Helper is a Python ETL library to simplify data transfer into and out of databases.
Datacater
⭐
80
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
Openrefine Client
⭐
80
The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Dataengineeringpilipinas
⭐
80
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.
Luigi Warehouse
⭐
73
A luigi powered analytics / warehouse stack
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Sqlbucket
⭐
67
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Data Wrangling With Python
⭐
66
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Django Calaccess Raw Data
⭐
63
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Dbt Sqlite
⭐
59
A SQLite adapter plugin for dbt (data build tool)
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Rony
⭐
56
Data Engineering made simple - An opinionated Data Engineering framework
Onetl
⭐
55
One ETL tool to rule them all
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Pyetl
⭐
51
python ETL framework
Unihan Etl
⭐
48
Export UNIHAN's database to csv, json or yaml
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Bigmetadata
⭐
43
Odoo Etl
⭐
43
Odoo data manipulation, like an small ELT (Extract, Load, Transform) for odoo databases.
Architect_big_data_solutions_with_spark
⭐
42
code, labs and lectures for the course
Projeto_etl_rfb_ibge_anp
⭐
38
PYTHON E POSTGRESQL - EXTRACT TRANSFORM LOAD - ETL - DADOS PÚBLICOS DA RECEITA FEDERAL DO BRASIL - RFB, INSTITUTO BRASILEIRO DE GEOGRAFIA E ESTATÍSTICA - IBGE E AGÊNCIA NACIONAL DO PETRÓLEO, GÁS NATURAL E BIOCOMBUSTÍVEIS - ANP - PYTHON E POSTGRESQL
Parade
⭐
37
A simple and out-of-box toolkit to handle data work
Tablite
⭐
36
multiprocessing enabled out-of-memory data analysis library for tabular data.
Ether_sql
⭐
35
A python library to push ethereum blockchain data into an sql database.
Fhirpack
⭐
34
FHIR Python Analysis Client and Kit (FHIRPACK) is a general purpose FHIR client that simplifies the access, analysis and representation of FHIR and EHR data using PANDAS, an ETL philosophy and a functional syntax. It was initially developed at the IKIM and HDDBS in Germany. Read more at https://zenodo.org/record/8006589
Pandas To Postgres
⭐
33
Copy Pandas DataFrames and HDF5 files to PostgreSQL database
Hive Metastore Client
⭐
32
A client for connecting and running DDLs on hive metastore.
Dagster Polars
⭐
32
Polars integration for Dagster
Yaetos
⭐
32
Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
Blast
⭐
31
Blast is a data orchestration tool that can run SQL and Python against Google BigQuery and Snowflake. It supports templating with Jinja, data quality tests, query validation, environment management and more.
Stellar Etl Airflow
⭐
31
Airflow DAGs for the Stellar ETL project
Dbd
⭐
29
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Topbi
⭐
29
Business intelligence for software development by python.
News_scrapy_redis
⭐
28
Cookiecutter R Project
⭐
28
Basic cookiecutter template for R projects
Datayoga
⭐
27
streaming data pipeline platform
Aws Auto Terminate Idle Emr
⭐
26
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Alto
⭐
26
Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want.
Covalent Slurm Plugin
⭐
26
Executor plugin interfacing Covalent with Slurm
Sample_etl_structure
⭐
26
Fhir Pipe
⭐
25
Populate FHIR-compliant objects using SQL databases and processing rules
Arthur Redshift Etl
⭐
25
ELT Code for your Data Warehouse
Related Searches
Python Jupyter Notebook (17,496)
Python Dataset (14,792)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Command Line (12,663)
Python Database (10,521)
Python Artificial Intelligence (8,580)
Python Amazon Web Services (7,946)
Python Paper (6,550)
Python Pandas (6,193)
1-100 of 261 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2025 Awesome Open Source. All rights reserved.