Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python etl
etl
x
python
x
330 search results found
Airflow
⭐
34,299
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Aws Sdk Pandas
⭐
3,779
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Ethereum Etl
⭐
2,760
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Quadratic
⭐
2,485
Quadratic | Data Science Spreadsheet with Python & SQL
Mara Pipelines
⭐
2,053
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Deepie
⭐
1,737
DeepIE: Deep Learning for Information Extraction
Riko
⭐
1,573
A Python stream processing engine modeled after Yahoo! Pipes
Vdp
⭐
1,556
💧 Instill VDP (Versatile Data Pipeline) is an open-source tool to seamlessly integrate AI to process unstructured data in the modern data stack
Aws Glue Samples
⭐
1,334
AWS Glue code samples
Hamilton
⭐
1,272
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Getting Started
⭐
1,098
This repository is a getting started guide to Singer.
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Pgsync
⭐
1,003
Postgres to Elasticsearch/OpenSearch sync
Sqlmesh
⭐
931
SQLMesh is a data transformation framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
Hamilton
⭐
877
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Open Semantic Search
⭐
741
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Neumai
⭐
693
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Eland
⭐
588
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Aws Glue Libs
⭐
568
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Baby Names Analysis
⭐
555
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Redun
⭐
464
Yet another redundant workflow engine
Pudl
⭐
417
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Etlalchemy
⭐
414
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Etlpy
⭐
393
a smart stream-like crawler & etl python library
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Aws Serverless Data Lake Framework
⭐
379
Enterprise-grade, production-hardened, serverless data lake on AWS
Ethereum Etl Airflow
⭐
378
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe
Bitcoin Etl
⭐
350
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Astro Sdk
⭐
303
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Recap
⭐
292
Work with your web service, database, and streaming schemas in a single format.
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Pygrametl
⭐
275
Official repository for pygrametl - ETL programming in Python
Usaspending Api
⭐
273
Server application to serve U.S. federal spending data via a RESTful API
Butterfree
⭐
269
A tool for building feature stores.
Synch
⭐
268
Sync data from the other DB to ClickHouse(cluster)
Naas
⭐
266
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
Paperetl
⭐
235
📄 ⚙️ ETL processes for medical and scientific papers
Bigquery Etl
⭐
216
Bigquery ETL
Example Airflow Dags
⭐
204
Example DAGs using hooks and operators from Airflow Plugins
Amundsendatabuilder
⭐
196
Data ingestion library for Amundsen to build graph and search index
Dbt Coves
⭐
193
CLI tool for dbt users to simplify creation of staging models (yml and sql) files
Aws Etl Orchestrator
⭐
185
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Trex
⭐
182
Intelligently transform unstructured to structured data
Airflow_for_beginners
⭐
166
Dbt Databricks
⭐
165
A dbt adapter for Databricks.
Reddit Detective
⭐
160
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Aliyun Log Python Sdk
⭐
159
Use python to manage, produce and consume data with Aliyun Log Service.
Metl
⭐
154
mito ETL tool
Meilisync
⭐
154
Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch
Morph Kgc
⭐
151
Powerful RDF Knowledge Graph Generation with RML Mappings
Csv2db
⭐
133
The CSV to database command line loader
Od
⭐
131
Česká otevřená data
Easy_sql
⭐
126
A library developed to ease the data ETL development process.
Datagristle
⭐
122
Tough and flexible tools for data analysis, transformation, validation and movement.
Diffsync
⭐
121
A utility library for comparing and synchronizing different datasets.
Datachecks
⭐
117
Open Source Data Quality Monitoring.
Sayn
⭐
117
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Pangeo Forge Recipes
⭐
108
Python library for building Pangeo Forge recipes.
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Locopy
⭐
99
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Carry
⭐
93
Python ETL(Extract-Transform-Load) tool / Data migration tool
Polygon Etl
⭐
93
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Target Postgres
⭐
93
A Singer.io Target for Postgres
Airbyte_serverless
⭐
83
Airbyte made simple (no UI, no database, no cluster)
Sycamore
⭐
82
🍁 Sycamore is an LLM-powered semantic data preparation system for building search applications.
Stetl
⭐
81
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Etlhelper
⭐
81
ETL Helper is a Python ETL library to simplify data transfer into and out of databases.
Dataengineeringpilipinas
⭐
80
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.
Datacater
⭐
80
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
Luigi Warehouse
⭐
73
A luigi powered analytics / warehouse stack
Etl Cms
⭐
72
Workproducts to ETL CMS datasets into OMOP Common Data Model
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Discreetly
⭐
70
ETLy is an add-on dashboard service on top of Apache Airflow.
Sqlbucket
⭐
67
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Data Wrangling With Python
⭐
66
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Django Calaccess Raw Data
⭐
63
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Dbt Sqlite
⭐
59
A SQLite adapter plugin for dbt (data build tool)
Openrefine Client
⭐
57
The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Etl_with_python
⭐
57
ETL with Python - Taught at DWH course 2017 (TAU)
Etl Parser
⭐
57
Event Trace Log file parser in pure Python
Rony
⭐
56
Data Engineering made simple - An opinionated Data Engineering framework
Onetl
⭐
55
One ETL tool to rule them all
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Pyetl
⭐
51
python ETL framework
Uptasticsearch
⭐
48
An Elasticsearch client tailored to data science workflows.
Unihan Etl
⭐
48
Export UNIHAN's database to csv, json or yaml
Amaxa
⭐
47
A multi-object ETL tool for Salesforce.
Legis Graph
⭐
45
ETL scripts for loading US Congressional data from govtrack.us into Neo4j
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Odoo Etl
⭐
43
Odoo data manipulation, like an small ELT (Extract, Load, Transform) for odoo databases.
Bigmetadata
⭐
43
Architect_big_data_solutions_with_spark
⭐
42
code, labs and lectures for the course
Functions
⭐
42
Serverless ETL using cloud functions https://fivetran.com/docs/functions
Related Searches
Python Jupyter Notebook (17,496)
Python Dataset (14,792)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Command Line (12,663)
Python Database (10,521)
Python Artificial Intelligence (8,580)
Python Amazon Web Services (7,946)
Python Paper (6,550)
Python Pandas (6,193)
1-100 of 330 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.