Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python etl
etl
x
python
x
473 search results found
Airbyte
⭐
10,786
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
Dagster
⭐
7,563
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
4,756
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Orchest
⭐
3,867
Build data pipelines, the easy way 🛠️
Aws Sdk Pandas
⭐
3,468
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Ethereum Etl
⭐
2,548
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Quadratic
⭐
2,010
Quadratic | Data Science Spreadsheet with Python & SQL
Mara Pipelines
⭐
1,993
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Deepie
⭐
1,737
DeepIE: Deep Learning for Information Extraction
Riko
⭐
1,573
A Python stream processing engine modeled after Yahoo! Pipes
Aws Glue Samples
⭐
1,282
AWS Glue code samples
Getting Started
⭐
1,098
This repository is a getting started guide to Singer.
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Hamilton
⭐
894
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Pgsync
⭐
851
Postgres to Elasticsearch/OpenSearch sync
Open Semantic Search
⭐
741
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Hamilton
⭐
640
A scalable general purpose micro-framework for defining dataflows. You can use it to build dataframes, numpy matrices, python objects, ML models, etc. Embed Hamilton anywhere python runs, e.g. spark, airflow, jupyter, fastapi, python scripts, etc. Comes with lineage out of the box.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Baby Names Analysis
⭐
555
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Eland
⭐
522
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Aws Glue Libs
⭐
514
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Sqlmesh
⭐
499
SQLMesh is a DataOps framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
Covalent
⭐
456
Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogenous environments.
Etlalchemy
⭐
414
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Etlpy
⭐
393
a smart stream-like crawler & etl python library
Pudl
⭐
368
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Ethereum Etl Airflow
⭐
349
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe
Aws Serverless Data Lake Framework
⭐
338
Enterprise-grade, production-hardened, serverless data lake on AWS
Versatile Data Kit
⭐
338
Build, run and manage your data pipelines with Python or SQL on any cloud
Bitcoin Etl
⭐
305
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Pygrametl
⭐
270
Official repository for pygrametl - ETL programming in Python
Synch
⭐
268
Sync data from the other DB to ClickHouse(cluster)
Astro Sdk
⭐
260
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Butterfree
⭐
249
A tool for building feature stores.
Usaspending Api
⭐
249
Server application to serve U.S. federal spending data via a RESTful API
Recap
⭐
239
Recap tracks and transform schemas across your whole application.
Naas
⭐
231
⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Example Airflow Dags
⭐
204
Example DAGs using hooks and operators from Airflow Plugins
Amundsendatabuilder
⭐
196
Data ingestion library for Amundsen to build graph and search index
Bigquery Etl
⭐
186
Bigquery ETL
Aws Etl Orchestrator
⭐
185
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Airflow_for_beginners
⭐
166
Reddit Detective
⭐
160
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Aliyun Log Python Sdk
⭐
156
Use python to manage, produce and consume data with Aliyun Log Service.
Metl
⭐
154
mito ETL tool
Dbt Coves
⭐
149
CLI tool for dbt users to simplify creation of staging models (yml and sql) files
Dbt Databricks
⭐
132
A dbt adapter for Databricks.
Od
⭐
127
Česká otevřená data
Csv2db
⭐
124
The CSV to database command line loader
Paperetl
⭐
123
📄 ⚙️ ETL processes for medical and scientific papers
Datagristle
⭐
122
Tough and flexible tools for data analysis, transformation, validation and movement.
Easy_sql
⭐
115
A library developed to ease the data ETL development process.
Sayn
⭐
113
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Sync Addons
⭐
112
Odoo Integration Addons
Morph Kgc
⭐
104
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Diffsync
⭐
103
A utility library for comparing and synchronizing different datasets.
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Locopy
⭐
98
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Pangeo Forge Recipes
⭐
98
Python library for building Pangeo Forge recipes.
Carry
⭐
93
Python ETL(Extract-Transform-Load) tool / Data migration tool
Target Postgres
⭐
93
A Singer.io Target for Postgres
Polygon Etl
⭐
81
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Datacater
⭐
78
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
Etlhelper
⭐
78
ETL Helper is a Python ETL library to simplify data transfer into and out of databases.
Luigi Warehouse
⭐
73
A luigi powered analytics / warehouse stack
Etl Cms
⭐
72
Workproducts to ETL CMS datasets into OMOP Common Data Model
Stetl
⭐
71
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Sqlbucket
⭐
67
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Data Wrangling With Python
⭐
66
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Django Calaccess Raw Data
⭐
63
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Discreetly
⭐
62
ETLy is an add-on dashboard service on top of Apache Airflow.
Dbt Sqlite
⭐
59
A SQLite adapter plugin for dbt (data build tool)
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Etl Parser
⭐
57
Event Trace Log file parser in pure Python
Openrefine Client
⭐
57
The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Etl_with_python
⭐
57
ETL with Python - Taught at DWH course 2017 (TAU)
Rony
⭐
56
Data Engineering made simple - An opinionated Data Engineering framework
Pyetl
⭐
51
python ETL framework
Uptasticsearch
⭐
47
An Elasticsearch client tailored to data science workflows.
Amaxa
⭐
47
A multi-object ETL tool for Salesforce.
Drivers
⭐
46
🏎 The python library enabling access to tools and data sources in minutes, with Naas low-code formulas.
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Legis Graph
⭐
45
ETL scripts for loading US Congressional data from govtrack.us into Neo4j
Meilisync
⭐
45
Realtime sync data from MySQL/PostgreSQL/MongoDB to meilisearch
Odoo Etl
⭐
43
Odoo data manipulation, like an small ELT (Extract, Load, Transform) for odoo databases.
Bigmetadata
⭐
43
Unihan Etl
⭐
42
Export UNIHAN's database to csv, json or yaml
Architect_big_data_solutions_with_spark
⭐
42
code, labs and lectures for the course
Functions
⭐
42
Serverless ETL using cloud functions https://fivetran.com/docs/functions
Parade
⭐
37
A simple and out-of-box toolkit to handle data work
Ether_sql
⭐
35
A python library to push ethereum blockchain data into an sql database.
Tablite
⭐
34
multiprocessing enabled out-of-memory data analysis library for tabular data.
Pandas To Postgres
⭐
33
Copy Pandas DataFrames and HDF5 files to PostgreSQL database
Knackpy
⭐
33
A Python client for interacting with Knack applications
Main
⭐
33
FHIRPACK (FHIR Python Analysis Client and Kit) is a general purpose FHIR client that simplifies the access, analysis and representation of FHIR and EHR data using PANDAS, an ETL philosophy and a functional syntax. FHIRPACK was designed and developed at the IKIM MML (https://mml.ikim.nrw/) and HDDBS (https://dbs.ifi.uni-heidelberg.de/).
Wikirepo
⭐
33
Python based Wikidata framework for easy dataframe extraction
Hive Metastore Client
⭐
32
A client for connecting and running DDLs on hive metastore.
Blast
⭐
31
Blast is a data orchestration tool that can run SQL and Python against Google BigQuery and Snowflake. It supports templating with Jinja, data quality tests, query validation, environment management and more.
Related Searches
Python Jupyter Notebook (17,496)
Python Dataset (14,792)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Command Line (13,187)
Python Database (10,072)
Python Amazon Web Services (8,185)
Python Django (8,165)
Python Artificial Intelligence (6,875)
Python Paper (6,550)
1-100 of 473 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.