Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python etl pipeline
etl-pipeline
x
python
x
49 search results found
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Hamilton
⭐
1,538
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Hamilton
⭐
877
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Watchmen Matryoshka Doll
⭐
124
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Onetl
⭐
55
One ETL tool to rule them all
Dataligo
⭐
47
A library to accelerate ML and ETL pipeline by connecting all data sources
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Data Science Regular Bootcamp
⭐
39
Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.
Conductor Python
⭐
38
Conductor OSS SDK for Python programming language
Uber Expenses Tracking
⭐
35
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Stellar Etl Airflow
⭐
31
Airflow DAGs for the Stellar ETL project
Bitcoinmonitor
⭐
31
Near real time ETL to populate a dashboard.
Tweetsolaping
⭐
24
implementing an end-to-end tweets ETL/Analysis pipeline.
Udacity Data Eng Proj2
⭐
23
A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract data from S3, apply a series of transformations and load into S3 and Redshift.
Aws Youtube Analytics
⭐
20
It aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.
Jira Database Etl
⭐
20
🚹 💾 Script to import issues from a JIRA instance into a database.
Data Refinery
⭐
19
Data transformation
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Automated_etl_google_cloud Social_dashboard
⭐
15
A dashboard is worth a thousand words => https://datastudio.google.com/reporting/755f3183-d
Nyc_taxi_pipeline
⭐
12
Design/Implement stream/batch architecture on NYC taxi data | #DE
Spotify Etl
⭐
12
Spotify ETL Pipeline
Airflowjob
⭐
11
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
Serverless Python Workflow With Aws Lambda
⭐
11
A tutorial to setup and deploy a simple Serverless Python workflow with REST API endpoints in AWS Lambda.
Gfw Data Api
⭐
10
GFW Data API
Dlt With Debug
⭐
8
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
Spooq
⭐
8
Pyspark Template
⭐
8
A Python PySpark Projet with Poetry
Spotify_etl
⭐
8
Using an ETL pipeline to investigate the change in hip-hop/rap genre over time
Disaster Response Pipeline
⭐
7
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
Etl Pipeline Runner
⭐
7
A package to run ETL Pipelines for your datascience projects
Wiredflow
⭐
7
Lightweight library for creating services using just Python
Etl Pipeline Using Airflow
⭐
7
ETL pipeline to extract data from AWS S3 and transform it and load it to AWS Redshift
Thomasnet Scraper
⭐
6
Scraping USA Hardware Suppliers Data
Valves
⭐
6
general functions for your data .pipe()-lines.
Pyemits
⭐
5
Sugar candy for data scientist. Easy manipulation in time-series data analytics works.
Stock_streaming_pipeline_project
⭐
5
Built a real-time streaming pipeline to extract stock data, using Apache Nifi, Debezium, Kafka, and Spark Streaming. Loaded the transformed data into Glue database and created real-time dashboards using Power BI and Tableau with Athena. The pipeline is orchestrated using Airflow.
Gcp Airflow Foundations
⭐
5
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse
Rabbit In A Blender
⭐
5
An ETL pipeline to transform your EMP data to OMOP
Otokuna
⭐
5
A system and web app to discover good deals of rental properties, built and automated on a serverless architecture.
Trusted Data Pipeline
⭐
5
Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb
Cryptodatapy
⭐
5
CryptoDataPy is a python library that makes it easy to build high quality data pipelines for the analysis of cryptoassets
Dados Censup
⭐
5
Automação da ingestão de dados disponibilizados pelo INEP referente ao censo superior da educacão brasileira.
Celo Etl
⭐
5
Python scripts for ETL (extract, transform and load) jobs for Celo blockchain blocks, transactions and more coming.
Stock Market Real Time Data Pipeline With Apache Kafka And Cassandra
⭐
5
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Related Searches
Python Flask (17,643)
Python Jupyter Notebook (17,496)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Deep Learning (13,092)
Python Database (9,975)
Python Natural Language Processing (9,064)
Python Amazon Web Services (7,946)
Python Google (6,463)
Python Pandas (6,193)
1-49 of 49 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.