Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Gather Deployment | 347 | 9 months ago | mit | Jupyter Notebook | ||||||
Gathers Python deployment, infrastructure and practices. | ||||||||||
Movalytics Data Warehouse | 116 | 4 years ago | Python | |||||||
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow | ||||||||||
Pyjaws | 36 | 6 months ago | 3 | mit | Python | |||||
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows | ||||||||||
Python_mozetl | 26 | 4 months ago | 23 | mit | Python | |||||
ETL jobs for Firefox Telemetry | ||||||||||
Jobanalytics_and_search | 22 | 2 years ago | 8 | mit | Python | |||||
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters. | ||||||||||
Airflow | 8 | 7 months ago | PHP | |||||||
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql | ||||||||||
Aws Etl | 7 | 2 years ago | Smarty | |||||||
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations. | ||||||||||
Reddit Data Engineering | 7 | 2 years ago | n,ull | mit | Python | |||||
An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit | ||||||||||
Airflow Pyspark Emr | 7 | 2 years ago | 7 | Python | ||||||
This project demonstrate how to process data stored in a data lake fashion, transforming it into an OLAP optimized structure by using PySpark. The PySpark Job runs on AWS EMR, and the Data Pipeline is orchestrated by Apache Airflow, including the infrastructure creation and the EMR cluster termination. | ||||||||||
Spark Mesos Airflow Tutorial | 6 | 4 years ago | 2 | Python | ||||||