Airflow Pyspark Emr

This project demonstrate how to process data stored in a data lake fashion, transforming it into an OLAP optimized structure by using PySpark. The PySpark Job runs on AWS EMR, and the Data Pipeline is orchestrated by Apache Airflow, including the infrastructure creation and the EMR cluster termination.
Alternatives To Airflow Pyspark Emr
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Gather Deployment347
9 months agomitJupyter Notebook
Gathers Python deployment, infrastructure and practices.
Movalytics Data Warehouse116
4 years agoPython
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Pyjaws36
6 months ago3mitPython
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Python_mozetl26
4 months ago23mitPython
ETL jobs for Firefox Telemetry
Jobanalytics_and_search22
2 years ago8mitPython
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Airflow8
7 months agoPHP
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql
Aws Etl7
2 years agoSmarty
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations.
Reddit Data Engineering7
2 years agon,ullmitPython
An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit
Airflow Pyspark Emr7
2 years ago7Python
This project demonstrate how to process data stored in a data lake fashion, transforming it into an OLAP optimized structure by using PySpark. The PySpark Job runs on AWS EMR, and the Data Pipeline is orchestrated by Apache Airflow, including the infrastructure creation and the EMR cluster termination.
Spark Mesos Airflow Tutorial6
4 years ago2Python
Alternatives To Airflow Pyspark Emr
Select To Compare


Alternative Project Comparisons
Popular Airflow Projects
Popular Pyspark Projects
Popular Control Flow Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Table
Songs
Airflow
Pyspark