Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data engineering pyspark
data-engineering
x
pyspark
x
20 search results found
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Butterfree
⭐
269
A tool for building feature stores.
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Movalytics Data Warehouse
⭐
117
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Towardsdataengineering
⭐
52
This repo contains commands that data engineers use in day to day work.
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Learn Data Munging
⭐
37
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Pyjaws
⭐
36
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Spark Studyclub
⭐
31
Grupo de Estudios de Apache Spark organizado por la comunidad Data Engineering Latam
Sparkdataset
⭐
28
Instant search for and access to many datasets in Pyspark.
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Apache Spark Docker
⭐
21
Dockerizing an Apache Spark Standalone Cluster
Pyspark On Aws Emr
⭐
13
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Marshmallow Pyspark
⭐
12
Marshmallow serializer integration with pyspark
Sparkitecture
⭐
9
A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.
Cis_households
⭐
9
Data engineering pipeline for the household COVID-19 Infection Survey (CIS)
Data Engineering
⭐
9
Common data manipulations in different languages and frameworks.
Airflow
⭐
8
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql
Analysis
⭐
8
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
Pyspark Template
⭐
8
A Python PySpark Projet with Poetry
Reddit Data Engineering
⭐
7
An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit
Spark Structured Streaming Kafka
⭐
5
Spark Structured Streaming + Kafka + Delta pipeline.
Related Searches
Spark Pyspark (773)
Python Pyspark (689)
1-20 of 20 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.