Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark data engineering
data-engineering
x
spark
x
70 search results found
Data Engineering Zoomcamp
⭐
19,461
Free Data Engineering course!
Cookbook
⭐
12,557
The Data Engineering Cookbook
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Risingwave
⭐
5,799
The distributed streaming database. Engineered to offer the simplest and most cost-efficient way for stream processing and management.
Awesome Opensource Data Engineering
⭐
1,331
An Awesome List of Open-Source Data Engineering Projects
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Around Dataengineering
⭐
926
A Data Engineering & Machine Learning Knowledge Hub
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Blaze
⭐
784
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Data Engineering Interview Questions
⭐
554
More than 2000+ Data engineer interview questions.
Data Engineering Projects
⭐
322
Personal Data Engineering Projects
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Butterfree
⭐
269
A tool for building feature stores.
Geni
⭐
268
A Clojure dataframe library that runs on Spark
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Spark Alchemy
⭐
169
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Lakehouse Engine
⭐
154
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Scalable Data Science Platform
⭐
153
Content for architecting a data science platform for products using Luigi, Spark & Flask.
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Movalytics Data Warehouse
⭐
116
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
De Zoomcamp Ui
⭐
107
🎨 UI for the Free Data Engineering Zoomcamp 2023 Course provided by DataTalksClub
Streamify
⭐
97
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Gallia Core
⭐
79
A schema-aware Scala library for data transformation
Data Engineering Nanodegree
⭐
76
Projects done in the Data Engineering Nanodegree by Udacity.com
Waimak
⭐
73
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Cowait
⭐
54
Containerized distributed programming framework for Python
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Learn Data Munging
⭐
37
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Sageworks
⭐
36
SageWorks: An easy to use Python API for creating and deploying SageMaker Models
Pyjaws
⭐
36
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Us Stock Prediction Using Ml And Spark
⭐
35
Predict stock price based on financial news feeds
Distributedwekaspark
⭐
32
Weka on Spark
Yaetos
⭐
32
Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
Spark Ai
⭐
31
Toolbox for building Generative AI applications on top of Apache Spark.
Spark Studyclub
⭐
31
Grupo de Estudios de Apache Spark organizado por la comunidad Data Engineering Latam
Debussy_concert
⭐
29
Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.
Sparkdataset
⭐
28
Instant search for and access to many datasets in Pyspark.
Data Engineering Nanodegree
⭐
27
Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift, Data Lake with Spark and Data Pipeline with Airflow.
Aws Glue Docker
⭐
22
🐋 Docker image for AWS Glue Spark/Python
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
De 100 Days
⭐
22
data engineering 100 days 🤖 🧲 🦾 | #DE
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Spark Distcp
⭐
18
A re-implementation of Hadoop DistCP in Apache Spark
Big Data Engineering
⭐
15
Ghcn D
⭐
14
Data Pipeline from the Global Historical Climatology Network DataSet
Pyspark On Aws Emr
⭐
13
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Bootcamp_data Engineering
⭐
13
Bootcamp to learn basics in Data Engineering
Akka Lift Ml
⭐
12
akka http service for serving spark machine learning models
Marshmallow Pyspark
⭐
12
Marshmallow serializer integration with pyspark
Data Paths
⭐
11
Airflowjob
⭐
11
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
Huemul Bigdatagovernance
⭐
10
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de der
Fake Data Pipeline
⭐
10
Data Generators -> Kafka -> Spark Streaming -> PostgreSQL -> Grafana
Sparkitecture
⭐
9
A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.
Spooq
⭐
8
Pyspark Template
⭐
8
A Python PySpark Projet with Poetry
Data Engineering Onboarding Starter
⭐
8
This repository contains a 10 step program to enter the world of Data Engineering
Analysis
⭐
8
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
Itversity Boxes
⭐
8
Repository for all ITVersity Vagrant Boxes.
Airflow
⭐
8
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql
Data Engineering Interviews
⭐
7
Data engineering interviews Q&A for data community by data community
Dataengineering Youtube Project
⭐
6
Data Engineering Youtube Project
Sparklyclean
⭐
6
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
Data Engineer Portfolio
⭐
6
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
Data.engineers.lunch
⭐
6
Resources from weekly Zoom lunches revolving around Data Engineering. Hosted by Anant Corporation.
Awesome Data Pipeline
⭐
6
Awesome list for datapipeline
Dataengineering
⭐
6
The Data Engineering subteam of Cornell Data Science
Spark Databricks
⭐
6
🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀
Spark Structured Streaming Kafka
⭐
5
Spark Structured Streaming + Kafka + Delta pipeline.
Data Readings
⭐
5
Reading List in Data Systems
Udacity Data Engineering Nanodegree
⭐
5
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
Docker_spark_history_ui
⭐
5
A dockerised version of the spark history server which enables us to access metrics in the spark ui from a log generated by AWS glue
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Spark Pyspark (812)
Docker Spark (693)
1-70 of 70 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.