Pyspark Example Project Alternatives

Name: AlexIoannides/pyspark-example-project
Brand: AlexIoannides/pyspark-example-project
SKU: project/AlexIoannides/pyspark-example-project
Rating: 4.8 (1034 reviews)

Example project implementing best practices for PySpark ETL jobs and applications.

Categories > Data Processing > Data Science

Suggest Alternative

Stars

1,034

Alternatives

License

No license specified

Open Issues

Most Recent Commit

over 3 years ago

Programming Language

Python

Dependent Repos

Dependent Packages

Total Releases

Categories

Programming Languages > Python

Data Processing > Data Science

Data Processing > Spark

Data Processing > Etl

Data Processing > Data Engineering

Data Processing > Pyspark

Repo

Alternatives To AlexIoannides/pyspark-example-project

Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
AlexIoannides/pyspark-example-project	1,034	0	0	over 3 years ago	0		11		Python
Example project implementing best practices for PySpark ETL jobs and applications.
quintoandar/butterfree	269	0	1	over 2 years ago	35	November 14, 2023	6	apache-2.0	Python
A tool for building feature stores.
martandsingh/ApacheSpark	59	0	0	over 3 years ago	0		0		Python
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
vim89/datapipelines-essentials-python	45	0	0	about 3 years ago	0		1	apache-2.0	Python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
basin-etl/basin	29	0	0	over 3 years ago	0		42	other	TypeScript
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
mozilla/python_mozetl	26	0	0	over 2 years ago	0		23	mit	Python
ETL jobs for Firefox Telemetry
guidok91/spark-movies-etl	21	0	0	almost 3 years ago	0		2		Python
Spark data pipeline that ingests and transforms movie ratings data.
ksbg/sparklanes	16	1	0	over 6 years ago	5	January 31, 2019	2	mit	Python
A lightweight data processing framework for Apache Spark
datayoga-io/lineage	14	0	2	over 4 years ago	11	January 26, 2022	0	apache-2.0	TypeScript
Generate beautiful documentation for your data pipelines in markdown format
telia-oss/birgitta	12	0	0	over 3 years ago	34	September 10, 2020	20	mit	Python
Birgitta is a Python ETL test and schema framework, providing automated tests for pyspark notebooks/recipes.

Alternatives To AlexIoannides/pyspark-example-project

Select To Compare

AlexIoannides/pyspark-example-project ⭐ 1,034

Example project implementing best practices for PySpark ETL jobs and applications.

dependent packages 0 total releases 0 most recent commit over 3 years ago

quintoandar/butterfree ⭐ 269

A tool for building feature stores.

dependent packages 1 total releases 35 most recent commit over 2 years ago downloads badge

martandsingh/ApacheSpark ⭐ 59

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

dependent packages 0 total releases 0 most recent commit over 3 years ago

vim89/datapipelines-essentials-python ⭐ 45

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

dependent packages 0 total releases 0 most recent commit about 3 years ago

basin-etl/basin ⭐ 29

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

dependent packages 0 total releases 0 most recent commit over 3 years ago

mozilla/python_mozetl ⭐ 26

ETL jobs for Firefox Telemetry

dependent packages 0 total releases 0 most recent commit over 2 years ago

guidok91/spark-movies-etl ⭐ 21

Spark data pipeline that ingests and transforms movie ratings data.

dependent packages 0 total releases 0 most recent commit almost 3 years ago

ksbg/sparklanes ⭐ 16

A lightweight data processing framework for Apache Spark

dependent packages 0 total releases 5 most recent commit over 6 years ago downloads badge

datayoga-io/lineage ⭐ 14

Generate beautiful documentation for your data pipelines in markdown format

dependent packages 2 total releases 11 most recent commit over 4 years ago downloads badge

telia-oss/birgitta ⭐ 12

Birgitta is a Python ETL test and schema framework, providing automated tests for pyspark notebooks/recipes.

dependent packages 0 total releases 34 most recent commit over 3 years ago downloads badge

Suggest An Alternative To pyspark-example-project

Alternative Project Comparisons

AlexIoannides/pyspark-example-project vs Pyspark Example Project

AlexIoannides/pyspark-example-project vs Butterfree

AlexIoannides/pyspark-example-project vs Apachespark

AlexIoannides/pyspark-example-project vs Datapipelines Essentials Python

AlexIoannides/pyspark-example-project vs Basin

AlexIoannides/pyspark-example-project vs Python_mozetl

AlexIoannides/pyspark-example-project vs Spark Movies Etl

AlexIoannides/pyspark-example-project vs Sparklanes

AlexIoannides/pyspark-example-project vs Lineage

AlexIoannides/pyspark-example-project vs Birgitta

Popular Etl Projects

pingcap/tidb⭐ 35,604

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial

apache/airflow⭐ 33,219

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

airbytehq/airbyte⭐ 12,918

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

apache/doris⭐ 10,666

Apache Doris is an easy-to-use, high performance and unified analytics database.

dagster-io/dagster⭐ 9,467

An orchestration platform for the development, production, and observation of data assets.

Popular Pyspark Projects

kailashahirwar/cheatsheets-ai⭐ 13,281

Essential Cheat Sheets for deep learning and machine learning researchers https://medium.com/@kailashahirwar/essential-cheat-sheets-for-machine-learning-and-deep-learning-researchers-efb6a8ebd2e5

microsoft/SynapseML⭐ 5,228

Simple and Distributed Machine Learning

JohnSnowLabs/spark-nlp⭐ 3,578

State of the Art Natural Language Processing

apache/linkis⭐ 3,407

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

ibis-project/ibis⭐ 3,404

The flexibility of Python with the scale and performance of modern SQL.

Popular Data Processing Categories

Jupyter Notebook

Dataset

Sql

Validation

Pipeline

Translation

Data Science

Classification

Transaction

Scraper