Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Spark	37,661	2,394	939	3 months ago	46	May 09, 2021	186	apache-2.0	Scala
Apache Spark - A unified analytics engine for large-scale data processing
Synapseml	4,960		6	14 days ago	12	November 27, 2023	335	mit	Scala
Simple and Distributed Machine Learning
Hudi	4,901		58	3 months ago	21	November 11, 2023	886	apache-2.0	Java
Upserts, Deletes And Incremental Processing on Big Data.
Bigdl	4,728		10	3 months ago	16	April 19, 2021	958	apache-2.0	Jupyter Notebook
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Sparkinternals	4,665			2 years ago			27
Notes talking about the design and implementation of Apache Spark
Spark Nlp	3,578		30	3 months ago	134	December 08, 2023	43	apache-2.0	Scala
State of the Art Natural Language Processing
Coolplayspark	3,447			2 years ago			35		Scala
酷玩 Spark: Spark 源代码解析、Spark 类库等
Koalas	3,291	1	16	7 months ago	47	October 19, 2021	112	apache-2.0	Python
Koalas: pandas API on Apache Spark
Spark Notebook	3,147			a year ago			207	apache-2.0	JavaScript
Interactive and Reactive Data Science using Scala and Spark.
Deequ	3,044		6	3 months ago	37	November 09, 2023	141	apache-2.0	Scala
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Alternatives To Net.jgp.labs.spark

Select To Compare

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

dependent packages 939total releases 46most recent commit 3 months ago

Synapseml ⭐ 4,960

Simple and Distributed Machine Learning

dependent packages 6total releases 12most recent commit 14 days ago

Hudi ⭐ 4,901

Upserts, Deletes And Incremental Processing on Big Data.

dependent packages 58total releases 21most recent commit 3 months ago

Bigdl ⭐ 4,728

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm

dependent packages 10total releases 16most recent commit 3 months ago

Sparkinternals ⭐ 4,665

Notes talking about the design and implementation of Apache Spark

most recent commit 2 years ago

Spark Nlp ⭐ 3,578

State of the Art Natural Language Processing

dependent packages 30total releases 134most recent commit 3 months ago

Coolplayspark ⭐ 3,447

酷玩 Spark: Spark 源代码解析、Spark 类库等

most recent commit 2 years ago

Koalas ⭐ 3,291

Koalas: pandas API on Apache Spark

dependent packages 16total releases 47most recent commit 7 months ago

Spark Notebook ⭐ 3,147

Interactive and Reactive Data Science using Scala and Spark.

most recent commit a year ago

Deequ ⭐ 3,044

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

dependent packages 6total releases 37most recent commit 3 months ago

Suggest An Alternative To net.jgp.labs.spark

Alternative Project Comparisons

Net.jgp.labs.spark vs Spark

Net.jgp.labs.spark vs Synapseml

Net.jgp.labs.spark vs Hudi

Net.jgp.labs.spark vs Bigdl

Net.jgp.labs.spark vs Sparkinternals

Net.jgp.labs.spark vs Spark Nlp

Net.jgp.labs.spark vs Coolplayspark

Net.jgp.labs.spark vs Koalas

Net.jgp.labs.spark vs Spark Notebook

Net.jgp.labs.spark vs Deequ

Popular Spark Projects

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

most recent commit 6 months ago

Redash ⭐ 24,479

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

dependent packages 3total releases 2latest release May 05, 2020most recent commit 3 months ago

Docker_practice ⭐ 23,279

Learn and understand Docker&Container technologies, with real DevOps practice!

total releases 9latest release December 01, 2021most recent commit 4 months ago

Data Engineering Zoomcamp ⭐ 19,461

Free Data Engineering course!

most recent commit 3 months ago

Bigdata Notes ⭐ 14,872

大数据入门指南 :star:

most recent commit 4 months ago

Popular Apache Spark Projects

Mlflow ⭐ 16,343

Open source platform for the machine learning lifecycle

dependent packages 440total releases 86latest release December 07, 2023most recent commit 3 months ago

Data Engineer Handbook ⭐ 5,650

This is a repo with links to everything you'd ever want to learn about data engineering

most recent commit 3 months ago

Lakefs ⭐ 3,900

lakeFS - Data version control for your data lake | Git for data

dependent packages 5total releases 83latest release November 13, 2023most recent commit 3 months ago

Feathr ⭐ 1,886

Feathr – A scalable, unified data and AI engineering platform for enterprise

dependent packages 4total releases 48latest release June 30, 2023most recent commit 3 months ago

Oryx ⭐ 1,793

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

dependent packages 2total releases 14latest release November 25, 2018most recent commit 3 years ago

Popular Data Processing Categories