Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Koalas	3,291	1	16	7 months ago	47	October 19, 2021	112	apache-2.0	Python
Koalas: pandas API on Apache Spark
Ballista	2,244		13	3 years ago	4	May 10, 2020		apache-2.0
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Graphframes	944	2	8	7 months ago	1	December 05, 2018	165	apache-2.0	Scala

Mobius	937	6		3 months ago	22	January 29, 2017	88	mit	C#
C# and F# language binding and extensions to Apache Spark
Spark Redis	926		3	6 months ago	5	June 14, 2022	133	bsd-3-clause	Scala
A connector for Spark that allows reading and writing to/from Redis cluster
Spark Daria	738		1	2 years ago	7	February 09, 2022	11	mit	Scala
Essential Spark extensions and helper methods ✨😲
Datafusion	626			5 years ago				apache-2.0	Rust
DataFusion has now been donated to the Apache Arrow project
Metorikku	536			a year ago	126	February 27, 2023	65	mit	Scala
A simplified, lightweight ETL Framework based on Apache Spark
Spark Avro	535	47	39	5 years ago	8	October 30, 2017	77	apache-2.0	Scala
Avro Data Source for Apache Spark
Traceml	490	45	12	2 months ago	10	November 25, 2021	6	apache-2.0	Python
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

Alternatives To Apache Spark Node

Select To Compare

Koalas ⭐ 3,291

Koalas: pandas API on Apache Spark

dependent packages 16total releases 47most recent commit 7 months ago

Ballista ⭐ 2,244

Distributed compute platform implemented in Rust, and powered by Apache Arrow.

dependent packages 13total releases 4most recent commit 3 years ago

Graphframes ⭐ 944

dependent packages 8total releases 1most recent commit 7 months ago

Mobius ⭐ 937

C# and F# language binding and extensions to Apache Spark

total releases 22most recent commit 3 months ago

Spark Redis ⭐ 926

A connector for Spark that allows reading and writing to/from Redis cluster

dependent packages 3total releases 5most recent commit 6 months ago

Spark Daria ⭐ 738

Essential Spark extensions and helper methods ✨😲

dependent packages 1total releases 7most recent commit 2 years ago

Datafusion ⭐ 626

DataFusion has now been donated to the Apache Arrow project

most recent commit 5 years ago

Metorikku ⭐ 536

A simplified, lightweight ETL Framework based on Apache Spark

total releases 126most recent commit a year ago

Spark Avro ⭐ 535

Avro Data Source for Apache Spark

dependent packages 39total releases 8most recent commit 5 years ago

Traceml ⭐ 490

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

dependent packages 12total releases 10most recent commit 2 months ago

Suggest An Alternative To apache-spark-node

Alternative Project Comparisons

Apache Spark Node vs Koalas

Apache Spark Node vs Ballista

Apache Spark Node vs Graphframes

Apache Spark Node vs Mobius

Apache Spark Node vs Spark Redis

Apache Spark Node vs Spark Daria

Apache Spark Node vs Datafusion

Apache Spark Node vs Metorikku

Apache Spark Node vs Spark Avro

Apache Spark Node vs Traceml

Popular Dataframe Projects

Polars ⭐ 24,900

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

dependent packages 346total releases 328latest release December 01, 2023most recent commit a month ago

Modin ⭐ 9,275

Modin: Scale your Pandas workflows by changing a single line of code

dependent packages 25total releases 89latest release November 17, 2023most recent commit 3 months ago

Pygwalker ⭐ 8,698

PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

dependent packages 3total releases 124latest release December 10, 2023most recent commit 3 months ago

Vaex ⭐ 8,161

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

dependent packages 29total releases 69latest release July 21, 2023most recent commit 2 months ago

Cudf ⭐ 6,936

cuDF - GPU DataFrame Library

dependent packages 3total releases 31latest release October 12, 2023most recent commit 3 months ago

Popular Spark Projects

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

dependent packages 939total releases 46latest release May 09, 2021most recent commit 3 months ago

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

most recent commit 6 months ago

Redash ⭐ 24,479

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

dependent packages 3total releases 2latest release May 05, 2020most recent commit 3 months ago

Docker_practice ⭐ 23,279

Learn and understand Docker&Container technologies, with real DevOps practice!

total releases 9latest release December 01, 2021most recent commit 4 months ago

Data Engineering Zoomcamp ⭐ 19,461

Free Data Engineering course!

most recent commit 3 months ago

Popular Control Flow Categories