Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Spark | 35,897 | 2,394 | 882 | 8 hours ago | 46 | May 09, 2021 | 269 | apache-2.0 | Scala | |
Apache Spark - A unified analytics engine for large-scale data processing | ||||||||||
Cookbook | 11,769 | 2 months ago | 110 | apache-2.0 | ||||||
The Data Engineering Cookbook | ||||||||||
God Of Bigdata | 7,992 | 2 months ago | 2 | |||||||
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive... | ||||||||||
Zeppelin | 6,058 | 32 | 23 | 2 days ago | 2 | June 21, 2017 | 141 | apache-2.0 | Java | |
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. | ||||||||||
Sparkinternals | 4,665 | 2 years ago | 27 | |||||||
Notes talking about the design and implementation of Apache Spark | ||||||||||
Iceberg | 4,334 | 7 hours ago | 4 | May 23, 2022 | 1,351 | apache-2.0 | Java | |||
Apache Iceberg | ||||||||||
Bigdl | 4,222 | 10 | 18 hours ago | 16 | April 19, 2021 | 744 | apache-2.0 | Jupyter Notebook | ||
Fast, distributed, secure AI for Big Data | ||||||||||
Tensorflowonspark | 3,851 | 5 | 19 days ago | 32 | April 21, 2022 | 13 | apache-2.0 | Python | ||
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters. | ||||||||||
Spark Nlp | 3,271 | 2 | 2 | a day ago | 90 | March 05, 2021 | 37 | apache-2.0 | Scala | |
State of the Art Natural Language Processing | ||||||||||
Koalas | 3,228 | 1 | 12 | 6 months ago | 47 | October 19, 2021 | 109 | apache-2.0 | Python | |
Koalas: pandas API on Apache Spark |
A curated list of awesome Apache Spark packages and resources.
Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance (Wikipedia 2017).
Users of Apache Spark may choose between different the Python, R, Scala and Java programming languages to interface with the Apache Spark APIs.
dplyr
.joblib
backend for running tasks on Spark clusters.SparkSQL has serveral built-in Data Sources for files. These include csv
, json
, parquet
, orc
, and avro
. It also supports JDBC databases as well as Apache Hive. Additional data sources can be added by including the packages listed below, or writing your own.
spark.ml
and scikit-learn
o.a.s.ml
models without dependency on SparkSession
.DataFrames
and RDDs
.Wikipedia. 2017. Apache Spark Wikipedia, the Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Apache_Spark&oldid=781182753.
This work (Awesome Spark, by awesome-spark/awesome-spark), identified by Maciej Szymkiewicz, is free of known copyright restrictions.
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. This compilation is not endorsed by The Apache Software Foundation.
Inspired by sindresorhus/awesome.