|Project Name||Stars||Downloads||Repos Using This||Packages Using This||Most Recent Commit||Total Releases||Latest Release||Open Issues||License||Language|
|Spark||35,897||2,394||882||8 hours ago||46||May 09, 2021||269||apache-2.0||Scala|
|Apache Spark - A unified analytics engine for large-scale data processing|
|Cookbook||11,769||2 months ago||110||apache-2.0|
|The Data Engineering Cookbook|
|God Of Bigdata||7,992||2 months ago||2|
|Zeppelin||6,058||32||23||2 days ago||2||June 21, 2017||141||apache-2.0||Java|
|Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.|
|Sparkinternals||4,665||2 years ago||27|
|Notes talking about the design and implementation of Apache Spark|
|Iceberg||4,334||7 hours ago||4||May 23, 2022||1,351||apache-2.0||Java|
|Bigdl||4,222||10||18 hours ago||16||April 19, 2021||744||apache-2.0||Jupyter Notebook|
|Fast, distributed, secure AI for Big Data|
|Tensorflowonspark||3,851||5||19 days ago||32||April 21, 2022||13||apache-2.0||Python|
|TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.|
|Spark Nlp||3,271||2||2||a day ago||90||March 05, 2021||37||apache-2.0||Scala|
|State of the Art Natural Language Processing|
|Koalas||3,228||1||12||6 months ago||47||October 19, 2021||109||apache-2.0||Python|
|Koalas: pandas API on Apache Spark|
A curated list of awesome Apache Spark packages and resources.
Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance (Wikipedia 2017).
Users of Apache Spark may choose between different the Python, R, Scala and Java programming languages to interface with the Apache Spark APIs.
joblibbackend for running tasks on Spark clusters.
SparkSQL has serveral built-in Data Sources for files. These include
avro. It also supports JDBC databases as well as Apache Hive. Additional data sources can be added by including the packages listed below, or writing your own.
o.a.s.mlmodels without dependency on
Wikipedia. 2017. Apache Spark Wikipedia, the Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Apache_Spark&oldid=781182753.
This work (Awesome Spark, by awesome-spark/awesome-spark), identified by Maciej Szymkiewicz, is free of known copyright restrictions.
Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. This compilation is not endorsed by The Apache Software Foundation.
Inspired by sindresorhus/awesome.