Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Daft | 1,012 | 3 | 3 months ago | 53 | December 05, 2023 | 101 | apache-2.0 | Rust | ||
Distributed DataFrame for Python designed for the cloud, powered by Rust | ||||||||||
Metorikku | 536 | a year ago | 126 | February 27, 2023 | 65 | mit | Scala | |||
A simplified, lightweight ETL Framework based on Apache Spark | ||||||||||
Geni | 268 | 5 months ago | 33 | October 14, 2020 | 14 | apache-2.0 | Clojure | |||
A Clojure dataframe library that runs on Spark | ||||||||||
Spark With Python | 98 | 4 years ago | mit | Jupyter Notebook | ||||||
Fundamentals of Spark with Python (using PySpark), code examples | ||||||||||
Pyspark Algorithms | 33 | 4 years ago | 2 | other | Python | |||||
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2 | ||||||||||
Pandas Farm | 11 | 7 years ago | mit | Python | ||||||
Parallelize pandas operations easily on your personal small cluster | ||||||||||
Pdf2dataset | 8 | 4 years ago | 15 | September 13, 2020 | 9 | apache-2.0 | Python | |||
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features |