Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Spark | 37,661 | 2,394 | 939 | 5 months ago | 46 | May 09, 2021 | 186 | apache-2.0 | Scala | |
Apache Spark - A unified analytics engine for large-scale data processing | ||||||||||
Data Science Ipython Notebooks | 25,668 | 9 months ago | 34 | other | Python | |||||
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. | ||||||||||
Bigdata Notes | 14,872 | 6 months ago | 39 | Java | ||||||
大数据入门指南 :star: | ||||||||||
Cookbook | 12,557 | 6 months ago | 111 | apache-2.0 | ||||||
The Data Engineering Cookbook | ||||||||||
God Of Bigdata | 8,483 | a year ago | 3 | |||||||
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive... | ||||||||||
Delta | 6,656 | 45 | 5 months ago | 24 | May 24, 2023 | 601 | apache-2.0 | HTML | ||
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs | ||||||||||
H2o 3 | 6,618 | 62 | 33 | 5 months ago | 49 | August 09, 2023 | 2,746 | apache-2.0 | Jupyter Notebook | |
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. | ||||||||||
Zeppelin | 6,259 | 32 | 31 | 3 months ago | 2 | June 21, 2017 | 160 | apache-2.0 | Java | |
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. | ||||||||||
Risingwave | 5,799 | 5 months ago | 14 | December 07, 2023 | 1,010 | apache-2.0 | Rust | |||
The distributed streaming database. Engineered to offer the simplest and most cost-efficient way for stream processing and management. | ||||||||||
Synapseml | 4,989 | 6 | a month ago | 12 | November 27, 2023 | 335 | mit | Scala | ||
Simple and Distributed Machine Learning |