Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Data Science Ipython Notebooks | 25,668 | 6 months ago | 34 | other | Python | |||||
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. | ||||||||||
Ds Cheatsheets | 11,535 | a year ago | 7 | mit | ||||||
List of Data Science Cheatsheets to rule the world | ||||||||||
Dagster | 9,467 | 2 | 133 | 3 months ago | 585 | December 07, 2023 | 2,343 | apache-2.0 | Python | |
An orchestration platform for the development, production, and observation of data assets. | ||||||||||
H2o 3 | 6,618 | 62 | 33 | 3 months ago | 49 | August 09, 2023 | 2,746 | apache-2.0 | Jupyter Notebook | |
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. | ||||||||||
Mage Ai | 6,324 | 3 months ago | 314 | December 06, 2023 | 189 | apache-2.0 | Python | |||
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data. | ||||||||||
Synapseml | 4,967 | 6 | 3 days ago | 12 | November 27, 2023 | 335 | mit | Scala | ||
Simple and Distributed Machine Learning | ||||||||||
Koalas | 3,291 | 1 | 16 | 7 months ago | 47 | October 19, 2021 | 112 | apache-2.0 | Python | |
Koalas: pandas API on Apache Spark | ||||||||||
Spark Notebook | 3,147 | a year ago | 207 | apache-2.0 | JavaScript | |||||
Interactive and Reactive Data Science using Scala and Spark. | ||||||||||
Benchm Ml | 1,839 | 2 years ago | 11 | mit | R | |||||
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.). | ||||||||||
Spark Py Notebooks | 1,515 | a year ago | 9 | other | Jupyter Notebook | |||||
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks |