Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Data Science Ipython Notebooks | 25,668 | 6 months ago | 34 | other | Python | |||||
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. | ||||||||||
Awesome Bigdata | 12,699 | a month ago | 38 | mit | ||||||
A curated list of awesome big data frameworks, ressources and other awesomeness. | ||||||||||
Trino | 9,118 | 29 | 2 months ago | 83 | November 30, 2023 | 2,496 | apache-2.0 | Java | ||
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io) | ||||||||||
Vaex | 8,161 | 2 | 29 | 24 days ago | 69 | July 21, 2023 | 508 | mit | Python | |
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀 | ||||||||||
Catboost | 7,564 | 12 | 2 months ago | 20 | September 19, 2023 | 539 | apache-2.0 | Python | ||
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU. | ||||||||||
H2o 3 | 6,618 | 62 | 33 | 2 months ago | 49 | August 09, 2023 | 2,746 | apache-2.0 | Jupyter Notebook | |
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. | ||||||||||
Pachyderm | 6,035 | 1 | 2 months ago | 613 | December 04, 2023 | 897 | apache-2.0 | Go | ||
Data-Centric Pipelines and Data Versioning | ||||||||||
Feast | 5,053 | 28 | 2 months ago | 116 | September 07, 2023 | 149 | apache-2.0 | Python | ||
Feature Store for Machine Learning | ||||||||||
Synapseml | 4,943 | 6 | 15 days ago | 12 | November 27, 2023 | 335 | mit | Scala | ||
Simple and Distributed Machine Learning | ||||||||||
Koalas | 3,291 | 1 | 16 | 6 months ago | 47 | October 19, 2021 | 112 | apache-2.0 | Python | |
Koalas: pandas API on Apache Spark |