Data Science Study Materials

"Big data is at the foundation of all the megatrends that are happening." – Chris Lynch
Alternatives To Data Science Study Materials
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Data Science Ipython Notebooks25,668
9 months ago34otherPython
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Trino9,118295 months ago83November 30, 20232,496apache-2.0Java
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Vaex8,1612294 months ago69July 21, 2023508mitPython
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Catboost7,564125 months ago20September 19, 2023539apache-2.0Python
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
H2o 36,61862335 months ago49August 09, 20232,746apache-2.0Jupyter Notebook
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Pachyderm6,03515 months ago613December 04, 2023897apache-2.0Go
Data-Centric Pipelines and Data Versioning
Feast5,342285 days ago116September 07, 2023149apache-2.0Python
The Open Source Feature Store for Machine Learning
Synapseml4,9896a month ago12November 27, 2023335mitScala
Simple and Distributed Machine Learning
Koalas3,2911169 months ago47October 19, 2021112apache-2.0Python
Koalas: pandas API on Apache Spark
Graphscope3,03315 months ago452December 09, 2023302apache-2.0C++
🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba 来自阿里巴巴的一站式大规模图计算系统 图分析 图查询 图机器学习
Alternatives To Data Science Study Materials
Select To Compare


Alternative Project Comparisons
Popular Data Science Projects
Popular Big Data Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Data Science
Statistics
Big Data