Spark Ray Data Science

Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with Spark and Ray in the context of a data scientist's standard workflow.
Alternatives To Spark Ray Data Science
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Data Science Ipython Notebooks25,668
6 months ago34otherPython
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Ds Cheatsheets11,535
a year ago7mit
List of Data Science Cheatsheets to rule the world
Dagster9,46721333 months ago585December 07, 20232,343apache-2.0Python
An orchestration platform for the development, production, and observation of data assets.
H2o 36,61862333 months ago49August 09, 20232,746apache-2.0Jupyter Notebook
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Mage Ai6,324
3 months ago314December 06, 2023189apache-2.0Python
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Synapseml4,960614 days ago12November 27, 2023335mitScala
Simple and Distributed Machine Learning
Koalas3,2911167 months ago47October 19, 2021112apache-2.0Python
Koalas: pandas API on Apache Spark
Spark Notebook3,147
a year ago207apache-2.0JavaScript
Interactive and Reactive Data Science using Scala and Spark.
Benchm Ml1,839
2 years ago11mitR
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Spark Py Notebooks1,515
a year ago9otherJupyter Notebook
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Alternatives To Spark Ray Data Science
Select To Compare


Alternative Project Comparisons
Popular Spark Projects
Popular Data Science Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Jupyter Notebook
Machine Learning
Artificial Intelligence
Data Science
Spark
Distributed Computing