Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for machine learning spark
machine-learning
x
spark
x
216 search results found
Data Science Ipython Notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Horovod
⭐
13,950
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Angel
⭐
6,690
A Flexible and Powerful Parameter Server for large-scale machine learning
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Synapseml
⭐
4,967
Simple and Distributed Machine Learning
Pipeline
⭐
4,158
PipelineAI
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Scio
⭐
2,505
A Scala API for Apache Beam and Google Cloud Dataflow.
Transmogrifai
⭐
2,099
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Easyml
⭐
1,966
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Spark
⭐
1,963
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Benchm Ml
⭐
1,839
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Fugue
⭐
1,821
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
.github
⭐
1,722
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Spark Ml Source Analysis
⭐
1,710
spark ml 算法原理剖析以及具体的源码实现分析
Petastorm
⭐
1,693
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Optimus
⭐
1,446
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Seldon Server
⭐
1,420
Machine Learning Platform and Recommendation Engine built on Kubernetes
Spark Sklearn
⭐
1,039
(Deprecated) Scikit-learn integration package for Apache Spark
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Around Dataengineering
⭐
926
A Data Engineering & Machine Learning Knowledge Hub
Sparklyr
⭐
922
R interface for Apache Spark
Sparkctr
⭐
896
CTR prediction model based on spark(LR, GBDT, DNN)
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Machinelearning
⭐
684
Machine learning resources,including algorithm, paper, dataset, example and so on.
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Onedal
⭐
584
oneAPI Data Analytics Library (oneDAL)
Sparklearning
⭐
573
Learning Apache spark,including code and data .Most part can run local.
Data Science Learning Resources
⭐
499
A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)
Complete Life Cycle Of A Data Science Project
⭐
499
Complete-Life-Cycle-of-a-Data-Science-Project
Featran
⭐
465
A Scala feature transformation library for data science and machine learning
Data On Eks
⭐
439
DoEKS is a tool to build, deploy and scale Data Platforms on Amazon EKS
Agile_data_code_2
⭐
435
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Machinelearning
⭐
406
Machine Learning
Stockinference Spark
⭐
376
Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.
Ytk Learn
⭐
351
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Sk Dist
⭐
283
Distributed scikit-learn meta-estimators in PySpark
Data Science
⭐
269
Projects and awesome list for all Data Science fields
Geni
⭐
268
A Clojure dataframe library that runs on Spark
Jpmml Sparkml
⭐
265
Java library and command-line application for converting Apache Spark ML pipelines to PMML
Openuba
⭐
264
A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Glow
⭐
251
An open-source toolkit for large-scale genomic analysis
Hydro Serving
⭐
248
MLOps Platform
Data_science_blogs
⭐
232
A repository to keep track of all the code that I end up writing for my blog posts.
Rasterframes
⭐
226
Geospatial Raster support for Spark DataFrames
Isolation Forest
⭐
211
A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Book Resources
⭐
181
Whylogs Java
⭐
179
Profile and monitor your ML data pipeline end-to-end
Spark Ml Streaming
⭐
175
Visualize streaming machine learning in Spark
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Mydatascienceportfolio
⭐
172
Applying Data Science and Machine Learning to Solve Real World Business Problems
Opaque Sql
⭐
171
An encrypted data analytics platform
Cape Dataframes
⭐
162
Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.
Qb
⭐
160
QANTA Quiz Bowl AI
Cumf_als
⭐
157
CUDA Matrix Factorization Library with Alternating Least Square (ALS)
Scalable Data Science Platform
⭐
153
Content for architecting a data science platform for products using Luigi, Spark & Flask.
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Spark Ext
⭐
147
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
Albedo
⭐
142
A recommender system for discovering GitHub repos, built with Apache Spark
Bigdata
⭐
142
hadoop,hbase,storm,spark,etc..
Sparkling Graph
⭐
134
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Lift
⭐
129
The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Rikai
⭐
127
Parquet-based ML data format optimized for working with unstructured data
Data Science Tutorials
⭐
124
Python Tutorials for Data Science
Ml Resource
⭐
110
A concise resource repository for machine learning
Spark Mllib Twitter Sentiment Analysis
⭐
103
🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib
Xlearning Xdml
⭐
101
extremely distributed machine learning
Spark Nlp Models
⭐
100
Models and Pipelines for the Spark NLP library
Ros_hadoop
⭐
98
Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Pyspark2pmml
⭐
93
Python library for converting Apache Spark ML pipelines to PMML
Machine Learning With Spark Second Edition
⭐
92
Machine Learning with Spark - Second Edition, by Packt
Sift
⭐
91
Knowledge extraction from web data
Mloperator
⭐
89
Machine Learning Operator & Controller for Kubernetes
Spark_python_ml_examples
⭐
81
Spark 2.0 Python Machine Learning examples
Mleap
⭐
76
MLeap allows for easily putting Spark ML pipelines into production
Resilient Ml Research Platform
⭐
76
Spark_scala_ml_examples
⭐
75
Spark 2.0 Scala Machine Learning examples
Generator Mitosis
⭐
75
A micro-service infrastructure generator based on Yeoman/Chatbot, Kubernetes/Docker Swarm, Traefik, Ansible, Jenkins, Spark, Hadoop, Kafka, etc.
Holoclean Legacy Deprecated
⭐
74
A Machine Learning System for Data Enrichment.
Scalableml
⭐
71
COM6012 Scalable Machine Learning - University of Sheffield
Spark Redis Ml
⭐
66
A spark package for loading Spark ML models to Redis-ML
Mltoolkits
⭐
65
learningOrchestra is a distributed Machine Learning integration tool that facilitates and streamlines iterative processes in a Data Science project.
Pypmml
⭐
64
Python PMML scoring library
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Awesome Ai Kubernetes
⭐
62
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
W2v
⭐
62
Word2Vec models with Twitter data using Spark. Blog:
Frovedis
⭐
62
Framework of vectorized and distributed data analytics
Sparkml
⭐
61
Spark ML with pyspark
Spark
⭐
60
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
Nlp_spark
⭐
58
Natural Language Processing with Spark's MLlib
Learning
⭐
57
Walkthrough notebooks for Deep Learning, Machine Learning, Reinforcement Learning, Spark, Statistics, Algorithms, Scala, Python
Bigdataanalytics_infoh515
⭐
56
Material for the Big Data Analytics exercise classes - INFOH515 - Big Data : Distributed Data Management and Scalable Analytics - Université Libre de Bruxelles
Playdata Zeppelin Notebook
⭐
55
Zeppelin 화재 뉴스 기사 분류 예제
Epitweetr
⭐
54
ECDC Early warning tool using Twitter data
Related Searches
Python Machine Learning (14,099)
Jupyter Notebook Machine Learning (12,247)
Machine Learning Neural Network (4,361)
Machine Learning Data Science (4,304)
Machine Learning Artificial Intelligence (4,079)
Machine Learning Natural Language Processing (3,891)
Scala Spark (3,279)
Machine Learning Tensorflow (2,982)
Python Spark (2,053)
Machine Learning Computer Vision (1,966)
1-100 of 216 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.