Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data science spark
data-science
x
spark
x
119 search results found
Data Science Ipython Notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Ds Cheatsheets
⭐
11,535
List of Data Science Cheatsheets to rule the world
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Synapseml
⭐
4,967
Simple and Distributed Machine Learning
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Spark Notebook
⭐
3,147
Interactive and Reactive Data Science using Scala and Spark.
Benchm Ml
⭐
1,839
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Optimus
⭐
1,446
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Pixiedust
⭐
1,035
Python Helper library for Jupyter Notebooks
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Splink
⭐
939
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Around Dataengineering
⭐
926
A Data Engineering & Machine Learning Knowledge Hub
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Vegas
⭐
682
The missing MatPlotLib for Scala + Spark
Data Science With Ruby
⭐
664
Practical Data Science with Ruby based tools.
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Onedal
⭐
584
oneAPI Data Analytics Library (oneDAL)
Data Science Learning Resources
⭐
499
A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)
Complete Life Cycle Of A Data Science Project
⭐
499
Complete-Life-Cycle-of-a-Data-Science-Project
Traceml
⭐
490
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Featran
⭐
465
A Scala feature transformation library for data science and machine learning
Popmon
⭐
461
Monitor the stability of a Pandas or Spark dataframe ⚙︎
Agile_data_code_2
⭐
435
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Datacompy
⭐
339
Pandas and Spark DataFrame comparison for humans and more!
Sk Dist
⭐
283
Distributed scikit-learn meta-estimators in PySpark
Datavines
⭐
275
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Data Science
⭐
269
Projects and awesome list for all Data Science fields
Butterfree
⭐
269
A tool for building feature stores.
Geni
⭐
268
A Clojure dataframe library that runs on Spark
Openuba
⭐
264
A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Kamu Cli
⭐
263
New generation decentralized data lake and a streaming data pipeline
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Data_science_blogs
⭐
232
A repository to keep track of all the code that I end up writing for my blog posts.
Pyspark Cheatsheet
⭐
230
🐍 Quick reference guide to common patterns & functions in PySpark.
Intro_ds
⭐
229
Code to accompany Mastering Data Science from PT press
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Flytekit
⭐
175
Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Mydatascienceportfolio
⭐
172
Applying Data Science and Machine Learning to Solve Real World Business Problems
Spark Alchemy
⭐
169
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Visions
⭐
166
Type System for Data Analysis in Python
Cape Dataframes
⭐
162
Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.
Scalable Data Science Platform
⭐
153
Content for architecting a data science platform for products using Luigi, Spark & Flask.
Python Bigdata
⭐
128
Data science and Big Data with Python
Data Science Tutorials
⭐
124
Python Tutorials for Data Science
Ml Resource
⭐
110
A concise resource repository for machine learning
Spark R Notebooks
⭐
109
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Pulsar Spark
⭐
103
Spark Connector to read and write with Pulsar
Gallia Core
⭐
79
A schema-aware Scala library for data transformation
Tiledb Vcf
⭐
79
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Holoclean Legacy Deprecated
⭐
74
A Machine Learning System for Data Enrichment.
Sit742
⭐
72
SIT742: Modern Data Science
Udacity Data Engineer Nanodegree
⭐
64
Classwork projects and home works done through Udacity data engineering nano degree
Pythom
⭐
64
Code supporting Data Science articles at The Marketing Technologist, Floryn Tech Blog, and Pythom.nl
W2v
⭐
62
Word2Vec models with Twitter data using Spark. Blog:
Awesome Ai Kubernetes
⭐
62
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Visualize Data With Python
⭐
60
A Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Kafka Streaming Click Analysis
⭐
56
Use Kafka and Apache Spark streaming to perform click stream analytics
Cowait
⭐
54
Containerized distributed programming framework for Python
Books
⭐
53
A collection of online books for data science, computer science and coding!
Prosto
⭐
53
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Spotify Song Recommendation Ml
⭐
52
UC Berkeley team's submission for RecSys Challenge 2018
Mastering Spark For Data Science
⭐
43
Mastering Spark for Data Science, published by Packt
Architect_big_data_solutions_with_spark
⭐
42
code, labs and lectures for the course
Big Data
⭐
37
Python tools for big data
Posts
⭐
34
A list of all my posts and personal projects
Groovy Data Science
⭐
34
Some Data Science examples using Groovy
Pyspark Algorithms
⭐
33
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Awesome Tools
⭐
32
curated list of awesome tools and libraries for specific domains
Ides
⭐
32
智能数据探索服务(Intelligent Data Exploration Service),一站式Data + AI数据解决方案!
Spark Studyclub
⭐
31
Grupo de Estudios de Apache Spark organizado por la comunidad Data Engineering Latam
Spark Notebook Ml Labs
⭐
30
Data Science with Apache Spark and Spark Notebook
Learning Spark
⭐
29
Tidy up Spark and Hadoop tutorials.
Spark Ray Data Science
⭐
29
Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with Spark and Ray in the context of a data scientist's standard workflow.
Bdr Analytics Py
⭐
29
Common data science and data engineering utilities to help us perform analytics. Our toolbox for data scientists, licensed under Apache-2.0
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Practical Data Science With Hadoop And Spark
⭐
23
Snorkel
⭐
23
Snorkel - Bootstrap your Data Science
Awesome Sparklyr
⭐
22
An awesome sparklyr related package collection
Datasciencebox
⭐
21
Create and manage instances for data science
Learnanalytics Microsoftml
⭐
19
Introduction to Statistical Machine Learning with MicrosoftML
Ds30_5
⭐
18
Data Science in 30 Minutes #5: Spark
Spark For Data Science
⭐
17
Code repository for Spark for Data Science by Packt
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Data Mill
⭐
16
A K8s-based infrastructure for analytics
Datascience Environment
⭐
16
Docker Environment for data science
Address Index Data
⭐
16
Pyspark For Data Processing
⭐
16
Code for my presentation: Using PySpark to Process Boat Loads of Data
Interview Notes
⭐
15
有关Python、大数据、MySQL的总结
Rheoceros
⭐
15
Cloud-based AI / ML workflow and data application development framework
Bigdata_docker
⭐
13
Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook
Nyc_taxi_trip_duration
⭐
13
Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS
R_bio
⭐
12
바닥부터 시작하는 R + 바이오인포
Distributed Machine Learning
⭐
12
PySpark, Databrick, h2o, MLlib
Big_data_course_rimini_2021
⭐
11
Questa repository contiene tutto il materiale didattico utilizzato durante il corso di "Laboratorio Big Data" in collaborazione con il comune di Rimini.
Airflowjob
⭐
11
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
Conch Bigdata
⭐
10
Big Data
Related Searches
Python Data Science (6,905)
Machine Learning Data Science (5,390)
Jupyter Notebook Data Science (3,734)
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Deep Learning Data Science (1,039)
1-100 of 119 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.