Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data science pyspark
data-science
x
pyspark
x
46 search results found
Synapseml
⭐
4,967
Simple and Distributed Machine Learning
Machine Learning
⭐
2,607
🌎 machine learning tutorials (mainly in Python3)
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Optimus
⭐
1,446
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Hopsworks
⭐
1,041
Hopsworks - Data-Intensive AI platform with a Feature Store
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Kuwala
⭐
610
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demograp
Pandapy
⭐
483
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Datacompy
⭐
339
Pandas and Spark DataFrame comparison for humans and more!
Sk Dist
⭐
283
Distributed scikit-learn meta-estimators in PySpark
Butterfree
⭐
269
A tool for building feature stores.
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Data_science_blogs
⭐
232
A repository to keep track of all the code that I end up writing for my blog posts.
Pyspark Cheatsheet
⭐
230
🐍 Quick reference guide to common patterns & functions in PySpark.
Nyc Transport
⭐
144
A Unified Database of NYC transport (subway, taxi/Uber, and citibike) data.
Spark R Notebooks
⭐
109
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Dataanalysiswithpythonandpyspark
⭐
102
Code repository for the "PySpark in Action" book
Anovos
⭐
78
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
W2v
⭐
62
Word2Vec models with Twitter data using Spark. Blog:
Stork
⭐
47
Make your libraries magically appear in Databricks.
Pyspark Algorithms
⭐
33
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Data Analytics Services
⭐
33
This repo collects the open-source work of the Analytics Service within NHS Digital Data Services
Spark Studyclub
⭐
31
Grupo de Estudios de Apache Spark organizado por la comunidad Data Engineering Latam
Decorators4ds
⭐
27
Useful decorators every Data Scientist should know
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Courses
⭐
25
Just the stuff from the faculty (homework, projects, lectures)
Springboard Data Science Immersive
⭐
23
Pyspark K8s Boilerplate
⭐
23
Boilerplate for PySpark on Cloud Kubernetes
Data Science Learning Paths
⭐
22
Practical data science courses - from basic to intermediate
Ds30_5
⭐
18
Data Science in 30 Minutes #5: Spark
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Pyspark For Data Processing
⭐
16
Code for my presentation: Using PySpark to Process Boat Loads of Data
Rheoceros
⭐
15
Cloud-based AI / ML workflow and data application development framework
Nyc_taxi_trip_duration
⭐
13
Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS
Dataquest
⭐
12
Data Science Massive Open Online Course: All the code, notes and supplementary materials generated during the course of my data scientific learning.
Distributed Machine Learning
⭐
12
PySpark, Databrick, h2o, MLlib
Data Engineering
⭐
9
Common data manipulations in different languages and frameworks.
Databricks Datascience Titanic
⭐
9
A walk-through of data science basics using PySpark, MLflow and the Titanic dataset
Gift
⭐
9
Gold Idea First Templates covering data, analytics and visualization.
Pyspark Template
⭐
8
A Python PySpark Projet with Poetry
Anova_in_pyspark
⭐
8
Custom one-way ANOVA implementation using PySpark
Analysis
⭐
8
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
Aws Glue Monorepo Style
⭐
7
Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.
Optimus Examples
⭐
6
Examples for Optimus a Data Cleansing Library for Big Data.
Spark Data Analysis Projects
⭐
6
A collection of data analysis projects done using PySpark via Jupyter notebooks.
Datascience Playground
⭐
6
A scalable, cloud-ready environment for Data Science using Docker
Cheatsheets
⭐
6
This repo contains all the cheatsheets that I found Important.
Msc In Machine Learning And Artificial Intelligence
⭐
6
Master of Science in Machine Learning & Artificial Intelligence - Indian Institute Technology Madras & Liverpool John Moores University
Sms Spam Filter Using Hortonworks
⭐
5
Build Spam Filter Model on HDP using Watson Studio Local
Cookiecutter Ds Docker
⭐
5
A Docker-based Data Science cookiecutter (for myself)
Docker Spark Anaconda
⭐
5
Spark and Anaconda in Docker
Apachespark Pyspark 2023
⭐
5
PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.
Related Searches
Python Data Science (6,905)
Machine Learning Data Science (5,390)
Jupyter Notebook Data Science (3,869)
R Data Science (1,164)
Deep Learning Data Science (1,039)
Data Science Pandas (948)
Spark Pyspark (773)
Python Pyspark (689)
Html Data Science (671)
1-46 of 46 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.