Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for big data pyspark
big-data
x
pyspark
x
37 search results found
Synapseml
⭐
4,967
Simple and Distributed Machine Learning
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Optimus
⭐
1,446
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Sparklearning
⭐
451
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Geopyspark
⭐
151
GeoTrellis for PySpark
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Spark R Notebooks
⭐
109
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Big Data Engineering Coursera Yandex
⭐
91
Big Data for Data Engineers Coursera Specialization from Yandex
Bitcoin Value Predictor
⭐
90
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Anovos
⭐
78
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Cuallee
⭐
56
A data quality acceleration library to get data sets verified in a friendly interface
Big_data
⭐
55
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Mmtf Workshop 2018
⭐
53
Structural Bioinformatics Training Workshop & Hackathon 2018
Spark Select
⭐
53
A library for Spark DataFrame using MinIO Select API
Data_processing_course
⭐
53
Some class materials for a data processing course using PySpark
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Pyspark Algorithms
⭐
33
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Check Engine
⭐
30
Data validation library for PySpark 3.0.0
Detecting Malicious Url Machine Learning
⭐
23
Pyspark Setup Demo
⭐
21
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Spark And Kafka_iot Data Processing And Analytics
⭐
21
Final Project for IoT: Big Data Processing and Analytics class in UCSC Extension. Analyzing U.S nationwide temperature from IoT sensors in real-time
Graphlet
⭐
21
PyPi module for Graphlet AI Knowledge Graph Factory
Aws Youtube Analytics
⭐
20
It aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.
Pyspark
⭐
15
spark (scala and python)
Big Data Course
⭐
14
Practice course on Big Data
Pyspark On Aws Emr
⭐
13
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Bigdata Spark
⭐
12
BerkeleyX: CS100.1x, Introduction to Big Data with Apache Spark
D Pandisim
⭐
11
distributed pandemics simulator, uses the power of spark to generate huge bulks of contact-tracing data.
Pyspark Dataframe Made Easy
⭐
10
pyspark dataframe made easy
Subredditrecommend
⭐
9
Recommends you Subreddits based on Word2Vec neural net!
Data Analytics Machine Learning Big Data
⭐
8
Slides, code and more for my class: Data Analytics and Machine Learning on Big Data
Shingho
⭐
7
Shingho is a PySpark based statistical library designed for Big Data applications.
Cloudera_material
⭐
7
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
Spark On K8s
⭐
7
Presenting 3 ways to run Spark over containers, this project is recommended to those who seek to explore Big Data out of a Hadoop Cluster.
Traffic Data Analysis With Apache Spark Based On Mobile Robot Data
⭐
7
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
Optimus Examples
⭐
6
Examples for Optimus a Data Cleansing Library for Big Data.
Iceberg Intro Workshop
⭐
5
Hands-on workshop with Apache Iceberg
Coursera Bigdata Specialization
⭐
5
Coursera Specialization :Big Data for Data Engineers
Spark Streaming In Python
⭐
5
Apache Spark 3 - Structured Streaming Course Material
Az Databricks Realtime Alert System
⭐
5
Building a real-time alert monitoring pipeline that sends email notifications off of Azure Event Hubs, Azure Databricks, and a Azure Logic App
Related Searches
Spark Pyspark (773)
Python Pyspark (689)
1-37 of 37 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.