Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark pyspark
pyspark
x
spark
x
533 search results found
Synapseml
⭐
4,566
Simple and Distributed Machine Learning
Spark Nlp
⭐
3,440
State of the Art Natural Language Processing
Ibis
⭐
3,160
The flexibility of Python with the scale and performance of modern SQL.
Linkis
⭐
3,136
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Petastorm
⭐
1,614
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Mleap
⭐
1,474
MLeap: Deploy ML Pipelines to Production
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Optimus
⭐
1,406
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Sparkmagic
⭐
1,254
Jupyter magics and kernels for working with remote Spark clusters
Bigflow
⭐
1,122
Baidu Bigflow is an interface that allows for writing distributed computing programs and provides lots of simple, flexible, powerful APIs. Using Bigflow, you can easily handle data of any scale. Bigflow processes 4P+ data inside Baidu and runs about 10k jobs every day.
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Pyspark Tutorial
⭐
959
PySpark-Tutorial provides basic algorithms using PySpark
Sparkling Water
⭐
954
Sparkling Water provides H2O functionality inside Spark cluster
Scriptis
⭐
767
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Devops Python Tools
⭐
675
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Eat_pyspark_in_10_days
⭐
534
pyspark🍒🥭 is delicious,just eat it!😋😋
Sparklearning
⭐
451
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
Findspark
⭐
428
Learningpyspark
⭐
409
Code base for the Learning PySpark book (in preparation)
Spark Syntax
⭐
391
This is a repo documenting the best practices in PySpark.
Miscellaneous
⭐
376
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Sparklingpandas
⭐
338
Sparkling Pandas
Datacompy
⭐
316
Pandas and Spark DataFrame comparison for humans and more!
Spark Standalone Cluster On Docker
⭐
311
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡️
Learning Pyspark
⭐
294
Code repository for Learning PySpark by Packt
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Sk Dist
⭐
283
Distributed scikit-learn meta-estimators in PySpark
Cc Pyspark
⭐
280
Process Common Crawl data with Python and Spark
Spark Gotchas
⭐
276
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Butterfree
⭐
268
A tool for building feature stores.
Pyspark Style Guide
⭐
264
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Data_science_blogs
⭐
232
A repository to keep track of all the code that I end up writing for my blog posts.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Pyspark Cheatsheet
⭐
230
🐍 Quick reference guide to common patterns & functions in PySpark.
Joblib Spark
⭐
226
Joblib Apache Spark Backend
Zeppelin Notebooks
⭐
206
Gallery of Apache Zeppelin notebooks
Dbldatagen
⭐
203
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Azure Cosmosdb Spark
⭐
194
Apache Spark Connector for Azure Cosmos DB
Learningapachespark
⭐
192
LearningApacheSpark
Hnswlib
⭐
178
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Cloud Dataproc
⭐
173
Cloud Dataproc: Samples and Utils
Drunken Data Quality
⭐
167
Spark package for checking data quality
Spark Practice
⭐
153
Apache Spark (PySpark) Practice on Real Data
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Geopyspark
⭐
151
GeoTrellis for PySpark
Spark Iforest
⭐
147
Isolation Forest on Spark
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Big Data Mapreduce Course
⭐
131
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Handyspark
⭐
129
HandySpark - bringing pandas-like capabilities to Spark dataframes
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Spark Extension
⭐
127
A library that provides useful extensions to Apache Spark and PySpark.
Aliyun Emapreduce Demo
⭐
123
Mastering Big Data Analytics With Pyspark
⭐
118
Mastering Big Data Analytics with PySpark, Published by Packt
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Spark Df Profiling
⭐
115
Create HTML profiling reports from Apache Spark DataFrames
Spark Knn Recommender
⭐
113
Item and User-based KNN recommendation algorithms using PySpark
Spark R Notebooks
⭐
109
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Medium Articles
⭐
97
Repo for all my code on the articles I post on medium
Movalytics Data Warehouse
⭐
93
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Relation_extraction
⭐
93
Relation Extraction using Deep learning(CNN)
Big Data Engineering Coursera Yandex
⭐
91
Big Data for Data Engineers Coursera Specialization from Yandex
Pyspark Predictive Maintenance
⭐
85
Predictive Maintenance using Pyspark
Phrase At Scale
⭐
84
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Spark_python_ml_examples
⭐
81
Spark 2.0 Python Machine Learning examples
Pyspark Cassandra
⭐
81
PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.
Azure Databricks Nyc Taxi Workshop
⭐
80
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Pyspark Cookbook
⭐
76
PySpark Cookbook, published by Packt
Python Spark Streaming
⭐
73
Learn By Examples
⭐
72
Real-world Spark pipelines examples
Jupyterlab Sparkmonitor
⭐
72
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Pyspark Cassandra
⭐
67
pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Jgit Spark Connector
⭐
67
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Delta Architecture
⭐
66
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Pyspark_dist_explore
⭐
64
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
Pypmml
⭐
64
Python PMML scoring library
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
W2v
⭐
62
Word2Vec models with Twitter data using Spark. Blog:
Sparkml
⭐
61
Spark ML with pyspark
Spark
⭐
60
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
Pysparkgeoanalysis
⭐
60
🌐 Interactive Workshop on GeoAnalysis using PySpark
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Sparkly
⭐
56
Helpers & syntactic sugar for PySpark.
Pyspark Setup Guide
⭐
54
A guide for setting up Spark + PySpark under Ubuntu linux
Data_processing_course
⭐
53
Some class materials for a data processing course using PySpark
Big_data
⭐
53
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Spark Select
⭐
53
A library for Spark DataFrame using MinIO Select API
Mlflow Spark Summit 2019
⭐
52
MLFlow Spark Summit 2019 Presentation
Pyspark Elastic
⭐
52
PySpark for Elastic Search
Spark Training
⭐
52
Repository used for Spark Trainings
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Spark Hive Udf
⭐
47
Example project showing how to use Hive UDFs in Apache Spark
Sparkudfexamples
⭐
46
Spark SQL UDF examples
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Smv
⭐
41
Spark Modularized View
Spark Nba Analytics
⭐
41
Analyzing NBA data using Spark 2.1
Spark Dgraph Connector
⭐
40
A connector for Apache Spark and PySpark to Dgraph databases.
Related Searches
Scala Spark (3,279)
Python Spark (2,044)
Java Spark (1,596)
Jupyter Spark (1,284)
Spark Hadoop (1,199)
Apache Spark (1,178)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Python Pyspark (782)
1-100 of 533 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.