Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark pyspark
pyspark
x
spark
x
248 search results found
Synapseml
⭐
5,040
Simple and Distributed Machine Learning
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Linkis
⭐
3,283
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Mleap
⭐
1,479
MLeap: Deploy ML Pipelines to Production
Optimus
⭐
1,472
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Sparkmagic
⭐
1,272
Jupyter magics and kernels for working with remote Spark clusters
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Pyspark Tutorial
⭐
959
PySpark-Tutorial provides basic algorithms using PySpark
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Scriptis
⭐
767
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Eat_pyspark_in_10_days
⭐
534
pyspark🍒🥭 is delicious,just eat it!😋😋
Sparklearning
⭐
451
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
Spark Syntax
⭐
391
This is a repo documenting the best practices in PySpark.
Miscellaneous
⭐
382
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Datacompy
⭐
339
Pandas and Spark DataFrame comparison for humans and more!
Sparklingpandas
⭐
338
Sparkling Pandas
Spark Standalone Cluster On Docker
⭐
311
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Sk Dist
⭐
283
Distributed scikit-learn meta-estimators in PySpark
Cc Pyspark
⭐
280
Process Common Crawl data with Python and Spark
Spark Gotchas
⭐
276
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Butterfree
⭐
269
A tool for building feature stores.
Pyspark Style Guide
⭐
264
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Dbldatagen
⭐
234
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Learningapachespark
⭐
233
LearningApacheSpark
Hnswlib
⭐
233
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Data_science_blogs
⭐
232
A repository to keep track of all the code that I end up writing for my blog posts.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Joblib Spark
⭐
226
Joblib Apache Spark Backend
Zeppelin Notebooks
⭐
206
Gallery of Apache Zeppelin notebooks
Azure Cosmosdb Spark
⭐
194
Apache Spark Connector for Azure Cosmos DB
Cloud Dataproc
⭐
173
Cloud Dataproc: Samples and Utils
Drunken Data Quality
⭐
167
Spark package for checking data quality
Spark Practice
⭐
153
Apache Spark (PySpark) Practice on Real Data
Spark Extension
⭐
152
A library that provides useful extensions to Apache Spark and PySpark.
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Geopyspark
⭐
151
GeoTrellis for PySpark
Spark Iforest
⭐
147
Isolation Forest on Spark
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Handyspark
⭐
129
HandySpark - bringing pandas-like capabilities to Spark dataframes
Movalytics Data Warehouse
⭐
127
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Aliyun Emapreduce Demo
⭐
123
Spark Df Profiling
⭐
115
Create HTML profiling reports from Apache Spark DataFrames
Spark R Notebooks
⭐
109
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Medium Articles
⭐
97
Repo for all my code on the articles I post on medium
Relation_extraction
⭐
93
Relation Extraction using Deep learning(CNN)
Pyspark Predictive Maintenance
⭐
85
Predictive Maintenance using Pyspark
Phrase At Scale
⭐
84
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Spark_python_ml_examples
⭐
81
Spark 2.0 Python Machine Learning examples
Azure Databricks Nyc Taxi Workshop
⭐
80
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Jupyterlab Sparkmonitor
⭐
72
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Learn By Examples
⭐
72
Real-world Spark pipelines examples
Pyspark Cassandra
⭐
67
pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Jgit Spark Connector
⭐
67
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Delta Architecture
⭐
66
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Pyspark_dist_explore
⭐
64
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
Pypmml
⭐
64
Python PMML scoring library
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
W2v
⭐
62
Word2Vec models with Twitter data using Spark. Blog:
Sparkml
⭐
61
Spark ML with pyspark
Sparkly
⭐
60
Helpers & syntactic sugar for PySpark.
Pysparkgeoanalysis
⭐
60
🌐 Interactive Workshop on GeoAnalysis using PySpark
Spark
⭐
60
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Big_data
⭐
55
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Pyspark Setup Guide
⭐
54
A guide for setting up Spark + PySpark under Ubuntu linux
Spark Training
⭐
52
Repository used for Spark Trainings
Pyspark Elastic
⭐
52
PySpark for Elastic Search
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Spark Hive Udf
⭐
47
Example project showing how to use Hive UDFs in Apache Spark
Sparkudfexamples
⭐
46
Spark SQL UDF examples
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Spark Dgraph Connector
⭐
41
A connector for Apache Spark and PySpark to Dgraph databases.
Smv
⭐
41
Spark Modularized View
Openspark
⭐
39
The out-of-the-box environment to for Hadoop/Spark applications
Spark Ml Intro
⭐
37
PySpark Machine Learning Examples
Azure Databricks
⭐
37
Azure Databricks - Advent of 2020 Blogposts
Shparkley
⭐
33
Spark implementation of computing Shapley Values using monte-carlo approximation
Spark Twitter Sentiment Analysis
⭐
33
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Pyspark Algorithms
⭐
33
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Dlsa
⭐
33
Distributed least squares approximation (dlsa) implemented with Apache Spark
Spark Studyclub
⭐
31
Grupo de Estudios de Apache Spark organizado por la comunidad Data Engineering Latam
Aliyun Cupid Sdk
⭐
30
SDK for open source framwork to interact with MaxCompute
Deepgold
⭐
29
DeepGold using convolution network features to learn mineral data
Mongo Spark Jupyter
⭐
29
Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.
Basin
⭐
29
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Sparkdataset
⭐
28
Instant search for and access to many datasets in Pyspark.
Kafka Compose
⭐
28
🎼 Docker compose files for various kafka stacks
Docker Pyspark
⭐
28
Docker image of Apache Spark with its Python interface, pyspark.
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Spark Fundamentals
⭐
24
Elevate big data skills with Apache Spark's core concepts and examples
Sparglim
⭐
22
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Spark For Data Engineers
⭐
22
Apache Spark for data engineers
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Python Pyspark (792)
Shell Spark (705)
1-100 of 248 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.