Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for pyspark
pyspark
x
496 search results found
Cheatsheets Ai
⭐
13,281
Essential Cheat Sheets for deep learning and machine learning researchers https://medium.com/@kailashahirwar/essential-cheat
Synapseml
⭐
4,989
Simple and Distributed Machine Learning
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Ibis
⭐
3,404
The flexibility of Python with the scale and performance of modern SQL.
Linkis
⭐
3,250
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Machine Learning
⭐
2,607
🌎 machine learning tutorials (mainly in Python3)
Petastorm
⭐
1,693
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Mleap
⭐
1,479
MLeap: Deploy ML Pipelines to Production
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Optimus
⭐
1,447
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Sparkmagic
⭐
1,272
Jupyter magics and kernels for working with remote Spark clusters
Bigflow
⭐
1,122
Baidu Bigflow is an interface that allows for writing distributed computing programs and provides lots of simple, flexible, powerful APIs. Using Bigflow, you can easily handle data of any scale. Bigflow processes 4P+ data inside Baidu and runs about 10k jobs every day.
Sparkit Learn
⭐
1,054
PySpark + Scikit-learn = Sparkit-learn
Hopsworks
⭐
1,041
Hopsworks - Data-Intensive AI platform with a Feature Store
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Pyspark Tutorial
⭐
959
PySpark-Tutorial provides basic algorithms using PySpark
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Pyspark Examples
⭐
778
Pyspark RDD, DataFrame and Dataset Examples in Python language
Scriptis
⭐
767
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Kuwala
⭐
610
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demograp
Quinn
⭐
572
pyspark methods to enhance developer productivity 📣 👯 🎉
Eat_pyspark_in_10_days
⭐
534
pyspark🍒🥭 is delicious,just eat it!😋😋
Pandapy
⭐
483
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Sparklearning
⭐
451
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
Chispa
⭐
443
PySpark test helper methods with beautiful error messages
Findspark
⭐
428
Learningpyspark
⭐
409
Code base for the Learning PySpark book (in preparation)
Spark Syntax
⭐
391
This is a repo documenting the best practices in PySpark.
Miscellaneous
⭐
382
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Gather Deployment
⭐
351
Gathers Python deployment, infrastructure and practices.
Datacompy
⭐
339
Pandas and Spark DataFrame comparison for humans and more!
Sparklingpandas
⭐
338
Sparkling Pandas
Tdigest
⭐
332
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Spark Standalone Cluster On Docker
⭐
311
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡
Learning Pyspark
⭐
294
Code repository for Learning PySpark by Packt
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Sk Dist
⭐
283
Distributed scikit-learn meta-estimators in PySpark
Cc Pyspark
⭐
280
Process Common Crawl data with Python and Spark
Spark Gotchas
⭐
276
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Butterfree
⭐
269
A tool for building feature stores.
Pyspark Style Guide
⭐
264
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Dbldatagen
⭐
234
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Pyspark Tutorials
⭐
233
Code snippets and tutorials for working with social science data in PySpark
Learningapachespark
⭐
233
LearningApacheSpark
Morphl Community Edition
⭐
233
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Hnswlib
⭐
233
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Data_science_blogs
⭐
232
A repository to keep track of all the code that I end up writing for my blog posts.
Pyspark Cheatsheet
⭐
230
🐍 Quick reference guide to common patterns & functions in PySpark.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Joblib Spark
⭐
226
Joblib Apache Spark Backend
Zeppelin Notebooks
⭐
206
Gallery of Apache Zeppelin notebooks
Sql Data Analysis And Visualization Projects
⭐
200
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Azure Cosmosdb Spark
⭐
194
Apache Spark Connector for Azure Cosmos DB
Mack
⭐
188
Delta Lake helper methods in PySpark
Cloud Dataproc
⭐
173
Cloud Dataproc: Samples and Utils
Hunter
⭐
170
A threat hunting / data analysis environment based on Python, Pandas, PySpark and Jupyter Notebook.
Drunken Data Quality
⭐
167
Spark package for checking data quality
Spark Practice
⭐
153
Apache Spark (PySpark) Practice on Real Data
Spark Extension
⭐
152
A library that provides useful extensions to Apache Spark and PySpark.
Geopyspark
⭐
151
GeoTrellis for PySpark
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Pyspark Pictures
⭐
149
Learn the pyspark API through pictures and simple examples
Spark Iforest
⭐
147
Isolation Forest on Spark
Nyc Transport
⭐
144
A Unified Database of NYC transport (subway, taxi/Uber, and citibike) data.
Osci
⭐
140
Open Source Contributor Index
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Repo 2019
⭐
135
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Handyspark
⭐
129
HandySpark - bringing pandas-like capabilities to Spark dataframes
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Aliyun Emapreduce Demo
⭐
123
Mastering Big Data Analytics With Pyspark
⭐
118
Mastering Big Data Analytics with PySpark, Published by Packt
Movalytics Data Warehouse
⭐
117
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Ai Deployment
⭐
116
关注AI模型上线、模型部署
Spark Df Profiling
⭐
115
Create HTML profiling reports from Apache Spark DataFrames
Spark Knn Recommender
⭐
113
Item and User-based KNN recommendation algorithms using PySpark
Spark R Notebooks
⭐
109
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Replay
⭐
109
A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models
Machinelearning
⭐
106
Machine learning for beginner(Data Science enthusiast)
Dataproc Templates
⭐
103
Dataproc templates and pipelines for solving simple in-cloud data tasks
Dataanalysiswithpythonandpyspark
⭐
102
Code repository for the "PySpark in Action" book
Dampr
⭐
101
Python Data Processing library
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Medium Articles
⭐
97
Repo for all my code on the articles I post on medium
Relation_extraction
⭐
93
Relation Extraction using Deep learning(CNN)
Big Data Engineering Coursera Yandex
⭐
91
Big Data for Data Engineers Coursera Specialization from Yandex
Bitcoin Value Predictor
⭐
90
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Pyspark Csv
⭐
87
An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parses csv data into SchemaRDD. No installation required, simply include pyspark_csv.py via SparkContext.
Kdd Cup 99 Spark
⭐
87
PySpark solution to the KDDCup99
Pyspark Predictive Maintenance
⭐
85
Predictive Maintenance using Pyspark
Phrase At Scale
⭐
84
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Pyspark Tutorial
⭐
82
Jupyter notebooks for pyspark tutorials given at the university
Spark_python_ml_examples
⭐
81
Spark 2.0 Python Machine Learning examples
Pyspark Cassandra
⭐
81
PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.
Azure Databricks Nyc Taxi Workshop
⭐
80
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Related Searches
Spark Pyspark (773)
Python Pyspark (689)
1-100 of 496 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.