Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python pyspark
pyspark
x
python
x
194 search results found
Ibis
⭐
4,794
the portable Python dataframe library
Machine Learning
⭐
2,607
🌎 machine learning tutorials (mainly in Python3)
Mleap
⭐
1,479
MLeap: Deploy ML Pipelines to Production
Sparkmagic
⭐
1,272
Jupyter magics and kernels for working with remote Spark clusters
Sparkit Learn
⭐
1,054
PySpark + Scikit-learn = Sparkit-learn
Hopsworks
⭐
1,041
Hopsworks - Data-Intensive AI platform with a Feature Store
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Pyspark Examples
⭐
778
Pyspark RDD, DataFrame and Dataset Examples in Python language
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Kuwala
⭐
610
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demograp
Eat_pyspark_in_10_days
⭐
534
pyspark🍒🥭 is delicious,just eat it!😋😋
Pandapy
⭐
483
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Gather Deployment
⭐
351
Gathers Python deployment, infrastructure and practices.
Datacompy
⭐
339
Pandas and Spark DataFrame comparison for humans and more!
Sparklingpandas
⭐
338
Sparkling Pandas
Spark Standalone Cluster On Docker
⭐
311
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Sk Dist
⭐
283
Distributed scikit-learn meta-estimators in PySpark
Cc Pyspark
⭐
280
Process Common Crawl data with Python and Spark
Butterfree
⭐
269
A tool for building feature stores.
Pyspark Style Guide
⭐
264
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Dbldatagen
⭐
234
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Morphl Community Edition
⭐
233
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Learningapachespark
⭐
233
LearningApacheSpark
Data_science_blogs
⭐
232
A repository to keep track of all the code that I end up writing for my blog posts.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Joblib Spark
⭐
226
Joblib Apache Spark Backend
Mack
⭐
188
Delta Lake helper methods in PySpark
Spark Extension
⭐
152
A library that provides useful extensions to Apache Spark and PySpark.
Geopyspark
⭐
151
GeoTrellis for PySpark
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Movalytics Data Warehouse
⭐
133
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Handyspark
⭐
129
HandySpark - bringing pandas-like capabilities to Spark dataframes
Spark Df Profiling
⭐
115
Create HTML profiling reports from Apache Spark DataFrames
Replay
⭐
109
A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models
Machinelearning
⭐
106
Machine learning for beginner(Data Science enthusiast)
Dataanalysiswithpythonandpyspark
⭐
102
Code repository for the "PySpark in Action" book
Dampr
⭐
101
Python Data Processing library
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Relation_extraction
⭐
93
Relation Extraction using Deep learning(CNN)
Phrase At Scale
⭐
84
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Spark_python_ml_examples
⭐
81
Spark 2.0 Python Machine Learning examples
Anovos
⭐
78
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Pyspark Cassandra
⭐
67
pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Jgit Spark Connector
⭐
67
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Pyspark_dist_explore
⭐
64
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
Pypmml
⭐
64
Python PMML scoring library
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Sparkly
⭐
60
Helpers & syntactic sugar for PySpark.
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Replay
⭐
53
RecSys Library
Towardsdataengineering
⭐
52
This repo contains commands that data engineers use in day to day work.
Pyspark Elastic
⭐
52
PySpark for Elastic Search
Spark Training
⭐
52
Repository used for Spark Trainings
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Cluster Pack
⭐
44
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Smv
⭐
41
Spark Modularized View
Dsq
⭐
39
Distributed Streaming Quantiles (for PySpark)
Azure Databricks
⭐
37
Azure Databricks - Advent of 2020 Blogposts
Spark_app_twitter
⭐
36
A data engineering project (Twitter monitor app)
Pyspark Algorithms
⭐
33
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Spark Twitter Sentiment Analysis
⭐
33
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Data Analytics Services
⭐
33
This repo collects the open-source work of the Analytics Service within NHS Digital Data Services
Dlsa
⭐
33
Distributed least squares approximation (dlsa) implemented with Apache Spark
Shparkley
⭐
33
Spark implementation of computing Shapley Values using monte-carlo approximation
Ceja
⭐
30
PySpark phonetic and string matching algorithms
Deepgold
⭐
29
DeepGold using convolution network features to learn mineral data
Mongo Spark Jupyter
⭐
29
Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.
Sparkdataset
⭐
28
Instant search for and access to many datasets in Pyspark.
Kafka Compose
⭐
28
🎼 Docker compose files for various kafka stacks
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Pybase
⭐
26
Codebase for Python
Amazon Emr Vscode Toolkit
⭐
25
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Springboard Data Science Immersive
⭐
23
Spark For Data Engineers
⭐
22
Apache Spark for data engineers
Sparglim
⭐
22
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Data Science Learning Paths
⭐
22
Practical data science courses - from basic to intermediate
Gsw_passing_network
⭐
21
Terraglue
⭐
21
Providing an easy way to deploy a Glue job in any AWS account using Terraform
Graphlet
⭐
21
PyPi module for Graphlet AI Knowledge Graph Factory
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Spark And Kafka_iot Data Processing And Analytics
⭐
21
Final Project for IoT: Big Data Processing and Analytics class in UCSC Extension. Analyzing U.S nationwide temperature from IoT sensors in real-time
Pyspark Setup Demo
⭐
21
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Gutenberg
⭐
21
A content-based recommender system for books using the Project Gutenberg text corpus
Sparkclean
⭐
20
A Scalable Data Cleaning Library for PySpark.
Spark Tdd Example
⭐
20
A simple Spark TDD example
Data Engineering Zoomcamp
⭐
20
Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing
Pyspark Distributed Kmodes
⭐
20
Covid 19 Data Engineering Pipeline
⭐
19
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Spark And Mllib Projects
⭐
18
This repository contains Spark, MLlib, PySpark and Dataframes projects
Pbspark
⭐
17
protobuf pyspark conversion
Admml
⭐
17
ADMM based Scalable Machine Learning on Spark
Nlp_model_selection_app
⭐
16
Pyspark For Data Processing
⭐
16
Code for my presentation: Using PySpark to Process Boat Loads of Data
Kafka Twitter Spark Streaming
⭐
16
Counting Tweets Per User in Real-Time
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Related Searches
Python Machine Learning (20,195)
Python Flask (17,643)
Python Dataset (14,792)
Python Docker (14,113)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Html (10,924)
Python Algorithms (10,033)
Python Testing (9,479)
1-100 of 194 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.