Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for pyspark
pyspark
x
496 search results found
Anovos
⭐
78
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Pyspark Cookbook
⭐
76
PySpark Cookbook, published by Packt
Python Spark Streaming
⭐
73
Jupyterlab Sparkmonitor
⭐
72
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Learn By Examples
⭐
72
Real-world Spark pipelines examples
Jgit Spark Connector
⭐
67
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Pyspark Cassandra
⭐
67
pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Delta Architecture
⭐
66
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Pypmml
⭐
64
Python PMML scoring library
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Pyspark_dist_explore
⭐
64
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Nsl Kdd
⭐
63
PySpark solution to the NSL-KDD dataset: https://www.unb.ca/cic/datasets/nsl.html
W2v
⭐
62
Word2Vec models with Twitter data using Spark. Blog:
Sparkml
⭐
61
Spark ML with pyspark
Sparkly
⭐
60
Helpers & syntactic sugar for PySpark.
Spark
⭐
60
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
Pysparkgeoanalysis
⭐
60
🌐 Interactive Workshop on GeoAnalysis using PySpark
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Cuallee
⭐
56
A data quality acceleration library to get data sets verified in a friendly interface
Big_data
⭐
55
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Pyspark Setup Guide
⭐
54
A guide for setting up Spark + PySpark under Ubuntu linux
Replay
⭐
53
RecSys Library
Spark Select
⭐
53
A library for Spark DataFrame using MinIO Select API
Mmtf Workshop 2018
⭐
53
Structural Bioinformatics Training Workshop & Hackathon 2018
Data_processing_course
⭐
53
Some class materials for a data processing course using PySpark
Pyspark Elastic
⭐
52
PySpark for Elastic Search
Towardsdataengineering
⭐
52
This repo contains commands that data engineers use in day to day work.
Mlflow Spark Summit 2019
⭐
52
MLFlow Spark Summit 2019 Presentation
Spark Training
⭐
52
Repository used for Spark Trainings
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Apollo
⭐
48
Advanced similarity and duplicate source code proof of concept for our research efforts.
Stork
⭐
47
Make your libraries magically appear in Databricks.
Spark Hive Udf
⭐
47
Example project showing how to use Hive UDFs in Apache Spark
Terraform Emr Pyspark
⭐
46
Quickstart PySpark with Anaconda on AWS/EMR using Terraform
Dms Smm695
⭐
46
Teaching material for a B-school, post-grad module on Data Management Systems
Sparkora
⭐
46
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Sparkudfexamples
⭐
46
Spark SQL UDF examples
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Cluster Pack
⭐
44
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Song Playlist Recommendation
⭐
43
This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.
Emr Bootstrap Pyspark
⭐
43
Quickstart PySpark with Anaconda on AWS/EMR
Smv
⭐
41
Spark Modularized View
Spark Dgraph Connector
⭐
41
A connector for Apache Spark and PySpark to Dgraph databases.
Spark Nba Analytics
⭐
41
Analyzing NBA data using Spark 2.1
Dsq
⭐
39
Distributed Streaming Quantiles (for PySpark)
Openspark
⭐
39
The out-of-the-box environment to for Hadoop/Spark applications
Pydata_berlin2016_materials
⭐
39
Collection of pointers to slides and repositories from speakers at PyData Berlin 2016
Pytest Spark
⭐
38
pytest plugin to run the tests with support of pyspark
Spark Ml Intro
⭐
37
PySpark Machine Learning Examples
Azure Databricks
⭐
37
Azure Databricks - Advent of 2020 Blogposts
Machine Learning With Pyspark
⭐
37
Source Code for 'Machine Learning with PySpark' by Pramod Singh
Learn Data Munging
⭐
37
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Spark Social Science
⭐
36
Automated Spark Cluster Builds with RStudio or PySpark for Policy Research
Spark_app_twitter
⭐
36
A data engineering project (Twitter monitor app)
Pyjaws
⭐
36
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Pyspark Cassandra
⭐
35
Utilities and examples to asssist in working with PySpark and Cassandra.
Getting_started_with_pyspark
⭐
34
Materials for class Getting Started with Pyspark
Spark Twitter Sentiment Analysis
⭐
33
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Pyspark Algorithms
⭐
33
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Data Analytics Services
⭐
33
This repo collects the open-source work of the Analytics Service within NHS Digital Data Services
Luigi Sample
⭐
33
Sample repo for luigi tasks & config
Shparkley
⭐
33
Spark implementation of computing Shapley Values using monte-carlo approximation
Dlsa
⭐
33
Distributed least squares approximation (dlsa) implemented with Apache Spark
Spark Studyclub
⭐
31
Grupo de Estudios de Apache Spark organizado por la comunidad Data Engineering Latam
Gmm
⭐
31
Gaussian Mixture Model Implementation in Pyspark
Aliyun Cupid Sdk
⭐
30
SDK for open source framwork to interact with MaxCompute
Ceja
⭐
30
PySpark phonetic and string matching algorithms
Check Engine
⭐
30
Data validation library for PySpark 3.0.0
Mongo Spark Jupyter
⭐
29
Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.
Pysparkcookbook
⭐
29
A repository for a PySpark Cookbook by Tomasz Drabas and Denny Lee
Deepgold
⭐
29
DeepGold using convolution network features to learn mineral data
Spark Tree Plotting
⭐
29
A simple tool for plotting Spark ML's Decision Trees
Basin
⭐
29
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Kafka Compose
⭐
28
🎼 Docker compose files for various kafka stacks
Sparkdltrigger
⭐
28
Repo for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"
Docker Pyspark
⭐
28
Docker image of Apache Spark with its Python interface, pyspark.
Sparkdataset
⭐
28
Instant search for and access to many datasets in Pyspark.
Daas Client
⭐
27
Python client library for DaaS (Deployment-as-a-Service)
Decorators4ds
⭐
27
Useful decorators every Data Scientist should know
Hands On Big Data Analytics With Pyspark
⭐
27
Hands-On Big Data Analytics with PySpark, Published by Packt
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Pyspark Asyncactions
⭐
26
Asynchronous actions for PySpark
Farsante
⭐
26
Fake Pandas / PySpark DataFrame creator
Python_mozetl
⭐
26
ETL jobs for Firefox Telemetry
Pybase
⭐
26
Codebase for Python
Courses
⭐
25
Just the stuff from the faculty (homework, projects, lectures)
Mmtf Genomics
⭐
25
Methods for mapping genomic data onto 3D protein structure.
Amazon Emr Vscode Toolkit
⭐
25
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Spark Fundamentals
⭐
24
Elevate big data skills with Apache Spark's core concepts and examples
Dummyrdd
⭐
24
A pure python mock of pyspark's RDD
Springboard Data Science Immersive
⭐
23
Detecting Malicious Url Machine Learning
⭐
23
Pyspark K8s Boilerplate
⭐
23
Boilerplate for PySpark on Cloud Kubernetes
Spark For Data Engineers
⭐
22
Apache Spark for data engineers
Spark3d
⭐
22
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Sparglim
⭐
22
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Data Science Learning Paths
⭐
22
Practical data science courses - from basic to intermediate
Related Searches
Spark Pyspark (773)
Python Pyspark (689)
101-200 of 496 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.