Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark pyspark
pyspark
x
spark
x
314 search results found
Openspark
⭐
39
The out-of-the-box environment to for Hadoop/Spark applications
Pytest Spark
⭐
38
pytest plugin to run the tests with support of pyspark
Spark Ml Intro
⭐
37
PySpark Machine Learning Examples
Learn Data Munging
⭐
37
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Azure Databricks
⭐
37
Azure Databricks - Advent of 2020 Blogposts
Spark Social Science
⭐
36
Automated Spark Cluster Builds with RStudio or PySpark for Policy Research
Pyjaws
⭐
36
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Getting_started_with_pyspark
⭐
34
Materials for class Getting Started with Pyspark
Spark Twitter Sentiment Analysis
⭐
33
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Dlsa
⭐
33
Distributed least squares approximation (dlsa) implemented with Apache Spark
Pyspark Algorithms
⭐
33
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Shparkley
⭐
33
Spark implementation of computing Shapley Values using monte-carlo approximation
Spark Studyclub
⭐
31
Grupo de Estudios de Apache Spark organizado por la comunidad Data Engineering Latam
Aliyun Cupid Sdk
⭐
30
SDK for open source framwork to interact with MaxCompute
Basin
⭐
29
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Mongo Spark Jupyter
⭐
29
Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.
Spark Tree Plotting
⭐
29
A simple tool for plotting Spark ML's Decision Trees
Pysparkcookbook
⭐
29
A repository for a PySpark Cookbook by Tomasz Drabas and Denny Lee
Deepgold
⭐
29
DeepGold using convolution network features to learn mineral data
Sparkdataset
⭐
28
Instant search for and access to many datasets in Pyspark.
Docker Pyspark
⭐
28
Docker image of Apache Spark with its Python interface, pyspark.
Sparkdltrigger
⭐
28
Repo for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"
Kafka Compose
⭐
28
🎼 Docker compose files for various kafka stacks
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Hands On Big Data Analytics With Pyspark
⭐
27
Hands-On Big Data Analytics with PySpark, Published by Packt
Pyspark Asyncactions
⭐
26
Asynchronous actions for PySpark
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Spark Fundamentals
⭐
24
Elevate big data skills with Apache Spark's core concepts and examples
Dummyrdd
⭐
24
A pure python mock of pyspark's RDD
Sparglim
⭐
22
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Spark3d
⭐
22
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Spark For Data Engineers
⭐
22
Apache Spark for data engineers
Gsw_passing_network
⭐
21
Spark And Kafka_iot Data Processing And Analytics
⭐
21
Final Project for IoT: Big Data Processing and Analytics class in UCSC Extension. Analyzing U.S nationwide temperature from IoT sensors in real-time
Terraglue
⭐
21
Providing an easy way to deploy a Glue job in any AWS account using Terraform
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Pyspark
⭐
21
Pyspark Setup Demo
⭐
21
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Data Engineering Zoomcamp
⭐
20
Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing
Pyspark Distributed Kmodes
⭐
20
Spark Tdd Example
⭐
20
A simple Spark TDD example
Lasagna
⭐
20
A Docker Compose template that builds a interactive development environment for PySpark with Jupyter Lab, MinIO as object storage, Hive Metastore, Trino and Kafka
Spark
⭐
20
『빅데이터 분석을 위한 스파크 2 프로그래밍』 예제 코드
Docker Spark Mesos
⭐
20
Docker image with spark and mesos installed. Used for driving spark on mesos cluster with docker.
Sparkclean
⭐
20
A Scalable Data Cleaning Library for PySpark.
Pmml4s Spark
⭐
19
PMML scoring library for Spark as SparkML Transformer
Spark Sframe
⭐
19
This project contains the code to translate between Apache Spark and SFrame.
Oshinko S2i
⭐
19
This is a place to put s2i images and utilities for spark application builders for openshift
Covid 19 Data Engineering Pipeline
⭐
19
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Ds30_5
⭐
18
Data Science in 30 Minutes #5: Spark
Spark Hdfs On Kubernetes
⭐
18
Pyspark Arrow Pandas
⭐
18
Presentation about Pyspark and how Arrow makes it faster
Spark And Mllib Projects
⭐
18
This repository contains Spark, MLlib, PySpark and Dataframes projects
Pyspark Testing
⭐
18
Unit and integration testing with PySpark can be tough to figure out, let's make that easier.
Setup Spark
⭐
17
✨ Setup Apache Spark in GitHub Action workflows
Bigdata Workshop Es
⭐
17
Workshop Big Data en Español
Admml
⭐
17
ADMM based Scalable Machine Learning on Spark
Pyspark_dl_pipeline
⭐
17
Spark Sparql Connector
⭐
17
spark-sparql-connector
Phil_stopwatch
⭐
17
Pbspark
⭐
17
protobuf pyspark conversion
Kafka Twitter Spark Streaming
⭐
16
Counting Tweets Per User in Real-Time
Pyspark For Data Processing
⭐
16
Code for my presentation: Using PySpark to Process Boat Loads of Data
Django Libspark
⭐
16
Apache Spark API for Django
Nlp_model_selection_app
⭐
16
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Rheoceros
⭐
15
Cloud-based AI / ML workflow and data application development framework
Pyspark
⭐
15
spark (scala and python)
Link Prediction
⭐
15
[UNMAINTAINED] 基于PySpark与MySQL的复杂网络链路预测。
Spark Fits
⭐
15
FITS data source for Spark SQL and DataFrames
Listenbrainz Labs
⭐
15
A collection tools/scripts to explore the ListenBrainz data using Apache Spark.
Pyspark Emr
⭐
15
A toolset to streamline running spark python on EMR
Pyspark Lsh
⭐
15
Locality-sensitive hashing in PySpark.
Big Data Course
⭐
14
Practice course on Big Data
Pyspark K8s Example
⭐
14
Pipeasy Spark
⭐
14
an easy way to define preprocessing data pipeline (similar to sklean-pandas but for Spark ML)
Live_log_analyzer_spark
⭐
14
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Rasppi Cluster
⭐
14
An efficient quick-start tool to build a Raspberry Pi (or Debian-based) Cluster with popular ecosystem like Hadoop, Spark
Hdinsight Pyspark Cntk Integration
⭐
14
Instructions and examples for installing CNTK on an HDInsight cluster and running CNTK-Pyspark applications from Jupyter notebooks.
Pypmml Spark
⭐
13
Python PMML scoring library for PySpark as SparkML Transformer
Machine Learning Course
⭐
13
Machine Learning Course @ Santa Clara University
Nyc_taxi_trip_duration
⭐
13
Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS
Pyconca 2016 Spark Tutorial
⭐
13
Materials for Mike's PyCon Canada 2016 PySpark Tutorial
Pyspark Ml Examples
⭐
13
Spark ML Tutorial and Examples for Beginners
Pyspark S3 Parquet Example
⭐
13
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
Workshop Spark
⭐
13
Código para workshops Spark com ambiente de desenvolvimento em docker
Docker Datascience Ultimate
⭐
13
Customized Jupyter Spark Docker images with everything you need
Pyspark On Aws Emr
⭐
13
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Moive Question Robot Based On Spark Neo4j
⭐
13
Sparkling Titanic
⭐
12
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Sentry Spark
⭐
12
Apache Spark Sentry Integration
Tidypyspark
⭐
12
Make pyspark sing dplyr
Pyspark For Beginners
⭐
12
PySpark for Beginners by Packt Pyblishing
Distributed Machine Learning
⭐
12
PySpark, Databrick, h2o, MLlib
Nyc_taxi_pipeline
⭐
12
Design/Implement stream/batch architecture on NYC taxi data | #DE
Bigdata Spark
⭐
12
BerkeleyX: CS100.1x, Introduction to Big Data with Apache Spark
Marshmallow Pyspark
⭐
12
Marshmallow serializer integration with pyspark
Spark Slurm
⭐
12
Tool for running/managing ad hoc spark clusters on a Slurm cluster
Relk
⭐
12
RELK -- The Research Elastic Stack (Kafka, Beats, Zookeeper, Logstash, ElasticSearch, Kibana, Spark, & Jupyter -- All in Docker)
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Python Pyspark (792)
Shell Spark (705)
101-200 of 314 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.