Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark pyspark
pyspark
x
spark
x
314 search results found
Py Hadoop Tutorial
⭐
11
Source Material for using Python and Hadoop together
Spark Pubsub
⭐
11
Google Cloud Pubsub connector for Spark Streaming
Orange3 Spark
⭐
11
A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML
Spark_bazel
⭐
11
Spark Application with Bazel
Docker Aut
⭐
11
Docker image for the Archives Unleashed Toolkit
Sparky
⭐
11
Tool to pentest spark clusters
Mobile_trends_using_spark_and_dash
⭐
11
Asynchronous, classic OOP on the Spark engine with a light front-end
D Pandisim
⭐
11
distributed pandemics simulator, uses the power of spark to generate huge bulks of contact-tracing data.
Pyspark_amld2019
⭐
11
Workshop materials for AMLD2019 on PySpark.
Pyspark Project Example
⭐
11
A simple example for PySpark based project.
Pyspark Docker
⭐
10
Emr Demo
⭐
10
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Sparksql Stats
⭐
10
基于PySpark库,使用SparkSql连接MYSQL数据库并对数据进行统计分析的基础架构
Pyspark Tutorial
⭐
10
A learning journey into the Python API of Apache Spark from an ETL-developer perspective
Dijkstra Hadoop Spark
⭐
10
Dijkstra Algorithm - Python Hadoop Streaming and Pyspark
Betfair Data Analysis
⭐
10
Explore, analyse and visualise Betfair Historical Data Feed using PySpark.
Pyspark Tutorial
⭐
10
A short tutorial notebook on PySpark
Sparkql
⭐
10
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
Chicago Taxi Trips Analysis
⭐
10
Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset
Sarplus
⭐
10
pronounced sUrplus as it's simply better if not best!
Pyspark Dataframe Made Easy
⭐
10
pyspark dataframe made easy
Pysparklyr
⭐
9
Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect
Sparkitecture
⭐
9
A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.
Lookalike Modelling
⭐
9
Finding customer lookalikes using Machine Learning in PySpark
Tensorflow Spark Docker
⭐
9
contains Tensorflow + HADOOP + SPARK, to make it easy to running TensorFlow on Spark via Docker.
Hackathonclt2019
⭐
9
Typedpyspark
⭐
9
Type-annotate your spark dataframes and validate them
Ipython Notebook Spark
⭐
9
ipython notebook with spark in action
Subredditrecommend
⭐
9
Recommends you Subreddits based on Word2Vec neural net!
Veronica
⭐
9
big data processing and machine learning platform,just like useing sql
Sparktraining
⭐
9
Training material for the course "Introduction to Apache Spark APIs for Data Processing" https://sparktraining.web.cern.ch/
Vim Sparkshell
⭐
9
control spark-shell from vim
Docker Jupyter Spark
⭐
9
Docker image for Jupyter notebooks with PySpark
Driver_safety_analysis
⭐
8
Estimating driver safety for connected cars
Anova_in_pyspark
⭐
8
Custom one-way ANOVA implementation using PySpark
Yelp_sampling
⭐
8
Data Analytics Machine Learning Big Data
⭐
8
Slides, code and more for my class: Data Analytics and Machine Learning on Big Data
Spark Xarray
⭐
8
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analysis.
Xmltocsv_stackexchange
⭐
8
Xml to Csv converter for Large files using Apache Spark
Pygotham_spark_streaming_demo
⭐
8
PyGotham 2017: Spark Streaming for World Domination (and other projects)
Sparksnake
⭐
8
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
Analysis
⭐
8
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
Pyspark Template
⭐
8
A Python PySpark Projet with Poetry
Cca175_master_preparation
⭐
8
Airflow
⭐
8
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql
Pysparkpivot
⭐
7
PySparkPivot is a small python module for Spark, to manipulate PySpark Dataframes/RDD into spreadsheet-style pivot tables for data aggregation.
Divolte Spark
⭐
7
Utilities for using data created by Divolte collector in Spark, Spark Streaming and PySpark
Spark_ver_bigdatahw_jiuzhang
⭐
7
Homework for the Big Data course at Jiuzhang, re-written in Python and Spark!
Pyspark Machine Learning
⭐
7
A collection of machine learning examples using PySpark
Pyspark Boilerplate Mehdio
⭐
7
Pyspark boilerplate for running prod ready data pipeline
Spark On K8s
⭐
7
Presenting 3 ways to run Spark over containers, this project is recommended to those who seek to explore Big Data out of a Hadoop Cluster.
Cloudera_material
⭐
7
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
Spark Examples
⭐
7
Spark examples to go with me presentation on 10/25/2014
Livy Server Docker
⭐
7
Dockers
⭐
7
Aws Etl
⭐
7
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/A it's a zipped file with some .csvs inside that we will apply transformations.
Drpyspark
⭐
7
Handy utilities for debugging and tuning pyspark programs. A work in progress.
Traffic Data Analysis With Apache Spark Based On Mobile Robot Data
⭐
7
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
Spark_linear_regression
⭐
7
Spark (pyspark) linear regression on clickthrough rate (CTR) prediction form Kaggle
Shingho
⭐
7
Shingho is a PySpark based statistical library designed for Big Data applications.
Spark For Noobs By A Noob
⭐
7
Jupyter notebooks for learning PySpark
Spark Docker Cluster
⭐
6
Example stand-alone Apache Spark cluster running in Docker containers
Pyspark Recipe Zhcn
⭐
6
pyspark-recipes 中文
Stock Visualizer
⭐
6
Based on Kafka, Node.js, D3.js, Google finance API.
Spark Data Analysis Projects
⭐
6
A collection of data analysis projects done using PySpark via Jupyter notebooks.
Mockrdd
⭐
6
A Python3 module for testing PySpark code
Sparklyr_start
⭐
6
Materials to start using Spark in R (sparklyr package). Examples for local and cluster usages
Spark Tutorials
⭐
6
PySpark notebooks to learn Apache Spark (WIP)
Spark Slowly Changing Dimension
⭐
6
Spark implementation of Slowly Changing Dimension type 2
Workshop Introduction To Machine Learning
⭐
6
Come ready to discover the goals and approaches of machine learning, and how to build effective algorithms and solutions!
Big Data Cluster
⭐
6
The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
Pyspark Connectors
⭐
6
Databrickstraining
⭐
6
Repository for Microsoft Databricks Training Events - Hosted by BlueGranite
Pyspark_pandas
⭐
6
Pyspark + pandas. This may get merged into the SparklingPandas project.
Py3spark Docker Boilerplate
⭐
6
Minimal installation for working with Python3 & Spark via Docker
Datascience Playground
⭐
6
A scalable, cloud-ready environment for Data Science using Docker
Spark Datadog Relay
⭐
6
Implements SparkFirehoseListener for forwarding Spark events to statsd
Pyspark Ide Starter
⭐
6
Basic Python project and settings to run PySpark on your IDE
Zeppelin Clojure Interpreter
⭐
6
Clojure Plugin for zeppelin
Improved Spark Viz
⭐
6
🐼 WIP Improved visualizations in Spark
Spark Sql Etl Framework
⭐
6
Multi-stage, config driven, SQL based ETL framework using PySpark
Distcomputing
⭐
6
Spark Mesos Airflow Tutorial
⭐
6
Dataday2018
⭐
6
Repo for code of Data Day 2018
Map_reduce Ntua
⭐
6
Lab exercise of Advanced Topics in Database Systems course in NTUA regarding Map Reduce
Sparkintro
⭐
5
Scsparkl
⭐
5
scSPARKL is an Apache spark based pipeline for performing variety of preprocessing and downstream analysis of scRNA-seq data.
Google Collab Tutorial
⭐
5
This contains various setup over google collab and how to use it.
Alpine Python3 Numpy Pandas Sparkcontainer Spark Submit
⭐
5
Using python3.6 alpine base image adds java,pandas, numpy,pyspark and spark as rundeps. This image can be used as container image when you run spark-submit on k8.
Spark For Dummies
⭐
5
Mastering Spark 2 from the very beginning
Spark Notebooks
⭐
5
End To End Machine Learning Model With Mllib In Pyspark
⭐
5
end-to-end Machine Learning model with MLlib in pySpark, For a Binary Classification problem with Imbalanced Classes
Spark Structured Streaming Kafka
⭐
5
Spark Structured Streaming + Kafka + Delta pipeline.
Pyspark Docker
⭐
5
PySpark in Docker Containers
Hackathonclt2018
⭐
5
Machine Learning Pipeline Lr Pyspark
⭐
5
Power Plant ML Pipeline Application - Apache Spark
Spark Traffic
⭐
5
使用Spark批量处理离线交通大数据
Webcat
⭐
5
Dsr Spark Appliedml
⭐
5
DSR Class - Applied Machine Learning with Apache Spark
Pyspark Analytics Workshop
⭐
5
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Python Pyspark (792)
Shell Spark (705)
201-300 of 314 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.