Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python apache spark
apache-spark
x
python
x
147 search results found
Spark
⭐
35,923
Apache Spark - A unified analytics engine for large-scale data processing
Mlflow
⭐
14,527
Open source platform for the machine learning lifecycle
Bigdl
⭐
4,223
Fast, distributed, secure AI for Big Data
Koalas
⭐
3,228
Koalas: pandas API on Apache Spark
Analytics Zoo
⭐
2,565
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Sparkit Learn
⭐
1,054
PySpark + Scikit-learn = Sparkit-learn
Spark Sklearn
⭐
1,039
(Deprecated) Scikit-learn integration package for Apache Spark
Incubator Livy
⭐
773
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Splink
⭐
658
Fast, accurate and scalable probabilistic data linkage using your choice of SQL backend
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Flintrock
⭐
604
A command-line tool for launching Apache Spark clusters.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Sparkmeasure
⭐
561
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Quinn
⭐
456
pyspark methods to enhance developer productivity 📣 👯 🎉
Agile_data_code_2
⭐
435
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Sparktorch
⭐
297
Train and run Pytorch models on Apache Spark.
Sparkflow
⭐
290
Easy to use library to bring Tensorflow on Apache Spark
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Sql Data Analysis And Visualization Projects
⭐
200
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Sparknotebook
⭐
142
An example of running Apache Spark using Scala in ipython notebook
Albedo
⭐
142
A recommender system for discovering GitHub repos, built with Apache Spark
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Griffon Vm
⭐
129
Griffon Data Science Virtual Machine
Aut
⭐
122
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Qbox Blog Code
⭐
88
Code reference from my Qbox blog posts.
Maggy
⭐
81
Distribution transparent Machine Learning experiments on Apache Spark
Fink Broker
⭐
66
Astronomy Broker based on Apache Spark
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Sparker
⭐
53
SparkER: an Entity Resolution framework for Apache Spark
Sparkora
⭐
46
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Serverless Spark Workshop
⭐
45
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Liquidsvm
⭐
45
Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, full flexibility for experts, and inclusion of a variety of different learning scenarios: multi-class classification, ROC, and Neyman-Pearson learning, and least-s
Decision Tree Visualization Spark
⭐
39
🌲 Decision Tree Visualization for Apache Spark
Financial Market Data Analysis
⭐
39
Real-Time Financial Market Data Processing and Prediction application
Spark With Python My Learning Notes
⭐
39
ETL pipeline using pyspark (Spark - Python)
Spylon
⭐
38
Utilities to work with Scala/Java code with py4j
Lstm Tensorspark
⭐
35
Implementation of a LSTM with TensorFlow and distributed on Apache Spark
Spark Twitter Sentiment Analysis
⭐
33
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Spark Gpu
⭐
33
GPU Acceleration for Apache Spark
Dlsa
⭐
33
Distributed least squares approximation (dlsa) implemented with Apache Spark
Awesome Tools
⭐
32
curated list of awesome tools and libraries for specific domains
Pysparkcheatsheet
⭐
30
PySpark Cheatsheet
Baskerville
⭐
30
Security Analytics Engine - Anomaly Detection in Web Traffic
Btcspark
⭐
28
A toolkit for using apache spark to efficiently query Bitcoin Blockchain data.
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Pyspark Asyncactions
⭐
26
Asynchronous actions for PySpark
Location Based Restaurants Recommendation System
⭐
23
Big Data Management and Analysis Final Project
Spark Programming In Python
⭐
23
Apache Spark 3 - Spark Programming in Python for Beginners
Spark For Data Engineers
⭐
22
Apache Spark for data engineers
Learn Hadoop And Spark
⭐
22
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Cassandra Spark Analytics
⭐
21
Supercharge your analysis of Cassandra data with Apache Spark
Covid 19 Data Engineering Pipeline
⭐
18
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Sql Based Etl With Apache Spark On Amazon Eks
⭐
18
A solution that provides declarative data processing capability, and workflow orchestration automation to help your business users (such as analysts and data scientists) access their data and create meaningful insights without the need for manual IT processes.
Cerebro System
⭐
18
Data System for Optimized Deep Learning Model Selection
Py Pair
⭐
17
Pairwise association measures of statistical variable types
Admml
⭐
17
ADMM based Scalable Machine Learning on Spark
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Spark Streaming Visualize
⭐
15
Simple demonstration of how to build a complex real time machine learning visualization tool.
Amazon Emr Vscode Toolkit
⭐
14
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Live_log_analyzer_spark
⭐
14
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Dsgrid
⭐
13
Python package for working with demand-side grid projects, datasets and queries
Edx Introduction To Big Data With Apache Spark
⭐
13
Bigdata Spark
⭐
12
BerkeleyX: CS100.1x, Introduction to Big Data with Apache Spark
Dead Salmon Brain
⭐
11
Apache Spark based framework for analysis A/B experiments
Amazon Emr Cli
⭐
11
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Spark Tensors
⭐
10
Temporary repository for implementing tensor factorization algorithms on Apache Spark
Spark Pipeline
⭐
9
Machine learning pipeline for Apache Spark
Ml_streaming_spark
⭐
9
An introduction to machine learning techniques in the high velocity case (including sequential learning) with Apache Spark.
Pybda
⭐
9
💻💻💻 A commandline tool for analysis of big biological data sets for distributed HPC clusters.
Spark Python Knn
⭐
8
Function for computing K-NN in Apache Spark
Spark In A Box
⭐
8
Template-based Dockerfile generator for Apache Spark applications.
Apache Spark Etl Pipeline Example
⭐
8
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.
Hgn
⭐
8
Hybrid Girvan Newman. Code for the "A Distributed Hybrid Community Detection Methodology for Social Networks" paper.
Spark Python Celery Demo
⭐
8
Apache Spark Python 3 Async Results with Celery
Financial Data Project In Azure
⭐
8
Free High-Quality Financial Data in Azure
Spectral Clustering On Apache Spark
⭐
7
Spark’s built- in power iteration clustering (PIC) to simulate an approximate variant of spectral clustering.
Sparkql
⭐
7
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
Traffic Data Analysis With Apache Spark Based On Mobile Robot Data
⭐
7
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
Music Recommender System
⭐
7
Music Recommender System using Apache Spark and Python
Spark Privacy Preserver
⭐
7
Anonymizing Library for Apache Spark
Pydata London2015
⭐
7
Cve 2022 33891
⭐
7
Apache Spark Command Injection PoC Exploit for CVE-2022-33891
Pyspark Connectors
⭐
6
Pyspark_pandas
⭐
6
Pyspark + pandas. This may get merged into the SparklingPandas project.
Databrickstraining
⭐
6
Repository for Microsoft Databricks Training Events - Hosted by BlueGranite
Genespark
⭐
5
geneSpark is a bioinformatics software program written in Python and Apache Spark for big data epigenetic histone modification ChIP-seq analysis.
Spark Prometheus
⭐
5
Prometheus Connector for Apache Spark
Transactional Datalake Using Apache Iceberg On Aws Glue
⭐
5
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS
Aws Glue Streaming Etl With Apache Iceberg
⭐
5
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
Data_analysis_pandas_spark_koalas
⭐
5
Data Analytics with Pandas, Spark, Koalas & Visualization
Pyspark Docker
⭐
5
PySpark in Docker Containers
Spark Streaming In Python
⭐
5
Apache Spark 3 - Structured Streaming Course Material
Spark For Dummies
⭐
5
Mastering Spark 2 from the very beginning
Related Searches
Python Python3 (857,414)
Python Ml (20,195)
Python Pytorch (14,667)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Aws (7,633)
Python Ai (6,875)
1-100 of 147 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.