Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python apache spark
apache-spark
x
python
x
86 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Mlflow
⭐
16,343
Open source platform for the machine learning lifecycle
Bigdl
⭐
4,728
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Analytics Zoo
⭐
2,592
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Sparkit Learn
⭐
1,054
PySpark + Scikit-learn = Sparkit-learn
Spark Sklearn
⭐
1,039
(Deprecated) Scikit-learn integration package for Apache Spark
Splink
⭐
939
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Flintrock
⭐
627
A command-line tool for launching Apache Spark clusters.
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Sparkmeasure
⭐
603
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Sparkflow
⭐
301
Easy to use library to bring Tensorflow on Apache Spark
Sparktorch
⭐
297
Train and run Pytorch models on Apache Spark.
Spark Programming In Python
⭐
269
Apache Spark 3 - Spark Programming in Python for Beginners
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Sparknotebook
⭐
142
An example of running Apache Spark using Scala in ipython notebook
Albedo
⭐
142
A recommender system for discovering GitHub repos, built with Apache Spark
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Maggy
⭐
88
Distribution transparent Machine Learning experiments on Apache Spark
Qbox Blog Code
⭐
88
Code reference from my Qbox blog posts.
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Sparker
⭐
53
SparkER: an Entity Resolution framework for Apache Spark
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Liquidsvm
⭐
45
Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, full flexibility for experts, and inclusion of a variety of different learning scenarios: multi-class classification, ROC, and Neyman-Pearson learning, and least-s
Decision Tree Visualization Spark
⭐
39
🌲 Decision Tree Visualization for Apache Spark
Financial Market Data Analysis
⭐
39
Real-Time Financial Market Data Processing and Prediction application
Spark With Python My Learning Notes
⭐
39
ETL pipeline using pyspark (Spark - Python)
Spylon
⭐
38
Utilities to work with Scala/Java code with py4j
Lstm Tensorspark
⭐
35
Implementation of a LSTM with TensorFlow and distributed on Apache Spark
Dlsa
⭐
33
Distributed least squares approximation (dlsa) implemented with Apache Spark
Spark Twitter Sentiment Analysis
⭐
33
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Spark Gpu
⭐
33
GPU Acceleration for Apache Spark
Awesome Tools
⭐
32
curated list of awesome tools and libraries for specific domains
Pysparkcheatsheet
⭐
30
PySpark Cheatsheet
Baskerville
⭐
30
Security Analytics Engine - Anomaly Detection in Web Traffic
Btcspark
⭐
28
A toolkit for using apache spark to efficiently query Bitcoin Blockchain data.
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Amazon Emr Cli
⭐
26
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Amazon Emr Vscode Toolkit
⭐
25
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Sql Based Etl With Apache Spark On Amazon Eks
⭐
23
A solution that provides declarative data processing capability, and workflow orchestration automation to help your business users (such as analysts and data scientists) access their data and create meaningful insights without the need for manual IT processes.
Learn Hadoop And Spark
⭐
22
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Spark For Data Engineers
⭐
22
Apache Spark for data engineers
Cassandra Spark Analytics
⭐
21
Supercharge your analysis of Cassandra data with Apache Spark
Covid 19 Data Engineering Pipeline
⭐
19
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Cerebro System
⭐
18
Data System for Optimized Deep Learning Model Selection
Admml
⭐
17
ADMM based Scalable Machine Learning on Spark
Py Pair
⭐
17
Pairwise association measures of statistical variable types
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Spark Streaming Visualize
⭐
15
Simple demonstration of how to build a complex real time machine learning visualization tool.
Dsgrid
⭐
14
Python package for working with demand-side grid projects, datasets and queries
Live_log_analyzer_spark
⭐
14
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Edx Introduction To Big Data With Apache Spark
⭐
13
Bigdata Spark
⭐
12
BerkeleyX: CS100.1x, Introduction to Big Data with Apache Spark
Data_ai_for_all
⭐
10
Data Analysis, Analytics, Science, AI & ML, LLM etc.
Spark Tensors
⭐
10
Temporary repository for implementing tensor factorization algorithms on Apache Spark
Sparkql
⭐
10
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
Spark Pipeline
⭐
9
Machine learning pipeline for Apache Spark
Ml_streaming_spark
⭐
9
An introduction to machine learning techniques in the high velocity case (including sequential learning) with Apache Spark.
Spark Python Knn
⭐
8
Function for computing K-NN in Apache Spark
K8s Bigdata
⭐
8
Apache Spark with HDFS cluster within Kubernetes
Hgn
⭐
8
Hybrid Girvan Newman. Code for the "A Distributed Hybrid Community Detection Methodology for Social Networks" paper.
Apache Spark Etl Pipeline Example
⭐
8
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.
Financial Data Project In Azure
⭐
8
Free High-Quality Financial Data in Azure
Spark In A Box
⭐
8
Template-based Dockerfile generator for Apache Spark applications.
Spark Privacy Preserver
⭐
7
Anonymizing Library for Apache Spark
Traffic Data Analysis With Apache Spark Based On Mobile Robot Data
⭐
7
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
Music Recommender System
⭐
7
Music Recommender System using Apache Spark and Python
Transactional Datalake Using Apache Iceberg On Aws Glue
⭐
7
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS
Pyspark_pandas
⭐
6
Pyspark + pandas. This may get merged into the SparklingPandas project.
Stock Price Prediction Spark Cassandra
⭐
6
This is a data pipeline for predicting stock prices using Apache Spark, Apache Cassandra, and machine learning techniques. It collects and preprocesses stock data from Alpha Vantage API, engineers features, trains models, and performs data analysis and predictions.
Spark Databricks
⭐
6
🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀
Pyspark Connectors
⭐
6
Chopin2
⭐
6
Domain-Agnostic Supervised Learning with Hyperdimensional Computing
Databrickstraining
⭐
6
Repository for Microsoft Databricks Training Events - Hosted by BlueGranite
Spark For Dummies
⭐
5
Mastering Spark 2 from the very beginning
Aws Glue Streaming Etl With Apache Iceberg
⭐
5
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
Spark Prometheus
⭐
5
Prometheus Connector for Apache Spark
Apachespark Pyspark 2023
⭐
5
PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.
Spark Structured Streaming Kafka
⭐
5
Spark Structured Streaming + Kafka + Delta pipeline.
Pyspark Docker
⭐
5
PySpark in Docker Containers
Irecognize
⭐
5
iRecognize: Visual Intelligence Made Easy
Genespark
⭐
5
geneSpark is a bioinformatics software program written in Python and Apache Spark for big data epigenetic histone modification ChIP-seq analysis.
Related Searches
Python Machine Learning (20,195)
Python Docker (14,113)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Pytorch (7,877)
Python Amazon Web Services (7,633)
Python Keras (6,821)
Python Graph (6,224)
Python Pandas (6,193)
1-86 of 86 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.