Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python apache spark
apache-spark
x
python
x
104 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Mlflow
⭐
16,343
Open source platform for the machine learning lifecycle
Bigdl
⭐
4,728
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Analytics Zoo
⭐
2,592
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Sparkit Learn
⭐
1,054
PySpark + Scikit-learn = Sparkit-learn
Spark Sklearn
⭐
1,039
(Deprecated) Scikit-learn integration package for Apache Spark
Splink
⭐
939
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Flintrock
⭐
627
A command-line tool for launching Apache Spark clusters.
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Sparkmeasure
⭐
603
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Agile_data_code_2
⭐
435
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Sparkflow
⭐
301
Easy to use library to bring Tensorflow on Apache Spark
Sparktorch
⭐
297
Train and run Pytorch models on Apache Spark.
Spark Programming In Python
⭐
269
Apache Spark 3 - Spark Programming in Python for Beginners
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Sql Data Analysis And Visualization Projects
⭐
200
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Albedo
⭐
142
A recommender system for discovering GitHub repos, built with Apache Spark
Sparknotebook
⭐
142
An example of running Apache Spark using Scala in ipython notebook
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Griffon Vm
⭐
129
Griffon Data Science Virtual Machine
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Dataproc Templates
⭐
103
Dataproc templates and pipelines for solving simple in-cloud data tasks
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Qbox Blog Code
⭐
88
Code reference from my Qbox blog posts.
Maggy
⭐
88
Distribution transparent Machine Learning experiments on Apache Spark
Fink Broker
⭐
66
Astronomy Broker based on Apache Spark
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Serverless Spark Workshop
⭐
56
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
Sparker
⭐
53
SparkER: an Entity Resolution framework for Apache Spark
Sparkora
⭐
46
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Liquidsvm
⭐
45
Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, full flexibility for experts, and inclusion of a variety of different learning scenarios: multi-class classification, ROC, and Neyman-Pearson learning, and least-s
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Decision Tree Visualization Spark
⭐
39
🌲 Decision Tree Visualization for Apache Spark
Spark With Python My Learning Notes
⭐
39
ETL pipeline using pyspark (Spark - Python)
Financial Market Data Analysis
⭐
39
Real-Time Financial Market Data Processing and Prediction application
Spylon
⭐
38
Utilities to work with Scala/Java code with py4j
Pyjaws
⭐
36
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Lstm Tensorspark
⭐
35
Implementation of a LSTM with TensorFlow and distributed on Apache Spark
Spark Twitter Sentiment Analysis
⭐
33
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Dlsa
⭐
33
Distributed least squares approximation (dlsa) implemented with Apache Spark
Spark Gpu
⭐
33
GPU Acceleration for Apache Spark
Awesome Tools
⭐
32
curated list of awesome tools and libraries for specific domains
Pysparkcheatsheet
⭐
30
PySpark Cheatsheet
Baskerville
⭐
30
Security Analytics Engine - Anomaly Detection in Web Traffic
Btcspark
⭐
28
A toolkit for using apache spark to efficiently query Bitcoin Blockchain data.
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Amazon Emr Cli
⭐
26
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Pyspark Asyncactions
⭐
26
Asynchronous actions for PySpark
Amazon Emr Vscode Toolkit
⭐
25
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Location Based Restaurants Recommendation System
⭐
23
Big Data Management and Analysis Final Project
Sql Based Etl With Apache Spark On Amazon Eks
⭐
23
A solution that provides declarative data processing capability, and workflow orchestration automation to help your business users (such as analysts and data scientists) access their data and create meaningful insights without the need for manual IT processes.
Learn Hadoop And Spark
⭐
22
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Spark For Data Engineers
⭐
22
Apache Spark for data engineers
Cassandra Spark Analytics
⭐
21
Supercharge your analysis of Cassandra data with Apache Spark
Covid 19 Data Engineering Pipeline
⭐
19
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Cerebro System
⭐
18
Data System for Optimized Deep Learning Model Selection
Py Pair
⭐
17
Pairwise association measures of statistical variable types
Admml
⭐
17
ADMM based Scalable Machine Learning on Spark
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Spark Streaming Visualize
⭐
15
Simple demonstration of how to build a complex real time machine learning visualization tool.
Dsgrid
⭐
14
Python package for working with demand-side grid projects, datasets and queries
Live_log_analyzer_spark
⭐
14
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Edx Introduction To Big Data With Apache Spark
⭐
13
Bigdata Spark
⭐
12
BerkeleyX: CS100.1x, Introduction to Big Data with Apache Spark
Dead Salmon Brain
⭐
11
Apache Spark based framework for analysis A/B experiments
Spark Tensors
⭐
10
Temporary repository for implementing tensor factorization algorithms on Apache Spark
Data_ai_for_all
⭐
10
Data Analysis, Analytics, Science, AI & ML, LLM etc.
Sparkql
⭐
10
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
Spark Pipeline
⭐
9
Machine learning pipeline for Apache Spark
Pybda
⭐
9
💻💻💻 A commandline tool for analysis of big biological data sets for distributed HPC clusters.
Ml_streaming_spark
⭐
9
An introduction to machine learning techniques in the high velocity case (including sequential learning) with Apache Spark.
Hgn
⭐
8
Hybrid Girvan Newman. Code for the "A Distributed Hybrid Community Detection Methodology for Social Networks" paper.
Apache Spark Etl Pipeline Example
⭐
8
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.
Spark Python Celery Demo
⭐
8
Apache Spark Python 3 Async Results with Celery
Spark In A Box
⭐
8
Template-based Dockerfile generator for Apache Spark applications.
K8s Bigdata
⭐
8
Apache Spark with HDFS cluster within Kubernetes
Spark Python Knn
⭐
8
Function for computing K-NN in Apache Spark
Financial Data Project In Azure
⭐
8
Free High-Quality Financial Data in Azure
Spark Privacy Preserver
⭐
7
Anonymizing Library for Apache Spark
Pydata London2015
⭐
7
Spectral Clustering On Apache Spark
⭐
7
Spark’s built- in power iteration clustering (PIC) to simulate an approximate variant of spectral clustering.
Music Recommender System
⭐
7
Music Recommender System using Apache Spark and Python
Traffic Data Analysis With Apache Spark Based On Mobile Robot Data
⭐
7
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
Cve 2022 33891
⭐
7
Apache Spark Command Injection PoC Exploit for CVE-2022-33891
Transactional Datalake Using Apache Iceberg On Aws Glue
⭐
7
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS
Pyspark_pandas
⭐
6
Pyspark + pandas. This may get merged into the SparklingPandas project.
Spark Databricks
⭐
6
🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀
Pyspark Connectors
⭐
6
Chopin2
⭐
6
Domain-Agnostic Supervised Learning with Hyperdimensional Computing
Databrickstraining
⭐
6
Repository for Microsoft Databricks Training Events - Hosted by BlueGranite
Stock Price Prediction Spark Cassandra
⭐
6
This is a data pipeline for predicting stock prices using Apache Spark, Apache Cassandra, and machine learning techniques. It collects and preprocesses stock data from Alpha Vantage API, engineers features, trains models, and performs data analysis and predictions.
Aws Glue Streaming Etl With Apache Iceberg
⭐
5
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
Related Searches
Python Machine Learning (20,195)
Python Docker (14,113)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Pytorch (7,877)
Python Amazon Web Services (7,633)
Python Keras (6,821)
Python Graph (6,224)
Python Pandas (6,193)
1-100 of 104 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.