Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for apache spark
apache-spark
x
583 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Mlflow
⭐
16,343
Open source platform for the machine learning lifecycle
Data Engineer Handbook
⭐
5,650
This is a repo with links to everything you'd ever want to learn about data engineering
Hudi
⭐
5,129
Upserts, Deletes And Incremental Processing on Big Data.
Synapseml
⭐
4,989
Simple and Distributed Machine Learning
Bigdl
⭐
4,728
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Sparkinternals
⭐
4,665
Notes talking about the design and implementation of Apache Spark
Lakefs
⭐
3,900
lakeFS - Data version control for your data lake | Git for data
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Coolplayspark
⭐
3,447
酷玩 Spark: Spark 源代码解析、Spark 类库等
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Spark Notebook
⭐
3,148
Interactive and Reactive Data Science using Scala and Spark.
Deequ
⭐
3,044
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Analytics Zoo
⭐
2,592
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Spark On K8s Operator
⭐
2,526
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Ballista
⭐
2,244
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Transmogrifai
⭐
2,099
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Spark
⭐
1,963
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Vega
⭐
1,904
A new arguably faster implementation of Apache Spark from scratch in Rust
Feathr
⭐
1,886
Feathr – A scalable, unified data and AI engineering platform for enterprise
Oryx
⭐
1,793
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Docker Spark
⭐
1,783
Apache Spark docker image
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Sparkit Learn
⭐
1,054
PySpark + Scikit-learn = Sparkit-learn
Spark Sklearn
⭐
1,039
(Deprecated) Scikit-learn integration package for Apache Spark
Splink
⭐
939
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Mobius
⭐
937
C# and F# language binding and extensions to Apache Spark
Sparklyr
⭐
922
R interface for Apache Spark
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Extraction Framework
⭐
802
The software used to extract structured data from Wikipedia
Kafka Storm Starter
⭐
729
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Streaming Readings
⭐
640
Streaming System 相关的论文读物
Flintrock
⭐
627
A command-line tool for launching Apache Spark clusters.
Docker Spark
⭐
626
Docker build for Apache Spark
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Sparkmeasure
⭐
603
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Spark
⭐
597
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Sparklearning
⭐
573
Learning Apache spark,including code and data .Most part can run local.
Quinn
⭐
572
pyspark methods to enhance developer productivity 📣 👯 🎉
Learningsparkv2
⭐
570
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Openscoring
⭐
565
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Awesome Kafka
⭐
549
A list about Apache Kafka
Parquet Dotnet
⭐
457
Fully managed Apache Parquet implementation
Sparkle
⭐
442
Haskell on Apache Spark.
Agile_data_code_2
⭐
435
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Sparkling
⭐
423
A Clojure library for Apache Spark: fast, fully-features, and developer friendly
Spark Corenlp
⭐
409
Stanford CoreNLP wrapper for Apache Spark
Machinelearning
⭐
406
Machine Learning
Spark Perf
⭐
346
Performance tests for Apache Spark
Eclairjs Node
⭐
340
Node.js API for Apache Spark with Remote Client
Wirbelsturm
⭐
333
[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Morpheus
⭐
330
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Parquet Dotnet
⭐
319
🏐 Apache Parquet for modern .NET
Incubator Hivemall
⭐
308
Mirror of Apache Hivemall (incubating)
Mist
⭐
303
Serverless proxy for Spark cluster
Sparkflow
⭐
301
Easy to use library to bring Tensorflow on Apache Spark
Delight
⭐
299
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
Sparktorch
⭐
297
Train and run Pytorch models on Apache Spark.
Data Accelerator
⭐
295
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Akka Analytics
⭐
281
Large-scale event processing with Akka Persistence and Apache Spark
Spark Gotchas
⭐
276
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Spark Programming In Python
⭐
269
Apache Spark 3 - Spark Programming in Python for Beginners
Cuelake
⭐
266
Use SQL to build ELT pipelines on a data lakehouse.
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Spark Indexedrdd
⭐
247
An efficient updatable key-value store for Apache Spark
Succinct
⭐
239
Enabling queries on compressed data.
Spark Workshop
⭐
231
Apache Spark™ and Scala Workshops
Azure Event Hubs Spark
⭐
225
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Ruby Spark
⭐
215
Ruby wrapper for Apache Spark
Spark_dbscan
⭐
215
DBSCAN clustering algorithm on top of Apache Spark
Databricks
⭐
212
Repository of sample Databricks notebooks
Sql Data Analysis And Visualization Projects
⭐
200
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Spark Snowflake
⭐
196
Snowflake Data Source for Apache Spark.
Azure Cosmosdb Spark
⭐
194
Apache Spark Connector for Azure Cosmos DB
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Vn.vitk
⭐
189
A Vietnamese Text Processing Toolkit
Spark.jl
⭐
180
Julia binding for Apache Spark
Whylogs Java
⭐
179
Profile and monitor your ML data pipeline end-to-end
Awesome Ai Infrastructures
⭐
171
Infrastructures™ for Machine Learning Training/Inference in Production.
Learning Hadoop And Spark
⭐
160
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Spark Authorizer
⭐
158
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Spark Operator
⭐
155
Operator for managing the Spark clusters on Kubernetes and OpenShift.
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Spark Ext
⭐
147
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
Dbscan On Spark
⭐
146
An implementation of DBSCAN runing on top of Apache Spark
Spark On Lambda
⭐
144
Apache Spark on AWS Lambda
Albedo
⭐
142
A recommender system for discovering GitHub repos, built with Apache Spark
Sparknotebook
⭐
142
An example of running Apache Spark using Scala in ipython notebook
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Sansa Stack
⭐
139
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Hydrograph
⭐
138
A visual ETL development and debugging tool for big data
Related Searches
Scala Apache Spark (497)
1-100 of 583 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.