Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for apache spark
apache
x
spark
x
830 search results found
Spark
⭐
36,808
Apache Spark - A unified analytics engine for large-scale data processing
Cookbook
⭐
11,769
The Data Engineering Cookbook
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Zeppelin
⭐
6,156
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Iceberg
⭐
4,790
Apache Iceberg
Sparkinternals
⭐
4,665
Notes talking about the design and implementation of Apache Spark
Bigdl
⭐
4,392
Accelerating LLM with low-bit (INT3 / INT4 / NF4 / INT5 / INT8) optimizations using bigdl-llm
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Spark Nlp
⭐
3,436
State of the Art Natural Language Processing
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Roaringbitmap
⭐
3,209
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas and Tablesaw
Scio
⭐
2,484
A Scala API for Apache Beam and Google Cloud Dataflow.
Spark On K8s Operator
⭐
2,430
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Ballista
⭐
2,244
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Spark
⭐
1,930
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Elasticsearch Hadoop
⭐
1,915
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Spark Cassandra Connector
⭐
1,912
DataStax Connector for Apache Spark to Apache Cassandra
Vega
⭐
1,904
A new arguably faster implementation of Apache Spark from scratch in Rust
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Carbondata
⭐
1,376
High performance data store solution
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Killrweather
⭐
1,174
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Adam
⭐
955
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Spark Scala Tutorial
⭐
922
A free tutorial for Apache Spark.
Tispark
⭐
862
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Incubator Livy
⭐
819
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Kafka Storm Starter
⭐
729
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Incubator Toree
⭐
719
Mirror of Apache Toree (Incubating)
Sparkr Pkg
⭐
649
R frontend for Spark
Reference Apps
⭐
615
Spark reference applications
Elasticsearch Spark Recommender
⭐
603
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Sparkmeasure
⭐
591
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Spark Rapids
⭐
577
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Cassandra Lucene Index
⭐
574
Lucene based secondary indexes for Cassandra
Sparklearning
⭐
573
Learning Apache spark,including code and data .Most part can run local.
Streaming Benchmarks
⭐
560
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
Spline
⭐
538
Data Lineage Tracking And Visualization Solution
Hivemall
⭐
508
Scalable machine learning library for Apache Hive/Spark/Pig
Shc
⭐
484
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Ballista
⭐
411
Experimental Distributed Compute Platform based on Kubnernetes and Apache Arrow
Machinelearning
⭐
406
Machine Learning
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stockinference Spark
⭐
376
Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.
Spark Training
⭐
365
Apache Spark training material
Graphx
⭐
353
Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
Distributed Java
⭐
336
Distributed Java.《分布式 Java》
Hyperspace
⭐
334
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Bahir
⭐
325
Mirror of Apache Bahir
Spark Standalone Cluster On Docker
⭐
311
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡️
Every Single Day I Tldr
⭐
307
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Neo4j Spark Connector
⭐
297
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Sparkflow
⭐
290
Easy to use library to bring Tensorflow on Apache Spark
Transport
⭐
278
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
Azure Event Hubs
⭐
277
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Jpmml Sparkml
⭐
265
Java library and command-line application for converting Apache Spark ML pipelines to PMML
Sparkstreaming
⭐
253
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志
Rust Dataframe
⭐
250
A Rust DataFrame implementation, built on Apache Arrow
Spark Indexedrdd
⭐
247
An efficient updatable key-value store for Apache Spark
Sql Spark Connector
⭐
242
Apache Spark Connector for SQL Server and Azure SQL
Succinct
⭐
239
Enabling queries on compressed data.
Spark Programming In Python
⭐
234
Apache Spark 3 - Spark Programming in Python for Beginners
Joblib Spark
⭐
226
Joblib Apache Spark Backend
Azure Event Hubs Spark
⭐
225
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Ruby Spark
⭐
215
Ruby wrapper for Apache Spark
Spark_dbscan
⭐
215
DBSCAN clustering algorithm on top of Apache Spark
Bigdl Tutorials
⭐
201
Step-by-step Deep Leaning Tutorials on Apache Spark using BigDL
Spark Elastic
⭐
197
This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.
Awesome Scala
⭐
196
A curated list of awesome Scala frameworks, libraries and software.
Jupyter Spark
⭐
192
Jupyter Notebook extension for Apache Spark integration
Learningapachespark
⭐
192
LearningApacheSpark
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Vn.vitk
⭐
189
A Vietnamese Text Processing Toolkit
Spark Snowflake
⭐
188
Snowflake Data Source for Apache Spark.
Spark.jl
⭐
180
Julia binding for Apache Spark
Aws Glue Data Catalog Client For Apache Hive Metastore
⭐
171
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog
Sparkmonitor
⭐
164
Monitor Apache Spark from Jupyter Notebook
Spark Authorizer
⭐
158
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Spark Metrics
⭐
150
Spark metrics related custom classes and sinks (e.g. Prometheus)
Dbscan On Spark
⭐
146
An implementation of DBSCAN runing on top of Apache Spark
Incubator Wayang
⭐
142
Apache Wayang(incubating) is the first cross-platform data processing system.
Spookystuff
⭐
135
Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark
Spark Tsne
⭐
134
Distributed t-SNE via Apache Spark
Apache Spark Node
⭐
134
Node.js bindings for Apache Spark DataFrame APIs
Envelope
⭐
133
Build configuration-driven ETL pipelines on Apache Spark
Hdfs_fdw
⭐
131
PostgreSQL foreign data wrapper for HDFS
Mastering Apache Spark
⭐
130
This is repository of my YouTube Course on End to End Apache Spark in AIEngineering YouTube Channel
Example Spark Kafka
⭐
118
Apache Spark and Apache Kafka integration example
Docker Spark
⭐
118
Docker image for general apache spark client
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Spark Df Profiling
⭐
115
Create HTML profiling reports from Apache Spark DataFrames
Bdutil
⭐
114
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
Drizzle Spark
⭐
113
Drizzle integration with Apache Spark
Spark Atlas Connector
⭐
112
A Spark Atlas connector to track data lineage in Apache Atlas
Stocator
⭐
108
Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.
Distributed Dataset
⭐
107
A distributed data processing framework in Haskell.
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Ispark
⭐
104
An Apache Spark-shell backend for IPython
Related Searches
Java Apache (4,281)
Scala Spark (3,279)
Php Apache (2,291)
Python Spark (2,043)
Java Spark (1,596)
Javascript Apache (1,555)
Python Apache (1,438)
Shell Apache (1,374)
Jupyter Notebook Spark (1,284)
Docker Apache (1,277)
1-100 of 830 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.