Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for apache spark
apache
x
spark
x
536 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Cookbook
⭐
12,557
The Data Engineering Cookbook
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Iceberg
⭐
5,179
Apache Iceberg
Bigdl
⭐
4,728
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Sparkinternals
⭐
4,665
Notes talking about the design and implementation of Apache Spark
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Roaringbitmap
⭐
3,308
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Tablesaw, and many others
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Spark On K8s Operator
⭐
2,526
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Scio
⭐
2,505
A Scala API for Apache Beam and Google Cloud Dataflow.
Ballista
⭐
2,244
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Spark
⭐
1,963
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Spark Cassandra Connector
⭐
1,929
DataStax Connector for Apache Spark to Apache Cassandra
Elasticsearch Hadoop
⭐
1,914
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Vega
⭐
1,904
A new arguably faster implementation of Apache Spark from scratch in Rust
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Carbondata
⭐
1,401
High performance data store solution
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Killrweather
⭐
1,174
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Spark Scala Tutorial
⭐
922
A free tutorial for Apache Spark.
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Kafka Storm Starter
⭐
729
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Incubator Toree
⭐
721
Mirror of Apache Toree (Incubating)
Sparkr Pkg
⭐
649
R frontend for Spark
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Reference Apps
⭐
615
Spark reference applications
Sparkmeasure
⭐
603
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Elasticsearch Spark Recommender
⭐
603
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Cassandra Lucene Index
⭐
574
Lucene based secondary indexes for Cassandra
Sparklearning
⭐
573
Learning Apache spark,including code and data .Most part can run local.
Streaming Benchmarks
⭐
560
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Hivemall
⭐
508
Scalable machine learning library for Apache Hive/Spark/Pig
Shc
⭐
484
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Ballista
⭐
411
Experimental Distributed Compute Platform based on Kubnernetes and Apache Arrow
Machinelearning
⭐
406
Machine Learning
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stockinference Spark
⭐
376
Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.
Spark Training
⭐
365
Apache Spark training material
Graphx
⭐
353
Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
Distributed Java
⭐
336
Distributed Java.《分布式 Java》
Hyperspace
⭐
334
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Bahir
⭐
325
Mirror of Apache Bahir
Spark Standalone Cluster On Docker
⭐
311
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Sparkflow
⭐
301
Easy to use library to bring Tensorflow on Apache Spark
Neo4j Spark Connector
⭐
300
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Transport
⭐
288
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
Azure Event Hubs
⭐
277
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Spark Programming In Python
⭐
269
Apache Spark 3 - Spark Programming in Python for Beginners
Jpmml Sparkml
⭐
265
Java library and command-line application for converting Apache Spark ML pipelines to PMML
Sparkstreaming
⭐
253
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志
Rust Dataframe
⭐
250
A Rust DataFrame implementation, built on Apache Arrow
Spark Indexedrdd
⭐
247
An efficient updatable key-value store for Apache Spark
Sql Spark Connector
⭐
242
Apache Spark Connector for SQL Server and Azure SQL
Succinct
⭐
239
Enabling queries on compressed data.
Learningapachespark
⭐
233
LearningApacheSpark
Joblib Spark
⭐
226
Joblib Apache Spark Backend
Azure Event Hubs Spark
⭐
225
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Ruby Spark
⭐
215
Ruby wrapper for Apache Spark
Spark_dbscan
⭐
215
DBSCAN clustering algorithm on top of Apache Spark
Spark Elastic
⭐
197
This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.
Awesome Scala
⭐
197
A curated list of awesome Scala frameworks, libraries and software.
Spark Snowflake
⭐
196
Snowflake Data Source for Apache Spark.
Jupyter Spark
⭐
192
Jupyter Notebook extension for Apache Spark integration
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Vn.vitk
⭐
189
A Vietnamese Text Processing Toolkit
Aws Glue Data Catalog Client For Apache Hive Metastore
⭐
184
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog
Spark.jl
⭐
180
Julia binding for Apache Spark
Sparkmonitor
⭐
164
Monitor Apache Spark from Jupyter Notebook
Incubator Wayang
⭐
162
Apache Wayang(incubating) is the first cross-platform data processing system.
Spark Authorizer
⭐
158
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Spark Metrics
⭐
150
Spark metrics related custom classes and sinks (e.g. Prometheus)
Dbscan On Spark
⭐
146
An implementation of DBSCAN runing on top of Apache Spark
Spookystuff
⭐
137
Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark
Apache Spark Node
⭐
134
Node.js bindings for Apache Spark DataFrame APIs
Spark Tsne
⭐
134
Distributed t-SNE via Apache Spark
Envelope
⭐
133
Build configuration-driven ETL pipelines on Apache Spark
Hdfs_fdw
⭐
131
PostgreSQL foreign data wrapper for HDFS
Mastering Apache Spark
⭐
130
This is repository of my YouTube Course on End to End Apache Spark in AIEngineering YouTube Channel
Example Spark Kafka
⭐
118
Apache Spark and Apache Kafka integration example
Docker Spark
⭐
118
Docker image for general apache spark client
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Spark Df Profiling
⭐
115
Create HTML profiling reports from Apache Spark DataFrames
Bdutil
⭐
114
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
Drizzle Spark
⭐
113
Drizzle integration with Apache Spark
Spark Atlas Connector
⭐
112
A Spark Atlas connector to track data lineage in Apache Atlas
Bunsen
⭐
110
Explore, transform, and analyze FHIR data with Apache Spark
Distributed Dataset
⭐
107
A distributed data processing framework in Haskell.
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Ispark
⭐
104
An Apache Spark-shell backend for IPython
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Eclairjs Nashorn
⭐
94
JavaScript API for Apache Spark
Related Searches
Java Apache (4,331)
Scala Spark (3,279)
Php Apache (2,627)
Python Spark (2,053)
Java Spark (1,587)
Javascript Apache (1,522)
Python Apache (1,438)
Shell Apache (1,374)
Docker Apache (1,277)
Spark Hadoop (1,188)
1-100 of 536 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.