Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for scala spark
scala
x
spark
x
1,206 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Deeplearning4j
⭐
13,873
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
Angel
⭐
6,690
A Flexible and Powerful Parameter Server for large-scale machine learning
Synapseml
⭐
5,108
Simple and Distributed Machine Learning
Bigdl
⭐
4,728
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Learning Spark
⭐
3,804
Example code from Learning Spark book
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Deequ
⭐
3,044
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Spark Jobserver
⭐
2,837
REST job server for Apache Spark
Analytics Zoo
⭐
2,592
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Scio
⭐
2,505
A Scala API for Apache Beam and Google Cloud Dataflow.
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Ballista
⭐
2,244
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Zio Quill
⭐
2,152
Compile-time Language Integrated Queries for Scala
Szt Bigdata
⭐
2,055
深圳地铁大数据客流分析系统🚇🚄🌟
Spark Cassandra Connector
⭐
1,929
DataStax Connector for Apache Spark to Apache Cassandra
Kyuubi
⭐
1,849
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Docker Spark
⭐
1,783
Apache Spark docker image
Almond
⭐
1,611
A Scala kernel for Jupyter
Mleap
⭐
1,479
MLeap: Deploy ML Pipelines to Production
Spark Testing Base
⭐
1,475
Base classes to use when writing tests with Spark
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Movie_recommend
⭐
1,441
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Carbondata
⭐
1,401
High performance data store solution
Ecosystem
⭐
1,289
Integration of TensorFlow with other open-source frameworks
Utils4s
⭐
1,033
scala、spark使用过程中,各种测试用例以及相关资料整理
Spark Csv
⭐
1,009
CSV Data Source for Apache Spark 1.x
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Graphframes
⭐
944
Spark Redis
⭐
926
A connector for Spark that allows reading and writing to/from Redis cluster
Spark Scala Tutorial
⭐
922
A free tutorial for Apache Spark.
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Sparkctr
⭐
909
CTR prediction model based on spark(LR, GBDT, DNN)
Frameless
⭐
882
Expressive types for Spark.
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Flint
⭐
796
A Time Series Library for Apache Spark
Scriptis
⭐
767
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Kafka Storm Starter
⭐
729
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Incubator Toree
⭐
721
Mirror of Apache Toree (Incubating)
Pkpmspark
⭐
697
awesome 三维数据挖掘 数据分析 & 推荐
Fregata
⭐
674
A light weight, super fast, large scale machine learning library on spark .
Delta Sharing
⭐
654
An open protocol for secure data sharing
Justenoughscalaforspark
⭐
643
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Blinkdb
⭐
625
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Pythondatascience Collections
⭐
615
最全数据分析资料汇总(含python、爬虫、数据库、大数据、tableau、统计学等)
Reference Apps
⭐
615
Spark reference applications
Freestyle
⭐
615
A cohesive & pragmatic framework of FP centric Scala libraries
Sparkmeasure
⭐
603
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Sparknet
⭐
601
Distributed Neural Networks for Spark
Sparklearning
⭐
573
Learning Apache spark,including code and data .Most part can run local.
Learningsparkv2
⭐
570
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Spark Avro
⭐
535
Avro Data Source for Apache Spark
Eat_pyspark_in_10_days
⭐
534
pyspark🍒🥭 is delicious,just eat it!😋😋
Sparta
⭐
525
Real Time Analytics and Data Pipelines based on Spark Streaming
Sparklens
⭐
520
Qubole Sparklens tool for performance tuning Apache Spark
Spark Redshift
⭐
514
Redshift data source for Apache Spark
Magellan
⭐
509
Geo Spatial Data Analytics on Spark
Clickhouse Native Jdbc
⭐
502
ClickHouse Native Protocol JDBC implementation
Piflow
⭐
498
πflow is a big data flow engine with spark support
Shc
⭐
484
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Keystone
⭐
472
Simplifying robust end-to-end machine learning on Apache Spark.
Featran
⭐
465
A Scala feature transformation library for data science and machine learning
Streamdm
⭐
456
Stream Data Mining Library for Spark Streaming
Spark Sql Perf
⭐
452
Spark Scala Examples
⭐
443
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
Kotlin Spark Api
⭐
425
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Learningspark
⭐
406
Scala examples for learning to use Spark
Spark Fast Tests
⭐
385
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Connectors
⭐
383
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Spark Training
⭐
365
Apache Spark training material
Graphx
⭐
353
Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
Spark Jobserver
⭐
348
REST job server for Spark. Note that this is *not* the mainline open source version. For that, go to https://github.com/spark-jobserver/spark-jobserver This fork now serves as a semi-private repo for Ooyala.
Spark Perf
⭐
346
Performance tests for Apache Spark
Hyperspace
⭐
334
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Bahir
⭐
325
Mirror of Apache Bahir
Cloudflow
⭐
323
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Spark Sql On Hbase
⭐
319
Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
Spark Standalone Cluster On Docker
⭐
311
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Spark Mongodb
⭐
308
Spark library for easy MongoDB access
Koober
⭐
301
Neo4j Spark Connector
⭐
300
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Sparklint
⭐
293
A tool for monitoring and tuning Spark jobs for efficiency.
Spark Hbase Connector
⭐
287
Connect Spark to HBase for reading and writing data with ease
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Big Data Rosetta Code
⭐
283
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Akka Analytics
⭐
281
Large-scale event processing with Akka Persistence and Apache Spark
Hbase Rdd
⭐
278
Spark RDD to read, write and delete from HBase
Geni
⭐
268
A Clojure dataframe library that runs on Spark
Spark Tfrecord
⭐
255
Read and write Tensorflow TFRecord data from Apache Spark.
Sparkstreaming
⭐
253
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志
Related Searches
Scala Sbt (4,158)
Scala Akka (2,120)
Python Spark (2,053)
Java Scala (1,794)
Java Spark (1,587)
Scala Play Framework (1,309)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Plugin Scala (1,079)
1-100 of 1,206 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2025 Awesome Open Source. All rights reserved.