Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for scala spark
scala
x
spark
x
2,251 search results found
Spark
⭐
35,923
Apache Spark - A unified analytics engine for large-scale data processing
Bigdata Notes
⭐
13,291
大数据入门指南 ⭐️
Deeplearning4j
⭐
12,965
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
Angel
⭐
6,657
A Flexible and Powerful Parameter Server for large-scale machine learning
Zeppelin
⭐
6,060
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Delta
⭐
6,041
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Synapseml
⭐
4,295
Simple and Distributed Machine Learning
Hudi
⭐
4,255
Upserts, Deletes And Incremental Processing on Big Data.
Bigdl
⭐
4,223
Fast, distributed, secure AI for Big Data
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Learning Spark
⭐
3,804
Example code from Learning Spark book
Spark Nlp
⭐
3,273
State of the Art Natural Language Processing
Spark Notebook
⭐
3,138
Interactive and Reactive Data Science using Scala and Spark.
Deequ
⭐
2,806
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Spark Jobserver
⭐
2,735
REST job server for Apache Spark
Analytics Zoo
⭐
2,565
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Scio
⭐
2,454
A Scala API for Apache Beam and Google Cloud Dataflow.
Ballista
⭐
2,244
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Zio Quill
⭐
2,125
Compile-time Language Integrated Queries for Scala
Transmogrifai
⭐
2,099
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Bigdataguide
⭐
1,994
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Spark Cassandra Connector
⭐
1,903
DataStax Spark Cassandra Connector
Docker Spark
⭐
1,783
Apache Spark docker image
Szt Bigdata
⭐
1,702
深圳地铁大数据客流分析系统🚇🚄🌟
Kyuubi
⭐
1,627
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Spark The Definitive Guide
⭐
1,558
Spark: The Definitive Guide's Code Repository
Almond
⭐
1,516
A Scala kernel for Jupyter
Aas
⭐
1,493
Code to accompany Advanced Analytics with Spark from O'Reilly Media
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Mleap
⭐
1,454
MLeap: Deploy ML Pipelines to Production
Movie_recommend
⭐
1,441
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Spark Testing Base
⭐
1,427
Base classes to use when writing tests with Spark
Carbondata
⭐
1,364
High performance data store solution
Lakesoul
⭐
1,304
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Ecosystem
⭐
1,289
Integration of TensorFlow with other open-source frameworks
Geotrellis
⭐
1,269
GeoTrellis is a geographic data processing engine for high performance applications.
Killrweather
⭐
1,174
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
Utils4s
⭐
1,033
scala、spark使用过程中,各种测试用例以及相关资料整理
Pixiedust
⭐
1,030
Python Helper library for Jupyter Notebooks
Spark Csv
⭐
1,009
CSV Data Source for Apache Spark 1.x
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Adam
⭐
946
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Sparkling Water
⭐
945
Sparkling Water provides H2O functionality inside Spark cluster
Spark Scala Tutorial
⭐
922
A free tutorial for Apache Spark.
Graphframes
⭐
922
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Spark Redis
⭐
885
A connector for Spark that allows reading and writing to/from Redis cluster
Tispark
⭐
856
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Frameless
⭐
851
Expressive types for Spark.
Flint
⭐
796
A Time Series Library for Apache Spark
Extraction Framework
⭐
792
The software used to extract structured data from Wikipedia
Incubator Livy
⭐
773
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Sparkctr
⭐
769
CTR prediction model based on spark(LR, GBDT, DNN)
Scriptis
⭐
767
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Kafka Storm Starter
⭐
729
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Incubator Toree
⭐
707
Mirror of Apache Toree (Incubating)
Pkpmspark
⭐
697
awesome 三维数据挖掘 数据分析 & 推荐
Vegas
⭐
682
The missing MatPlotLib for Scala + Spark
Fregata
⭐
674
A light weight, super fast, large scale machine learning library on spark .
Justenoughscalaforspark
⭐
643
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Blinkdb
⭐
625
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
Reference Apps
⭐
615
Spark reference applications
Pythondatascience Collections
⭐
615
最全数据分析资料汇总(含python、爬虫、数据库、大数据、tableau、统计学等)
Freestyle
⭐
615
A cohesive & pragmatic framework of FP centric Scala libraries
Sparknet
⭐
601
Distributed Neural Networks for Spark
Delta Sharing
⭐
590
An open protocol for secure data sharing
Sparklearning
⭐
573
Learning Apache spark,including code and data .Most part can run local.
Learningsparkv2
⭐
570
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Sparkmeasure
⭐
561
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Spark Rapids
⭐
540
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Spark Avro
⭐
535
Avro Data Source for Apache Spark
Eat_pyspark_in_10_days
⭐
534
pyspark🍒🥭 is delicious,just eat it!😋😋
Sparta
⭐
524
Real Time Analytics and Data Pipelines based on Spark Streaming
Spark Redshift
⭐
514
Redshift data source for Apache Spark
Magellan
⭐
509
Geo Spatial Data Analytics on Spark
Spline
⭐
503
Data Lineage Tracking And Visualization Solution
Piflow
⭐
485
πflow is a big data flow engine with spark support
Clickhouse Native Jdbc
⭐
484
ClickHouse Native Protocol JDBC implementation
Shc
⭐
484
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Keystone
⭐
472
Simplifying robust end-to-end machine learning on Apache Spark.
Featran
⭐
465
A Scala feature transformation library for data science and machine learning
Streamdm
⭐
456
Stream Data Mining Library for Spark Streaming
Sparklens
⭐
454
Qubole Sparklens tool for performance tuning Apache Spark
Spark Sql Perf
⭐
452
Spark Xml
⭐
446
XML data source for Spark SQL and DataFrames
Spark Scala Examples
⭐
443
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
Spark Solr
⭐
440
Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Spark Corenlp
⭐
409
Stanford CoreNLP wrapper for Apache Spark
Learningspark
⭐
406
Scala examples for learning to use Spark
Spark Fast Tests
⭐
385
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Kotlin Spark Api
⭐
379
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Connectors
⭐
377
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Spark Excel
⭐
370
A Spark plugin for reading and writing Excel files
Spark Training
⭐
365
Apache Spark training material
Graphx
⭐
353
Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
Spark Jobserver
⭐
348
REST job server for Spark. Note that this is *not* the mainline open source version. For that, go to https://github.com/spark-jobserver/spark-jobserver This fork now serves as a semi-private repo for Ooyala.
Spark Perf
⭐
346
Performance tests for Apache Spark
Hyperspace
⭐
334
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Related Searches
Scala Sbt (4,158)
Scala Akka (2,117)
Python Spark (2,035)
Java Scala (1,794)
Java Spark (1,596)
Scala Play Framework (1,309)
Spark Hadoop (1,199)
Jupyter Notebook Spark (1,151)
Plugin Scala (1,080)
Spark Kafka (985)
1-100 of 2,251 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.