Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for scala big data
big-data
x
scala
x
86 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Flink
⭐
22,747
Apache Flink
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Predictionio
⭐
12,541
PredictionIO, a machine learning server for developers and ML engineers.
Cmak
⭐
11,908
CMAK is a tool for managing Apache Kafka clusters
Synapseml
⭐
5,133
Simple and Distributed Machine Learning
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Carbondata
⭐
1,401
High performance data store solution
Utils4s
⭐
1,033
scala、spark使用过程中,各种测试用例以及相关资料整理
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Samza
⭐
792
Mirror of Apache Samza
Gearpump
⭐
758
Lightweight real-time big data streaming engine over Akka
Delta Sharing
⭐
654
An open protocol for secure data sharing
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Nussknacker
⭐
564
Low-code tool for automating actions on real time data | Stream processing for the users.
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Magellan
⭐
509
Geo Spatial Data Analytics on Spark
Kotlin Spark Api
⭐
425
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Morpheus
⭐
339
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Hyperspace
⭐
334
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Big Data Rosetta Code
⭐
283
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Predictionio Sdk Php
⭐
269
PredictionIO PHP SDK
Geni
⭐
268
A Clojure dataframe library that runs on Spark
Parquet4s
⭐
267
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Succinct
⭐
239
Enabling queries on compressed data.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Azure Event Hubs Spark
⭐
225
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Flink Notes
⭐
223
flink学习笔记
Bigdata
⭐
219
大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理,实时处理,OLAP等,如hadoop、spark、flink、hive、
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Predictionio Sdk Ruby
⭐
190
PredictionIO Ruby SDK
Setl
⭐
181
A simple Spark-powered ETL framework that just works 🍺
Qbeast Spark
⭐
171
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Spark On Lambda
⭐
144
Apache Spark on AWS Lambda
Eel Sdk
⭐
140
Big Data Toolkit for the JVM
Sparkling Graph
⭐
134
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Flink Web
⭐
133
Apache Flink Website
Flink Shaded
⭐
130
Apache Flink shaded artifacts repository
Clustering4ever
⭐
109
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Predictionio Sdk Java
⭐
106
PredictionIO Java SDK
Crunch
⭐
100
Mirror of Apache Crunch (Incubating)
Cloudberry
⭐
89
Big Data Visualization
Splash
⭐
86
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Spark Acid
⭐
79
ACID Data Source for Apache Spark based on Hive ACID
Rocket Bi
⭐
79
A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica
Predictionio Template Recommender
⭐
78
PredictionIO Recommendation Engine Template (Scala-based parallelized engine)
Predictionio Template Ecom Recommender
⭐
72
PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)
Cleanframes
⭐
70
type-class based data cleansing library for Apache Spark SQL
Awesome Ai Kubernetes
⭐
62
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Salt Core
⭐
60
Looking at big data? Add a little salt.
Spark Records
⭐
58
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Spark Docker
⭐
53
Official Dockerfile for Apache Spark
Predictionio Template Similar Product
⭐
50
PredictionIO Similar Product Engine Template (Scala-based parallelized engine)
Spark Betweenness
⭐
44
k Betweenness Centrality algorithm for Spark using GraphX
Docker Spark Cluster
⭐
44
A Spark cluster setup running on Docker containers
Yuzhouwan
⭐
42
Code Library for My Blog
Books
⭐
39
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Predictionio Template Attribute Based Classifier
⭐
38
PredictionIO Classification Engine Template (Scala-based parallelized engine)
Flink Book
⭐
38
大数据,流计算,实时计算,Flink框架学习资料。畅销书籍《深入理解Flink核心设计与实践原理》 随书代码,书中讲解的Flink特性均有完整可运行的代码供读者运行和测试。整个工程共有【182个Jav
Predictionio Template Java Ecom Recommender
⭐
37
PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)
Sharpetl
⭐
36
Write ETL using your favorite SQL dialects
Sparkdemo
⭐
34
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
Predictionio Template Text Classifier
⭐
33
Text Classification Engine
Ides
⭐
32
智能数据探索服务(Intelligent Data Exploration Service),一站式Data + AI数据解决方案!
Enceladus
⭐
28
Dynamic Conformance Engine
Spark Root
⭐
24
Apache Spark Data Source for ROOT File Format
Movies Analytics In Spark And Scala
⭐
24
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Sparkucx
⭐
23
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Insightedge
⭐
22
InsightEdge Core
Data Flare
⭐
21
Data quality control tool built on spark and deequ
Bigdata
⭐
21
Materials for Bigdatatech Con - Boston
Bigdata Project
⭐
20
大数据相关笔记
Pramen
⭐
20
Resilient data pipeline framework running on Apache Spark
Akka Http File Server
⭐
19
akka-http file server for large file upload/download
Centrifuge
⭐
18
Data quality tools for Big Data
Sparkprogramminginscala
⭐
18
Apache Spark Course Material
Spark Greenplum Connector
⭐
18
ITSumma Spark Greenplum Connector
Pyspark
⭐
15
spark (scala and python)
Bigdata
⭐
15
小白大数据学习笔记 ⭐
Spark Streaming Monitoring With Lightning
⭐
15
Plot live-stats as graph from ApacheSpark application using Lightning-viz
Bigdata Learning
⭐
14
大数据学习,主要涉及Kafka、ZooKeeper、Hive、HBase、Spark
Spark Genome Alignment Demo
⭐
13
An example of bioinformatics and bigdata tools can playing nicely together
Bigdata_docker
⭐
13
Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook
Kafka Manager
⭐
12
Kafka Manager - A tool for managing Apache Kafka.
Easterbunny
⭐
11
EasterBunny数据分析
Scray
⭐
11
Lambda Architecture Framework for Big Data, Spark, Versioned Data, NoSQL and SQL-Stores.
Bigdata Examples
⭐
9
bigdata examples about spark and flink
Yasp
⭐
9
Yet Another SPark Framework
Metadata Digger
⭐
9
Big Data tool for metadata extraction (Exif), enrichment (using DeepLearning) and analysis
Lambdaconf 2017 Bigdata
⭐
9
Materials for "Big Data Pipelines with Scala" Workshop at LambdaConf 2017
Bigdatademo
⭐
9
The demo of using Kafka, Spark, Hive, Cassandra, etc by using Docker. It produces the production ready environment for any kinds of big data project relates to Hadoop ecosystem
Pygmql
⭐
9
Python Library for data analysis based on GMQL
Clusterindices
⭐
9
This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.
Related Searches
Scala Sbt (4,178)
Scala Spark (3,279)
Scala Akka (2,120)
Java Scala (1,794)
Scala Play Framework (1,309)
Plugin Scala (1,079)
Scala Kafka (969)
Scala Functional Programming (942)
Scala Scalajs (887)
Scala Apache (705)
1-86 of 86 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2025 Awesome Open Source. All rights reserved.