Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for scala big data
big-data
x
scala
x
100 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Flink
⭐
22,747
Apache Flink
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Predictionio
⭐
12,549
PredictionIO, a machine learning server for developers and ML engineers.
Cmak
⭐
11,651
CMAK is a tool for managing Apache Kafka clusters
Zeppelin
⭐
6,259
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Synapseml
⭐
4,960
Simple and Distributed Machine Learning
Hudi
⭐
4,901
Upserts, Deletes And Incremental Processing on Big Data.
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Carbondata
⭐
1,401
High performance data store solution
Utils4s
⭐
1,033
scala、spark使用过程中,各种测试用例以及相关资料整理
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Samza
⭐
792
Mirror of Apache Samza
Gearpump
⭐
758
Lightweight real-time big data streaming engine over Akka
Delta Sharing
⭐
654
An open protocol for secure data sharing
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Nussknacker
⭐
564
Low-code tool for automating actions on real time data | Stream processing for the users.
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Magellan
⭐
509
Geo Spatial Data Analytics on Spark
Kotlin Spark Api
⭐
425
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Hyperspace
⭐
334
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Morpheus
⭐
329
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Mist
⭐
303
Serverless proxy for Spark cluster
Big Data Rosetta Code
⭐
283
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Predictionio Sdk Php
⭐
269
PredictionIO PHP SDK
Geni
⭐
268
A Clojure dataframe library that runs on Spark
Parquet4s
⭐
267
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Succinct
⭐
239
Enabling queries on compressed data.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Azure Event Hubs Spark
⭐
225
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Flink Notes
⭐
223
flink学习笔记
Bigdata
⭐
219
大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理,实时处理,OLAP等,如hadoop、spark、flink、hive、
Predictionio Sdk Python
⭐
198
PredictionIO Python SDK
Predictionio Sdk Ruby
⭐
191
PredictionIO Ruby SDK
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Qbeast Spark
⭐
171
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Incubator Wayang
⭐
162
Apache Wayang(incubating) is the first cross-platform data processing system.
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Spark On Lambda
⭐
144
Apache Spark on AWS Lambda
Eel Sdk
⭐
140
Big Data Toolkit for the JVM
Sparkling Graph
⭐
134
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Flink Web
⭐
133
Apache Flink Website
Flink Shaded
⭐
130
Apache Flink shaded artifacts repository
Griffon Vm
⭐
129
Griffon Data Science Virtual Machine
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Maha
⭐
126
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Clustering4ever
⭐
109
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Spark Website
⭐
109
Apache Spark Website
Predictionio Sdk Java
⭐
106
PredictionIO Java SDK
Bigdata_project
⭐
104
电商大数据项目-推荐系统(java和scala语言)
Crunch
⭐
100
Mirror of Apache Crunch (Incubating)
Samza Hello Samza
⭐
99
Mirror of Apache Samza
Cloudberry
⭐
89
Big Data Visualization
Splash
⭐
86
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Rocket Bi
⭐
79
A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica
Spark Acid
⭐
79
ACID Data Source for Apache Spark based on Hive ACID
Predictionio Template Recommender
⭐
78
PredictionIO Recommendation Engine Template (Scala-based parallelized engine)
Doric
⭐
73
Type safety for spark columns
Predictionio Template Ecom Recommender
⭐
72
PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)
Cleanframes
⭐
70
type-class based data cleansing library for Apache Spark SQL
Awesome Ai Kubernetes
⭐
62
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Salt Core
⭐
60
Looking at big data? Add a little salt.
Spark Records
⭐
58
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Spark Docker
⭐
53
Official Dockerfile for Apache Spark
Spark Select
⭐
53
A library for Spark DataFrame using MinIO Select API
Predictionio Template Similar Product
⭐
50
PredictionIO Similar Product Engine Template (Scala-based parallelized engine)
Spark Betweenness
⭐
44
k Betweenness Centrality algorithm for Spark using GraphX
Docker Spark Cluster
⭐
44
A Spark cluster setup running on Docker containers
Yuzhouwan
⭐
42
Code Library for My Blog
Books
⭐
39
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Flink Book
⭐
38
大数据,流计算,实时计算,Flink框架学习资料。畅销书籍《深入理解Flink核心设计与实践原理》 随书代码,书中讲解的Flink特性均有完整可运行的代码供读者运行和测试。整个工程共有【182个Jav
Predictionio Template Attribute Based Classifier
⭐
38
PredictionIO Classification Engine Template (Scala-based parallelized engine)
Predictionio Template Java Ecom Recommender
⭐
37
PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)
Sharpetl
⭐
36
Write ETL using your favorite SQL dialects
Sparkdemo
⭐
34
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
Predictionio Template Text Classifier
⭐
33
Text Classification Engine
Ides
⭐
32
智能数据探索服务(Intelligent Data Exploration Service),一站式Data + AI数据解决方案!
Telemetry Batch View
⭐
32
A Scala framework to build derived datasets, aka batch views, of Telemetry data.
Enceladus
⭐
28
Dynamic Conformance Engine
Predictionio Template Skeleton
⭐
25
PredictionIO vanilla engine template (Scala-based parallelized engine)
Spark Root
⭐
24
Apache Spark Data Source for ROOT File Format
Movies Analytics In Spark And Scala
⭐
24
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Jetprobe
⭐
24
🚀 Validation DSL for data pipelines
Sparkucx
⭐
23
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Insightedge
⭐
22
InsightEdge Core
Scala Polars
⭐
21
Polars for Scala & Java projects!
Data Flare
⭐
21
Data quality control tool built on spark and deequ
Bigdata
⭐
21
Materials for Bigdatatech Con - Boston
Bigdata Project
⭐
20
大数据相关笔记
Bandar Log
⭐
20
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Pramen
⭐
20
Resilient data pipeline framework running on Apache Spark
Akka Http File Server
⭐
19
akka-http file server for large file upload/download
Sparkprogramminginscala
⭐
18
Apache Spark Course Material
Related Searches
Scala Sbt (4,178)
Scala Spark (3,279)
Scala Akka (2,120)
Java Scala (1,794)
Scala Play Framework (1,309)
Plugin Scala (1,079)
Scala Kafka (969)
Scala Functional Programming (942)
Scala Scalajs (887)
Scala Apache (705)
1-100 of 100 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.