Awesome Open Source

Programming Languages

Search results for scala big data

100 search results found

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

Flink ⭐ 22,747

Bigdata Notes ⭐ 14,872

大数据入门指南 ⭐

Predictionio ⭐ 12,549

PredictionIO, a machine learning server for developers and ML engineers.

Cmak ⭐ 11,651

CMAK is a tool for managing Apache Kafka clusters

Zeppelin ⭐ 6,259

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Synapseml ⭐ 4,960

Simple and Distributed Machine Learning

Upserts, Deletes And Incremental Processing on Big Data.

Bigdataguide ⭐ 2,355

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Carbondata ⭐ 1,401

High performance data store solution

Utils4s ⭐ 1,033

scala、spark使用过程中，各种测试用例以及相关资料整理

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

Sparkling Water ⭐ 957

Sparkling Water provides H2O functionality inside Spark cluster

Tispark ⭐ 872

TiSpark is built for running Apache Spark on top of TiDB/TiKV

Incubator Livy ⭐ 840

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Mirror of Apache Samza

Gearpump ⭐ 758

Lightweight real-time big data streaming engine over Akka

Delta Sharing ⭐ 654

An open protocol for secure data sharing

Spark Rapids ⭐ 619

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

Nussknacker ⭐ 564

Low-code tool for automating actions on real time data | Stream processing for the users.

Data Lineage Tracking And Visualization Solution

Metorikku ⭐ 536

A simplified, lightweight ETL Framework based on Apache Spark

Magellan ⭐ 509

Geo Spatial Data Analytics on Spark

Kotlin Spark Api ⭐ 425

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Hyperspace ⭐ 334

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Morpheus ⭐ 329

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Every Single Day I Tldr ⭐ 311

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Serverless proxy for Spark cluster

Big Data Rosetta Code ⭐ 283

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Predictionio Sdk Php ⭐ 269

PredictionIO PHP SDK

A Clojure dataframe library that runs on Spark

Parquet4s ⭐ 267

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Succinct ⭐ 239

Enabling queries on compressed data.

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Azure Event Hubs Spark ⭐ 225

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Flink Notes ⭐ 223

flink学习笔记

Bigdata ⭐ 219

大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理，实时处理，OLAP等，如hadoop、spark、flink、hive、

Predictionio Sdk Python ⭐ 198

PredictionIO Python SDK

Predictionio Sdk Ruby ⭐ 191

PredictionIO Ruby SDK

Sparkrdma ⭐ 191

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

A simple Spark-powered ETL framework that just works 🍺

Qbeast Spark ⭐ 171

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Incubator Wayang ⭐ 162

Apache Wayang(incubating) is the first cross-platform data processing system.

Bigdata Playground ⭐ 154

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Spark On Lambda ⭐ 144

Apache Spark on AWS Lambda

Eel Sdk ⭐ 140

Big Data Toolkit for the JVM

Sparkling Graph ⭐ 134

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Flink Web ⭐ 133

Apache Flink Website

Flink Shaded ⭐ 130

Apache Flink shaded artifacts repository

Griffon Vm ⭐ 129

Griffon Data Science Virtual Machine

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

Clustering4ever ⭐ 109

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Spark Website ⭐ 109

Apache Spark Website

Predictionio Sdk Java ⭐ 106

PredictionIO Java SDK

Bigdata_project ⭐ 104

电商大数据项目-推荐系统(java和scala语言)

Mirror of Apache Crunch (Incubating)

Samza Hello Samza ⭐ 99

Mirror of Apache Samza

Cloudberry ⭐ 89

Big Data Visualization

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

Rocket Bi ⭐ 79

A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica

Spark Acid ⭐ 79

ACID Data Source for Apache Spark based on Hive ACID

Predictionio Template Recommender ⭐ 78

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Type safety for spark columns

Predictionio Template Ecom Recommender ⭐ 72

PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)

Cleanframes ⭐ 70

type-class based data cleansing library for Apache Spark SQL

Awesome Ai Kubernetes ⭐ 62

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Salt Core ⭐ 60

Looking at big data? Add a little salt.

Spark Records ⭐ 58

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Spark Docker ⭐ 53

Official Dockerfile for Apache Spark

Spark Select ⭐ 53

A library for Spark DataFrame using MinIO Select API

Predictionio Template Similar Product ⭐ 50

PredictionIO Similar Product Engine Template (Scala-based parallelized engine)

Spark Betweenness ⭐ 44

k Betweenness Centrality algorithm for Spark using GraphX

Docker Spark Cluster ⭐ 44

A Spark cluster setup running on Docker containers

Yuzhouwan ⭐ 42

Code Library for My Blog

整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据、推荐系统、数据库、数据挖掘、机器学习、深度学习、算法等。

Flink Book ⭐ 38

大数据，流计算，实时计算，Flink框架学习资料。畅销书籍《深入理解Flink核心设计与实践原理》随书代码,书中讲解的Flink特性均有完整可运行的代码供读者运行和测试。整个工程共有【182个Jav

Predictionio Template Attribute Based Classifier ⭐ 38

PredictionIO Classification Engine Template (Scala-based parallelized engine)

Predictionio Template Java Ecom Recommender ⭐ 37

PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)

Sharpetl ⭐ 36

Write ETL using your favorite SQL dialects

Sparkdemo ⭐ 34

spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)

Predictionio Template Text Classifier ⭐ 33

Text Classification Engine

智能数据探索服务(Intelligent Data Exploration Service)，一站式Data + AI数据解决方案！

Telemetry Batch View ⭐ 32

A Scala framework to build derived datasets, aka batch views, of Telemetry data.

Enceladus ⭐ 28

Dynamic Conformance Engine

Predictionio Template Skeleton ⭐ 25

PredictionIO vanilla engine template (Scala-based parallelized engine)

Spark Root ⭐ 24

Apache Spark Data Source for ROOT File Format

Movies Analytics In Spark And Scala ⭐ 24

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Jetprobe ⭐ 24

🚀 Validation DSL for data pipelines

Sparkucx ⭐ 23

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Insightedge ⭐ 22

InsightEdge Core

Scala Polars ⭐ 21

Polars for Scala & Java projects!

Data Flare ⭐ 21

Data quality control tool built on spark and deequ

Materials for Bigdatatech Con - Boston

Bigdata Project ⭐ 20

大数据相关笔记

Bandar Log ⭐ 20

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Resilient data pipeline framework running on Apache Spark

Akka Http File Server ⭐ 19

akka-http file server for large file upload/download

Sparkprogramminginscala ⭐ 18

Apache Spark Course Material

Related Searches

Scala Sbt (4,178)

Scala Spark (3,279)

Scala Akka (2,120)

Java Scala (1,794)

Scala Play Framework (1,309)

Plugin Scala (1,079)

Scala Kafka (969)

Scala Functional Programming (942)

Scala Scalajs (887)

Scala Apache (705)

1-100 of 100 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.