Awesome Open Source

Programming Languages

Search results for spark flink

126 search results found

Bigdata Notes ⭐ 14,872

大数据入门指南 ⭐

Flink Learning ⭐ 13,801

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《 Flink 实战与性能优化》

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

Zeppelin ⭐ 6,259

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Risingwave ⭐ 5,799

The distributed streaming database. Engineered to offer the simplest and most cost-efficient way for stream processing and management.

Iceberg ⭐ 5,179

Dataspherestudio ⭐ 2,860

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Analytics Zoo ⭐ 2,592

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

Bigdataguide ⭐ 2,355

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Lakesoul ⭐ 2,248

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Szt Bigdata ⭐ 2,055

深圳地铁大数据客流分析系统🚇🚄🌟

Quicksql ⭐ 1,939

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources

Incubator Paimon ⭐ 1,647

Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.

Bigdata Interview ⭐ 1,397

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop

Bigdata Growth ⭐ 1,256

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Taier ⭐ 1,220

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display

Coding Now ⭐ 925

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、

Hadoop_study ⭐ 817

定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Streaming Readings ⭐ 640

Streaming System 相关的论文读物

Wedatasphere ⭐ 624

WeDataSphere is a financial grade, one-stop big data platform suite.

Pythondatascience Collections ⭐ 615

最全数据分析资料汇总（含python、爬虫、数据库、大数据、tableau、统计学等）

Streaming Benchmarks ⭐ 560

Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...

Data Engineering Interview Questions ⭐ 554

More than 2000+ Data engineer interview questions.

Featran ⭐ 465

A Scala feature transformation library for data science and machine learning

Stream computing platform for bigdata

Ecommercerecommendsystem ⭐ 350

商品大数据实时推荐系统。前端：Vue + TypeScript + ElementUI，后端 Spring + Spark

Cloudflow ⭐ 323

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Big Whale ⭐ 290

Spark、Flink等离线任务的调度以及实时任务的监控

Compass ⭐ 284

Compass is a task diagnosis platform for bigdata

Kamu Cli ⭐ 263

New generation decentralized data lake and a streaming data pipeline

Cloudshuffleservice ⭐ 204

Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.

Big Data ⭐ 190

一个开源、成体系的大数据学习教程。spark学习 hadoop hive hbase flink教程 linux 从入门到精通

Bigdata Hub ⭐ 187

数据建设与大数据技术知识体系，包含hadoop、hive、spark、flink主流框架和系列框架，

Sparkstreaming ⭐ 183

💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算)；🚀 支持运行过程中增删topic；🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase

Kafka Book ⭐ 167

《Kafka技术内幕》代码

Bigdata ⭐ 142

hadoop,hbase,storm,spark,etc..

Sansa Stack ⭐ 139

Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/

Bigdata Learning ⭐ 136

大数据学习记录

A distributed scheduling framework supporting DAG workflow for big data and regular jobs, providing programmable job types across different languages.

Java_learning_practice ⭐ 118

java 进阶之路：面试高频算法、akka、多线程、NIO、Netty、SpringBoot、Spark&&F 等

Xichuan_note ⭐ 114

xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件

Pulsar Spark ⭐ 103

Spark Connector to read and write with Pulsar

Flink Spark Submiter ⭐ 92

从本地IDEA提交Flink/Spark任务到Yarn/k8s集群

Focusbigdata ⭐ 89

【大数据成神之路学习路径+面经+简历】

Recsys 2017 Online Learning Tutorial ⭐ 81

Bigdata Learning Notes ⭐ 79

本项目已废弃，笔记收藏整理参考：

User Guide Smack ⭐ 66

[Cloudframeworks]SMACK Big Data Architecture - user guide ／ [云框架]SMACK大数据架构-用户指南

Ammonium ⭐ 64

Impatient fork of Ammonite

A quotation-based Scala DSL for scalable data analysis.

A temporary home for LinkedIn's changes to Apache Iceberg (incubating)

Bigdataparty ⭐ 54

大数据组件 All-in-One 的 Dockerfile

Cloud Bigdata Book ⭐ 53

Awesome Pulsar ⭐ 53

A curated list of Pulsar tools, integrations and resources.

Model Serving Tutorial ⭐ 53

Code and presentation for Strata Model Serving tutorial

Fb_scraper ⭐ 52

FBLYZE is a Facebook scraping system and analysis system.

Learnbasicbigdatatech ⭐ 44

🚀Some projects on Big Data Analysis like Spark, Hive, Presto and Data Visualization like Superset

Streambench ⭐ 41

Measuring the performance of popular streaming engines with Yahoo's Streaming Benchmark

Garmadon ⭐ 39

Java event logs collector for hadoop and frameworks

Data Ingestion Platform ⭐ 39

Bigdata Getting Started ⭐ 37

大数据相关框架实战项目(Hadoop, Spark, Storm, Flink)

Archived Sansa Examples ⭐ 37

Usage examples for the SANSA Stack

Sharpetl ⭐ 36

Write ETL using your favorite SQL dialects

Bullet Core ⭐ 34

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Streaming Analytics platform, built with Apache Flink and Kafka

Archived Sansa Query ⭐ 31

SANSA Query Layer

Dockerfiles ⭐ 31

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Framework Of Bigdata ⭐ 30

大数据面试题，从0到1走向架构师之路。Flink、Spark、Hive、HBase、Hadoop、K

Real Time Data Warehouse ⭐ 29

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Rts_practice ⭐ 28

Documentation placeholder and utilities for all the other containers.

Archived Sansa Inference ⭐ 27

A general Inference API based on two of the most popular Big Data processing engines: Apache Spark and Apache Flink

Rzf.github.io ⭐ 27

✏️[计算机基础+java基础+大数据基础及进阶+面试指南] 一份涵盖计算机基础，java，大数据，面试宝典，大部分核心知识的项目，学习，面试，共同进步！

Kafka Spark Flink Example ⭐ 26

Kafka streaming with Spark and Flink example

Peel is a framework that helps you to define, execute, analyze, and share experiments for distributed systems and algorithms.

Cassandra.realtime ⭐ 25

Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink

Bigdata Doc ⭐ 25

大数据学习笔记，学习路线，技术案例整理。

Archived Sansa Owl ⭐ 25

SANSA Stack OWL (Web Ontology Language) API

Open Stream Processing Benchmark ⭐ 24

This repository contains the code base for the Open Stream Processing Benchmark.

Seatunnel Example ⭐ 23

seatunnel plugin developing examples.

Snailtrail ⭐ 23

SnailTrail implementation

Fastdata Cluster ⭐ 22

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Idocuments ⭐ 20

收集与 Java 开发相关的文档，包括基础系统服务（大数据、流计算、NoSQL 等）、专业名词、jar 包、开发工具等文档，持续更新……

A compiler for Pig Latin to Spark and Flink.

Terasort ⭐ 20

TeraSort for Spark and Flink which uses a range partitioner based on sampling

Snowplow Scala Analytics Sdk ⭐ 20

Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.

Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.

Dataflow Runner ⭐ 19

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR

Bigdata Book ⭐ 18

上百本大数据电子书，附带下载链接，包括计算机基础，Java，hadoop，spark，flink，k

Bigdataanalysisweb ⭐ 18

Big data performance test analysis platform (including WebUI display)。 BigDataAnalysisWeb Analyze storm, spark, and flink throughput changes in the form of charts.

Realtime Dashboard Example ⭐ 18

This is a real-time dashboard example using Spark Streaming and Node.js

Awesome Big Data ⭐ 17

大数据&&分布式系统学习过程中一些经验总结

Qs Hadoop ⭐ 17

大数据生态圈学习

Resume Bjkonglu ⭐ 17

记录Spark、Flink研究经验

Pulsar Hub ⭐ 17

The canonical source of StreamNative Hub.

Bigdata_learning ⭐ 16

大数据组件学习代码

Bigdataguider ⭐ 15

大数据学习笔记: 涉及hadoop,spark,hive,zookeeper等内容

Sparkdeepdoc ⭐ 15

Some deep resources from apache spark, cloudera, my practice and so on. Most important is what i think.

Related Searches

Scala Spark (3,279)

Python Spark (2,053)

Java Spark (1,587)

Apache Spark (1,207)

Spark Hadoop (1,188)

Jupyter Notebook Spark (1,151)

Spark Kafka (985)

Spark Streaming (817)

Spark Pyspark (812)

Shell Spark (705)

1-100 of 126 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.