Awesome Open Source

Programming Languages

Search results for apache big data

124 search results found

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

Flink ⭐ 22,747

Cookbook ⭐ 12,557

The Data Engineering Cookbook

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

Apache Beam is a unified programming model for Batch and Streaming data processing.

Ignite ⭐ 4,626

Calcite ⭐ 4,216

Koalas ⭐ 3,291

Koalas: pandas API on Apache Spark

Flume ⭐ 2,475

Mirror of Apache Flume

Parquet Mr ⭐ 2,296

Ambari ⭐ 2,030

Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.

Spark ⭐ 1,963

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Drill ⭐ 1,856

Apache Drill is a distributed MPP query layer for self describing data

Bookkeeper ⭐ 1,828

Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads

Carbondata ⭐ 1,401

High performance data store solution

Spark Doc Zh ⭐ 1,186

Apache Spark 官方文档中文版

Phoenix ⭐ 1,006

Mirror of Apache Phoenix

Accumulo ⭐ 1,003

Apache Accumulo

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

Coding Now ⭐ 925

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、

Tispark ⭐ 872

TiSpark is built for running Apache Spark on top of TiDB/TiKV

Dataflowjavasdk ⭐ 853

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Incubator Livy ⭐ 840

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Mirror of Apache Sqoop

Mirror of Apache Samza

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Spark Rapids ⭐ 619

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

Mirror of Apache Giraph

Parquetviewer ⭐ 574

Simple windows desktop application for viewing & querying Apache Parquet files

Nussknacker ⭐ 564

Low-code tool for automating actions on real time data | Stream processing for the users.

Data Lineage Tracking And Visualization Solution

Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.

Bigdata Ecosystem ⭐ 536

BigData Ecosystem Dataset

Datawave ⭐ 512

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.

Hudi Resources ⭐ 509

汇总Apache Hudi相关资料

Mirror of Apache Helix

Sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Couchdb Fauxton ⭐ 361

Fauxton is the new Web UI for CouchDB

Apex Core ⭐ 346

Mirror of Apache Apex core

Hyperspace ⭐ 334

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Morpheus ⭐ 330

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Parquet Dotnet ⭐ 319

🏐 Apache Parquet for modern .NET

Parquet Cpp ⭐ 312

Every Single Day I Tldr ⭐ 311

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Trafodion ⭐ 243

Apache Trafodion

Couchdb Docker ⭐ 242

Semi-official Apache CouchDB Docker images

Succinct ⭐ 239

Enabling queries on compressed data.

Node Hbase ⭐ 232

Asynchronous HBase client for NodeJs using REST

Azure Event Hubs Spark ⭐ 225

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Calcite Avatica ⭐ 225

Apache Calcite Avatica

Flink Notes ⭐ 223

flink学习笔记

Sparkrdma ⭐ 191

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Spark.jl ⭐ 180

Julia binding for Apache Spark

TipDM建模平台，开源的数据挖掘工具。

Mirror of Apache Knox

Incubator Wayang ⭐ 162

Apache Wayang(incubating) is the first cross-platform data processing system.

Bigdata Playground ⭐ 154

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Metamodel ⭐ 144

Mirror of Apache Metamodel

Storm Doc Zh ⭐ 143

Apache Storm 官方文档中文版

Parquetsharp ⭐ 142

ParquetSharp is a .NET library for reading and writing Apache Parquet files.

Flink Web ⭐ 133

Apache Flink Website

Incubator Liminal ⭐ 131

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Apex Malhar ⭐ 131

Mirror of Apache Apex malhar

Flink Shaded ⭐ 130

Apache Flink shaded artifacts repository

Mirror of Apache Tajo

Mirror of Apache Hama

Mnemonic ⭐ 115

Apache Mnemonic - A non-volatile hybrid memory storage oriented library

The Apache Gora open source framework provides an in-memory data model and persistence for big data.

Calcite Avatica Go ⭐ 110

Mirror of Apache Calcite - Avatica Go SQL Driver

Frank Kanes Taming Big Data With Apache Spark And Python ⭐ 106

Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt

Mirror of Apache Crunch (Incubating)

Spark With Python ⭐ 98

Fundamentals of Spark with Python (using PySpark), code examples

Mirror of Apache Falcon

Airavata ⭐ 92

A general purpose Distributed Systems Framework

Mirror of Apache REEF

Predictionio Template Recommender ⭐ 78

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

The Apache Ignite Book ⭐ 72

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Cleanframes ⭐ 70

type-class based data cleansing library for Apache Spark SQL

Apache Spark Hands On ⭐ 64

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Incubator Tez ⭐ 60

Mirror of Apache Tez (Incubating)

Mirror of Apache Lens

Mirror of Apache OODT

Data_processing_course ⭐ 53

Some class materials for a data processing course using PySpark

Doris Website ⭐ 51

Apache Doris Website

Phoenix Connectors ⭐ 48

Apache Phoenix Connectors

Scalable R for Machine Learning

Phoenix Queryserver ⭐ 41

Apache Phoenix Query Server

Predictionio Template Attribute Based Classifier ⭐ 38

PredictionIO Classification Engine Template (Scala-based parallelized engine)

Flink Book ⭐ 38

大数据，流计算，实时计算，Flink框架学习资料。畅销书籍《深入理解Flink核心设计与实践原理》随书代码,书中讲解的Flink特性均有完整可运行的代码供读者运行和测试。整个工程共有【182个Jav

Accumulo Examples ⭐ 34

Apache Accumulo Examples

Ambari Metrics ⭐ 34

Apache Ambari Metrics is a sub project of Apache Ambari.

Predictionio Template Text Classifier ⭐ 33

Text Classification Engine

Deploy a secured, clustered, auto-scaling NiFi service in AWS.

Apache Kibble - a tool to collect, aggregate and visualize data about any software project

Beam Site ⭐ 27

Apache Beam Site

Apache Hive Essentials Second Edition ⭐ 27

Apache Hive Essentials, Second Edition published by Packt

Airavata Django Portal ⭐ 27

Apache Airavata Django Portal Framework

Related Searches

Java Apache (4,331)

Php Apache (2,627)

Shell Apache (1,492)

Javascript Apache (1,450)

Python Apache (1,438)

Docker Apache (1,277)

Apache Spark (1,207)

Mysql Apache (961)

Apache Kafka (836)

Scala Apache (705)

1-100 of 124 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.