Awesome Open Source

Programming Languages

Search results for apache hadoop

206 search results found

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

Cookbook ⭐ 12,557

The Data Engineering Cookbook

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

Bigdl ⭐ 4,728

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm

Ignite ⭐ 4,626

Calcite ⭐ 4,216

Tensorflowonspark ⭐ 3,851

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Nutch ⭐ 2,742

Apache Nutch is an extensible and scalable web crawler

Ambari ⭐ 2,030

Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.

Elasticsearch Hadoop ⭐ 1,914

🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop

Drill ⭐ 1,856

Apache Drill is a distributed MPP query layer for self describing data

Atlas ⭐ 1,685

Carbondata ⭐ 1,401

High performance data store solution

Dr Elephant ⭐ 1,301

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark

Hadoop Docker ⭐ 1,169

Hadoop docker image

Impala ⭐ 1,044

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Awesome Hadoop ⭐ 987

A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources

Coding Now ⭐ 925

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、

Mirror of Apache Sqoop

Mirror of Apache Pig

Sparkr Pkg ⭐ 649

R frontend for Spark

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Mirror of Apache Giraph

Data Lineage Tracking And Visualization Solution

Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.

Bigdata Ecosystem ⭐ 536

BigData Ecosystem Dataset

Hadoopinternals ⭐ 424

Diagrams describing Apache Hadoop internals (2.3.0 or later).

Mirror of Apache Eagle

Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.

Apex Core ⭐ 346

Mirror of Apache Apex core

Easyhadoop ⭐ 310

Apache hadoop management system

Behemoth ⭐ 284

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Sparkstreaming ⭐ 253

Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志

Hive Jdbc Uber Jar ⭐ 252

Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version

Kerberos_and_hadoop ⭐ 248

Kerberos and Hadoop: The Madness beyond the Gate

Trafodion ⭐ 243

Apache Trafodion

Node Hbase ⭐ 232

Asynchronous HBase client for NodeJs using REST

Calcite Avatica ⭐ 225

Apache Calcite Avatica

Emr Dynamodb Connector ⭐ 210

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB

Sparkrdma ⭐ 191

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Incubator Wayang ⭐ 162

Apache Wayang(incubating) is the first cross-platform data processing system.

Docker Flink ⭐ 157

Apache Flink docker image

Bigdata Playground ⭐ 154

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Logparser ⭐ 153

Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Flink, Beam, Storm, Drill, ...

Hdfs_fdw ⭐ 131

PostgreSQL foreign data wrapper for HDFS

Parquet Rs ⭐ 129

Apache Parquet implementation in Rust

Mirror of Apache Tajo

A tool and library for easily deploying applications on Apache YARN

Docker Spark ⭐ 118

Docker image for general apache spark client

[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine

The Apache Gora open source framework provides an in-memory data model and persistence for big data.

Calcite Avatica Go ⭐ 110

Mirror of Apache Calcite - Avatica Go SQL Driver

Mirror of Apache DataFu

Linkedin Gradle Plugin For Apache Hadoop ⭐ 106

Crux is a reporting application for HBase. Crux provides a simple web based graphical interface to access HBase, query data and create reports. Crux is open sourced under Apache Software Foundation License v2.0.

Spark With Python ⭐ 98

Fundamentals of Spark with Python (using PySpark), code examples

Mirror of Apache REEF

Halyard is an extremely horizontally scalable Triplestore with support for Named Graphs, designed for integration of extremely large Semantic Data Models, and for storage and SPARQL 1.1 querying of the whole Linked Data universe snapshots.

Docker Cloudera Quickstart ⭐ 87

Docker Cloudera Quick Start Image

Pig on Apache Spark

Phphiveadmin ⭐ 81

An Apache Hive management system

Mirror of Apache Chukwa

Docker Spark ⭐ 77

🚢 Docker image for Apache Spark

SQL backend to dplyr for Impala

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

The Apache Ignite Book ⭐ 72

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Apache Spark Hands On ⭐ 64

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Hcatalog ⭐ 61

Mirror of Apache HCatalog

Incubator Tez ⭐ 60

Mirror of Apache Tez (Incubating)

Stormtweetssentimentanalysis ⭐ 60

Computes sentiment analysis of tweets of US States in real-time using Storm.

R and Hadoop Integrated Programming Environment

Presto Yarn ⭐ 52

Cascading Flink ⭐ 52

Cascading on Apache Flink®

Doris Website ⭐ 51

Apache Doris Website

Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadoop-related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc...) developed by a community with a focus on the system as a whole, rather than individual projects.

Stand-alone ANSI SQL for Cascading on Apache Hadoop

Python Driver for Apache Drill.

Hadoop FSImage Analyzer (HFSA)

Code Of Spark Big Data Business Trilogy ⭐ 42

This is code of book "Spark Big Data Business Trilogy"

Yarn Prometheus Exporter ⭐ 39

Export Hadoop YARN (resource-manager) metrics in prometheus format

Vertica Hadoop Connector ⭐ 38

Vertica Hadoop Connector

Cdh Package ⭐ 38

Hive Driver ⭐ 38

Driver for connection to Apache Hive via Thrift API

Hive Jdbc Driver ⭐ 38

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Xxhadoop ⭐ 37

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Testing framework for Collaborative Filtering

Avro Maven Plugin ⭐ 34

Maven 2 Plugin for processing Apache Avro files. Avro is a subproject of Apache Hadoop.

Ambari Metrics ⭐ 34

Apache Ambari Metrics is a sub project of Apache Ambari.

Flume Logs ⭐ 34

Apache Flume to process log files on Hadoop cluster

Nutch Newsclassify ⭐ 33

基于nutch的新闻分类系统

Ansible Ambari ⭐ 33

Quickly deploy Hadoop with the help of Ansible and Apache Ambari

Recommender ⭐ 33

NReco Recommender is a .NET port of Apache Mahout CF java engine (standalone, non-Hadoop version)

Docker Hadoop Ubuntu ⭐ 32

A Hadoop image on Ubuntu

Jmxtrans Lib ⭐ 32

JMXTrans configuration for hadoop/cassandra/zookeeper

Freebase2rdf ⭐ 30

Cascading plus City of Palo Alto open data

Related Searches

Java Apache (4,331)

Php Apache (2,627)

Java Hadoop (2,117)

Shell Apache (1,492)

Javascript Apache (1,450)

Python Apache (1,438)

Docker Apache (1,277)

Spark Hadoop (1,188)

Hadoop Hdfs (1,082)

Mysql Apache (961)

1-100 of 206 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.