Awesome Open Source

Programming Languages

Search results for hadoop

2,009 search results found

Data Lineage Tracking And Visualization Solution

Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.

Hadoop2x Eclipse Plugin ⭐ 549

eclipse plugin for hadoop 2.2.0 , 2.4.1

Hadoop Lzo ⭐ 541

Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20

Elephantdb ⭐ 540

Distributed database specialized in exporting key/value data from Hadoop

Bigdata Ecosystem ⭐ 536

BigData Ecosystem Dataset

Spark Redshift ⭐ 514

Redshift data source for Apache Spark

Aircompressor ⭐ 510

A port of Snappy, LZO, LZ4, and Zstandard to Java

Ruby on Hadoop: Efficient, effective Hadoop streaming & bulk data processing. Write micro scripts for terabyte-scale data

Opensoc ⭐ 499

OpenSOC Apache Hadoop Code

Gis Tools For Hadoop ⭐ 495

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

A Scala productivity framework for Hadoop.

Kafka Connect Hdfs ⭐ 473

Kafka Connect HDFS connector

Commoncrawl ⭐ 466

Common Crawl support library to access 2008-2012 crawl archives (ARC files)

Marmaray ⭐ 444

Generic Data Ingestion & Dispersal Library for Hadoop

Tuiblogs ⭐ 443

优秀的计算机编程类博客和文章 share excellent blogs and sites

Hadoopinternals ⭐ 424

Diagrams describing Apache Hadoop internals (2.3.0 or later).

An open-source columnar data format designed for fast & realtime analytic with big data.

Storm Yarn ⭐ 419

Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.

Hadoop Ansible ⭐ 416

Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.

Mirror of Apache Eagle

Iceberg ⭐ 409

Iceberg is a table format for large, slow-moving tabular data

Venice, Derived Data Platform for Planet-Scale Workloads.

Stream computing platform for bigdata

Oozie - workflow engine for Hadoop

Bigdata ⭐ 358

💎🔥大数据学习笔记

Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.

Big_data_architect_skills ⭐ 353

一个大数据架构师应该掌握的技能

Trendingtopics ⭐ 351

Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2

Ytk Learn ⭐ 351

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Cloudbreak ⭐ 348

CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.

Gather Deployment ⭐ 347

Gathers Python deployment, infrastructure and practices.

Apex Core ⭐ 346

Mirror of Apache Apex core

Cloudeon ⭐ 345

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

一个使用SpringCloud Alibaba开发的电商项目，移动端使用Flutter2.x构建，小程序使用uni-app构建，管理 3.0 + Element Plus 进行构建，并在支付上接入数字货币（比特币、以太坊UDST）支付，后端采用Hadoop与Flink等大

Spatial Framework For Hadoop ⭐ 343

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.

Cascading ⭐ 337

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

Elasticluster ⭐ 334

Create clusters of VMs on the cloud and configure them with Ansible.

Set of Kubernetes solutions for reusing idle resources of nodes by running extra batch jobs

Cascading ⭐ 321

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.

Easyhadoop ⭐ 310

Apache hadoop management system

Gohadoop ⭐ 301

Official MAAS repository mirror (may be out of date). Development happens in Launchpad (https://git.launchpad.net/maas/).

Big Whale ⭐ 290

Spark、Flink等离线任务的调度以及实时任务的监控

Riemann Jvm Profiler ⭐ 288

Sends stacktrace-level performance data from a JVM process to Riemann.

Android Nosql ⭐ 287

Lightweight, simple structured NoSQL database for Android

Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.

Sagemaker Spark ⭐ 285

A Spark library for Amazon SageMaker.

Behemoth ⭐ 284

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Compass ⭐ 284

Compass is a task diagnosis platform for bigdata

This is a research prototype of the Dryad and DryadLINQ data-parallel processing frameworks running on Hadoop YARN.

Gridgain Old ⭐ 278

Sparkonhbase ⭐ 277

Hadoop Connectors ⭐ 276

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

Slimfast ⭐ 272

Slimming down jars since 2016

Parquet4s ⭐ 267

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Datacube ⭐ 264

Multidimensional data storage with rollups for numerical data

Parkour ⭐ 261

Hadoop MapReduce in idiomatic Clojure.

Facebook Hive Udfs ⭐ 259

Facebook's Hive UDFs

Graph Analytics Engine

Docker Ambari ⭐ 259

Docker image with Ambari

Demo_11.11_storm Spark Hadoop ⭐ 257

hadoop_storm_spark结合实验的例子，模拟淘宝双11节，根据订单详细信息，汇总出总销售 --------大概流程------- 第一阶段（storm实时报表）第二阶段（离线报表）第三阶段（大规模订单即席查询,和多维度查询）第四阶段（数据挖掘和图计算）

Spark Jupyter Aws ⭐ 255

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Sparkstreaming ⭐ 253

Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志

Hive Jdbc Uber Jar ⭐ 252

Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version

Hadoop Mini Clusters ⭐ 251

hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE

Kerberos_and_hadoop ⭐ 248

Kerberos and Hadoop: The Madness beyond the Gate

Hadoopy ⭐ 244

Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.

Trafodion ⭐ 243

Apache Trafodion

Bisheserver ⭐ 242

本系统是我的毕业设计项目，题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为

Es Fastloader ⭐ 242

Quickly build large-scale ElasticSearch indices by using the fault tolerance and parallelism of Hadoop

An end-to-end machine learning and data mining framework on Hadoop

Node Hbase ⭐ 232

Asynchronous HBase client for NodeJs using REST

Parquet Go ⭐ 228

Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.

Hadoop Tutorials Examples ⭐ 228

Source, data and turotials of the blog post video series of Hue, the Web UI for Hadoop.

Weathertop ⭐ 226

J2EE学习以及Linux组件学习的日常总结，适合想了解和温习基础知识的童鞋。目前计划包含的内容有设

Bigdata_docker ⭐ 226

Big Data Ecosystem Docker

Calcite Avatica ⭐ 225

Apache Calcite Avatica

Rubydoop ⭐ 225

Write Hadoop jobs in JRuby

Bigdata ⭐ 219

大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理，实时处理，OLAP等，如hadoop、spark、flink、hive、

Hadoop Docker ⭐ 210

基于Docker构建的Hadoop开发测试环境，包含Hadoop，Hive，HBase，Spark

Emr Dynamodb Connector ⭐ 210

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB

Commoncrawl Crawler ⭐ 208

The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)

Hadoop Pcap ⭐ 202

Hadoop library to read packet capture (PCAP) files

Hadoop Attack Library ⭐ 200

A collection of pentest tools and resources targeting Hadoop environments

Hadoop Book ⭐ 198

Source code to accompany the book "Hadoop in Practice", published by Manning.

Haproxy Configs ⭐ 198

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Kubernetes, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

A fast to develop, fast to run, Go based toolkit for ETL and feature extraction on Hadoop.

Programming Video Tutorials ⭐ 195

视频教程：Java, 大数据,云计算,Android,Hadoop,Docker,mysql,spark,CRM,OA..

Wonderdog ⭐ 193

Bulk loading for elastic search

s3mper - Consistent Listing for S3

Sparkrdma ⭐ 191

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Big Data ⭐ 190

一个开源、成体系的大数据学习教程。spark学习 hadoop hive hbase flink教程 linux 从入门到精通

Wifiprobeanalysis ⭐ 189

基于WIFI探针的商业大数据分析技术

Bigdata Hub ⭐ 187

数据建设与大数据技术知识体系，包含hadoop、hive、spark、flink主流框架和系列框架，

Snzip, a compression/decompression tool based on snappy

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase

Related Searches

Java Hadoop (2,117)

Spark Hadoop (1,188)

Hadoop Hdfs (1,082)

Hadoop Mapreduce (851)

Shell Hadoop (772)

Python Hadoop (761)

Hadoop Hive (703)

101-200 of 2,009 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.