Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for hadoop
hadoop
x
2,009 search results found
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Bigtop
⭐
549
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.
Hadoop2x Eclipse Plugin
⭐
549
eclipse plugin for hadoop 2.2.0 , 2.4.1
Hadoop Lzo
⭐
541
Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
Elephantdb
⭐
540
Distributed database specialized in exporting key/value data from Hadoop
Bigdata Ecosystem
⭐
536
BigData Ecosystem Dataset
Spark Redshift
⭐
514
Redshift data source for Apache Spark
Aircompressor
⭐
510
A port of Snappy, LZO, LZ4, and Zstandard to Java
Wukong
⭐
503
Ruby on Hadoop: Efficient, effective Hadoop streaming & bulk data processing. Write micro scripts for terabyte-scale data
Opensoc
⭐
499
OpenSOC Apache Hadoop Code
Gis Tools For Hadoop
⭐
495
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Scoobi
⭐
485
A Scala productivity framework for Hadoop.
Kafka Connect Hdfs
⭐
473
Kafka Connect HDFS connector
Commoncrawl
⭐
466
Common Crawl support library to access 2008-2012 crawl archives (ARC files)
Tez
⭐
446
Apache Tez
Marmaray
⭐
444
Generic Data Ingestion & Dispersal Library for Hadoop
Tuiblogs
⭐
443
优秀的计算机编程类博客和文章 share excellent blogs and sites
Hadoopinternals
⭐
424
Diagrams describing Apache Hadoop internals (2.3.0 or later).
Indexr
⭐
422
An open-source columnar data format designed for fast & realtime analytic with big data.
Storm Yarn
⭐
419
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Hadoop Ansible
⭐
416
Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.
Eagle
⭐
410
Mirror of Apache Eagle
Iceberg
⭐
409
Iceberg is a table format for large, slow-moving tabular data
Venice
⭐
402
Venice, Derived Data Platform for Planet-Scale Workloads.
Sylph
⭐
396
Stream computing platform for bigdata
Oozie
⭐
378
Oozie - workflow engine for Hadoop
Kite
⭐
366
Kite SDK
Bigdata
⭐
358
💎🔥大数据学习笔记
Graphx
⭐
353
Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
Big_data_architect_skills
⭐
353
一个大数据架构师应该掌握的技能
Trendingtopics
⭐
351
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
Ytk Learn
⭐
351
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Cloudbreak
⭐
348
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Gather Deployment
⭐
347
Gathers Python deployment, infrastructure and practices.
Apex Core
⭐
346
Mirror of Apache Apex core
Cloudeon
⭐
345
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Shopzz
⭐
344
一个使用SpringCloud Alibaba开发的电商项目,移动端使用Flutter2.x构建,小程序使用uni-app构建,管理 3.0 + Element Plus 进行构建,并在支付上接入数字货币(比特币、以太坊UDST)支付,后端采用Hadoop与Flink等大
Spatial Framework For Hadoop
⭐
343
The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
Cascading
⭐
337
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Elasticluster
⭐
334
Create clusters of VMs on the cloud and configure them with Ansible.
Caelus
⭐
323
Set of Kubernetes solutions for reusing idle resources of nodes by running extra batch jobs
Cascading
⭐
321
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.
Easyhadoop
⭐
310
Apache hadoop management system
Gohadoop
⭐
301
Maas
⭐
299
Official MAAS repository mirror (may be out of date). Development happens in Launchpad (https://git.launchpad.net/maas/).
Big Whale
⭐
290
Spark、Flink等离线任务的调度以及实时任务的监控
Riemann Jvm Profiler
⭐
288
Sends stacktrace-level performance data from a JVM process to Riemann.
Android Nosql
⭐
287
Lightweight, simple structured NoSQL database for Android
Hops
⭐
285
Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Behemoth
⭐
284
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Compass
⭐
284
Compass is a task diagnosis platform for bigdata
Dryad
⭐
281
This is a research prototype of the Dryad and DryadLINQ data-parallel processing frameworks running on Hadoop YARN.
Gridgain Old
⭐
278
Sparkonhbase
⭐
277
SparkOnHBase
Hadoop Connectors
⭐
276
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Slimfast
⭐
272
Slimming down jars since 2016
Parquet4s
⭐
267
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Datacube
⭐
264
Multidimensional data storage with rollups for numerical data
Parkour
⭐
261
Hadoop MapReduce in idiomatic Clojure.
Facebook Hive Udfs
⭐
259
Facebook's Hive UDFs
Faunus
⭐
259
Graph Analytics Engine
Docker Ambari
⭐
259
Docker image with Ambari
Demo_11.11_storm Spark Hadoop
⭐
257
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Sparkstreaming
⭐
253
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志
Hive Jdbc Uber Jar
⭐
252
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Hadoop Mini Clusters
⭐
251
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Kerberos_and_hadoop
⭐
248
Kerberos and Hadoop: The Madness beyond the Gate
Hadoopy
⭐
244
Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
Trafodion
⭐
243
Apache Trafodion
Bisheserver
⭐
242
本系统是我的毕业设计项目,题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为
Es Fastloader
⭐
242
Quickly build large-scale ElasticSearch indices by using the fault tolerance and parallelism of Hadoop
Shifu
⭐
235
An end-to-end machine learning and data mining framework on Hadoop
Node Hbase
⭐
232
Asynchronous HBase client for NodeJs using REST
Parquet Go
⭐
228
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Hadoop Tutorials Examples
⭐
228
Source, data and turotials of the blog post video series of Hue, the Web UI for Hadoop.
Weathertop
⭐
226
J2EE学习以及Linux组件学习的日常总结,适合想了解和温习基础知识的童鞋。目前计划包含的内容有设
Bigdata_docker
⭐
226
Big Data Ecosystem Docker
Calcite Avatica
⭐
225
Apache Calcite Avatica
Rubydoop
⭐
225
Write Hadoop jobs in JRuby
Bigdata
⭐
219
大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理,实时处理,OLAP等,如hadoop、spark、flink、hive、
Hadoop Docker
⭐
210
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Emr Dynamodb Connector
⭐
210
Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Commoncrawl Crawler
⭐
208
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
Inviso
⭐
203
Hadoop Pcap
⭐
202
Hadoop library to read packet capture (PCAP) files
Hadoop Attack Library
⭐
200
A collection of pentest tools and resources targeting Hadoop environments
Hadoop Book
⭐
198
Source code to accompany the book "Hadoop in Practice", published by Manning.
Haproxy Configs
⭐
198
80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Kubernetes, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Crunch
⭐
196
A fast to develop, fast to run, Go based toolkit for ETL and feature extraction on Hadoop.
Programming Video Tutorials
⭐
195
视频教程:Java, 大数据,云计算,Android,Hadoop,Docker,mysql,spark,CRM,OA..
Wonderdog
⭐
193
Bulk loading for elastic search
S3mper
⭐
192
s3mper - Consistent Listing for S3
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Big Data
⭐
190
一个开源、成体系的大数据学习教程。spark学习 hadoop hive hbase flink教程 linux 从入门到精通
Wifiprobeanalysis
⭐
189
基于WIFI探针的商业大数据分析技术
Bigdata Hub
⭐
187
数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,
Snzip
⭐
184
Snzip, a compression/decompression tool based on snappy
Dpkb
⭐
182
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase
Related Searches
Java Hadoop (2,117)
Spark Hadoop (1,188)
Hadoop Hdfs (1,082)
Hadoop Mapreduce (851)
Shell Hadoop (772)
Python Hadoop (761)
Hadoop Hive (703)
101-200 of 2,009 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.