Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for java hadoop
hadoop
x
java
x
800 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Deeplearning4j
⭐
13,397
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Trino
⭐
9,118
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Hive
⭐
5,222
Apache Hive
Ignite
⭐
4,626
Apache Ignite
Calcite
⭐
4,216
Apache Calcite
Dataspherestudio
⭐
2,860
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Nutch
⭐
2,742
Apache Nutch is an extensible and scalable web crawler
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Ambari
⭐
2,030
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Elasticsearch Hadoop
⭐
1,914
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Drill
⭐
1,856
Apache Drill is a distributed MPP query layer for self describing data
Xlearning
⭐
1,729
AI on Hadoop
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
Flink Streaming Platform Web
⭐
1,698
基于flink的实时流计算web平台
Atlas
⭐
1,685
Apache Atlas
Easyreport
⭐
1,635
A simple and easy to use Web Report System for java.EasyReport是一个简单易用的Web报表工具(支持Hadoop,HBase及各种关系
Mongo Hadoop
⭐
1,511
MongoDB Connector for Hadoop
Movie_recommend
⭐
1,441
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Carbondata
⭐
1,401
High performance data store solution
Cascalog
⭐
1,378
Data processing on Hadoop without the hassle.
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Taier
⭐
1,220
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Javapdf
⭐
1,177
🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)
Elephant Bird
⭐
1,100
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Addax
⭐
1,034
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Mr4c
⭐
890
Datasketches Java
⭐
856
A software library of stochastic streaming algorithms, a.k.a. sketches.
Sqoop
⭐
820
Mirror of Apache Sqoop
Hadoop_study
⭐
817
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Useractionanalyzeplatform
⭐
810
电商用户行为分析大数据平台
Cdap
⭐
735
An open source framework for building data analytic applications.
Hive Json Serde
⭐
706
Read - Write JSON SerDe for Apache Hive.
Oozie
⭐
687
Mirror of Apache Oozie
Geometry Api Java
⭐
679
The Esri Geometry API for Java enables developers to write custom applications for analysis of spatial data. This API is used in the Esri GIS Tools for Hadoop and other 3rd-party data processing solutions.
Pig
⭐
659
Mirror of Apache Pig
Orc
⭐
645
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Giraph
⭐
582
Mirror of Apache Giraph
Bigtop
⭐
549
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.
Hadoop2x Eclipse Plugin
⭐
549
eclipse plugin for hadoop 2.2.0 , 2.4.1
Elephantdb
⭐
540
Distributed database specialized in exporting key/value data from Hadoop
Aircompressor
⭐
510
A port of Snappy, LZO, LZ4, and Zstandard to Java
Kafka Connect Hdfs
⭐
473
Kafka Connect HDFS connector
Tez
⭐
446
Apache Tez
Marmaray
⭐
444
Generic Data Ingestion & Dispersal Library for Hadoop
Tuiblogs
⭐
443
优秀的计算机编程类博客和文章 share excellent blogs and sites
Indexr
⭐
422
An open-source columnar data format designed for fast & realtime analytic with big data.
Storm Yarn
⭐
419
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Iceberg
⭐
409
Iceberg is a table format for large, slow-moving tabular data
Venice
⭐
402
Venice, Derived Data Platform for Planet-Scale Workloads.
Sylph
⭐
396
Stream computing platform for bigdata
Oozie
⭐
378
Oozie - workflow engine for Hadoop
Kite
⭐
366
Kite SDK
Bigdata
⭐
358
💎🔥大数据学习笔记
Cloudbreak
⭐
348
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Apex Core
⭐
346
Mirror of Apache Apex core
Cloudeon
⭐
345
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Shopzz
⭐
344
一个使用SpringCloud Alibaba开发的电商项目,移动端使用Flutter2.x构建,小程序使用uni-app构建,管理 3.0 + Element Plus 进行构建,并在支付上接入数字货币(比特币、以太坊UDST)支付,后端采用Hadoop与Flink等大
Spatial Framework For Hadoop
⭐
343
The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
Cascading
⭐
337
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Big Whale
⭐
290
Spark、Flink等离线任务的调度以及实时任务的监控
Hops
⭐
285
Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.
Compass
⭐
284
Compass is a task diagnosis platform for bigdata
Behemoth
⭐
284
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Gridgain Old
⭐
278
Slimfast
⭐
272
Slimming down jars since 2016
Datacube
⭐
264
Multidimensional data storage with rollups for numerical data
Faunus
⭐
259
Graph Analytics Engine
Facebook Hive Udfs
⭐
259
Facebook's Hive UDFs
Demo_11.11_storm Spark Hadoop
⭐
257
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Sparkstreaming
⭐
253
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志
Hive Jdbc Uber Jar
⭐
252
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Hadoop Mini Clusters
⭐
251
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Es Fastloader
⭐
242
Quickly build large-scale ElasticSearch indices by using the fault tolerance and parallelism of Hadoop
Shifu
⭐
235
An end-to-end machine learning and data mining framework on Hadoop
Calcite Avatica
⭐
225
Apache Calcite Avatica
Emr Dynamodb Connector
⭐
210
Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Commoncrawl Crawler
⭐
208
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
Hadoop Pcap
⭐
202
Hadoop library to read packet capture (PCAP) files
Hadoop Book
⭐
198
Source code to accompany the book "Hadoop in Practice", published by Manning.
Programming Video Tutorials
⭐
195
视频教程:Java, 大数据,云计算,Android,Hadoop,Docker,mysql,spark,CRM,OA..
Wonderdog
⭐
193
Bulk loading for elastic search
S3mper
⭐
192
s3mper - Consistent Listing for S3
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Wifiprobeanalysis
⭐
189
基于WIFI探针的商业大数据分析技术
Bigdata Hub
⭐
187
数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,
Javaorbigdata Interview
⭐
180
Java开发者或者大数据开发者面试知识点整理
Hadoop
⭐
178
Hadoop on Mesos
Hadoopintellijplugin
⭐
174
IntelliJ IDEA Plugin for Hadoop
Recommendsys
⭐
173
推荐项目(实时推荐和离线推荐)
Terrapin
⭐
168
Serving system for batch generated data sets
Incubator Wayang
⭐
162
Apache Wayang(incubating) is the first cross-platform data processing system.
Related Searches
Java Spring (21,350)
Java Jar (7,924)
Java Testing (7,163)
Java Database (6,015)
Java Mysql (5,954)
Javascript Java (5,468)
Java Algorithms (4,705)
Java Apache (4,283)
Java Cloud Computing (4,240)
Java Json (3,692)
1-100 of 800 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.