Awesome Open Source

Programming Languages

Search results for java hadoop

800 search results found

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

Bigdata Notes ⭐ 14,872

大数据入门指南 ⭐

Deeplearning4j ⭐ 13,397

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

Trino ⭐ 9,118

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

It_book ⭐ 8,543

本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍，没准你想找的书就在这里呢，包含了互联网行

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

H2o 3 ⭐ 6,618

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Ignite ⭐ 4,626

Calcite ⭐ 4,216

Dataspherestudio ⭐ 2,860

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Nutch ⭐ 2,742

Apache Nutch is an extensible and scalable web crawler

Bigdataguide ⭐ 2,355

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Ambari ⭐ 2,030

Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.

Elasticsearch Hadoop ⭐ 1,914

🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop

Drill ⭐ 1,856

Apache Drill is a distributed MPP query layer for self describing data

Xlearning ⭐ 1,729

Gaffer ⭐ 1,724

A large-scale entity and relation database supporting aggregation of properties

Flink Streaming Platform Web ⭐ 1,698

基于flink的实时流计算web平台

Atlas ⭐ 1,685

Easyreport ⭐ 1,635

A simple and easy to use Web Report System for java.EasyReport是一个简单易用的Web报表工具(支持Hadoop,HBase及各种关系

Mongo Hadoop ⭐ 1,511

MongoDB Connector for Hadoop

Movie_recommend ⭐ 1,441

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Carbondata ⭐ 1,401

High performance data store solution

Cascalog ⭐ 1,378

Data processing on Hadoop without the hassle.

Dr Elephant ⭐ 1,301

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark

Taier ⭐ 1,220

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display

Javapdf ⭐ 1,177

🍣100本 Java电子书技术书籍PDF(以下载阅读为荣，以点赞收藏为耻)

Elephant Bird ⭐ 1,100

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Addax ⭐ 1,034

Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

Data Algorithms Book ⭐ 973

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Coding Now ⭐ 925

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、

Livy is an open source REST interface for interacting with Apache Spark from anywhere

Datasketches Java ⭐ 856

A software library of stochastic streaming algorithms, a.k.a. sketches.

Mirror of Apache Sqoop

Hadoop_study ⭐ 817

定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Useractionanalyzeplatform ⭐ 810

电商用户行为分析大数据平台

An open source framework for building data analytic applications.

Hive Json Serde ⭐ 706

Read - Write JSON SerDe for Apache Hive.

Mirror of Apache Oozie

Geometry Api Java ⭐ 679

The Esri Geometry API for Java enables developers to write custom applications for analysis of spatial data. This API is used in the Esri GIS Tools for Hadoop and other 3rd-party data processing solutions.

Mirror of Apache Pig

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Mirror of Apache Giraph

Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.

Hadoop2x Eclipse Plugin ⭐ 549

eclipse plugin for hadoop 2.2.0 , 2.4.1

Elephantdb ⭐ 540

Distributed database specialized in exporting key/value data from Hadoop

Aircompressor ⭐ 510

A port of Snappy, LZO, LZ4, and Zstandard to Java

Kafka Connect Hdfs ⭐ 473

Kafka Connect HDFS connector

Marmaray ⭐ 444

Generic Data Ingestion & Dispersal Library for Hadoop

Tuiblogs ⭐ 443

优秀的计算机编程类博客和文章 share excellent blogs and sites

An open-source columnar data format designed for fast & realtime analytic with big data.

Storm Yarn ⭐ 419

Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.

Iceberg ⭐ 409

Iceberg is a table format for large, slow-moving tabular data

Venice, Derived Data Platform for Planet-Scale Workloads.

Stream computing platform for bigdata

Oozie - workflow engine for Hadoop

Bigdata ⭐ 358

💎🔥大数据学习笔记

Cloudbreak ⭐ 348

CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.

Apex Core ⭐ 346

Mirror of Apache Apex core

Cloudeon ⭐ 345

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

一个使用SpringCloud Alibaba开发的电商项目，移动端使用Flutter2.x构建，小程序使用uni-app构建，管理 3.0 + Element Plus 进行构建，并在支付上接入数字货币（比特币、以太坊UDST）支付，后端采用Hadoop与Flink等大

Spatial Framework For Hadoop ⭐ 343

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.

Cascading ⭐ 337

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

Big Whale ⭐ 290

Spark、Flink等离线任务的调度以及实时任务的监控

Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.

Compass ⭐ 284

Compass is a task diagnosis platform for bigdata

Behemoth ⭐ 284

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Gridgain Old ⭐ 278

Slimfast ⭐ 272

Slimming down jars since 2016

Datacube ⭐ 264

Multidimensional data storage with rollups for numerical data

Graph Analytics Engine

Facebook Hive Udfs ⭐ 259

Facebook's Hive UDFs

Demo_11.11_storm Spark Hadoop ⭐ 257

hadoop_storm_spark结合实验的例子，模拟淘宝双11节，根据订单详细信息，汇总出总销售 --------大概流程------- 第一阶段（storm实时报表）第二阶段（离线报表）第三阶段（大规模订单即席查询,和多维度查询）第四阶段（数据挖掘和图计算）

Sparkstreaming ⭐ 253

Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志

Hive Jdbc Uber Jar ⭐ 252

Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version

Hadoop Mini Clusters ⭐ 251

hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE

Es Fastloader ⭐ 242

Quickly build large-scale ElasticSearch indices by using the fault tolerance and parallelism of Hadoop

An end-to-end machine learning and data mining framework on Hadoop

Calcite Avatica ⭐ 225

Apache Calcite Avatica

Emr Dynamodb Connector ⭐ 210

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB

Commoncrawl Crawler ⭐ 208

The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)

Hadoop Pcap ⭐ 202

Hadoop library to read packet capture (PCAP) files

Hadoop Book ⭐ 198

Source code to accompany the book "Hadoop in Practice", published by Manning.

Programming Video Tutorials ⭐ 195

视频教程：Java, 大数据,云计算,Android,Hadoop,Docker,mysql,spark,CRM,OA..

Wonderdog ⭐ 193

Bulk loading for elastic search

s3mper - Consistent Listing for S3

Sparkrdma ⭐ 191

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Wifiprobeanalysis ⭐ 189

基于WIFI探针的商业大数据分析技术

Bigdata Hub ⭐ 187

数据建设与大数据技术知识体系，包含hadoop、hive、spark、flink主流框架和系列框架，

Javaorbigdata Interview ⭐ 180

Java开发者或者大数据开发者面试知识点整理

Hadoop on Mesos

Hadoopintellijplugin ⭐ 174

IntelliJ IDEA Plugin for Hadoop

Recommendsys ⭐ 173

推荐项目（实时推荐和离线推荐）

Terrapin ⭐ 168

Serving system for batch generated data sets

Incubator Wayang ⭐ 162

Apache Wayang(incubating) is the first cross-platform data processing system.

Related Searches

Java Spring (21,350)

Java Jar (7,924)

Java Testing (7,163)

Java Database (6,015)

Java Mysql (5,954)

Javascript Java (5,468)

Java Algorithms (4,705)

Java Apache (4,283)

Java Cloud Computing (4,240)

Java Json (3,692)

1-100 of 800 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.