Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for java big data
big-data
x
java
x
448 search results found
Spark
⭐
36,842
Apache Spark - A unified analytics engine for large-scale data processing
Flink
⭐
22,018
Apache Flink
Shardingsphere
⭐
18,820
Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.
Presto
⭐
15,102
The official home of the Presto distributed SQL query engine for big data
Bigdata Notes
⭐
14,410
大数据入门指南 ⭐️
Questdb
⭐
12,566
An open source time-series database for fast ingest and SQL queries
Trino
⭐
8,571
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Beam
⭐
7,158
Apache Beam is a unified programming model for Batch and Streaming data processing.
Kafka Ui
⭐
6,904
Open-Source Web UI for Apache Kafka Management
H2o 3
⭐
6,493
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Zeppelin
⭐
6,161
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Hazelcast
⭐
5,580
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
Starrocks
⭐
5,465
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
Hive
⭐
5,095
Apache Hive
Vespa
⭐
4,819
The open big data serving engine. https://vespa.ai
Ignite
⭐
4,548
Apache Ignite
Hudi
⭐
4,523
Upserts, Deletes And Incremental Processing on Big Data.
Calcite
⭐
4,039
Apache Calcite
Iotdb
⭐
4,034
Apache IoTDB
Crate
⭐
3,771
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
Chunjun
⭐
3,749
A data integration framework
Fastjson2
⭐
3,034
🚄 FASTJSON2 is a Java JSON library with excellent performance.
Avro
⭐
2,581
Apache Avro is a data serialization system.
Flume
⭐
2,448
Mirror of Apache Flume
Incubator Hugegraph
⭐
2,423
A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
Bigdataguide
⭐
2,257
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Parquet Mr
⭐
2,166
Apache Parquet
Ambari
⭐
1,991
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Poli
⭐
1,920
An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Flinkstreamsql
⭐
1,873
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Drill
⭐
1,837
Apache Drill is a distributed MPP query layer for self describing data
Bookkeeper
⭐
1,788
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Gaffer
⭐
1,713
A large-scale entity and relation database supporting aggregation of properties
Genie
⭐
1,635
Distributed Big Data Orchestration Service
Lakesoul
⭐
1,513
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Parquet Format
⭐
1,467
Apache Parquet
Bitsail
⭐
1,447
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Mysql_perf_analyzer
⭐
1,420
MySQL performance monitoring and analysis.
Carbondata
⭐
1,386
High performance data store solution
Dremio Oss
⭐
1,218
Dremio - the missing link in modern data
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Hazelcast Jet
⭐
1,065
Distributed Stream and Batch Processing
Datumbox Framework
⭐
1,057
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Egads
⭐
1,052
A Java package to automatically detect anomalies in large scale time-series data
Phoenix
⭐
996
Mirror of Apache Phoenix
Accumulo
⭐
995
Apache Accumulo
Adam
⭐
955
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Odd Platform
⭐
953
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Dataflowjavasdk
⭐
853
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Sqoop
⭐
820
Mirror of Apache Sqoop
Rakam Api
⭐
796
📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Samza
⭐
783
Mirror of Apache Samza
Flink Boot
⭐
725
懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成 ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Ozone
⭐
689
Scalable, redundant, and distributed object store for Apache Hadoop
Oozie
⭐
685
Mirror of Apache Oozie
Orc
⭐
626
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Courses
⭐
590
Answers for Quizzes & Assignments that I have taken
Flink Kubernetes Operator
⭐
586
Apache Flink Kubernetes Operator
Giraph
⭐
582
Mirror of Apache Giraph
Amoro
⭐
553
Amoro is a Lakehouse management system built on open data lake formats.
Incubator Celeborn
⭐
552
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Bigtop
⭐
532
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.
Mockneat
⭐
511
MockNeat - the modern faker lib.
Datawave
⭐
493
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Halodb
⭐
472
A fast, log structured key-value store.
Kafka Connect Hdfs
⭐
461
Kafka Connect HDFS connector
Cogcomp Nlp
⭐
448
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
Helix
⭐
432
Mirror of Apache Helix
Tez
⭐
430
Apache Tez
Stroom
⭐
412
Stroom is a highly scalable data storage, processing and analysis platform.
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Sylph
⭐
396
Stream computing platform for bigdata
Knowage Server
⭐
376
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Smooks
⭐
374
Extensible data integration Java framework for building XML and non-XML fragment-based applications
Ecommercerecommendsystem
⭐
350
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Tugraph Analytics
⭐
349
TuGraph-analytics is a distribute streaming computing engine based on graph model.
Apex Core
⭐
346
Mirror of Apache Apex core
Cloudbreak
⭐
343
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Zdh_web
⭐
335
大数据采集,抽取平台
Centurion
⭐
318
Kotlin Bigdata Toolkit
Parquet Cpp
⭐
312
Apache Parquet
Cloudeon
⭐
308
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Every Single Day I Tldr
⭐
307
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Janusgraph Externals
⭐
305
A collection of externals tools that make janusgraph more convenient and efficient to use.
Bingheguide
⭐
270
🔥🔥🔥 📚 本代码库是作者冰河多年从事互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教
Flink Ml
⭐
265
Machine learning library of Apache Flink
Bigdata File Viewer
⭐
256
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Helicalinsight
⭐
256
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Compass
⭐
243
Compass is a task diagnosis platform for bigdata
Succinct
⭐
239
Enabling queries on compressed data.
Shifu
⭐
235
An end-to-end machine learning and data mining framework on Hadoop
Calcite Avatica
⭐
211
Apache Calcite Avatica
Tibigdata
⭐
201
TiDB connectors for Flink/Hive/Presto
Datacompare
⭐
195
big data comparison and data profiling platform: low code,data comparison and data profiling
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Fluo
⭐
181
Apache Fluo
Javaorbigdata Interview
⭐
180
Java开发者或者大数据开发者面试知识点整理
Tipdm
⭐
178
TipDM建模平台,开源的数据挖掘工具。
Related Searches
Java Spring Boot (11,982)
Javascript Java (6,275)
Java Database (5,963)
Java Server (5,922)
Java Mysql (5,770)
Java Algorithms (4,705)
Java Apache (4,280)
Java Cloud (4,240)
Java Json (3,686)
Java Command Line (3,348)
1-100 of 448 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.