Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for java big data
big-data
x
java
x
214 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Flink
⭐
22,747
Apache Flink
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Questdb
⭐
13,178
An open source time-series database for fast ingest and SQL queries
Trino
⭐
9,118
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Kafka Ui
⭐
7,779
Open-Source Web UI for Apache Kafka Management
Beam
⭐
7,355
Apache Beam is a unified programming model for Batch and Streaming data processing.
Starrocks
⭐
7,191
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Zeppelin
⭐
6,259
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Hazelcast
⭐
5,738
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
Hive
⭐
5,222
Apache Hive
Vespa
⭐
5,115
AI + Data, online. https://vespa.ai
Ignite
⭐
4,626
Apache Ignite
Calcite
⭐
4,216
Apache Calcite
Iotdb
⭐
4,157
Apache IoTDB
Chunjun
⭐
3,893
A data integration framework
Crate
⭐
3,864
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
Fastjson2
⭐
3,251
🚄 FASTJSON2 is a Java JSON library with excellent performance.
Avro
⭐
2,691
Apache Avro is a data serialization system.
Flume
⭐
2,475
Mirror of Apache Flume
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Parquet Mr
⭐
2,296
Apache Parquet
Lakesoul
⭐
2,248
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Alldata
⭐
2,130
🔥🔥 AllData大数据产品是可定义数据中台,以数据平台为底座,以数据中台为桥梁,以机器学习平台为中层框
Ambari
⭐
2,030
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Flinkstreamsql
⭐
1,972
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Poli
⭐
1,920
An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Drill
⭐
1,856
Apache Drill is a distributed MPP query layer for self describing data
Bookkeeper
⭐
1,828
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
Genie
⭐
1,659
Distributed Big Data Orchestration Service
Incubator Paimon
⭐
1,647
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
Parquet Format
⭐
1,559
Apache Parquet
Bitsail
⭐
1,514
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Mysql_perf_analyzer
⭐
1,420
MySQL performance monitoring and analysis.
Carbondata
⭐
1,401
High performance data store solution
Dremio Oss
⭐
1,260
Dremio - the missing link in modern data
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Egads
⭐
1,136
A Java package to automatically detect anomalies in large scale time-series data
Datumbox Framework
⭐
1,089
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Hazelcast Jet
⭐
1,065
Distributed Stream and Batch Processing
Odd Platform
⭐
1,047
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Phoenix
⭐
1,006
Mirror of Apache Phoenix
Accumulo
⭐
1,003
Apache Accumulo
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Dataflowjavasdk
⭐
853
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Sqoop
⭐
820
Mirror of Apache Sqoop
Rakam Api
⭐
798
📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Samza
⭐
792
Mirror of Apache Samza
Flink Boot
⭐
725
懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成 ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Incubator Celeborn
⭐
725
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Oozie
⭐
687
Mirror of Apache Oozie
Flink Kubernetes Operator
⭐
657
Apache Flink Kubernetes Operator
Orc
⭐
645
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Amoro
⭐
617
Amoro is a Lakehouse management system built on open data lake formats.
Courses
⭐
590
Answers for Quizzes & Assignments that I have taken
Giraph
⭐
582
Mirror of Apache Giraph
Tugraph Analytics
⭐
557
TuGraph Analytics is the fastest OLAP graph database.
Bigtop
⭐
549
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.
Datawave
⭐
512
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Mockneat
⭐
511
MockNeat - the modern faker lib.
Kafka Connect Hdfs
⭐
473
Kafka Connect HDFS connector
Halodb
⭐
472
A fast, log structured key-value store.
Cogcomp Nlp
⭐
448
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
Tez
⭐
446
Apache Tez
Helix
⭐
440
Mirror of Apache Helix
Stroom
⭐
417
Stroom is a highly scalable data storage, processing and analysis platform.
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Sylph
⭐
396
Stream computing platform for bigdata
Knowage Server
⭐
387
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Zdh_web
⭐
379
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批
Smooks
⭐
377
Extensible data integration Java framework for building XML and non-XML fragment-based applications
Ecommercerecommendsystem
⭐
350
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Cloudbreak
⭐
348
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Apex Core
⭐
346
Mirror of Apache Apex core
Cloudeon
⭐
345
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Centurion
⭐
318
Kotlin Bigdata Toolkit
Parquet Cpp
⭐
312
Apache Parquet
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Janusgraph Externals
⭐
305
A collection of externals tools that make janusgraph more convenient and efficient to use.
Bingheguide
⭐
291
🔥🔥🔥 📚 本代码库是作者冰河多年从事互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教
Compass
⭐
284
Compass is a task diagnosis platform for bigdata
Flink Ml
⭐
270
Machine learning library of Apache Flink
Bigdata File Viewer
⭐
269
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Helicalinsight
⭐
256
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Succinct
⭐
239
Enabling queries on compressed data.
Shifu
⭐
235
An end-to-end machine learning and data mining framework on Hadoop
Calcite Avatica
⭐
225
Apache Calcite Avatica
Tibigdata
⭐
201
TiDB connectors for Flink/Hive/Presto
Datacompare
⭐
195
big data comparison and data profiling platform: low code,data comparison and data profiling
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Bigdata Hub
⭐
187
数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,
Erd Online
⭐
186
ERD Online is an online collaborative data warehouse design software. It does not need to install applications locally and operate databases online. It is an excellent alternative to desktop data modeling tools.
Fluo
⭐
183
Apache Fluo
Javaorbigdata Interview
⭐
180
Java开发者或者大数据开发者面试知识点整理
Tipdm
⭐
178
TipDM建模平台,开源的数据挖掘工具。
Siembol
⭐
176
An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Related Searches
Java Spring (21,350)
Java Spring Boot (11,982)
Java Docker (6,180)
Java Database (6,015)
Java Mysql (5,954)
Java Server (5,922)
Javascript Java (5,468)
Java Algorithms (4,705)
Java Apache (4,283)
Java Cloud Computing (4,240)
1-100 of 214 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.