Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for java spark
java
x
spark
x
650 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Flink Learning
⭐
13,801
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《 Flink 实战与性能优化》
Technology Talk
⭐
13,579
【大厂面试专栏】一份Java程序员需要的技术指南,这里有面试题、系统架构、职场锦囊、主流中间件等,让
Deeplearning4j
⭐
13,397
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Spark
⭐
9,543
A simple expressive web framework for java. Spark has a kotlin DSL https://github.com/perwendel/spark-kotlin
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Angel
⭐
6,690
A Flexible and Powerful Parameter Server for large-scale machine learning
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Zeppelin
⭐
6,259
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Technical Books
⭐
5,519
😆 国内外互联网技术大牛们都写了哪些书籍:计算机基础、网络、前端、后端、数据库、架构、大数据、深度学习.
Iceberg
⭐
5,179
Apache Iceberg
Hudi
⭐
4,901
Upserts, Deletes And Incremental Processing on Big Data.
Learning Spark
⭐
3,804
Example code from Learning Spark book
Roaringbitmap
⭐
3,308
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Tablesaw, and many others
Dataspherestudio
⭐
2,860
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Spring Boot Quick
⭐
2,282
🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、K
Lakesoul
⭐
2,248
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Ballista
⭐
2,244
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Easyml
⭐
1,966
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Quicksql
⭐
1,939
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Elasticsearch Hadoop
⭐
1,914
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
Jvm Profiler
⭐
1,717
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
Incubator Paimon
⭐
1,647
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
Elassandra
⭐
1,633
Elassandra = Elasticsearch + Apache Cassandra
Movie_recommend
⭐
1,441
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Seldon Server
⭐
1,420
Machine Learning Platform and Recommendation Engine built on Kubernetes
Tutorial
⭐
1,414
后端 (Java Golang)全栈知识架构体系总结
Carbondata
⭐
1,401
High performance data store solution
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Taier
⭐
1,220
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Spark
⭐
1,141
A simple Android sparkline chart view.
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Spark Redis
⭐
926
A connector for Spark that allows reading and writing to/from Redis cluster
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Spark
⭐
904
A performance profiler for Minecraft clients, servers, and proxies.
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Datasophon
⭐
823
The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.
Hadoop_study
⭐
817
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Useractionanalyzeplatform
⭐
810
电商用户行为分析大数据平台
Nessie
⭐
762
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
Cdap
⭐
735
An open source framework for building data analytic applications.
Incubator Celeborn
⭐
725
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Mongo Spark
⭐
692
The MongoDB Spark Connector
Coral
⭐
680
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
Kafka Spark Consumer
⭐
616
High Performance Kafka Connector for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper. No Data-loss. No dependency on HDFS and WAL. In-built PID rate controller. Support Message Handler . Offset Lag checker.
Wiki2vec
⭐
587
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
Yanagishima
⭐
584
Web UI for Trino, Hive and SparkSQL
Cassandra Lucene Index
⭐
574
Lucene based secondary indexes for Cassandra
Spark
⭐
548
Cross-platform real-time collaboration client optimized for business and organizations.
Gpt Web Java
⭐
546
基于JDK8 AI 聊天机器人!微信公众号 Midjourney画图、卡密兑换、web 支持ChatGPT、Midjourney画图、sd画图,卡密兑换,易支付,公众号引流,邮件注册🔥
Clickhouse Native Jdbc
⭐
502
ClickHouse Native Protocol JDBC implementation
Marmaray
⭐
444
Generic Data Ingestion & Dispersal Library for Hadoop
Dl4j Tutorials
⭐
429
dl4j 基础教程 配套视频:https://space.bilibili.com/327018681/#/
Iceberg
⭐
409
Iceberg is a table format for large, slow-moving tabular data
Learningspark
⭐
406
Scala examples for learning to use Spark
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Sylph
⭐
396
Stream computing platform for bigdata
Connectors
⭐
383
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Zdh_web
⭐
379
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批
Stockinference Spark
⭐
376
Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.
Ecommercerecommendsystem
⭐
350
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Distributed Java
⭐
336
Distributed Java.《分布式 Java》
Spark Bigquery Connector
⭐
332
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Incubator Uniffle
⭐
332
Uniffle is a high performance, general purpose Remote Shuffle Service.
Learning Spark Examples
⭐
320
Examples for learning spark
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Big Whale
⭐
290
Spark、Flink等离线任务的调度以及实时任务的监控
Transport
⭐
288
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
Compass
⭐
284
Compass is a task diagnosis platform for bigdata
Azure Event Hubs
⭐
277
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Datavines
⭐
275
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Jpmml Sparkml
⭐
265
Java library and command-line application for converting Apache Spark ML pipelines to PMML
Remoteshuffleservice
⭐
262
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Demo_11.11_storm Spark Hadoop
⭐
257
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Sparkstreaming
⭐
253
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志
Firestorm
⭐
240
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Succinct
⭐
239
Enabling queries on compressed data.
Hnswlib
⭐
233
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Atlas
⭐
223
OSM in memory
Opendl
⭐
219
The Deep Learning training framework on Spark
Cloudshuffleservice
⭐
204
Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
Datacompare
⭐
195
big data comparison and data profiling platform: low code,data comparison and data profiling
Extentreports Java
⭐
195
Extent Reporting Library, Java
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Video Stream Analytics
⭐
191
Emotional_analysis
⭐
190
[毕业设计]基于Spark网易云音乐数据分析【1.图计算 2.机器学习预测歌曲分类 3.评论词云 4.评论时间段 5.评论top榜 6.热歌top榜 7.用户性别比例 8.用户星座比例 9.用户年龄比例 10.用户全国地理分布 11.热评搜索等等..】
Vn.vitk
⭐
189
A Vietnamese Text Processing Toolkit
Wifiprobeanalysis
⭐
189
基于WIFI探针的商业大数据分析技术
Bigdata Hub
⭐
187
数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,
Aws Glue Data Catalog Client For Apache Hive Metastore
⭐
184
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog
Javaorbigdata Interview
⭐
180
Java开发者或者大数据开发者面试知识点整理
Related Searches
Java Spring (21,350)
Java Spring Boot (11,982)
Java Video Game (8,093)
Java Gradle (8,072)
Java Docker (6,180)
Java Sdk (5,864)
Java Rest (4,956)
Java Algorithms (4,737)
Javascript Java (4,659)
Java Mysql (4,593)
1-100 of 650 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.