Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for java spark
java
x
spark
x
1,034 search results found
Spark
⭐
35,928
Apache Spark - A unified analytics engine for large-scale data processing
Bigdata Notes
⭐
13,291
大数据入门指南 ⭐️
Flink Learning
⭐
13,198
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《 Flink 实战与性能优化》
Technology Talk
⭐
13,004
汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理
Deeplearning4j
⭐
12,966
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Spark
⭐
9,409
A simple expressive web framework for java. Spark has a kotlin DSL https://github.com/perwendel/spark-kotlin
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
Doris
⭐
8,432
Apache Doris is an easy-to-use, high performance and unified analytics database.
God Of Bigdata
⭐
7,992
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
H2o 3
⭐
6,303
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Alluxio
⭐
6,260
Alluxio, data orchestration for analytics and machine learning in the cloud
Zeppelin
⭐
6,062
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Technical Books
⭐
5,129
😆 国内外互联网技术大牛们都写了哪些书籍:计算机基础、网络、前端、后端、数据库、架构、大数据、深度学习.
Iceberg
⭐
4,340
Apache Iceberg
Hudi
⭐
4,259
Upserts, Deletes And Incremental Processing on Big Data.
Learning Spark
⭐
3,804
Example code from Learning Spark book
Roaringbitmap
⭐
3,065
A better compressed bitset in Java
Dataspherestudio
⭐
2,557
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Ballista
⭐
2,244
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Spring Boot Quick
⭐
2,152
🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、K
Bigdataguide
⭐
1,994
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Quicksql
⭐
1,939
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Elasticsearch Hadoop
⭐
1,904
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Easyml
⭐
1,894
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Gaffer
⭐
1,702
A large-scale entity and relation database supporting aggregation of properties
Jvm Profiler
⭐
1,661
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
Elassandra
⭐
1,633
Elassandra = Elasticsearch + Apache Cassandra
Gatk
⭐
1,442
Official code repository for GATK versions 4 and up
Movie_recommend
⭐
1,441
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Seldon Server
⭐
1,420
Machine Learning Platform and Recommendation Engine built on Kubernetes
Carbondata
⭐
1,364
High performance data store solution
Aws Serverless Java Container
⭐
1,351
A Java wrapper to run Spring, Jersey, Spark, and other apps inside AWS Lambda.
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Tutorial
⭐
1,267
后端 (Java Golang)全栈知识架构体系总结
Spark
⭐
1,141
A simple Android sparkline chart view.
Taier
⭐
1,122
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Spark Doc Zh
⭐
1,101
Apache Spark 官方文档中文版
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Adam
⭐
946
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Spark Redis
⭐
885
A connector for Spark that allows reading and writing to/from Redis cluster
Hadoop_study
⭐
817
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Useractionanalyzeplatform
⭐
810
电商用户行为分析大数据平台
Spark
⭐
786
A performance profiler for Minecraft clients, servers, and proxies.
Zingg
⭐
738
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Cdap
⭐
707
An open source framework for building data analytic applications.
Mongo Spark
⭐
676
The MongoDB Spark Connector
Nessie
⭐
642
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
Kafka Spark Consumer
⭐
616
High Performance Kafka Connector for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper. No Data-loss. No dependency on HDFS and WAL. In-built PID rate controller. Support Message Handler . Offset Lag checker.
Wiki2vec
⭐
587
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
Yanagishima
⭐
584
Web UI for Trino, Hive and SparkSQL
Cassandra Lucene Index
⭐
574
Lucene based secondary indexes for Cassandra
Coral
⭐
558
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
Spark
⭐
528
Cross-platform real-time collaboration client optimized for business and organizations.
Clickhouse Native Jdbc
⭐
484
ClickHouse Native Protocol JDBC implementation
Marmaray
⭐
444
Generic Data Ingestion & Dispersal Library for Hadoop
Dl4j Tutorials
⭐
429
dl4j 基础教程 配套视频:https://space.bilibili.com/327018681/#/
Incubator Celeborn
⭐
420
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Iceberg
⭐
409
Iceberg is a table format for large, slow-moving tabular data
Learningspark
⭐
406
Scala examples for learning to use Spark
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Sylph
⭐
396
Stream computing platform for bigdata
Connectors
⭐
377
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Stockinference Spark
⭐
376
Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.
Ecommercerecommendsystem
⭐
350
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Distributed Java
⭐
336
Distributed Java.《分布式 Java》
Learning Spark Examples
⭐
320
Examples for learning spark
Every Single Day I Tldr
⭐
304
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Spark Bigquery Connector
⭐
296
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Zdh_web
⭐
282
大数据采集,抽取平台
Transport
⭐
277
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
Azure Event Hubs
⭐
277
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Jpmml Sparkml
⭐
265
Java library and command-line application for converting Apache Spark ML pipelines to PMML
Remoteshuffleservice
⭐
262
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Demo_11.11_storm Spark Hadoop
⭐
257
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Incubator Uniffle
⭐
254
Uniffle is a high performance, general purpose Remote Shuffle Service.
Sparkstreaming
⭐
253
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志
Firestorm
⭐
240
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Succinct
⭐
239
Enabling queries on compressed data.
Big Whale
⭐
225
Spark、Flink等离线任务的调度以及实时任务的监控
Opendl
⭐
219
The Deep Learning training framework on Spark
Atlas
⭐
208
OSM in memory
Cloudshuffleservice
⭐
204
Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Video Stream Analytics
⭐
191
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Emotional_analysis
⭐
190
[毕业设计]基于Spark网易云音乐数据分析【1.图计算 2.机器学习预测歌曲分类 3.评论词云 4.评论时间段 5.评论top榜 6.热歌top榜 7.用户性别比例 8.用户星座比例 9.用户年龄比例 10.用户全国地理分布 11.热评搜索等等..】
Wifiprobeanalysis
⭐
189
基于WIFI探针的商业大数据分析技术
Vn.vitk
⭐
189
A Vietnamese Text Processing Toolkit
Javaorbigdata Interview
⭐
180
Java开发者或者大数据开发者面试知识点整理
Extentreports Java
⭐
179
Extent Reporting Library, Java
Whylogs Java
⭐
179
Profile and monitor your ML data pipeline end-to-end
Hnswlib
⭐
178
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Spark Kafka Writer
⭐
177
Write your Spark data to Kafka seamlessly
Kafka Book
⭐
167
《Kafka技术内幕》代码
Aws Glue Data Catalog Client For Apache Hive Metastore
⭐
164
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog
Distml
⭐
164
DistML provide a supplement to mllib to support model-parallel on Spark
Dcos Commons
⭐
162
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Related Searches
Java Spring (21,350)
Java Spring Boot (11,982)
Java Gradle (8,072)
Java Game (7,956)
Java Docker (6,180)
Java Sdk (6,021)
Javascript Java (4,659)
Java Mysql (4,593)
Java Algorithms (4,524)
Java Cloud Computing (4,282)
1-100 of 1,034 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.