Awesome Open Source

Programming Languages

Search results for java spark

650 search results found

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

Bigdata Notes ⭐ 14,872

大数据入门指南 ⭐

Flink Learning ⭐ 13,801

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《 Flink 实战与性能优化》

Technology Talk ⭐ 13,579

【大厂面试专栏】一份Java程序员需要的技术指南，这里有面试题、系统架构、职场锦囊、主流中间件等，让

Deeplearning4j ⭐ 13,397

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

Spark ⭐ 9,543

A simple expressive web framework for java. Spark has a kotlin DSL https://github.com/perwendel/spark-kotlin

It_book ⭐ 8,543

本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍，没准你想找的书就在这里呢，包含了互联网行

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

Angel ⭐ 6,690

A Flexible and Powerful Parameter Server for large-scale machine learning

H2o 3 ⭐ 6,618

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Zeppelin ⭐ 6,259

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Technical Books ⭐ 5,519

😆 国内外互联网技术大牛们都写了哪些书籍：计算机基础、网络、前端、后端、数据库、架构、大数据、深度学习.

Iceberg ⭐ 5,179

Upserts, Deletes And Incremental Processing on Big Data.

Learning Spark ⭐ 3,804

Example code from Learning Spark book

Roaringbitmap ⭐ 3,308

A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Tablesaw, and many others

Dataspherestudio ⭐ 2,860

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Bigdataguide ⭐ 2,355

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Spring Boot Quick ⭐ 2,282

🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、K

Lakesoul ⭐ 2,248

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Ballista ⭐ 2,244

Distributed compute platform implemented in Rust, and powered by Apache Arrow.

Easyml ⭐ 1,966

Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.

Quicksql ⭐ 1,939

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources

Elasticsearch Hadoop ⭐ 1,914

🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop

Gaffer ⭐ 1,724

A large-scale entity and relation database supporting aggregation of properties

Jvm Profiler ⭐ 1,717

JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter

Incubator Paimon ⭐ 1,647

Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.

Elassandra ⭐ 1,633

Elassandra = Elasticsearch + Apache Cassandra

Movie_recommend ⭐ 1,441

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Seldon Server ⭐ 1,420

Machine Learning Platform and Recommendation Engine built on Kubernetes

Tutorial ⭐ 1,414

后端（Java Golang）全栈知识架构体系总结

Carbondata ⭐ 1,401

High performance data store solution

Dr Elephant ⭐ 1,301

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark

Taier ⭐ 1,220

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display

Spark Doc Zh ⭐ 1,186

Apache Spark 官方文档中文版

Spark ⭐ 1,141

A simple Android sparkline chart view.

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Data Algorithms Book ⭐ 973

MapReduce, Spark, Java, and Scala for Data Algorithms Book

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

Spark Redis ⭐ 926

A connector for Spark that allows reading and writing to/from Redis cluster

Coding Now ⭐ 925

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、

Livy is an open source REST interface for interacting with Apache Spark from anywhere

A performance profiler for Minecraft clients, servers, and proxies.

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Datasophon ⭐ 823

The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.

Hadoop_study ⭐ 817

定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Useractionanalyzeplatform ⭐ 810

电商用户行为分析大数据平台

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

An open source framework for building data analytic applications.

Incubator Celeborn ⭐ 725

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Mongo Spark ⭐ 692

The MongoDB Spark Connector

Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.

Kafka Spark Consumer ⭐ 616

High Performance Kafka Connector for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper. No Data-loss. No dependency on HDFS and WAL. In-built PID rate controller. Support Message Handler . Offset Lag checker.

Wiki2vec ⭐ 587

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby

Yanagishima ⭐ 584

Web UI for Trino, Hive and SparkSQL

Cassandra Lucene Index ⭐ 574

Lucene based secondary indexes for Cassandra

Cross-platform real-time collaboration client optimized for business and organizations.

Gpt Web Java ⭐ 546

基于JDK8 AI 聊天机器人！微信公众号 Midjourney画图、卡密兑换、web 支持ChatGPT、Midjourney画图、sd画图，卡密兑换，易支付，公众号引流，邮件注册🔥

Clickhouse Native Jdbc ⭐ 502

ClickHouse Native Protocol JDBC implementation

Marmaray ⭐ 444

Generic Data Ingestion & Dispersal Library for Hadoop

Dl4j Tutorials ⭐ 429

dl4j 基础教程配套视频：https://space.bilibili.com/327018681/#/

Iceberg ⭐ 409

Iceberg is a table format for large, slow-moving tabular data

Learningspark ⭐ 406

Scala examples for learning to use Spark

Sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stream computing platform for bigdata

Connectors ⭐ 383

This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.

Zdh_web ⭐ 379

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台，包含数据采集,调度,权限,审批

Stockinference Spark ⭐ 376

Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.

Ecommercerecommendsystem ⭐ 350

商品大数据实时推荐系统。前端：Vue + TypeScript + ElementUI，后端 Spring + Spark

Distributed Java ⭐ 336

Distributed Java.《分布式 Java》

Spark Bigquery Connector ⭐ 332

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Incubator Uniffle ⭐ 332

Uniffle is a high performance, general purpose Remote Shuffle Service.

Learning Spark Examples ⭐ 320

Examples for learning spark

Every Single Day I Tldr ⭐ 311

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Big Whale ⭐ 290

Spark、Flink等离线任务的调度以及实时任务的监控

Transport ⭐ 288

A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.

Compass ⭐ 284

Compass is a task diagnosis platform for bigdata

Azure Event Hubs ⭐ 277

☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs

Datavines ⭐ 275

Know your data better！Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.

Jpmml Sparkml ⭐ 265

Java library and command-line application for converting Apache Spark ML pipelines to PMML

Remoteshuffleservice ⭐ 262

Remote shuffle service for Apache Spark to store shuffle data on remote servers.

Demo_11.11_storm Spark Hadoop ⭐ 257

hadoop_storm_spark结合实验的例子，模拟淘宝双11节，根据订单详细信息，汇总出总销售 --------大概流程------- 第一阶段（storm实时报表）第二阶段（离线报表）第三阶段（大规模订单即席查询,和多维度查询）第四阶段（数据挖掘和图计算）

Sparkstreaming ⭐ 253

Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志

Firestorm ⭐ 240

Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers

Succinct ⭐ 239

Enabling queries on compressed data.

Hnswlib ⭐ 233

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

The Deep Learning training framework on Spark

Cloudshuffleservice ⭐ 204

Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.

Datacompare ⭐ 195

big data comparison and data profiling platform: low code，data comparison and data profiling

Extentreports Java ⭐ 195

Extent Reporting Library, Java

⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Sparkrdma ⭐ 191

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Video Stream Analytics ⭐ 191

Emotional_analysis ⭐ 190

[毕业设计]基于Spark网易云音乐数据分析【1.图计算 2.机器学习预测歌曲分类 3.评论词云 4.评论时间段 5.评论top榜 6.热歌top榜 7.用户性别比例 8.用户星座比例 9.用户年龄比例 10.用户全国地理分布 11.热评搜索等等..】

Vn.vitk ⭐ 189

A Vietnamese Text Processing Toolkit

Wifiprobeanalysis ⭐ 189

基于WIFI探针的商业大数据分析技术

Bigdata Hub ⭐ 187

数据建设与大数据技术知识体系，包含hadoop、hive、spark、flink主流框架和系列框架，

Aws Glue Data Catalog Client For Apache Hive Metastore ⭐ 184

The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog

Javaorbigdata Interview ⭐ 180

Java开发者或者大数据开发者面试知识点整理

Related Searches

Java Spring (21,350)

Java Spring Boot (11,982)

Java Video Game (8,093)

Java Gradle (8,072)

Java Docker (6,180)

Java Sdk (5,864)

Java Rest (4,956)

Java Algorithms (4,737)

Javascript Java (4,659)

Java Mysql (4,593)

1-100 of 650 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.