Awesome Open Source

Programming Languages

Search results for spark hadoop

489 search results found

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Bigdata Notes ⭐ 14,872

大数据入门指南 ⭐

Deeplearning4j ⭐ 13,483

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

Cookbook ⭐ 12,557

The Data Engineering Cookbook

Doris ⭐ 11,243

Apache Doris is an easy-to-use, high performance and unified analytics database.

It_book ⭐ 8,543

本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍，没准你想找的书就在这里呢，包含了互联网行

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

H2o 3 ⭐ 6,618

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Alluxio ⭐ 6,612

Alluxio, data orchestration for analytics and machine learning in the cloud

Bigdl ⭐ 4,728

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm

Tensorflowonspark ⭐ 3,851

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

The flexibility of Python with the scale and performance of modern SQL.

Dataspherestudio ⭐ 2,860

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Bigdataguide ⭐ 2,355

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Szt Bigdata ⭐ 2,055

深圳地铁大数据客流分析系统🚇🚄🌟

Elasticsearch Hadoop ⭐ 1,914

🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop

Kyuubi ⭐ 1,849

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Docker Spark ⭐ 1,783

Apache Spark docker image

Gaffer ⭐ 1,724

A large-scale entity and relation database supporting aggregation of properties

Movie_recommend ⭐ 1,441

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Carbondata ⭐ 1,401

High performance data store solution

Bigdata Interview ⭐ 1,397

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop

Awesome Opensource Data Engineering ⭐ 1,331

An Awesome List of Open-Source Data Engineering Projects

Dr Elephant ⭐ 1,301

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark

Caffeonspark ⭐ 1,272

Distributed deep learning on Hadoop and Spark clusters.

Bigdata Growth ⭐ 1,256

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Taier ⭐ 1,220

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display

Dockerfiles ⭐ 1,171

50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Data Algorithms Book ⭐ 973

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Coding Now ⭐ 925

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、

Livy is an open source REST interface for interacting with Apache Spark from anywhere

Hadoop_study ⭐ 817

定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Useractionanalyzeplatform ⭐ 810

电商用户行为分析大数据平台

Docker Spark ⭐ 769

An open source framework for building data analytic applications.

Devops Python Tools ⭐ 709

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Sparkr Pkg ⭐ 649

R frontend for Spark

Digandburied ⭐ 645

挖坑与填坑

Flintrock ⭐ 627

A command-line tool for launching Apache Spark clusters.

Wedatasphere ⭐ 624

WeDataSphere is a financial grade, one-stop big data platform suite.

Dist Keras ⭐ 611

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Aws Glue Libs ⭐ 568

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.

Data Engineering Interview Questions ⭐ 554

More than 2000+ Data engineer interview questions.

Data Lineage Tracking And Visualization Solution

Spark Redshift ⭐ 514

Redshift data source for Apache Spark

Marmaray ⭐ 444

Generic Data Ingestion & Dispersal Library for Hadoop

Iceberg ⭐ 409

Iceberg is a table format for large, slow-moving tabular data

Stream computing platform for bigdata

Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.

Big_data_architect_skills ⭐ 353

一个大数据架构师应该掌握的技能

Ytk Learn ⭐ 351

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Elasticluster ⭐ 334

Create clusters of VMs on the cloud and configure them with Ansible.

Big Whale ⭐ 290

Spark、Flink等离线任务的调度以及实时任务的监控

Sagemaker Spark ⭐ 285

A Spark library for Amazon SageMaker.

Compass ⭐ 284

Compass is a task diagnosis platform for bigdata

Demo_11.11_storm Spark Hadoop ⭐ 257

hadoop_storm_spark结合实验的例子，模拟淘宝双11节，根据订单详细信息，汇总出总销售 --------大概流程------- 第一阶段（storm实时报表）第二阶段（离线报表）第三阶段（大规模订单即席查询,和多维度查询）第四阶段（数据挖掘和图计算）

Spark Jupyter Aws ⭐ 255

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Sparkstreaming ⭐ 253

Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志

Bisheserver ⭐ 242

本系统是我的毕业设计项目，题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为

Hadoop Tutorials Examples ⭐ 228

Source, data and turotials of the blog post video series of Hue, the Web UI for Hadoop.

Bigdata_docker ⭐ 226

Big Data Ecosystem Docker

Bigdata ⭐ 219

大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理，实时处理，OLAP等，如hadoop、spark、flink、hive、

Hadoop Docker ⭐ 210

基于Docker构建的Hadoop开发测试环境，包含Hadoop，Hive，HBase，Spark

Sparkrdma ⭐ 191

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Big Data ⭐ 190

一个开源、成体系的大数据学习教程。spark学习 hadoop hive hbase flink教程 linux 从入门到精通

Wifiprobeanalysis ⭐ 189

基于WIFI探针的商业大数据分析技术

Bigdata Hub ⭐ 187

数据建设与大数据技术知识体系，包含hadoop、hive、spark、flink主流框架和系列框架，

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase

Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.

Javaorbigdata Interview ⭐ 180

Java开发者或者大数据开发者面试知识点整理

Airflow Pipeline ⭐ 168

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Set of Machine Learning and Stochastic Optimazion tools based on Hadoop, Spark and Storm https://pkghosh.wordpress.com/

Juicy Bigdata ⭐ 162

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

Incubator Wayang ⭐ 162

Apache Wayang(incubating) is the first cross-platform data processing system.

Learning Hadoop And Spark ⭐ 160

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Aliyun Emapreduce Datasources ⭐ 157

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Ipython Spark Docker ⭐ 151

Bigdata ⭐ 142

hadoop,hbase,storm,spark,etc..

Sparktraining ⭐ 140

Examples for Spark Training in chinahadoop.cn

Csdn Code ⭐ 138

停止维护 -->移步 https://github.com/vbay/tutorials

Bigdata Learning ⭐ 136

大数据学习记录

Logvision ⭐ 136

分布式实时日志分析与入侵检测系统

Hdfs_fdw ⭐ 131

PostgreSQL foreign data wrapper for HDFS

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Beymani ⭐ 126

Hadoop, Spark and Storm based anomaly detection implementations for data quality, cyber security, fraud detection etc.

Shuttle ⭐ 123

Shuttle：High Available, High Performance Remote Shuffle Service

Vagrant Hadoop Spark Cluster ⭐ 121

Vagrant project to spin up a cluster of 4 32-bit CentOS6.5 Linux virtual machines with Hadoop v2.6.0 and Spark v1.1.1

Variantspark ⭐ 121

machine learning for genomic variants

Docker Spark ⭐ 118

Docker image for general apache spark client

Spark Terasort ⭐ 116

Xichuan_note ⭐ 114

xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件

[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine

Spark Streaming监控平台，支持任务部署与告警、自启动

Asakusafw ⭐ 113

Asakusa Framework

Spark Summit North America 2018 06 ⭐ 112

spark-summit-north-america-2018-06, More detail please visit

Logisland ⭐ 106

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm

Xlearning Xdml ⭐ 101

extremely distributed machine learning

Related Searches

Scala Spark (3,279)

Java Hadoop (2,117)

Python Spark (2,053)

Java Spark (1,587)

Apache Spark (1,207)

Jupyter Notebook Spark (1,151)

Hadoop Hdfs (1,075)

Spark Kafka (985)

Hadoop Mapreduce (847)

Spark Streaming (817)

1-100 of 489 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.