Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark hadoop
hadoop
x
spark
x
489 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Data Science Ipython Notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Deeplearning4j
⭐
13,397
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Cookbook
⭐
12,557
The Data Engineering Cookbook
Doris
⭐
11,243
Apache Doris is an easy-to-use, high performance and unified analytics database.
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Alluxio
⭐
6,612
Alluxio, data orchestration for analytics and machine learning in the cloud
Bigdl
⭐
4,728
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Ibis
⭐
3,404
The flexibility of Python with the scale and performance of modern SQL.
Dataspherestudio
⭐
2,860
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Szt Bigdata
⭐
2,055
深圳地铁大数据客流分析系统🚇🚄🌟
Elasticsearch Hadoop
⭐
1,914
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Kyuubi
⭐
1,849
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Docker Spark
⭐
1,783
Apache Spark docker image
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
Movie_recommend
⭐
1,441
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Carbondata
⭐
1,401
High performance data store solution
Bigdata Interview
⭐
1,397
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop
Awesome Opensource Data Engineering
⭐
1,331
An Awesome List of Open-Source Data Engineering Projects
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Caffeonspark
⭐
1,272
Distributed deep learning on Hadoop and Spark clusters.
Bigdata Growth
⭐
1,256
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Taier
⭐
1,220
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Dockerfiles
⭐
1,171
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Hadoop_study
⭐
817
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Useractionanalyzeplatform
⭐
810
电商用户行为分析大数据平台
Docker Spark
⭐
769
Cdap
⭐
735
An open source framework for building data analytic applications.
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Sparkr Pkg
⭐
649
R frontend for Spark
Digandburied
⭐
645
挖坑与填坑
Flintrock
⭐
627
A command-line tool for launching Apache Spark clusters.
Wedatasphere
⭐
624
WeDataSphere is a financial grade, one-stop big data platform suite.
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Aws Glue Libs
⭐
568
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Data Engineering Interview Questions
⭐
554
More than 2000+ Data engineer interview questions.
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Spark Redshift
⭐
514
Redshift data source for Apache Spark
Marmaray
⭐
444
Generic Data Ingestion & Dispersal Library for Hadoop
Iceberg
⭐
409
Iceberg is a table format for large, slow-moving tabular data
Sylph
⭐
396
Stream computing platform for bigdata
Graphx
⭐
353
Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
Big_data_architect_skills
⭐
353
一个大数据架构师应该掌握的技能
Ytk Learn
⭐
351
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Elasticluster
⭐
334
Create clusters of VMs on the cloud and configure them with Ansible.
Big Whale
⭐
290
Spark、Flink等离线任务的调度以及实时任务的监控
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Compass
⭐
284
Compass is a task diagnosis platform for bigdata
Demo_11.11_storm Spark Hadoop
⭐
257
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Sparkstreaming
⭐
253
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志
Bisheserver
⭐
242
本系统是我的毕业设计项目,题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为
Hadoop Tutorials Examples
⭐
228
Source, data and turotials of the blog post video series of Hue, the Web UI for Hadoop.
Bigdata_docker
⭐
226
Big Data Ecosystem Docker
Bigdata
⭐
219
大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理,实时处理,OLAP等,如hadoop、spark、flink、hive、
Hadoop Docker
⭐
210
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Big Data
⭐
190
一个开源、成体系的大数据学习教程。spark学习 hadoop hive hbase flink教程 linux 从入门到精通
Wifiprobeanalysis
⭐
189
基于WIFI探针的商业大数据分析技术
Bigdata Hub
⭐
187
数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,
Dpkb
⭐
182
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase
Magpie
⭐
182
Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.
Javaorbigdata Interview
⭐
180
Java开发者或者大数据开发者面试知识点整理
Airflow Pipeline
⭐
168
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Avenir
⭐
164
Set of Machine Learning and Stochastic Optimazion tools based on Hadoop, Spark and Storm https://pkghosh.wordpress.com/
Juicy Bigdata
⭐
162
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Incubator Wayang
⭐
162
Apache Wayang(incubating) is the first cross-platform data processing system.
Learning Hadoop And Spark
⭐
160
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Aliyun Emapreduce Datasources
⭐
157
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Ipython Spark Docker
⭐
151
Bigdata
⭐
142
hadoop,hbase,storm,spark,etc..
Sparktraining
⭐
140
Examples for Spark Training in chinahadoop.cn
Csdn Code
⭐
138
停止维护 -->移步 https://github.com/vbay/tutorials
Bigdata Learning
⭐
136
大数据学习记录
Logvision
⭐
136
分布式实时日志分析与入侵检测系统
Hdfs_fdw
⭐
131
PostgreSQL foreign data wrapper for HDFS
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Beymani
⭐
126
Hadoop, Spark and Storm based anomaly detection implementations for data quality, cyber security, fraud detection etc.
Shuttle
⭐
123
Shuttle:High Available, High Performance Remote Shuffle Service
Vagrant Hadoop Spark Cluster
⭐
121
Vagrant project to spin up a cluster of 4 32-bit CentOS6.5 Linux virtual machines with Hadoop v2.6.0 and Spark v1.1.1
Variantspark
⭐
121
machine learning for genomic variants
Docker Spark
⭐
118
Docker image for general apache spark client
Spark Terasort
⭐
116
Spark Terasort
Xichuan_note
⭐
114
xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件
Bdutil
⭐
114
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
Teddy
⭐
113
Spark Streaming监控平台,支持任务部署与告警、自启动
Asakusafw
⭐
113
Asakusa Framework
Spark Summit North America 2018 06
⭐
112
spark-summit-north-america-2018-06, More detail please visit
Logisland
⭐
106
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Chombo
⭐
102
Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
Xlearning Xdml
⭐
101
extremely distributed machine learning
Related Searches
Scala Spark (3,279)
Java Hadoop (2,117)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Jupyter Notebook Spark (1,151)
Hadoop Hdfs (1,075)
Spark Kafka (985)
Hadoop Mapreduce (847)
Spark Streaming (817)
1-100 of 489 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.