Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for hadoop big data
big-data
x
hadoop
x
426 search results found
Spark
⭐
36,793
Apache Spark - A unified analytics engine for large-scale data processing
Data Science Ipython Notebooks
⭐
25,242
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Presto
⭐
15,086
The official home of the Presto distributed SQL query engine for big data
Bigdata Notes
⭐
14,410
大数据入门指南 ⭐️
Cookbook
⭐
11,769
The Data Engineering Cookbook
Trino
⭐
8,540
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
H2o 3
⭐
6,485
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Hive
⭐
5,086
Apache Hive
Ignite
⭐
4,540
Apache Ignite
Calcite
⭐
4,032
Apache Calcite
H2o 2
⭐
2,242
Please visit https://github.com/h2oai/h2o-3 for latest H2O
Bigdataguide
⭐
2,224
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Ambari
⭐
1,991
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Drill
⭐
1,837
Apache Drill is a distributed MPP query layer for self describing data
Gaffer
⭐
1,713
A large-scale entity and relation database supporting aggregation of properties
Poseidon
⭐
1,543
A search engine which can hold 100 trillion lines of log data.
Moosefs
⭐
1,475
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Bigdata Interview
⭐
1,397
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop
Carbondata
⭐
1,376
High performance data store solution
Bigdata Growth
⭐
1,047
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Sqoop
⭐
820
Mirror of Apache Sqoop
Ozone
⭐
688
Scalable, redundant, and distributed object store for Apache Hadoop
Oozie
⭐
685
Mirror of Apache Oozie
Orc
⭐
625
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Wedatasphere
⭐
593
WeDataSphere is a financial grade, one-stop big data platform suite.
Giraph
⭐
582
Mirror of Apache Giraph
Spline
⭐
538
Data Lineage Tracking And Visualization Solution
Bigdata Ecosystem
⭐
536
BigData Ecosystem Dataset
Bigtop
⭐
531
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.
Kafka Connect Hdfs
⭐
459
Kafka Connect HDFS connector
Tez
⭐
430
Apache Tez
Sylph
⭐
396
Stream computing platform for bigdata
Big_data_architect_skills
⭐
353
一个大数据架构师应该掌握的技能
Apex Core
⭐
346
Mirror of Apache Apex core
Cloudbreak
⭐
343
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Cloudeon
⭐
307
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Parquet4s
⭐
251
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Trafodion
⭐
243
Apache Trafodion
Compass
⭐
243
Compass is a task diagnosis platform for bigdata
Shifu
⭐
235
An end-to-end machine learning and data mining framework on Hadoop
Node Hbase
⭐
232
Asynchronous HBase client for NodeJs using REST
Bigdata_docker
⭐
226
Big Data Ecosystem Docker
Calcite Avatica
⭐
211
Apache Calcite Avatica
Hadoop Attack Library
⭐
200
A collection of pentest tools and resources targeting Hadoop environments
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Bigdata
⭐
183
大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理,实时处理,OLAP等,如hadoop、spark、flink、hive、
Javaorbigdata Interview
⭐
180
Java开发者或者大数据开发者面试知识点整理
Juicy Bigdata
⭐
162
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Incubator Wayang
⭐
142
Apache Wayang(incubating) is the first cross-platform data processing system.
Bigdata
⭐
142
hadoop,hbase,storm,spark,etc..
Eel Sdk
⭐
140
Big Data Toolkit for the JVM
Bigdata Learning
⭐
136
大数据学习记录
Bigdata Hub
⭐
136
数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,
Hdfs Shell
⭐
129
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Griffon Vm
⭐
129
Griffon Data Science Virtual Machine
Tajo
⭐
129
Mirror of Apache Tajo
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Xichuan_note
⭐
114
xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件
Asakusafw
⭐
113
Asakusa Framework
Calcite Avatica Go
⭐
110
Mirror of Apache Calcite - Avatica Go SQL Driver
Gora
⭐
109
The Apache Gora open source framework provides an in-memory data model and persistence for big data.
Logisland
⭐
106
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Streamx
⭐
95
kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Reef
⭐
92
Mirror of Apache REEF
Ni
⭐
81
Say "ni" to data of any size
Cqu_bigdata
⭐
77
重庆大学计算机学院“大数据课程群”实验及PPT
Flowman
⭐
76
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Euphoria
⭐
74
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Mynote
⭐
72
本项目已废弃,笔记收藏整理参考:
Jumbune
⭐
69
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
The Apache Ignite Book
⭐
66
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Apache Spark Hands On
⭐
64
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Incubator Tez
⭐
60
Mirror of Apache Tez (Incubating)
Bigdataparty
⭐
54
大数据组件 All-in-One 的 Dockerfile
Big_data
⭐
53
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Serverless Spark Workshop
⭐
47
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
Doris Website
⭐
45
Apache Doris Website
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Docker Spark Cluster
⭐
44
A Spark cluster setup running on Docker containers
Docker Hadoop Workbench
⭐
44
A Hadoop cluster based on Docker, including Hive and Spark.
Yuzhouwan
⭐
40
Code Library for My Blog
Big Data Lite
⭐
38
Samples to the Oracle Big Data Lite VM
Big Data
⭐
36
Python tools for big data
Nectar
⭐
35
Open source framework for predictive modeling on Apache Hadoop
Sparkdemo
⭐
34
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
Fluo Uno
⭐
34
Apache Fluo Uno
Ambari Metrics
⭐
34
Apache Ambari Metrics is a sub project of Apache Ambari.
Telemetry Analysis Service
⭐
33
Telemetry Analysis Service
Dockerfiles
⭐
31
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Awesome Tools
⭐
31
curated list of awesome tools and libraries for specific domains
Bigdata Docker
⭐
30
docker构建大数据开发学习环境
Learning Spark
⭐
29
Tidy up Spark and Hadoop tutorials.
Flokkr
⭐
28
Documentation placeholder and utilities for all the other containers.
Rzf.github.io
⭐
27
✏️[计算机基础+java基础+大数据基础及进阶+面试指南] 一份涵盖计算机基础,java,大数据,面试宝典,大部分核心知识的项目,学习,面试,共同进步!
Bigdatas
⭐
27
this is a db-hdfs tools used to transfer big database datas to hadoop hdfs like sqoop,but bboss bigdata tool is very nice monitor and event drivered model,and high perfermance,support Distributed executor tasks Ability.
Enceladus
⭐
26
Dynamic Conformance Engine
Related Searches
Java Hadoop (2,117)
Spark Hadoop (1,188)
Hadoop Hdfs (1,082)
Hadoop Mapreduce (851)
Shell Hadoop (772)
Python Hadoop (761)
Hadoop Hive (703)
Python Big Data (588)
Spark Big Data (570)
Java Big Data (533)
1-100 of 426 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.