Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for hadoop
hadoop
x
2,011 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Data Science Ipython Notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Xgboost
⭐
25,253
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Luigi
⭐
17,046
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apijson
⭐
16,459
🏆 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构。 🏆 A JSON Transmission Protocol and an ORM Library 🚀 provides APIs and Docs without writing any code.
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Deeplearning4j
⭐
13,290
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Cookbook
⭐
12,557
The Data Engineering Cookbook
Doris
⭐
11,047
Apache Doris is an easy-to-use, high performance and unified analytics database.
Trino
⭐
9,118
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
School Of Sre
⭐
7,516
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Alluxio
⭐
6,544
Alluxio, data orchestration for analytics and machine learning in the cloud
Hive
⭐
5,222
Apache Hive
Bigdl
⭐
4,728
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Ignite
⭐
4,626
Apache Ignite
Calcite
⭐
4,216
Apache Calcite
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Scalding
⭐
3,433
A Scala API for Cascading
Ibis
⭐
3,404
The flexibility of Python with the scale and performance of modern SQL.
Hadoop Book
⭐
3,009
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
Dataspherestudio
⭐
2,860
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Nutch
⭐
2,742
Apache Nutch is an extensible and scalable web crawler
Expert_readed_books
⭐
2,692
2021年最新总结,推荐工程师合适读本,计算机科学,软件技术,创业,思想类,数学类,人物传记书籍
Mrjob
⭐
2,584
Run MapReduce jobs on Hadoop or Amazon Web Services
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Testing Distributed Systems
⭐
2,349
Curated list of resources on testing distributed systems
Winutils
⭐
2,261
Windows binaries for Hadoop versions (built from the git commit ID used for the ASF relase)
H2o 2
⭐
2,242
Please visit https://github.com/h2oai/h2o-3 for latest H2O
Devops Bash Tools
⭐
2,224
1000+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Docker, CI/CD, APIs, SQL, PostgreSQL, MySQL, Hive, Impala, Kafka, Hadoop, Jenkins, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, tmux..
Szt Bigdata
⭐
2,055
深圳地铁大数据客流分析系统🚇🚄🌟
Ambari
⭐
2,030
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Docker Hadoop
⭐
1,955
Apache Hadoop docker image
Elasticsearch Hadoop
⭐
1,914
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Drill
⭐
1,856
Apache Drill is a distributed MPP query layer for self describing data
Kyuubi
⭐
1,849
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Docker Spark
⭐
1,783
Apache Spark docker image
Xlearning
⭐
1,729
AI on Hadoop
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
Flink Streaming Platform Web
⭐
1,698
基于flink的实时流计算web平台
Atlas
⭐
1,685
Apache Atlas
Easyreport
⭐
1,635
A simple and easy to use Web Report System for java.EasyReport是一个简单易用的Web报表工具(支持Hadoop,HBase及各种关系
Poseidon
⭐
1,543
A search engine which can hold 100 trillion lines of log data.
Winutils
⭐
1,512
winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Mongo Hadoop
⭐
1,511
MongoDB Connector for Hadoop
Moosefs
⭐
1,509
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Hadoop Cluster Docker
⭐
1,445
Run Hadoop Custer within Docker Containers
Movie_recommend
⭐
1,441
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Carbondata
⭐
1,401
High performance data store solution
Bigdata Interview
⭐
1,397
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop
Cascalog
⭐
1,378
Data processing on Hadoop without the hassle.
Awesome Opensource Data Engineering
⭐
1,331
An Awesome List of Open-Source Data Engineering Projects
Hdfs
⭐
1,330
A native go client for HDFS
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Caffeonspark
⭐
1,272
Distributed deep learning on Hadoop and Spark clusters.
Taier
⭐
1,220
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Javapdf
⭐
1,177
🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)
Dockerfiles
⭐
1,171
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
Hadoop Docker
⭐
1,169
Hadoop docker image
Bigdata Growth
⭐
1,162
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Nagios Plugins
⭐
1,111
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Elephant Bird
⭐
1,100
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Impala
⭐
1,044
Apache Impala
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Addax
⭐
1,034
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Studybooks
⭐
999
我的学习资料,包括书籍、网址等
Awesome Hadoop
⭐
987
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Docker Hive
⭐
918
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Mr4c
⭐
890
Datasketches Java
⭐
856
A software library of stochastic streaming algorithms, a.k.a. sketches.
Snakebite
⭐
854
A pure python HDFS client
Tesser
⭐
841
Clojure reducers, but for parallel execution: locally and on distributed systems.
Sqoop
⭐
820
Mirror of Apache Sqoop
Hadoop_study
⭐
817
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Useractionanalyzeplatform
⭐
810
电商用户行为分析大数据平台
Docker Spark
⭐
769
Cdap
⭐
735
An open source framework for building data analytic applications.
Ozone
⭐
727
Scalable, redundant, and distributed object store for Apache Hadoop
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Hive Json Serde
⭐
706
Read - Write JSON SerDe for Apache Hive.
Tony
⭐
696
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Oozie
⭐
687
Mirror of Apache Oozie
Geometry Api Java
⭐
679
The Esri Geometry API for Java enables developers to write custom applications for analysis of spatial data. This API is used in the Esri GIS Tools for Hadoop and other 3rd-party data processing solutions.
Hawq
⭐
677
Apache HAWQ
Pig
⭐
659
Mirror of Apache Pig
Sparkr Pkg
⭐
649
R frontend for Spark
Digandburied
⭐
645
挖坑与填坑
Orc
⭐
645
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Flintrock
⭐
627
A command-line tool for launching Apache Spark clusters.
Wedatasphere
⭐
624
WeDataSphere is a financial grade, one-stop big data platform suite.
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Chill
⭐
598
Scala extensions for the Kryo serialization library
Giraph
⭐
582
Mirror of Apache Giraph
Aws Glue Libs
⭐
568
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Data Engineering Interview Questions
⭐
554
More than 2000+ Data engineer interview questions.
Related Searches
Java Hadoop (2,117)
Spark Hadoop (1,188)
Hadoop Hdfs (1,082)
Hadoop Mapreduce (851)
Shell Hadoop (772)
Python Hadoop (761)
Hadoop Hive (703)
1-100 of 2,011 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.