Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for hdfs
hdfs
x
794 search results found
Seaweedfs
⭐
21,063
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Cat
⭐
18,237
CAT 作为服务端项目基础组件,提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据库框架,缓存框架等,
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Ceph
⭐
12,859
Ceph is a distributed object, block, and file storage platform
Mycat Server
⭐
9,431
Juicefs
⭐
9,252
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Ibis
⭐
3,404
The flexibility of Python with the scale and performance of modern SQL.
Smart_open
⭐
3,065
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Docker Hadoop
⭐
1,955
Apache Hadoop docker image
Xlearning
⭐
1,729
AI on Hadoop
Tiledb
⭐
1,700
The Universal Storage Engine
Poseidon
⭐
1,543
A search engine which can hold 100 trillion lines of log data.
Drake
⭐
1,472
Data workflow tool, like a "Make for data"
Bigdata Interview
⭐
1,397
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop
Hdfs
⭐
1,330
A native go client for HDFS
Bigdata Growth
⭐
1,256
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Hopsworks
⭐
1,041
Hopsworks - Data-Intensive AI platform with a Feature Store
Addax
⭐
1,034
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Snakebite
⭐
854
A pure python HDFS client
Sqoop
⭐
820
Mirror of Apache Sqoop
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Hawq
⭐
677
Apache HAWQ
Juicesync
⭐
570
A tool to move your data between any clouds or regions.
Daily Deeplearning
⭐
532
🔥机器学习/深度学习/Python/算法面试/自然语言处理教程/剑指offer/machine learning/deeplearning/Python/Algorithm interview/NLP Tutorial
Sparta
⭐
526
Real Time Analytics and Data Pipelines based on Spark Streaming
Minos
⭐
508
Minos is beyond a hadoop deployment system.
Docker Hadoop Spark Workbench
⭐
503
[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook and HDFS FileBrowser.
Kafka Connect Ui
⭐
494
Web tool for Kafka Connect |
Kafka Connect Hdfs
⭐
473
Kafka Connect HDFS connector
Storm Yarn
⭐
419
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Zdh_web
⭐
379
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批
Kite
⭐
366
Kite SDK
Bigdata
⭐
358
💎🔥大数据学习笔记
Cloudeon
⭐
345
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Spindle
⭐
333
Next-generation web analytics processing with Scala, Spark, and Parquet.
Deeplog
⭐
320
Pytorch Implementation of DeepLog.
Rlink Rs
⭐
316
High-performance Stream Processing Framework. An alternative to Apache Flink.
Golang Distributed Filesystem
⭐
312
HDFS-alike in Go. Written to learn the language and get a job.
Packetpig
⭐
309
Packetpig - Open Source Big Data Security Analytics
Tensorspark
⭐
302
TensorFlow on Spark
Hops
⭐
285
Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.
Divolte Collector
⭐
275
Divolte Collector
Storagetapper
⭐
269
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Bigdata File Viewer
⭐
269
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Faunus
⭐
259
Graph Analytics Engine
Hdfs
⭐
257
API and command line interface for HDFS
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Hadoop Mini Clusters
⭐
251
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Omniduct
⭐
247
A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).
Hadoopy
⭐
244
Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
Wradlib
⭐
238
weather radar data processing - python package
Hdfs Mount
⭐
234
A tool to mount HDFS as a local Linux file system
Bigdata_docker
⭐
226
Big Data Ecosystem Docker
Zeppelin Notebooks
⭐
206
Gallery of Apache Zeppelin notebooks
Hadoop Attack Library
⭐
200
A collection of pentest tools and resources targeting Hadoop environments
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Wifiprobeanalysis
⭐
189
基于WIFI探针的商业大数据分析技术
Spark Notes
⭐
183
Magpie
⭐
182
Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.
Terrapin
⭐
168
Serving system for batch generated data sets
Tiledb Py
⭐
167
Python interface to the TileDB storage engine
Juicy Bigdata
⭐
162
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Dcos Commons
⭐
162
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Bigdata In Practice
⭐
154
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Lambda Arch
⭐
151
A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.
Ipython Spark Docker
⭐
151
Kube Yarn
⭐
135
Running YARN on Kubernetes with PetSet controller.
Distributed Graph Analytics
⭐
135
Distributed Graph Analytics (DGA) is a compendium of graph analytics written for Bulk-Synchronous-Parallel (BSP) processing frameworks such as Giraph and GraphX. The analytics included are High Betweenness Set Extraction, Weakly Connected Components, Page Rank, Leaf Compression, and Louvain Modularity.
Hsuntzu
⭐
134
HDFS compress tar zip snappy gzip uncompress untar codec hadoop spark
Hadoop Hdfs Fsimage Exporter
⭐
131
Exports Hadoop HDFS content statistics to Prometheus
Hdfs_fdw
⭐
131
PostgreSQL foreign data wrapper for HDFS
Cobrix
⭐
131
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Portainer
⭐
130
Apache Mesos framework for building Docker images on a cluster of machines
Hdfs Shell
⭐
129
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Plsc
⭐
129
Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, Swin, DeiT, CaiT, FaceViT, MoCo, MAE, ConvMAE, CAE.
Hadoopdemo
⭐
128
Hadoop简单应用案例,包括MapReduce、单词统计、HDFS基本操作、web日志分析、Zoo
Skein
⭐
126
A tool and library for easily deploying applications on Apache YARN
Elbencho
⭐
125
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
Elasticctr
⭐
125
ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案
Ssm
⭐
123
Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution
Gowfs
⭐
122
A Go client binding for Hadoop HDFS using WebHDFS.
Vagrant Hadoop Spark Cluster
⭐
121
Vagrant project to spin up a cluster of 4 32-bit CentOS6.5 Linux virtual machines with Hadoop v2.6.0 and Spark v1.1.1
Rubix
⭐
121
Cache File System optimized for columnar formats and object stores
Difacto_dmlc
⭐
118
Distributed FM and LR based on Parameter Server with Ftrl
Kylin Docker
⭐
116
This repository trackes the code and files for building docker image with Apache Kylin.
Mpich2 Yarn
⭐
112
Running MPICH2 on Yarn
Kubernetes Yarn
⭐
111
Dynamometer
⭐
110
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Hadron
⭐
110
Construct and run Hadoop MapReduce programs in Haskell
Nnanalytics
⭐
106
NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.
Kafka Connect Fs
⭐
106
Kafka Connect FileSystem Connector
Play Videos In Hdfs
⭐
102
This project realizes playing videos storing in HDFS(Hadoop) in the web page online.在线播放HDFS中视频文件
Megfile
⭐
99
Megvii FILE Library - Working with Files in Python same as the standard library
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Ros_hadoop
⭐
98
Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Tiledb R
⭐
96
R interface to TileDB: The Modern Database
Wifi
⭐
95
基于wifi抓取信息的大数据查询分析系统
My Tutorial
⭐
93
我想构建形成自己的知识的体系,工作职位是大数据,所以主要还是以大数据为主,从主流框架Hadoop,S 大数据开发是很繁琐的,正确的运行环境是成功的第一步,所以我尽量从搭建,部署,开发整个流程都做出来,单
Related Searches
Hadoop Hdfs (1,082)
Java Hdfs (752)
Spark Hdfs (573)
1-100 of 794 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.