Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Seaweedfs	20,994		2	a day ago	296	April 24, 2021	312	apache-2.0	Go
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Ceph	12,859	5	1	3 months ago	1	August 26, 2014	705	other	C++
Ceph is a distributed object, block, and file storage platform
Juicefs	9,252		1	3 months ago	136	November 28, 2023	120	apache-2.0	Go
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Smart_open	3,065			a month ago			94	mit	Python
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Tiledb	1,700		6	3 months ago	87	November 05, 2022	133	mit	C++
The Universal Storage Engine
Kafka Connect Ui	494			4 months ago			24	other	JavaScript
Web tool for Kafka Connect \|
Bigdata File Viewer	269			6 months ago			2	gpl-2.0	Java
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Storagetapper	269			2 years ago	4	November 19, 2021	21	mit	Go
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Pysparkling	253	7	1	a year ago	69	November 13, 2022	9	other	Python
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Rumble	194			a year ago	4	December 03, 2019	134	other	Java
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark \| Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) \| No install required (just a jar to download) \| Declarative Machine Learning and more

Alternatives To Smart_open

Select To Compare

Seaweedfs ⭐ 20,994

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

dependent packages 2total releases 296most recent commit a day ago

Ceph ⭐ 12,859

Ceph is a distributed object, block, and file storage platform

dependent packages 1total releases 1most recent commit 3 months ago

Juicefs ⭐ 9,252

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

dependent packages 1total releases 136most recent commit 3 months ago

Smart_open ⭐ 3,065

Utils for streaming large files (S3, HDFS, gzip, bz2...)

most recent commit a month ago

Tiledb ⭐ 1,700

The Universal Storage Engine

dependent packages 6total releases 87most recent commit 3 months ago

Kafka Connect Ui ⭐ 494

Web tool for Kafka Connect |

most recent commit 4 months ago

Bigdata File Viewer ⭐ 269

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

most recent commit 6 months ago

Storagetapper ⭐ 269

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

total releases 4most recent commit 2 years ago

Pysparkling ⭐ 253

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

dependent packages 1total releases 69most recent commit a year ago

Rumble ⭐ 194

⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

total releases 4most recent commit a year ago

Suggest An Alternative To smart_open

Alternative Project Comparisons

Smart_open vs Seaweedfs

Smart_open vs Ceph

Smart_open vs Juicefs

Smart_open vs Tiledb

Smart_open vs Kafka Connect Ui

Smart_open vs Bigdata File Viewer

Smart_open vs Storagetapper

Smart_open vs Pysparkling

Smart_open vs Rumble

Popular Hdfs Projects

Cat ⭐ 18,237

CAT 作为服务端项目基础组件，提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端，已经在美团点评的基础架构中间件框架（MVC框架，RPC框架，数据库框架，缓存框架等，

total releases 5latest release February 25, 2019most recent commit 4 months ago

Bigdata Notes ⭐ 14,872

大数据入门指南 :star:

most recent commit 3 months ago

Mycat Server ⭐ 9,431

most recent commit 6 months ago

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

most recent commit 8 months ago

Tensorflowonspark ⭐ 3,851

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

total releases 32latest release April 21, 2022most recent commit 9 months ago

Popular S3 Projects

Minio ⭐ 42,715

The Object Store for AI Data Infrastructure

dependent packages 129total releases 377latest release December 01, 2023most recent commit 3 months ago

Rclone ⭐ 42,258

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files

dependent packages 43total releases 268latest release November 26, 2023most recent commit 3 months ago

Transfer.sh ⭐ 14,640

Easy and fast file sharing from the command-line.

dependent packages 2total releases 27latest release December 04, 2023most recent commit 4 months ago

Siyuan ⭐ 14,236

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

total releases 1latest release July 07, 2022most recent commit 3 months ago

Airbyte ⭐ 12,918

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

dependent packages 11total releases 311latest release December 08, 2023most recent commit 3 months ago

Popular Data Storage Categories