Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Roapi | 2,969 | 4 months ago | 17 | March 20, 2022 | 37 | apache-2.0 | Rust | |||
Create full-fledged APIs for slowly moving datasets without writing a single line of code. | ||||||||||
Petastorm | 1,693 | 8 | 5 months ago | 86 | February 03, 2023 | 174 | apache-2.0 | Python | ||
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code. | ||||||||||
Tech.ml.dataset | 616 | 3 months ago | 251 | January 05, 2021 | 10 | epl-1.0 | Clojure | |||
A Clojure high performance data processing system | ||||||||||
Kartothek | 163 | a year ago | 38 | December 10, 2021 | 77 | mit | Python | |||
A consistent table management library in python | ||||||||||
Datasets.jl | 104 | 5 months ago | 22 | mit | Julia | |||||
Scientificsummarizationdatasets | 88 | 5 years ago | 2 | Jupyter Notebook | ||||||
Datasets I have created for scientific summarization, and a trained BertSum model | ||||||||||
Dendrite | 67 | 3 years ago | 27 | February 09, 2020 | 3 | other | Java | |||
Dendrite is a library for querying large datasets on a single host at near-interactive speeds. | ||||||||||
Spark Mail | 45 | 5 years ago | 3 | other | HTML | |||||
Tutorial on parsing Enron email to Avro and then explore the email set using Spark. | ||||||||||
Snowset | 41 | 3 years ago | 1 | Jupyter Notebook | ||||||
Snowflake dataset containing statistics for 70 million queries over 14 day period | ||||||||||
Rasterly | 38 | 4 years ago | 2 | June 08, 2020 | other | R | ||||
Rapidly generate raster images from large datasets in R with Plotly.js |