Fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Alternatives To Fastdup
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Pytorch Cyclegan And Pix2pix19,434
6 days ago476otherPython
Image-to-Image Translation in PyTorch
Tensor2tensor12,9968211a month ago79June 17, 2020588apache-2.0Python
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Stylegan12,815
7 months ago11otherPython
StyleGAN - Official TensorFlow Implementation
Stylegan28,836
a year ago27otherPython
StyleGAN2 - Official TensorFlow Implementation
Pix2pix8,452
2 years ago76otherLua
Image-to-image translation with conditional adversarial nets
Spade6,990
a year ago11August 25, 202295otherPython
Semantic Image Synthesis with SPADE
Open_nsfw5,230
4 years ago41bsd-2-clausePython
Not Suitable for Work (NSFW) classification using deep neural network Caffe models.
Stargan V22,896
9 months ago84otherPython
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Ffhq Dataset2,842
4 months ago5otherPython
Flickr-Faces-HQ Dataset (FFHQ)
Covid Chestxray Dataset2,841
7 months ago41Jupyter Notebook
We are building an open database of COVID-19 cases with chest X-ray or CT images.
Alternatives To Fastdup
Select To Compare


Alternative Project Comparisons
Readme

PyPi PyPi PyPi Contributors License


Fastdup logo.

Manage, Clean & Curate Visual Data - Fast and at Scale

An unsupervised and free tool for image and video dataset analysis.
Explore the docs
Features

<a href="https://github.com/visual-layer/fastdup/issues" target="_blank" rel="noopener noreferrer">Report Bug</a>

<a href="https://medium.com/@amiralush/large-image-datasets-today-are-a-mess-e3ea4c9e8d22" target="_blank" rel="noopener noreferrer">Read Blog</a>

<a href="https://visual-layer.readme.io/docs/getting-started" target="_blank" rel="noopener noreferrer">Quickstart</a>

<a href="https://visual-layer.com/" target="_blank" rel="noopener noreferrer">Enterprise Edition</a>

<a href="https://visual-layer.com/" target="_blank" rel="noopener noreferrer">About us</a>
<br />
<br /> 
<a href="https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/JOIN US ON SLACK-4A154B?style=for-the-badge&logo=slack&logoColor=white" alt="Logo">
</a>
<a href="https://visual-layer.readme.io/discuss" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/Discussion-%20Forum-brightgreen?style=for-the-badge&logo=discourse&logoColor=white" alt="Logo">
</a>
<a href="https://www.linkedin.com/company/visual-layer/" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Logo">
</a>
<a href="https://www.youtube.com/@visual-layer4035" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/-YouTube-black.svg?style=for-the-badge&logo=youtube&colorB=red" alt="Logo">
</a>

We've released fastdup V1.0! View the release notes here.

What's Included

fastdup lets you identify -

Additional features -

Why fastdup?

  • Quality: Find and remove anomalies and outliers from your dataset, including duplicates and similar images and videos at a large scale.
  • Cost: Reduce data operation costs by intelligently sampling high-quality or novel datasets before labeling and assessing labeled data quality.
  • Scale: fastdup's C++ graph engine is highly efficient and can handle up to 400M images on a single CPU machine.

Setting up

Prerequisites

Supported Python versions:

PyPi

Supported operating systems:

Windows 10 Windows 11 Windows Server 2019 Windows WSL Ubuntu 20.04 LTS Ubuntu 18.04 LTS macOS 10+ (Intel) macOS 10+ (M1) Amazon Linux 2 CentOS 7 RedHat 4.8

Installation

Option 1 - Install fastdup via PyPI:

# upgrade pip to its latest version
pip install -U pip

# install fastdup
pip install fastdup
    
# Alternatively, use explicit python version (XX)
python3.XX -m pip install fastdup 

Option 2 - Install fastdup via an Ubuntu 20.04 Docker image on DockerHub:

docker pull karpadoni/fastdup-ubuntu-20.04

Detailed installation instructions and common errors here.

Getting Started

Run fastdup with only 3 lines of code.

run

Visualize the result.

results

Here are the 8 lines of code you'll need in most cases.

import fastdup

fd = fastdup.create(work_dir, images_dir)
fd.run(nearest_neighbors_k=5, cc_threshold=0.96)

fd.vis.duplicates_gallery()    # create a visual gallery of found duplicates
fd.vis.outliers_gallery()      # create a visual gallery of anomalies
fd.vis.component_gallery()     # create a visualization of connected components
fd.vis.stats_gallery()         # create a visualization of images statistics (for example blur)
fd.vis.similarity_gallery()    # create a gallery of similar images

View the API docs here.

Learn from Examples

Quick dataset analysis: In this tutorial, the Oxford-IIIT Pet Dataset is used to demonstrate how to visualize similarity clusters, find duplicates and outliers in the dataset, and analyze the images in each cluster.
Cleaning and preparing a dataset: In this tutorial, we use fastdup to clean and analyze a food-101 dataset. The cleaning process includes identifying and removing duplicates, broken images, outliers, as well as the darkest, brightest, and blurriest images. FastDup also analyzes the dataset, finding similarity clusters and the percentage of images that fall within these clusters.
Preparing an image dataset for training: In this tutorial, we analyze the Imagenette dataset, a 10-class, 13k image subset of ImageNet. In this tutorial, we show how to use fastdup to analyze the dataset for similarity and outlier images.
Preparing an object dataset for training: In this tutorial we will load and analyze the mini-coco dataset which is labeled with bounding boxes and classes. Using fastdup, we discover duplicates, outliers, and possible mislabeled bounding boxes.

Getting Help

Get help from the fastdup team or community members via the following channels -

Community Contributions

The following are community-contributed blog posts about fastdup -

What our users think about fastdup

feedback

License

fastdup is licensed under Creative Commons 4.0 license. See LICENSE.

For any queries, reach us at [email protected]

Disclaimer

Usage Tracking

We have added an experimental crash report collection, using sentry.io. It does not collect user data other than anonymized IP address data, and it only logs fastdup library's own actions. We do NOT collect folder names, user names, image names, image content only aggregate performance statistics like total number of images, average runtime per image, total free memory, total free disk space, number of cores, etc. Collecting fastdup crashes will help us improve stability.

The code for the data collection is found here. On MAC we use Google crashpad.

It is always possible to opt out of the experimental crash report collection via either of the following two options:

  • Define an environment variable called SENTRY_OPT_OUT
  • or run() with turi_param='run_sentry=0'

About Visual-Layer

fastdup is founded by the authors of XGBoost, Apache TVM & Turi Create - Danny Bickson, Carlos Guestrin and Amir Alush.

Learn more about Visual Layer here.

Popular Image Projects
Popular Dataset Projects
Popular Media Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Machine Learning
Deep Learning
Dataset
Image Processing
Kaggle
Object Detection
Similarity
Image Classification
Data Augmentation
Visual Search