Vit Explain

Explainability for Vision Transformers
Alternatives To Vit Explain
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Transformers103,3776491117 hours ago91June 21, 2022747apache-2.0Python
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stable Diffusion Webui82,173
19 hours ago1,957agpl-3.0Python
Stable Diffusion web UI
Pytorch67,67014617 hours ago23August 10, 202212,257otherPython
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Real Time Voice Cloning41,693
a month ago129otherPython
Clone a voice in 5 seconds to generate arbitrary speech in real-time
2 days ago35May 21, 2022258agpl-3.0Python
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Made With Ml33,193
a month ago5May 15, 201911mitJupyter Notebook
Learn how to responsibly develop, deploy and maintain production machine learning applications.
Gfpgan29,43114 days ago11February 15, 2022226otherPython
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Mockingbird29,18428 days ago9February 28, 2022425otherPython
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Pytorch Tutorial26,129
2 months ago85mitPython
PyTorch Tutorial for Deep Learning Researchers
Ray25,9588019917 hours ago76June 09, 20222,928apache-2.0Python
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
Alternatives To Vit Explain
Select To Compare

Alternative Project Comparisons

Explainability for Vision Transformers (in PyTorch)

This repository implements methods for explainability in Vision Transformers.

See also

Currently implemented:

  • Attention Rollout.

  • Gradient Attention Rollout for class specific explainability. This is our attempt to further build upon and improve Attention Rollout.

  • TBD Attention flow is work in progress.

Includes some tweaks and tricks to get it working:

  • Different Attention Head fusion methods,
  • Removing the lowest attentions.


  • From code
from vit_grad_rollout import VITAttentionGradRollout

model = torch.hub.load('facebookresearch/deit:main', 
'deit_tiny_patch16_224', pretrained=True)
grad_rollout = VITAttentionGradRollout(model, discard_ratio=0.9, head_fusion='max')
mask = grad_rollout(input_tensor, category_index=243)

  • From the command line:
python --image_path <image path> --head_fusion <mean, min or max> --discard_ratio <number between 0 and 1> --category_index <category_index>

If category_index isn't specified, Attention Rollout will be used, otherwise Gradient Attention Rollout will be used.

Notice that by default, this uses the 'Tiny' model from Training data-efficient image transformers & distillation through attention hosted on torch hub.

Where did the Transformer pay attention to in this image?

Image Vanilla Attention Rollout With discard_ratio+max fusion

Gradient Attention Rollout for class specific explainability

The Attention that flows in the transformer passes along information belonging to different classes. Gradient roll out lets us see what locations the network paid attention too, but it tells us nothing about if it ended up using those locations for the final classification.

We can multiply the attention with the gradient of the target class output, and take the average among the attention heads (while masking out negative attentions) to keep only attention that contributes to the target category (or categories).

Where does the Transformer see a Dog (category 243), and a Cat (category 282)?

Where does the Transformer see a Musket dog (category 161) and a Parrot (category 87):

Tricks and Tweaks to get this working

Filtering the lowest attentions in every layer

--discard_ratio <value between 0 and 1>

Removes noise by keeping the strongest attentions.

Results for dIfferent values:

Different Attention Head Fusions

The Attention Rollout method suggests taking the average attention accross the attention heads,

but emperically it looks like taking the Minimum value, Or the Maximum value combined with --discard_ratio, works better.

--head_fusion <mean, min or max>

Image Mean Fusion Min Fusion



pip install timm

Popular Deep Learning Projects
Popular Pytorch Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Deep Learning