Awesome Open Source
Awesome Open Source

Making interpretations useful (CDEP)

Regularizes interpretations (computed via contextual decomposition) to improve neural networks. Official code for Interpretations are useful: penalizing explanations to align neural networks with prior knowledges (ICML 2020 pdf).

Note: this repo is actively maintained. For any questions please file an issue.



  • fully-contained data/models/code for reproducing and experimenting with CDEP
  • the src folder contains the core code for running and penalizing contextual decomposition
  • in addition, we run experiments on 4 datasets, each of which are located in their own folders
    • notebooks in these folders show demos for different kinds of text


ISIC skin-cancer classification - using CDEP, we can learn to avoid spurious patches present in the training set, improving test performance!

The segmentation maps of the patches can be downloaded here

ColorMNIST - penalizing the contributions of individual pixels allows us to teach a network to learn a digit's shape instead of its color, improving its test accuracy from 0.5% to 25.1%

Fixing text gender biases - CDEP can help to learn spurious biases in a dataset, such as gendered words

using CDEP on your own data

using CDEP requires two steps:

  1. run CD/ACD on your model. Specifically, 3 things must be altered:
  • the pred_ims function must be replaced by a function you write using your own trained model. This function gets predictions from a model given a batch of examples.
  • the model must be replaced with your model
  • the current CD implementation doesn't always work for all types of networks. If you are getting an error inside of, you may need to write a custom function that iterates through the layers of your network (for examples see
  1. add CD scores to the loss function (see notebooks)

related work

  • ACD (ICLR 2019 pdf, github) - extends CD to CNNs / arbitrary DNNs, and aggregates explanations into a hierarchy
  • PDR framework (PNAS 2019 pdf) - an overarching framewwork for guiding and framing interpretable machine learning
  • TRIM (ICLR 2020 workshop pdf, github) - using simple reparameterizations, allows for calculating disentangled importances to transformations of the input (e.g. assigning importances to different frequencies)
  • DAC (arXiv 2019 pdf, github) - finds disentangled interpretations for random forests


  • feel free to use/share this code openly
  • if you find this code useful for your research, please cite the following:
  title={Interpretations are useful: penalizing explanations to align neural networks with prior knowledge},
  author={Rieger, Laura and Singh, Chandan and Murdoch, William and Yu, Bin},
  booktitle={International Conference on Machine Learning},

Alternative Project Comparisons
Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (797,356
Jupyter Notebook (151,571
Network (37,696
Machine Learning (36,569
Ml (36,569
Deep Learning (36,026
Pytorch (20,778
Ai (18,772
Artificial Intelligence (18,772
Neural Network (15,419
Convolutional Neural Networks (12,614
Data Science (9,989
Recurrent Neural Networks (4,397
Fairness (387
Explainable Ai (279
Explainability (94
Fairness Ml (59
Interpretable Deep Learning (56
Feature Importance (52
Cdep (5