Awesome Open Source
Awesome Open Source

Papers Reading List.

  • This is a collection of papers aiming at reducing model sizes or the ASIC/FPGA accelerator for Machine Learning, especially deep neural network related applications. (Inspiled by Neural-Networks-on-Silicon)
  • Tutorials:
    • Hardware Accelerator: Efficient Processing of Deep Neural Networks. (link)
    • Model Compression: Model Compression and Acceleration for Deep Neural Networks. (link)

Table of Contents

Our Contributions

  • TODO

Network Compression

This field is changing rapidly, belowing entries may be somewhat antiquated.

Parameter Sharing

  • structured matrices
    • Structured Convolution Matrices for Energy-efficient Deep learning. (IBM Research–Almaden)
    • Structured Transforms for Small-Footprint Deep Learning. (Google Inc)
    • An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections.
    • Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank.
  • Hashing
    • Functional Hashing for Compressing Neural Networks. (Baidu Inc)
    • Compressing Neural Networks with the Hashing Trick. (Washington University + NVIDIA)
  • Learning compact recurrent neural networks. (University of Southern California + Google)

Teacher-Student Mechanism (Distilling)

  • Distilling the Knowledge in a Neural Network. (Google Inc)
  • Sequence-Level Knowledge Distillation. (Harvard University)
  • Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. (TuSimple)

Fixed-precision training and storage

  • Binary/Ternary Neural Networks
    • XNOR-Net, Ternary Weight Networks (TWNs), Binary-net and their variants.
  • Deep neural networks are robust to weight binarization and other non-linear distortions. (IBM Research–Almaden)
  • Recurrent Neural Networks With Limited Numerical Precision. (ETH Zurich + Montré[email protected] Bengio)
  • Neural Networks with Few Multiplications. (Montré[email protected] Bengio)
  • 1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs. (Tsinghua University + Microsoft)
  • Towards the Limit of Network Quantization. (Samsung US R&D Center)
  • Incremental Network Quantization_Towards Lossless CNNs with Low-precision Weights. (Intel Labs China)
  • Loss-aware Binarization of Deep Networks. (Hong Kong University of Science and Technology)
  • Trained Ternary Quantization. (Tsinghua University + Stanford University + NVIDIA)

Sparsity regularizers & Pruning

  • Learning both Weights and Connections for Efficient Neural Networks. (SongHan, Stanford University)
  • Deep Compression, EIE. (SongHan, Stanford University)
  • Dynamic Network Surgery for Efficient DNNs. (Intel)
  • Compression of Neural Machine Translation Models via Pruning. (Stanford University)
  • Accelerating Deep Convolutional Networks using low-precision and sparsity. (Intel)
  • Faster CNNs with Direct Sparse Convolutions and Guided Pruning. (Intel)
  • Exploring Sparsity in Recurrent Neural Networks. (Baidu Research)
  • Pruning Convolutional Neural Networks for Resource Efficient Inference. (NVIDIA)
  • Pruning Filters for Efficient ConvNets. (University of Maryland + NEC Labs America)
  • Soft Weight-Sharing for Neural Network Compression. (University of Amsterdam, reddit discussion)
  • Sparsely-Connected Neural Networks_Towards Efficient VLSI Implementation of Deep Neural Networks. (McGill University)
  • Training Compressed Fully-Connected Networks with a Density-Diversity Penalty. (University of Washington)
  • Bayesian Compression
    • Bayesian Sparsification of Recurrent Neural Networks
    • Bayesian Compression for Deep Learning
    • Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Tensor Decomposition

  • Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. (Samsung, etc)
  • Learning compact recurrent neural networks. (University of Southern California + Google)
  • Tensorizing Neural Networks. (Skolkovo Institute of Science and Technology, etc)
  • Ultimate tensorization_compressing convolutional and FC layers alike. (Moscow State University, etc)
  • Efficient and Accurate Approximations of Nonlinear Convolutional Networks. (@CVPR2015)
  • Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. (New York University, etc.)
  • Convolutional neural networks with low-rank regularization. (Princeton University, etc.)
  • Learning with Tensors: Why Now and How? (Tensor-Learn Workshop @ NIPS'16)

Conditional (Adaptive) Computing

  • Adaptive Computation Time for Recurrent Neural Networks. (Google [email protected] Graves)
  • Variable Computation in Recurrent Neural Networks. (New York University + Facebook AI Research)
  • Spatially Adaptive Computation Time for Residual Networks. (github link, Google, etc.)
  • Hierarchical Multiscale Recurrent Neural Networks. (Montréal)
  • Outrageously Large Neural Networks_The Sparsely-Gated Mixture-of-Experts Layer. (Google Brain, etc.)
  • Adaptive Neural Networks for Fast Test-Time Prediction. (Boston University, etc)
  • Dynamic Deep Neural Networks_Optimizing Accuracy-Efficiency Trade-offs by Selective Execution. (University of Michigan)
  • Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. (@Yoshua Bengio)
  • Multi-Scale Dense Convolutional Networks for Efficient Prediction. (Cornell University, etc)

Hardware Accelerator

Benchmark and Platform Analysis

  • Fathom: Reference Workloads for Modern Deep Learning Methods. (Harvard University)
  • DeepBench: Open-Source Tool for benchmarking DL operations. (
  • BENCHIP: Benchmarking Intelligence Processors.
  • DAWNBench: An End-to-End Deep Learning Benchmark and Competition. (Stanford)
  • MLPerf: A broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms.

Recurrent Neural Networks

  • FPGA-based Low-power Speech Recognition with Recurrent Neural Networks. (Seoul National University)
  • Accelerating Recurrent Neural Networks in Analytics Servers: Comparison of FPGA, CPU, GPU, and ASIC. (Intel)
  • ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA. (FPGA 2017, Best Paper Award)
  • DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for GeneralPurpose Deep Neural Networks. (KAIST, ISSCC 2017)
  • Hardware Architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition. (University of Kaiserslautern, etc)
  • Efficient Hardware Mapping of Long Short-Term Memory Neural Networks for Automatic Speech Recognition. (Master [email protected] N. Evangelopoulos)
  • Hardware Accelerators for Recurrent Neural Networks on FPGA. (Purdue University, ISCAS 2017)
  • Accelerating Recurrent Neural Networks: A Memory Efficient Approach. (Nanjing University)
  • A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications.
  • An Energy-Efficient Reconfigurable Architecture for RNNs Using Dynamically Adaptive Approximate Computing.
  • A Systolically Scalable Accelerator for Near-Sensor Recurrent Neural Network Inference.
  • A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications
  • E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks
  • C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs (FPGA 2018, Peking Univ, Syracuse Univ, CUNY)
  • DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator. (FPGA 2018, ETHZ, BenevolentAI)
  • Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs (MACRO 2018)
  • E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs (HPCA 2019)

Convolutional Neural Networks

Conference Papers

NIPS 2016

  • Dynamic Network Surgery for Efficient DNNs. (Intel Labs China)
  • Memory-Efficient Backpropagation Through Time. (Google DeepMind)
  • PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions. (Moscow State University, etc.)
  • Learning Structured Sparsity in Deep Neural Networks. (University of Pittsburgh)
  • LightRNN: Memory and Computation-Efficient Recurrent Neural Networks. (Nanjing University + Microsoft Research)


  • lognet: energy-efficient neural networks using logarithmic computation. (Stanford University)
  • extended low rank plus diagonal adaptation for deep and recurrent neural networks. (Microsoft)
  • fixed-point optimization of deep neural networks with adaptive step size retraining. (Seoul National University)
  • implementation of efficient, low power deep neural networks on next-generation intel client platforms (Demos). (Intel)
  • knowledge distillation for small-footprint highway networks. (TTI-Chicago, etc)
  • automatic node selection for deep neural networks using group lasso regularization. (Doshisha University, etc)
  • accelerating deep convolutional networks using low-precision and sparsity. (Intel Labs)

CVPR 2017

  • Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. (MIT)
  • Network Sketching: Exploiting Binary Structure in Deep CNNs. (Intel Labs China + Tsinghua University)
  • Spatially Adaptive Computation Time for Residual Networks. (Google, etc)
  • A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation. (University of Pittsburgh, etc)

ICML 2017

  • Deep Tensor Convolution on Multicores. (MIT)
  • Beyond Filters: Compact Feature Map for Portable Deep Model. (Peking University + University of Sydney)
  • Combined Group and Exclusive Sparsity for Deep Neural Networks. (UNIST)
  • Delta Networks for Optimized Recurrent Network Computation. (Institute of Neuroinformatics, etc)
  • MEC: Memory-efficient Convolution for Deep Neural Network. (IBM Research)
  • Deciding How to Decide: Dynamic Routing in Artificial Neural Networks. (California Institute of Technology)
  • Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning. (ETH Zurich, etc)
  • Analytical Guarantees on Numerical Precision of Deep Neural Networks. (University of Illinois at Urbana-Champaign)
  • Variational Dropout Sparsifies Deep Neural Networks. (Skoltech, etc)
  • Adaptive Neural Networks for Fast Test-Time Prediction. (Boston University, etc)
  • Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank. (The City University of New York, etc)

ICCV 2017

  • Channel Pruning for Accelerating Very Deep Neural Networks. (Xi’an Jiaotong University + Megvii Inc.)
  • ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. (Nanjing University, etc)
  • Learning Efficient Convolutional Networks through Network Slimming. (Intel Labs China, etc)
  • Performance Guaranteed Network Acceleration via High-Order Residual Quantization. (Shanghai Jiao Tong University + Peking University)
  • Coordinating Filters for Faster Deep Neural Networks. (University of Pittsburgh + Duke University, etc, github link)

NIPS 2017

  • Towards Accurate Binary Convolutional Neural Network. (DJI)
  • Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations. (ETH Zurich)
  • TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. (Duke University, etc, github link)
  • Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. (Intel)
  • Bayesian Compression for Deep Learning. (University of Amsterdam, etc)
  • Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon. (Nanyang Technological Univ)
  • Training Quantized Nets: A Deeper Understanding. (University of Maryland)
  • Structured Bayesian Pruning via Log-Normal Multiplicative Noise. (Yandex, etc)
  • Runtime Neural Pruning. (Tsinghua University)
  • The Reversible Residual Network: Backpropagation Without Storing Activations. (University of Toronto, gihub link)
  • Compression-aware Training of Deep Networks. (Toyota Research Institute + EPFL)

ICLR 2018

  • Oral
    • Training and Inference with Integers in Deep Neural Networks. (Tsinghua University)
  • Poster
    • Learning Sparse NNs Through L0 Regularization
    • Learning Intrinsic Sparse Structures within Long Short-Term Memory
    • Variantional Network Quantization
    • Alternating Multi-BIT Quantization for Recurrent Neural Networks
    • Mixed Precision Training
    • Multi-Scale Dense Networks for Resource Efficient Image Classification
    • efficient sparse-winograd CNNs
    • Compressing Wrod Embedding via Deep Compositional Code Learning
    • Mixed Precision Training of Convolutional Neural Networks using Integer Operations
    • Adaptive Quantization of Neural Networks
    • Espresso_Efficient Forward Propagation for Binary Deep Neural Networks
    • WRPN_Wide Reduced-Precision Networks
    • Deep Rewiring_Training very sparse deep networks
    • Loss-aware Weight Quantization of Deep Network
    • Learning to share_simultaneous parameter tying and sparsification in deep learning
    • Deep Gradient Compression_Reducing the Communication Bandwidth for Distributed Training
    • Large scale distributed neural network training through online distillation
    • Learning Discrete Weights Using the Local Reparameterization Trick
    • Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
    • Training wide residual networks for deployment using a single bit for each weight
    • The High-Dimensional Geometry of Binary Neural Networks
  • workshop
    • To Prune or Not to Prune_Exploring the Efficacy of Pruning for Model Compression

CVPR 2018

  • Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
  • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
  • Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
  • BlockDrop: Dynamic Inference Paths in Residual Networks
  • SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks
  • Two-Step Quantization for Low-Bit Neural Networks
  • Towards Effective Low-Bitwidth Convolutional Neural Networks
  • Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks
  • CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization
  • “Learning-Compression” Algorithms for Neural Net Pruning
  • Wide Compression: Tensor Ring Nets
  • NestedNet: Learning Nested Sparse Structures in Deep Neural Networks
  • Interleaved Structured Sparse Convolutional Neural Networks
  • NISP: Pruning Networks Using Neuron Importance Score Propagation
  • Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition
  • HydraNets: Specialized Dynamic Architectures for Efficient Inference
  • Learning Time/Memory-Efficient Deep Architectures With Budgeted Super Networks

ECCV 2018

  • ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
  • A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
  • Learning Compression from Limited Unlabeled Data
  • AMC: AutoML for Model Compression and Acceleration on Mobile Devices
  • Training Binary Weight Networks via Semi-Binary Decomposition
  • Clustering Convolutional Kernels to Compress Deep Neural Networks
  • Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
  • Data-Driven Sparse Structure Selection for Deep Neural Networks
  • Coreset-Based Neural Network Compression
  • Convolutional Networks with Adaptive Inference Graphs
  • Value-aware Quantization for Training and Inference of Neural Networks
  • LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
  • Deep Expander Networks: Efficient Deep Networks from Graph Theory
  • Extreme Network Compression via Filter Group Approximation
  • Constraint-Aware Deep Neural Network Compression

ICML 2018

  • Compressing Neural Networks using the Variational Information Bottleneck
  • DCFNet_Deep Neural Network with Decomposed Convolutional Filters
  • Deep k-Means Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
  • Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
  • High Performance Zero-Memory Overhead Direct Convolutions
  • Kronecker Recurrent Units
  • Learning Compact Neural Networks with Regularization
  • StrassenNets_Deep Learning with a Multiplication Budge
  • Weightless_Lossy weight encoding for deep neural network compression
  • WSNet_Compact and Efficient Networks Through Weight Sampling

NIPS 2018

  • workshops
  • 7761-scalable-methods-for-8-bit-training-of-neural-networks
  • 7382-frequency-domain-dynamic-pruning-for-convolutional-neural-networks
  • 7697-sparsified-sgd-with-memory
  • 7994-training-deep-neural-networks-with-8-bit-floating-point-numbers
  • 7358-kdgan-knowledge-distillation-with-generative-adversarial-networks
  • 7980-knowledge-distillation-by-on-the-fly-native-ensemble
  • 8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices
  • 7553-moonshine-distilling-with-cheap-convolutions
  • 7341-hitnet-hybrid-ternary-recurrent-neural-network
  • 8116-fastgrnn-a-fast-accurate-stable-and-tiny-kilobyte-sized-gated-recurrent-neural-network
  • 7327-training-dnns-with-hybrid-block-floating-point
  • 8117-reversible-recurrent-neural-networks
  • 485-norm-matters-efficient-and-accurate-normalization-schemes-in-deep-networks
  • 8218-synaptic-strength-for-convolutional-neural-network
  • 7666-tetris-tile-matching-the-tremendous-irregular-sparsity
  • 7644-learning-sparse-neural-networks-via-sensitivity-driven-regularization
  • 7466-pelee-a-real-time-object-detection-system-on-mobile-devices
  • 7433-learning-versatile-filters-for-efficient-convolutional-neural-networks
  • 7841-multi-task-zipping-via-layer-wise-neuron-sharing
  • 7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication
  • 7759-gradiveq-vector-quantization-for-bandwidth-efficient-gradient-aggregation-in-distributed-cnn-training
  • 8191-atomo-communication-efficient-learning-via-atomic-sparsification
  • 7405-gradient-sparsification-for-communication-efficient-distributed-optimization

ICLR 2019

  • Poster:
    • Rethinking the Value of Network Pruning
    • Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach
    • Dynamic Channel Pruning: Feature Boosting and Suppression
    • Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
    • Slimmable Neural Networks
    • RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks
    • Dynamic Sparse Graph for Efficient Deep Learning
    • Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
    • Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
    • Learning Recurrent Binary/Ternary Weights
    • Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network
    • Relaxed Quantization for Discretized Neural Networks
    • Integer Networks for Data Compression with Latent-Variable Models
    • Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
    • A Systematic Study of Binary Neural Networks' Optimisation
    • Analysis of Quantized Models
  • Oral:
    • The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

CVPR 2019

  • All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
  • Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
  • T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor
  • Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
  • others to be added

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
deep-learning (4,050
neural-network (755
compression (203
fpga (162
asic (23