Awesome Open Source
Awesome Open Source


Combine CV with NLP tasks,focus on Medical Report Generation、Image/Video Captioning、VQA、Anchor-free Object Detection、Weakly Supervised Segmentation.

Papers and Codes/Notes

Image Video Captioning


    • Show and Tell: A Neural Image Caption Generator, Oriol Vinyals et al, CVPR 2015, Google(pdf)
    • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Kelvin Xu et at, ICML 2015(pdf)(code)
    • Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, PAMI 2016(pdf)(code)
    • Areas of Attention for Image Captioning, ICCV 2017(pdf)
    • Rethinking the Form of Latent States in Image Captioning, ECCV 2018, CUHK(pdf)
    • Recurrent Fusion Network for Image Captioning, ECCV 2018, Tencent AI Lab, 复旦(pdf)
    • Move Forward and Tell- A Progressive Generator of Video Descriptions, ECCV 2018, CUHK(pdf)
    • Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks, CVPR 2016(pdf)

  • Reinforcement Learning

    • Improving Reinforcement Learning Based Image Captioning with Natural Language Prior, 2018, Tencent/IBM(pdf)
    • End-to-End Video Captioning with Multitask Reinforcement Learning(pdf)
  • Others

    • A Neural Compositional Paradigm for Image Captioning, NIPS 2018, CUHK(pdf)

Paragraph Description Generation

    • DenseCap: Fully Convolutional Localization Networks for Dense Captioning, Justin Johnson et al, CVPR 2016, Standford(homepage)(code)
    • A Hierarchical Approach for Generating Descriptive Image Paragraphs, Jonathan Krause et al, CVPR 2017, Stanford(homepage)(dense-caption code)
    • Recurrent Topic-Transition GAN for Visual Paragraph Generation, ICCV 2017
    • Diverse and Coherent Paragraph Generation from Images, ECCV 2018(code)

Visual Question Answering

    • Multi-level Attention Networks for Visual Question Answering, CVPR 2017
    • Motion-Appearance Co-Memory Networks for Video Question Answering, 2018
    • Deep Attention Neural Tensor Network for Visual Question Answering, ECCV 2018, HIT
    • Question-Guided Hybrid Convolution for Visual Question Answering, Peng Gao et al, ECCV 2018, CUHK(pdf)

Medical Report Generation


    • Learning to Read Chest X-Rays- Recurrent Neural Cascade Model for Automated Image Annotation, CVPR 2016(pdf)
    • TieNet Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays, Xiaosong Wang et at, CVPR 2018, NIH(pdf)(author's homepage)
    • On the Automatic Generation of Medical Imaging Reports, Baoyu Jing et al., ACL 2018, CMU(pdf)(author's homepage)
    • Multimodal Recurrent Model with Attention for Automated Radiology Report Generation, Yuan Xue et al., MICCAI 2018, PSU(pdf)
    • Attention-Based Abnormal-Aware Fusion Network for Radiology Report Generation, Xiancheng Xie et al., 2019, Fudan University
    • Addressing Data Bias Problems for Chest X-ray Image Report Generation, Philipp Harzig et al., 2019, University of Augsburg(pdf)
    • Addressing Data Bias Problems for Chest X-ray Image Report Generation, Philipp Harzig et al., 2019(pdf)
  • Reinforcement Learning

    • Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation, Christy Y. Li et al, NIPS 2018, CMU(pdf)(author's homepage)
  • Knowledge Graph

    • Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation, Christy Y. Li et al, AAAI 2019, DU(pdf)
  • Other

    • TextRay Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays, 2018 MICCAI(pdf)
  • Blogs

Medical Image Processing

Common Datasets

Medical Tasks

  • Detection

    • CheXNet- Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning, 2018 吴恩达
    • Attention-Guided Curriculum Learning for Weakly Supervised Classification and Localization of Thoracic Diseases on Chest Radiographs, Yuxing Tang et at, MICCAI-MLMI oral 2018, NIH(pdf)
    • DeepRadiologyNet - Radiologist Level Pathology Detection in CT Head Images
    • 肺部CT图像病变区域检测方法
    • 基于定量影像组学的肺肿瘤良恶性预测方法
  • Enhance

    • Super Resolution
      • Image Super-Resolution Using Deep Convolutional Networks
      • Deeply-Recursive Convolutional Network for Image Super-Resolution
  • Segmentation

    • U-Net: Convolutional Networks for Biomedical Image Segmentation, 2015 MICCAI
    • A 3D Coarse-to-Fine Framework for Automatic Pancreas Segmentation


  • Weakly-supervised

    • Learning Deep Features for Discriminative Localization, Bolei Zhou et al, CVPR 2016, MIT(pdf)(code)(note)
  • Anchor-based

    • SSD: Single Shot MultiBox Detector, Wei Liu et al, ECCV 2016, UNC Chapel Hill(pdf)(code)(blog)
    • YOLO9000- Better, Faster, Stronger, Joseph Redmon et al, CVPR 2017(pdf)(project)(code)
    • FPN, Feature Pyramid Networks for Object Detection, Tsung-Yi Lin et al., CVPR 2017, FAIR(pdf)(blog)
  • Anchor-free

    • YOLO, You Only Look Once- Unified, Real-Time Object Detection, Joseph Redmon et al, CVPR 2016(pdf)(note)
    • CornerNet, CornerNet: Detecting Objects as Paired Keypoints, Hei Law et al, ECCV 2018, Michigan University(pdf)(code)(blog)
    • FCOS, FCOS: Fully Convolutional One-Stage Object Detection, Zhi Tian et al, ICCV 2019, Adelaide University(pdf)(code)(blog)
    • CenterNet, Objects as Points, Xingyi Zhou et al, 2019, UT Austin(pdf)(code)
  • Others

    • Bag of Freebies for Training Object Detection Neural Networks, Zhi Zhang et al, 2019, Amazon 李沐(pdf)
    • Deformable Convolutional Networks, Jifeng Dai et al, ICCV 2017, Microsoft Research Asia(pdf)(code)


  • Semantic Segmentation
    • PSPNet, Pyramid Scene Parsing Network, Hengshuang Zhao et al., CVPR 2017, CUHK(pdf)(code)
  • Instance Segmentation
    • Mask R-CNN, Kaiming He et al, ICCV 2017(Best Paper), Facebook AI Research (FAIR)(pdf)(code)

Weakly Supervised Segmentation

  • Bounding Box Supervision
    • Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation, Liang-Chieh Chen et al., ICCV 2015, UCLA(pdf)(deeplab-v1-code)(model)(note)
    • BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, Jifeng Dai et al., ICCV 2015, Microsoft Research(pdf)
    • Simple Does It: Weakly Supervised Instance and Semantic Segmentation, Anna Khoreva et al., CVPR 2017, Max Planck Institute for Informatics(pdf)(code)(tf-code)
    • Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation, Chunfeng Song et al, CVPR 2019, CASIA(pdf)
  • Image Label Supervision
    • From Image-level to Pixel-level Labeling with Convolutional Networks, Pedro O. Pinheiro, CVPR 2015, Idiap Research Institute, Martigny(pdf)(note)
    • DSRG, Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing, Zilong Huang et al., CVPR 2018, HUST(pdf)(code)
    • SSENet, Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation, Yude Wang et al., 2019, CAS(pdf)(code)
  • Others
    • DenseCRF, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, Philipp Krahenbuhl et al., NIPS 2011, Stanford University(pdf)(homepage)(code)
    • A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, Lyndon Chan et al., 2019(pdf)
  • Good References


  • BLEU
    • BLEU: a method for automatic evaluation of machine translation, Kishore Papineni et al, ACL 2002(pdf)
  • CIDEr
    • CIDEr: Consensus-based Image Description Evaluation, CVPR 2015(pdf)(note)


  • Visual Commonsense Reasoning(VCR-视觉常识推理)

    • From Recognition to Cognition- Visual Commonsense Reasoning, Rowan Zeller et al, 2018, Paul G. Allen School(homepage)(pdf)
  • Language Model(语言模型)

    • Transformer:Attention Is All You Need, Ashish Vaswani et al, NIPS 2017, Google Brain/Research(pdf)(code)(blog)
    • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin et al, 2018, Googel AI Language(pdf)(code)(slides)
    • ELMo:Deep contextualized word representations, Matthew E. Peters et al, NAACL 2018, Paul G. Allen School(homepage)(pdf)(code-tf)
  • Teacher Forcing Policy

    • A learning algorithm for continually running fully recurrent neural networks, Ronald et al, Neural Computation 1989(pdf)(node)
    • Professor Forcing: A New Algorithm for Training Recurrent Networks, Alex Lamb et al, NIPS 2016(pdf)
  • classification

    • VGG, Very Deep Convolutional NetWorks for Large-Scale Image Recognition, Karen Simonyan et at., ICLR 2015(pdf)
    • Inception, Going Deeper with Convolutions, Christian Szegedy et al, CVPR 2015, Google(pdf)
    • ResNet, Deep Residual Learning for Image Recognition, Kaiming He et al, CVPR 2016, Microsoft Research(pdf)(code)(blog)
    • SENet:Squeeze-and-Excitation Networks, Jie Hu et al, CVPR 2018, Momenta(中国无人驾驶公司) and Oxford University(pdf)(code)(blog)

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
computer-vision (1,223
natural-language-processing (666
cvpr (54
iccv (24
vqa (23
eccv (20

Find Open Source By Browsing 7,000 Topics Across 59 Categories