Breaking Medical Data Sharing Boundaries by Employing Artificial Radiographs

Provides necessary python code used in our paper: Breaking Medical Data Sharing Boundaries by Employing Artificial Radiographs.

Update

MS-SSIM histogram results is uploaded to SSIM.Results folder.
Add code for federated learning GAN at Federated_GAN folder.

Abstract

Artificial intelligence (AI) has the potential to change medicine fundamentally. Here, expert knowledge provided by AI can enhance diagnosis by comprehensive image features. Unfortunately, existing algorithms often stay behind expectations, as databases used for training are usually too small, incomplete, and heterogeneous in quality. Additionally, data protection constitutes a serious obstacle to data sharing. We propose to use generative models (GM) to produce high-resolution artificial radiographs, which are free of personal identifying information. Blinded analyses by computer vision and radiology experts proved the high similarity of artificial and real radiographs. The combination of pooled GM improves the performance of computer vision algorithms trained on smaller datasets and the integration of artificial data into patient data repositories can compensate for underrepresented disease entities. By integrating federated learning strategies, even small hospitals can participate in the training of GMs. We envision that our approach could lead to scalable databases of anonymous medical images enabling standardized radiomic analyses at multiple sites.

Prerequisites

TensorFlow 1.9.0
Pytorch 1.1.0

Datasets used in the study

Preprocessing CheXpert dataset

Image preprocessing

python -m preprocess.CheXpert_process

CheXpert label generation

preprocess.Stanford_CSV.ipynb

Artificial radiographs generation

Artificial radiographs used in this paper is generated by progressive growing GAN. The GAN part of this repository is modified from the official implementation of Nvidia. Artificial radiographs can prepared by run the following:

prepare *.tfrecords files:

python dataset_tool.py create_from_images_labels [output Dir] [input image Dir] [label Dir] [label file] 
such as: 
python dataset_tool.py create_from_images_labels /media/tianyu.han/mri-scratch/DeepLearning/dataset/CheXpert_256 /media/tianyu.han/mri-scratch/DeepLearning/CheXpert_Dataset/images/ /media/tianyu.han/mri-scratch/DeepLearning/CheXpert_Dataset/labels/ label_CheXpert.npy

Edit config.py to specify the dataset and training configuration by uncommenting/editing specific lines.
Run the training script with python train.py
After training is converged, one can use ProGAN.reproduce_trainset.ipynb to produce artificial radiographs
The trained NIH and CheXpert GAN models can be downloaded from this link.
For evaluating the training of GANs, one can compute Frechet Inception Distance and MS-SSIM by using ProGAN.metrics.frechet_inception_distance.py and SSIM.ms_ssim.py

Classifying Chest Radiograph

Inspired by the previous work of CheXnet, we trained a DenseNet-121 (ImageNet weights) classifier to evaluate the performance of generated radiographs. The code of classifier is contained in folder Thorax Classifier.s

modify the path in retrain.py and simply run python retrain.py
All used classifier models can be downloaded via this link.
To obtain the confidence interval, one can use bootstrapping method by run python -m Analysis.bootstrapping.

Exploring the pathological correlation of generated radiographs

For each pathology, 5,000 random artificial radiographs with a pathology label drawn from a uniform distribution between 0.0 and 1.0 were generated. The images were then rated by the classifier network and Pearson’s correlation coefficient was calculated for each pairing of pathologies. The experimental code is uploaded to folder Correlation to reproduce our findings.

Our correlation results can be found at here.

Domain adaptation

To make this approach applicable, domain adaptation techniques such as Cycle-GAN based image translation is also consider in this study. We successfully trained a Cycle-GAN to achieve NIH and CheXpert style transformation.
The pretrained NIH2CheXpert Cycle-GAN model can be downloaded here.

Citation

If you use this code for your research, please cite our papers.

@article{han2019breaking,
  title={Breaking Medical Data Sharing Boundaries by Employing Artificial Radiographs},
  author={Han, Tianyu and Nebelung, Sven and Haarburger, Christoph and Horst, Nicolas and Reinartz, Sebastian and Merhof, Dorit and Kiessling, Fabian and Schulz, Volkmar and Truhn, Daniel},
  journal={BioRxiv},
  pages={841619},
  year={2019},
  publisher={Cold Spring Harbor Laboratory}
}

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Official progressive growing GAN code in TensorFlow: https://github.com/tkarras/progressive_growing_of_gans/blob/master/README.md
Official Cycle-GAN code in Pytorch: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
The Stanford Machine Learning Group: https://stanfordmlgroup.github.io/

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Analysis		Analysis
Correlation		Correlation
Federated_GAN		Federated_GAN
Preprocess		Preprocess
ProGAN		ProGAN
SSIM		SSIM
Thorax Classifier		Thorax Classifier
GAN_example.png		GAN_example.png
LICENSE		LICENSE
README.md		README.md
method.png		method.png
results.png		results.png

License

peterhan91/Thorax_GAN

Folders and files

Latest commit

History

Repository files navigation

Breaking Medical Data Sharing Boundaries by Employing Artificial Radiographs

Update

Abstract

Prerequisites

Datasets used in the study

Preprocessing CheXpert dataset

Artificial radiographs generation

Classifying Chest Radiograph

Exploring the pathological correlation of generated radiographs

Domain adaptation

Citation

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages