Awesome Open Source
Awesome Open Source

List of Medical (Imaging) Datasets

I maintain this list mostly as a personal braindump of interesting medical datasets, with a focus on medical imaging.
Rather than try to group / cluster datasets, I'm going to try to maintain a set of keywords for each.
See commit log for a list of additions over time.

Please feel free to contribute!

Disclaimer: please remember to solve real clinical problems ☺

Main Medical Imaging List

CheXpert

224,316 chest radiographs of 65,240 patients, with labels from reports
Keywords: very-large, X-ray, labels

ChestXray-NIHCC

100000 radiographs
Keywords: very-large, X-ray, labels

MIMIC-CXR

371,920 chest x-rays associated with 227,943 imaging studies
3/16/2019: Not yet linked with MIMIC ICU data. See news article
v2: free-text radiology reports
Need to request access
Keywords: very-large, X-ray, labels

PadChest

160,000 images from 67,000 patients that were interpreted and reported by radiologists
labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy mapped to standard Unified Medical Language System (UMLS)
Keywords: very-large, X-ray, labels

IBM Xray Eye Gaze

1000+ dataset of eye gaze, radiological reports, dictation, segmentation on MICMIC-CXR Database
code to reproduce experiments
Keywords: medium, X-ray, labels

Cancer Image Archive

Several collections
Tons of Images of various kinds, including CT, MR, Pathology, PT, with diagnoses
Keywords: vary-large, CT, MR, labels

National Lung Screening Trial

Part of Cancer Imaging Archive
50000+ patients with CT data, some pathology, limited availability
Keywords: very-large, CT, labels

DeepLesion

32000+ CT scans with annotations, meta-data, semantic labels from radiological reports
Keywords: very-large, CT, labels

EchoNet-Dynamic

10,000+ labeled echocardiogram videos and human expert tracing
Keywords: very-large, ultrasound, labels

ABCD Neurocognitive Prediction Challenge

MRI for 8500 young (9-10yo) subjects (about 4100 for training)
Keywords: large, MRI

AAPM Sparse-View CT Reconstruction Challenge

4,000 simulated sinogram/image pairs of 2D breast CTs Keywords: large, CT, reconstruction

Cross-Sectional Multidomain Lexical Processing

two large scale neuroimaging datasets on reading and language development
Over 3000 MRI, fMRI
article | more resources
Keywords: large, MRI

Neurite-OASIS

414 T1 MRIs from the OASIS dataset, processed using FreeSurfer and SAMSEG
Includes original images, along with processed volumes and resulting anatomical segmentation maps
Keywords: large, MRI, segmentations, labels, annotations, processed

MRNet

1,370 knee MRI exams with diagonsis (healthy/ACL tear/meniscal tear)
Keywords: large, MRI, labels

fastMRI

k-space data
1500 fully sample knee MRIs and 10K clinical MRIs, and 6.5K brain MRIs.
Part of a challenge
Keywords: large, MRI, k-space

OCMR

Open-Access Multi-Coil k-Space Dataset for Cardiovascular Magnetic Resonance Imaging
k-space data, roughly 250 volumes
Keywords: medium, MRI, k-space

PREVENT-AD

1704 MRI, 556 amyloid and tau CSF samples, blood markers, genetic info and longitudinal cognitive data on ~400 at risk individuals
Keywords: medium, MRI, genetics, labels

Medical Segmentation Decathlon

10 Medical image datasets with segmentations
2000+ CT & MR images of various organs from different sources
Keywords: medium, MRI, segmentations

MASSIVE

Multiple Acquisitions for Standardization of Structural Imaging Validation and Evaluation
8000 diffusion-weighted volumes
10 3D FLAIR, T1-, and T2-weighted datasets of a single healthy subject
Keywords: large, MRI

AOMIC: the Amsterdam Open MRI Collection

1000+ fMRI and other modalities subjects with annotated event files; raw and preprocessed
Keywords: medium, fMRI

MRIdata

List of mri k-space datasets

Cancer Imaging Archive: LDCT

601 series of CT projection data, reconstructed images, and clinical data reports Keywords: medium, CT, reconstruction

Brain MRI LGG FLAIR abnormality segmentation

Brain MRI images together with manual FLAIR abnormality segmentation masks
110 subjects from TCIA LGG collection with lower-grade glioma cases
Keywords: medium, brain, MRI, segmentation, LGG, FLAIR

Studyforrest

Few subjects, but many modalities (T1,T2,SWI,Angio,DWI, fMRI during Forrest Gump at 3T (audio+visual+eyetracking+physio) and 7T (audio+physio only), some audio tasks, and other important visual tasks)
Keywords: small, multi-modal

Lung Image Database Consortium

LIDC-IDRI consists of diagonstic and lung cancer screening CTs.
1018 cases with some Radiologist Annotations/Segmentations and nodule counts
Also available through LUng Nodule Analysis (LUNA) challenge
Keywords: large, CT, labels

UK Biobank

All imaging
Fundus imaging
Keywords: very-large

BrixIA: COVID19 severity score assessment databse

4703 CXR of COVID19 patients, manually annotated Brixia score
Keywords: large, x-ray, covid

COVID-CT

349 CT images collected from several COVID19-related papers
Image captions
Keywords: medium, CT, covid

Penumonia X-Ray

~5000 xrays
Keywords: medium, x-ray, pneumonia

Medical Imaging Data Resource Center (MIDRC)

998 Chest x-ray examinations from 361 COVID+ patients. Annotations with appearance classification and Airspace Disease Grading Clinical variables Keywords: large, x-ray, covid

BIMCV-COVID19

1350+ Xrays, 150+ CTs, 800 diagnoses
Keywords: medium, CT, covid

MosMedData Covid19

1000+ CTs of COVID19 patients
50 are annotated per pixel
Keywords: large, CT, covid, segmentations

COVID-19 LUNG CT LESION SEGMENTATION CHALLENGE

~250 chest CTs with positive RT-PCR SARS-CoV-2, annotations of COVID-19 lesions Keywords: medium, CT, covid, annotations, segmentations

MedSeg COVID-19 CT

~100 segmented CT slices
Keywords: medium, CT, segmentations, covid

COVID-Chest XRay

~150 xrays, ongoing, some hospital data
Keywords: medium, x-ray, covid

BSTI COVID19

ongoing, about 60 patients at last check, CT
paper pdf
Keywords: medium, CT, covid

RICORD

1000 X-rays and 240 CTs with annotations (paper)
Keywords: large, CT, covid, segmentations

FIRE (Fundus Image Registration Dataset)

129 retinal images.
Keywords: small, fundus

DRIVE: Digital Retinal Images for Vessel Extraction

40 retinal images with segmentations
Keywords: small, retinal, segmentations

FLARE: Fast and Low GPU memory Abdominal oRgan sEgmentation

500+ CT scans from 11+ countries with Abdominal Organ Segmentation (the liver, kidney, spleen, and pancreas)
Keywords: large, abdominal, CT

ADNI

Various imaging (longitudinal MRI), Genetics, Clinical data
Several thousand patients
Keyworks: large, MRI, genetics, clinical

VISCERAL

~120 image volumes (whole body CT and MRI images)
more than 1900 annotated anatomical structures
Keywords: medium, MRI, CT, whole-body, manual-segmentation

Mindboggle

Seems like 101 manually labelled brain MRIs
Keywords: medium, MRI, brain, manual-segmentation

Cross-Sectional Multidomain Lexical Processing

3000 brain scans (T1w, bold, events)
Standardized tests, scores, demographics
Keywords: large, MRI, fMRI, tests

Duke Breast Cancer Screening DBT

A curated dataset of digital breast tomosynthesis images from 5,060 patients.
Keywords: large, tomosynthesis, DBT, breast, detection

CBIS-DDSM (Curated Breast Imaging Subset of DDSM)

2600+ scanned film mammography studies
Keywords: large, x-ray

Neuromorphometrics

63 manually labelled brain scans. Costs ($1500?) Discussion
Keywords: medium, MRI, brain, manual-segmentation, costly

Automatic Non-rigid Histological Image Registration

This is a challenge for ISBI2019

7-Tesla rs-fMRI

22 particiapnts with cognitive and physiological mreasures, and 7T rs-fMRI

SpineWeb

200+ subjects across several datasets (CTs, Xrays, MRIs)

Whole-Heart and Great Vessel Segmentation from 3D Cardiovascular MRI in Congenital Heart Disease

20 cardiac MR images in Congenital Heart Disease

Longitudinal Neuroimaging in Children

paper
~50 children (~10yo) with single follow-up with MRI, fMRI and assesments
Keywords: medium, fMRI, longitudinal

Longitudinal Neuroimaging on arithmetic processing in children

paper
3T fMRI 132 typical dev children, 2 time points, four tasks
Keywords: medium, fMRI, longitudinal

Narratives

aggregates auditory story-listening fMRI datasets acquired over the course of roughly seven years
Keywords: medium, fMRI

ATLAS: Anatomical Tracings of Lesions After Stroke

229 T1-weighted MRI scans (n=220) with lesion segmentation
MNI152 standard-space T1-weighted average structural template image
A .csv file containing lesion metadata
paper
Keywords: medium, MRI, segmentations

MITOS_WSI_CMC

21 Canine mammary carcinoma whole slide images.
Annotated by 2/3 experts Keywords: small, 2D, whole slide imaging

FeTA Dataset

48 manually annotated in utero fetal MR
Keywords: small, mri, fetal, labels

SIMON

Single voluneer, 73 Sessions at multiple sites over ~17 years
MRI, at least T1 at each session, with other modalities varying by session.
Phenotype file provided
Keywords: small, MRI, longitudinal

BigBrain

Single volume, histological space , 100 micron) with GM/WM surfaces and cortical layers
ftp://bigbrain.loris.ca | interactive
Keywords: small, histology, high-resolution, segmentations

100 micron MRI of Human Brain

Single volume, ultra-high resolution MRI dataset (100-micron)
Keywords: small, MRI, brain

Natural Scenes Dataset (CMRR initiative)

8-subjects large-scale fMRI (40-sessions, high sampling, high resolution). T1w, T2w, T2*w MRI
Video description
Keywords: small, MRI, brain, fMRI

Brain Catalogue

(ex-vivo) brain MRIs or brains of different animals
Keywords: small, MRI, brain, animals

Multishell diffusion

Three Diffusion of healthy traveling adults
Keywords: small, MRI, diffusion, brain

Pre-Natal MRI

Prenatal brain MRI samples (looks like single subject?)
Keywords: small, MRI, fetal

Non-imaging

PhysioNet / Pulmonary Edema Severity Grades Based on MIMIC-CXR

This dataset is curated based on MIMIC-CXR, containing 3 metadata files that consist of pulmonary edema severity grades extracted from the MIMIC-CXR dataset through different means: 1) by regular expression (regex) from radiology reports, 2) by expert labeling from radiology reports, and 3) by consensus labeling from chest radiographs.
Keywords: pulmonary edema, severity grades, chest x-ray, radiology reports, MIMIC-CXR

PhysioNet / Computing in Cardiology 2019 Challenge

predict sepsis in an ICU population
5000 ICU patients in three separate hospital systems

eICU-CRD

detailed information about critical care stays for over 200,000 admissions at 200+ hospitals across the US.
With access to MIMIC, can access eICU-CRD immediately after signing an updated DUA.
paper

Non-medical but useful / fun

Moment in time

Other lists or pooling resources (relevant xkcd)


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
list (376
learning (370
collection (131
datasets (108
medical-imaging (96
medical (43