Final Project for OMSCS CS7643 Deep Learning
Using the RSNA pneumonia chest X-ray images for pneumonia classification, we ensemble various popular computer vision (CV) model architectures to incorporate more recent model architecture (e.g., EfficientNet) and data perturbation to further improve model external validity.
- Local
git clone https://github.gatech.edu/zwang3313/CS7643_Project.git
git checkout jason
cd cs7643_project
conda env create -f environment.yaml
- In case the model not included in
keras.applications
, need to install classification models Zoo - Keras (and TensorFlow Keras)
pip install git+https://github.com/qubvel/classification_models.git
- Download Kaggle datafiles
https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data - Run data pipeline scripts
python tfrecords_writing.py
- Colab
Tfrecord_writing.py
- Reformat and resize images
- Write image files into batch (average size of 32 with 1:1 case-control ratio)
- Write into
tfrecords
- Parameters
python tfrecords_writing.py --help
optional arguments:
-h, --help show this help message and exit
--width WIDTH, -wt WIDTH
Resized image width (default is 227)
--height HEIGHT, -ht HEIGHT
Resized image height (default is 227)
--data_dir DATA_DIR, -dd DATA_DIR
Zip file path (default is rsna-pneumonia-detection-challenge.zip)
--batch_size BATCH_SIZE, -bs BATCH_SIZE
Batch size in each tfrecord file (default is 32)
--num_shards NUM_SHARDS, -ns NUM_SHARDS
Number of shards (default is 200, must be less than 945)
- Execution
python tfrecords_writing.py
- Output
- There are total 26,684 (6,012 cases and 20,672 controls) in original dataset
- For this project, we randomly pick (3212 cases and 3200 controls) to form 200 tfrecords with average batch size of 32
- 200
tfrecord
(specified bynum_shards
) files saved in./tfrecords
See Training_[Model Name].ipynb
notebook for each model training implementation details and outputs.
Or open notebook in colab (This is for Densenet121)
Please note that Training_vgg16.ipynb is only tested in local Windows machine, for running it in Colab or Linux, it may need some adaptations.
See image_aug
function in utils_[Model Name].py
for data augmentation strategies for each model type.
See models_[Model Name].py
for data augmentation strategies for each model type.