Identifying high-risk breast cancer using digital pathology images
checkpoints/
# model checkpoints
csv_dir/
# mapping relationship
datasets/
# python pickle (tensor) datasets
models/
# swin transformer model class
split_biopsies.ipynb
prepare_datasets.ipynb
train_pipeline_resnet50.ipynb
train_pipeline_swin.py
test_ensemble.ipynb
submit.ipynb
license.txt
Run split_biopsies.ipynb
It will export several CSV files in csv_dir
folder that record the mapping relationship of downsampled slices information and labels.
Run prepare_datasets.ipynb
The script exports a dictionary for each train/test/holdout set in datasets/
folder. Like
pd.to_pickle({'x': holdout_x_list, 'y': holdout_y_list, 'id': holdout_biopsy_id_list}, f'./datasets/holdout.pkl')
- x: slice tensor. Croped and Normalized to 224x224x3 resolution.
- y: label. {0, 1, 2, 3, 4}
- id: BiopsyID, the slice image belongs to.
- Run
train_pipeline_resnet50.ipynb
for RestNet-50 model - Run
train_pipeline_swin.py
for Swin-Large model
Above two scripts will save model parameters in checkpoints/
folder.
Run test_ensemble.ipynb
The script load ResNet and Swin models' parameters and outputs the prediction logits for holdout set.
Then we calibrate and ensemble the two outputs (take average scores) and export the final prediction results for expected holdout set.
- Specify prediction result CSV, Run
submit.ipynb
(Our team's final best result is in submit_1220.csv
)