Merge branch 'MIC-DKFZ:master' into master

MIC-DKFZ · Oct 11, 2023 · 7e537d5 · 7e537d5
2 parents 4482ac1 + de48541
commit 7e537d5
Show file tree

Hide file tree

Showing 52 changed files with 391 additions and 156 deletions.
diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml
@@ -0,0 +1,22 @@
+---
+name: Codespell
+
+on:
+  push:
+    branches: [master]
+  pull_request:
+    branches: [master]
+
+permissions:
+  contents: read
+
+jobs:
+  codespell:
+    name: Check for spelling errors
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+      - name: Codespell
+        uses: codespell-project/actions-codespell@v2
diff --git a/documentation/competitions/AutoPETII.md b/documentation/competitions/AutoPETII.md
@@ -0,0 +1,129 @@
+# Look Ma, no code: fine tuning nnU-Net for the AutoPET II challenge by only adjusting its JSON plans
+
+Please cite our paper :-*
+
+```text
+COMING SOON
+```
+
+## Intro
+
+See the [Challenge Website](https://autopet-ii.grand-challenge.org/) for details on the challenge.
+
+Our solution to this challenge rewuires no code changes at all. All we do is optimize nnU-Net's hyperparameters 
+(architecture, batch size, patch size) through modifying the nnUNetplans.json file.
+
+## Prerequisites
+Use the latest pytorch version!
+
+We recommend you use the latest nnU-Net version as well! We ran our trainings with commit 913705f which you can try in case something doesn't work as expected:
+`pip install git+https://github.com/MIC-DKFZ/nnUNet.git@913705f`
+
+## How to reproduce our trainings
+
+### Download and convert the data
+1. Download and extract the AutoPET II dataset
+2. Convert it to nnU-Net format by running `python nnunetv2/dataset_conversion/Dataset221_AutoPETII_2023.py FOLDER` where folder is the extracted AutoPET II dataset.
+
+### Experiment planning and preprocessing
+We deviate a little from the standard nnU-Net procedure because all our experiments are based on just the 3d_fullres configuration
+
+Run the following commands:
+   - `nnUNetv2_extract_fingerprint -d 221` extracts the dataset fingerprint 
+   - `nnUNetv2_plan_experiment -d 221` does the planning for the plain unet
+   - `nnUNetv2_plan_experiment -d 221 -pl ResEncUNetPlanner` does the planning for the residual encoder unet
+   - `nnUNetv2_preprocess -d 221 -c 3d_fullres` runs all the preprocessing we need
+
+### Modification of plans files
+Please read the [information on how to modify plans files](../explanation_plans_files.md) first!!!
+
+
+It is easier to have everything in one plans file, so the first thing we do is transfer the ResEnc UNet to the 
+default plans file. We use the configuration inheritance feature of nnU-Net to make it use the same data as the 
+3d_fullres configuration.
+Add the following to the 'configurations' dict in 'nnUNetPlans.json':
+
+```json
+        "3d_fullres_resenc": {
+            "inherits_from": "3d_fullres",
+            "UNet_class_name": "ResidualEncoderUNet",
+            "n_conv_per_stage_encoder": [
+                1,
+                3,
+                4,
+                6,
+                6,
+                6
+            ],
+            "n_conv_per_stage_decoder": [
+                1,
+                1,
+                1,
+                1,
+                1
+            ]
+        },
+```
+
+(these values are basically just copied from the 'nnUNetResEncUNetPlans.json' file! With everything redundant being omitted thanks to inheritance from 3d_fullres)
+
+Now we crank up the patch and batch sizes. Add the following configurations:
+```json
+        "3d_fullres_resenc_bs80": {
+            "inherits_from": "3d_fullres_resenc",
+            "batch_size": 80
+            },
+        "3d_fullres_resenc_192x192x192_b24": {
+            "inherits_from": "3d_fullres_resenc",
+            "patch_size": [
+                192,
+                192,
+                192
+            ],
+            "batch_size": 24
+        }
+```
+
+Save the file (and check for potential Syntax Errors!)
+
+### Run trainings
+Training each model requires 8 Nvidia A100 40GB GPUs. Expect training to run for 5-7 days. You'll need a really good 
+CPU to handle the data augmentation! 128C/256T are a must! If you have less threads available, scale down nnUNet_n_proc_DA accordingly.
+
+```bash
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_bs80 0 -num_gpus 8
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_bs80 1 -num_gpus 8
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_bs80 2 -num_gpus 8
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_bs80 3 -num_gpus 8
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_bs80 4 -num_gpus 8
+
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_192x192x192_b24 0 -num_gpus 8
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_192x192x192_b24 1 -num_gpus 8
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_192x192x192_b24 2 -num_gpus 8
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_192x192x192_b24 3 -num_gpus 8
+nnUNet_compile=T nnUNet_n_proc_DA=28 nnUNetv2_train 221 3d_fullres_resenc_192x192x192_b24 4 -num_gpus 8
+```
+
+Done!
+
+(We also provide pretrained weights in case you don't want to invest the GPU resources, see below)
+
+## How to make predictions with pretrained weights
+Our final model is an ensemble of two configurations:
+- ResEnc UNet with batch size 80
+- ResEnc UNet with patch size 192x192x192 and batch size 24
+
+To run inference with these models, do the following:
+
+1. Download the pretrained model weights from [Zenodo](https://zenodo.org/record/8362371)
+2. Install both .zip files using `nnUNetv2_install_pretrained_model_from_zip`
+3. Make sure 
+4. Now you can run inference on new cases with `nnUNetv2_predict`:
+   - `nnUNetv2_predict -i INPUT -o OUTPUT1 -d 221 -c 3d_fullres_resenc_bs80 -f 0 1 2 3 4 -step_size 0.6 --save_probabilities`   
+   - `nnUNetv2_predict -i INPUT -o OUTPUT2 -d 221 -c 3d_fullres_resenc_192x192x192_b24 -f 0 1 2 3 4 --save_probabilities`
+   - `nnUNetv2_ensemble -i OUTPUT1 OUTPUT2 -o OUTPUT_ENSEMBLE`
+
+Note that our inference Docker omitted TTA via mirroring along the axial direction during prediction (only sagittal + 
+coronal mirroring). This was
+done to keep the inference time below 10 minutes per image on a T4 GPU (we actually never tested whether we could 
+have left this enabled). Just leave it on! You can also leave the step_size at default for the 3d_fullres_resenc_bs80.
diff --git a/documentation/dataset_format.md b/documentation/dataset_format.md
@@ -21,7 +21,7 @@ images). So these images could for example be a T1 and a T2 MRI (or whatever els
 channels MUST have the same geometry (same shape, spacing (if applicable) etc.) and
 must be co-registered (if applicable). Input channels are identified by nnU-Net by their FILE_ENDING: a four-digit integer at the end 
 of the filename. Image files must therefore follow the following naming convention: {CASE_IDENTIFIER}_{XXXX}.{FILE_ENDING}. 
-Hereby, XXXX is the 4-digit modality/channel identifier (should be unique for each modality/chanel, e.g., “0000” for T1, “0001” for 
+Hereby, XXXX is the 4-digit modality/channel identifier (should be unique for each modality/channel, e.g., “0000” for T1, “0001” for 
 T2 MRI, …) and FILE_ENDING is the file extension used by your image format (.png, .nii.gz, ...). See below for concrete examples.
 The dataset.json file connects channel names with the channel identifiers in the 'channel_names' key (see below for details).
 

diff --git a/documentation/how_to_use_nnunet.md b/documentation/how_to_use_nnunet.md
@@ -189,7 +189,7 @@ wait
 **Important: The first time a training is run nnU-Net will extract the preprocessed data into uncompressed numpy 
 arrays for speed reasons! This operation must be completed before starting more than one training of the same 
 configuration! Wait with starting subsequent folds until the first training is using the GPU! Depending on the 
-dataset size and your System this should oly take a couple of minutes at most.**
+dataset size and your System this should only take a couple of minutes at most.**
 
 If you insist on running DDP multi-GPU training, we got you covered:
 

diff --git a/documentation/set_environment_variables.md b/documentation/set_environment_variables.md
@@ -3,7 +3,7 @@
 nnU-Net requires some environment variables so that it always knows where the raw data, preprocessed data and trained 
 models are. Depending on the operating system, these environment variables need to be set in different ways.
 
-Variables can either be set permanently (recommended!) or you can decide to set them everytime you call nnU-Net. 
+Variables can either be set permanently (recommended!) or you can decide to set them every time you call nnU-Net. 
 
 # Linux & MacOS
 

diff --git a/nnunetv2/batch_running/collect_results_custom_Decathlon.py b/nnunetv2/batch_running/collect_results_custom_Decathlon.py
@@ -23,7 +23,7 @@ def collect_results(trainers: dict, datasets: List, output_file: str,
                             expected_output_folder = get_output_folder(d, module, plans, c)
                             if isdir(expected_output_folder):
                                 results_folds = []
-                                f.write("%s,%s,%s,%s,%s" % (d, c, module, plans, r))
+                                f.write(f"{d},{c},{module},{plans},{r}")
                                 for fl in folds:
                                     expected_output_folder_fold = get_output_folder(d, module, plans, c, fl)
                                     expected_summary_file = join(expected_output_folder_fold, "validation",
@@ -36,8 +36,8 @@ def collect_results(trainers: dict, datasets: List, output_file: str,
                                         foreground_mean = load_summary_json(expected_summary_file)['foreground_mean'][
                                             'Dice']
                                         results_folds.append(foreground_mean)
-                                        f.write(",%02.4f" % foreground_mean)
-                                f.write(",%02.4f\n" % np.nanmean(results_folds))
+                                        f.write(f",{foreground_mean:02.4f}")
+                                f.write(f",{np.nanmean(results_folds):02.4f}\n")
 
 
 def summarize(input_file, output_file, folds: Tuple[int, ...], configs: Tuple[str, ...], datasets, trainers):
@@ -61,7 +61,7 @@ def summarize(input_file, output_file, folds: Tuple[int, ...], configs: Tuple[st
         for t in trainers.keys():
             trainer_locs = valid_entries & (txt[:, 2] == t)
             for pl in trainers[t]:
-                f.write("%s__%s" % (t, pl))
+                f.write(f"{t}__{pl}")
                 trainer_plan_locs = trainer_locs & (txt[:, 3] == pl)
                 r = []
                 for d in valid_configs.keys():
@@ -83,13 +83,13 @@ def summarize(input_file, output_file, folds: Tuple[int, ...], configs: Tuple[st
                                 r.append(np.nan)
                             else:
                                 mean_dice = np.mean([float(i) for i in fold_results])
-                                f.write(",%02.4f" % mean_dice)
+                                f.write(f",{mean_dice:02.4f}")
                                 r.append(mean_dice)
                         else:
                             print('missing:', t, pl, d, v)
                             f.write(",nan")
                             r.append(np.nan)
-                f.write(",%02.4f\n" % np.mean(r))
+                f.write(f",{np.mean(r):02.4f}\n")
 
 
 if __name__ == '__main__':

diff --git a/nnunetv2/batch_running/release_trainings/nnunetv2_v1/collect_results.py b/nnunetv2/batch_running/release_trainings/nnunetv2_v1/collect_results.py
@@ -23,7 +23,7 @@ def collect_results(trainers: dict, datasets: List, output_file: str,
                             expected_output_folder = get_output_folder(d, module, plans, c)
                             if isdir(expected_output_folder):
                                 results_folds = []
-                                f.write("%s,%s,%s,%s,%s" % (d, c, module, plans, r))
+                                f.write(f"{d},{c},{module},{plans},{r}")
                                 for fl in folds:
                                     expected_output_folder_fold = get_output_folder(d, module, plans, c, fl)
                                     expected_summary_file = join(expected_output_folder_fold, "validation",
@@ -36,8 +36,8 @@ def collect_results(trainers: dict, datasets: List, output_file: str,
                                         foreground_mean = load_summary_json(expected_summary_file)['foreground_mean'][
                                             'Dice']
                                         results_folds.append(foreground_mean)
-                                        f.write(",%02.4f" % foreground_mean)
-                                f.write(",%02.4f\n" % np.nanmean(results_folds))
+                                        f.write(f",{foreground_mean:02.4f}")
+                                f.write(f",{np.nanmean(results_folds):02.4f}\n")
 
 
 def summarize(input_file, output_file, folds: Tuple[int, ...], configs: Tuple[str, ...], datasets, trainers):
@@ -61,7 +61,7 @@ def summarize(input_file, output_file, folds: Tuple[int, ...], configs: Tuple[st
         for t in trainers.keys():
             trainer_locs = valid_entries & (txt[:, 2] == t)
             for pl in trainers[t]:
-                f.write("%s__%s" % (t, pl))
+                f.write(f"{t}__{pl}")
                 trainer_plan_locs = trainer_locs & (txt[:, 3] == pl)
                 r = []
                 for d in valid_configs.keys():
@@ -83,13 +83,13 @@ def summarize(input_file, output_file, folds: Tuple[int, ...], configs: Tuple[st
                                 r.append(np.nan)
                             else:
                                 mean_dice = np.mean([float(i) for i in fold_results])
-                                f.write(",%02.4f" % mean_dice)
+                                f.write(f",{mean_dice:02.4f}")
                                 r.append(mean_dice)
                         else:
                             print('missing:', t, pl, d, v)
                             f.write(",nan")
                             r.append(np.nan)
-                f.write(",%02.4f\n" % np.mean(r))
+                f.write(f",{np.mean(r):02.4f}\n")
 
 
 if __name__ == '__main__':

diff --git a/nnunetv2/dataset_conversion/Dataset221_AutoPETII_2023.py b/nnunetv2/dataset_conversion/Dataset221_AutoPETII_2023.py
@@ -0,0 +1,70 @@
+from batchgenerators.utilities.file_and_folder_operations import *
+import shutil
+from nnunetv2.dataset_conversion.generate_dataset_json import generate_dataset_json
+from nnunetv2.paths import nnUNet_raw, nnUNet_preprocessed
+
+
+def convert_autopet(autopet_base_dir:str = '/media/isensee/My Book1/AutoPET/nifti/FDG-PET-CT-Lesions',
+                     nnunet_dataset_id: int = 221):
+    task_name = "AutoPETII_2023"
+
+    foldername = "Dataset%03.0d_%s" % (nnunet_dataset_id, task_name)
+
+    # setting up nnU-Net folders
+    out_base = join(nnUNet_raw, foldername)
+    imagestr = join(out_base, "imagesTr")
+    labelstr = join(out_base, "labelsTr")
+    maybe_mkdir_p(imagestr)
+    maybe_mkdir_p(labelstr)
+
+    patients = subdirs(autopet_base_dir, prefix='PETCT', join=False)
+    n = 0
+    identifiers = []
+    for pat in patients:
+        patient_acquisitions = subdirs(join(autopet_base_dir, pat), join=False)
+        for pa in patient_acquisitions:
+            n += 1
+            identifier = f"{pat}_{pa}"
+            identifiers.append(identifier)
+            if not isfile(join(imagestr, f'{identifier}_0000.nii.gz')):
+                shutil.copy(join(autopet_base_dir, pat, pa, 'CTres.nii.gz'), join(imagestr, f'{identifier}_0000.nii.gz'))
+            if not isfile(join(imagestr, f'{identifier}_0001.nii.gz')):
+                shutil.copy(join(autopet_base_dir, pat, pa, 'SUV.nii.gz'), join(imagestr, f'{identifier}_0001.nii.gz'))
+            if not isfile(join(imagestr, f'{identifier}.nii.gz')):
+                shutil.copy(join(autopet_base_dir, pat, pa, 'SEG.nii.gz'), join(labelstr, f'{identifier}.nii.gz'))
+
+    generate_dataset_json(out_base, {0: "CT", 1:"CT"},
+                          labels={
+                              "background": 0,
+                              "tumor": 1
+                          },
+                          num_training_cases=n, file_ending='.nii.gz',
+                          dataset_name=task_name, reference='https://autopet-ii.grand-challenge.org/',
+                          release='release',
+                          # overwrite_image_reader_writer='NibabelIOWithReorient',
+                          description=task_name)
+
+    # manual split
+    splits = []
+    for fold in range(5):
+        val_patients = patients[fold :: 5]
+        splits.append(
+            {
+                'train': [i for i in identifiers if not any([i.startswith(v) for v in val_patients])],
+                'val': [i for i in identifiers if any([i.startswith(v) for v in val_patients])],
+            }
+        )
+    pp_out_dir = join(nnUNet_preprocessed, foldername)
+    maybe_mkdir_p(pp_out_dir)
+    save_json(splits, join(pp_out_dir, 'splits_final.json'), sort_keys=False)
+
+
+if __name__ == '__main__':
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument('input_folder', type=str,
+                        help="The downloaded and extracted autopet dataset (must have PETCT_XXX subfolders)")
+    parser.add_argument('-d', required=False, type=int, default=221, help='nnU-Net Dataset ID, default: 221')
+    args = parser.parse_args()
+    amos_base = args.input_folder
+    convert_autopet(amos_base, args.d)
diff --git a/nnunetv2/dataset_conversion/generate_dataset_json.py b/nnunetv2/dataset_conversion/generate_dataset_json.py
@@ -76,7 +76,7 @@ def generate_dataset_json(output_folder: str,
             labels[l] = int(labels[l])
 
     dataset_json = {
-        'channel_names': channel_names,  # previously this was called 'modality'. I didnt like this so this is
+        'channel_names': channel_names,  # previously this was called 'modality'. I didn't like this so this is
         # channel_names now. Live with it.
         'labels': labels,
         'numTraining': num_training_cases,

diff --git a/nnunetv2/ensembling/ensemble.py b/nnunetv2/ensembling/ensemble.py
@@ -144,7 +144,7 @@ def ensemble_crossvalidations(list_of_trained_model_folders: List[str],
         for f in folds:
             if not isdir(join(tr, f'fold_{f}', 'validation')):
                 raise RuntimeError(f'Expected model output directory does not exist. You must train all requested '
-                                   f'folds of the speficied model.\nModel: {tr}\nFold: {f}')
+                                   f'folds of the specified model.\nModel: {tr}\nFold: {f}')
             files_here = subfiles(join(tr, f'fold_{f}', 'validation'), suffix='.npz', join=False)
             if len(files_here) == 0:
                 raise RuntimeError(f"No .npz files found in folder {join(tr, f'fold_{f}', 'validation')}. Rerun your "

diff --git a/nnunetv2/evaluation/evaluate_predictions.py b/nnunetv2/evaluation/evaluate_predictions.py
@@ -27,8 +27,8 @@ def key_to_label_or_region(key: str):
     except ValueError:
         key = key.replace('(', '')
         key = key.replace(')', '')
-        splitted = key.split(',')
-        return tuple([int(i) for i in splitted])
+        split = key.split(',')
+        return tuple([int(i) for i in split if len(i) > 0])
 
 
 def save_summary_json(results: dict, output_file: str):
@@ -227,7 +227,7 @@ def evaluate_folder_entry_point():
                         help='Output file. Optional. Default: pred_folder/summary.json')
     parser.add_argument('-np', type=int, required=False, default=default_num_processes,
                         help=f'number of processes used. Optional. Default: {default_num_processes}')
-    parser.add_argument('--chill', action='store_true', help='dont crash if folder_pred doesnt have all files that are present in folder_gt')
+    parser.add_argument('--chill', action='store_true', help='dont crash if folder_pred does not have all files that are present in folder_gt')
     args = parser.parse_args()
     compute_metrics_on_folder2(args.gt_folder, args.pred_folder, args.djfile, args.pfile, args.o, args.np, chill=args.chill)
 
@@ -245,7 +245,7 @@ def evaluate_simple_entry_point():
                         help='Output file. Optional. Default: pred_folder/summary.json')
     parser.add_argument('-np', type=int, required=False, default=default_num_processes,
                         help=f'number of processes used. Optional. Default: {default_num_processes}')
-    parser.add_argument('--chill', action='store_true', help='dont crash if folder_pred doesnt have all files that are present in folder_gt')
+    parser.add_argument('--chill', action='store_true', help='dont crash if folder_pred does not have all files that are present in folder_gt')
 
     args = parser.parse_args()
     compute_metrics_on_folder_simple(args.gt_folder, args.pred_folder, args.l, args.o, args.np, args.il, chill=args.chill)