openslide-based processing and filtering (Only tissue filtering right now, more will follow) The process can be configured using a config json file.
The tissue detection is processed on a higher level to speed up the process. Thereby rough tiles will be sampled and discarded if there isn't enough tissue coverage. The tiles will then be divided into patches for training etc.
Supported annotation types are .xml (Camelyon17 and some other public datasets) or .geojson (QuPath) Right now only binary annotation types are supported (tumor - non-tumor)
Supported slide formats are .tif and .svs right now
This script is designed to be used together with CuPath in case there are no annotations. Main file is "tile_generator.py" - Configure the process via the config file and execute this file to start the process
NOTE: Right now there is a bug on Unix systems regarding openslide where image data isn't properly loaded. To fix this follow: openslide/openslide-python#58 (comment)
Dictionary Entry | Explanation |
---|---|
tissue_coverage | Threshold [0,1] for how much tissue coverage is necessary, default is 0.75 |
keep_annotated_tiles_despite_too_little_tissue_coverage | legacy option. Old behaviour: Keep annotated tiles even if not covered by tissue. New behaviour (to allow easier tile clean-up around the edges): discard tiles with too little tissue coverage regardless of annotation status. |
processing_level | Level of downscaling by openslide - Lowering the level will increase precision but more time is needed, default is 5 |
blocked_threads | Number of threads that wont be used by the program |
patches_per_tile | Number of patches used for lower resolution operations like tissue detection |
overlap | Value [0,1[ to set the overlap between neighbouring unannotated patches |
annotation_overlap | Value [0,1[ to set the overlap between neighbouring annotated patches |
patch_size | Output pixel size of the quadratic patches |
slides_dir | Directory where the different slides and subdirs are located |
slides_file | txt file containing paths to all slides to process (absolute paths) |
annotation_dir | Directory where the annotations are located |
annotation_file_format | File format of the input annotations ("xml","geojson") |
output_path | Output directory to where the resulting images will be stored |
skip_unlabeled_slides | Boolean to skip slides without an annotation file |
save_annotated_only | Boolean to only save annotated patches |
output_format | Image output format default is "png" |
show_mode | Boolean to enable plotting of some intermediate results/visualizations |
label_dict | Structure to set up the operator and the threshold for checking the coverage of a certain class. Up to one unannotated tissue type (e.g. non-tumor) is possible and must go first for implementation reasons. |
type | Operator type [ "==", ">=", "<="] |
threshold | Coverage threshold for the individual class |