Skip to content

Python script to do tiling and basic filtering of histological slides

Notifications You must be signed in to change notification settings

DBO-DKFZ/wsi_preprocessing

Repository files navigation

wsi_preprocessing

Processing and tiling of histological slides

openslide-based processing and filtering (Only tissue filtering right now, more will follow) The process can be configured using a config json file.

The tissue detection is processed on a higher level to speed up the process. Thereby rough tiles will be sampled and discarded if there isn't enough tissue coverage. The tiles will then be divided into patches for training etc.

Supported annotation types are .xml (Camelyon17 and some other public datasets) or .geojson (QuPath) Right now only binary annotation types are supported (tumor - non-tumor)

Supported slide formats are .tif and .svs right now

Usage:

This script is designed to be used together with CuPath in case there are no annotations. Main file is "tile_generator.py" - Configure the process via the config file and execute this file to start the process

Additional information:

NOTE: Right now there is a bug on Unix systems regarding openslide where image data isn't properly loaded. To fix this follow: openslide/openslide-python#58 (comment)

Config Explanation:

Dictionary Entry Explanation
tissue_coverage Threshold [0,1] for how much tissue coverage is necessary, default is 0.75
keep_annotated_tiles_despite_too_little_tissue_coverage legacy option. Old behaviour: Keep annotated tiles even if not covered by tissue. New behaviour (to allow easier tile clean-up around the edges): discard tiles with too little tissue coverage regardless of annotation status.
processing_level Level of downscaling by openslide - Lowering the level will increase precision but more time is needed, default is 5
blocked_threads Number of threads that wont be used by the program
patches_per_tile Number of patches used for lower resolution operations like tissue detection
overlap Value [0,1[ to set the overlap between neighbouring unannotated patches
annotation_overlap Value [0,1[ to set the overlap between neighbouring annotated patches
patch_size Output pixel size of the quadratic patches
slides_dir Directory where the different slides and subdirs are located
slides_file txt file containing paths to all slides to process (absolute paths)
annotation_dir Directory where the annotations are located
annotation_file_format File format of the input annotations ("xml","geojson")
output_path Output directory to where the resulting images will be stored
skip_unlabeled_slides Boolean to skip slides without an annotation file
save_annotated_only Boolean to only save annotated patches
output_format Image output format default is "png"
show_mode Boolean to enable plotting of some intermediate results/visualizations
label_dict Structure to set up the operator and the threshold for checking the coverage of a certain class. Up to one unannotated tissue type (e.g. non-tumor) is possible and must go first for implementation reasons.
type Operator type [ "==", ">=", "<="]
threshold Coverage threshold for the individual class

About

Python script to do tiling and basic filtering of histological slides

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages