Skip to content

Latest commit

 

History

History
88 lines (67 loc) · 6.61 KB

README.md

File metadata and controls

88 lines (67 loc) · 6.61 KB

Processing and tiling of histological slides

python black

Repository setup

We suggest to use miniconda as package manager for your system. Create and activate conda environment with:

conda env create -f environment.yml
conda activate wsi-pre2

Coding best practices

The main branch is protected. Please checkout your own development branch and draft a Pull Request to commit changes.

This repository uses pre-commit hooks and Github Actions for code quality, inspired by the Lightning Hydra Template.

The pre-commit library comes installed with the conda environment. You should then setup pre-commit which uses the hooks from .pre-commit-config.yaml:

pre-commit install

After that your code will be automatically reformatted on every new commit.

To reformat all files in the project use command:

pre-commit run -a

To update hook versions in .pre-commit-config.yaml use:

pre-commit autoupdate

Run tiling with provided config files

The main script to run is tile_generator.py. We provide configs in the configs/ folder which generate tables of patch locations with the corresponding pixel sizes. The tables are then stored as .csv files for each slide in the configured output_path. By default multiprocessing is enabled, such that multiple slides can be processed simultaneously.

As example the tiling of TCGA slides with patch_size=256 can be started as follows:

python tile_generator.py --config configs/tcga-crc_256.json

Config parameters

The table shows descriptions for the most important config parameters:

Dictionary Entry Description
check_resolution Perform a resolution check of all slides before extracting patches
use_tissue_detection Toggle the activation of tissue detection
remove_top_border Useful for Camelyon slides. Default is false
save_patches In old pipelines we used to store patches. In this project the default is false
zip_patches Experimental to try if zipped patch image directories increase transfer speeds. Default is false.
tissue_coverage Threshold [0,1] for how much tissue coverage is necessary, default is 0.8
processing_level Level of downscaling by openslide - Lowering the level will increase precision but more time is needed, default is 3
blocked_threads Number of threads that won't be used by the program
patches_per_tile Number of patches used for lower resolution operations like tissue detection
overlap Value [0,1] to set the overlap between neighbouring unannotated patches
annotation_overlap Value [0,1] to set the overlap between neighbouring annotated patches
patch_size Output pixel size of the quadratic patches
calibration
use_non_pixel_lengths Activate calibration and use micrometers instead of pixels
patch_size_microns Specify the patch size in micrometers. At 0.25 $\mu\text{m}$ / pixel, 64 $\mu\text{m}$ equal 256 pixels
resize Whether to resize the patches in micrometers to the given patch_size
dataset Provide name for the dataset
slides_dir Directory where the different slides and subdirs are located
slideinfo_file Provide a .csv file with filenames and labels
annotation_dir Directory where the annotations are located
annotation_file_format File format of the input annotations ("xml","geojson")
output_path Output directory to where the resulting files will be stored
skip_unlabeled_slides Boolean to skip slides without an annotation file
save_annotated_only Boolean to only save annotated patches
output_format Image output format. Either "jpeg" or "png"
metadata_format Format in which slide metadata is stored. Default is "csv"
write_slideinfo Write information about the processed slide
show_mode Boolean to enable plotting of some intermediate results/visualizations
label_dict Structure to set up the operator and the threshold for checking the coverage of a certain class
type Operator type ["==", ">=", "<="]
threshold Coverage threshold for the individual class