Skip to content

v22.02.00

Compare
Choose a tag to compare
@GPUtester GPUtester released this 02 Feb 15:58
· 455 commits to main since this release

Version 22.02.00 (Feb 02, 2022)

This version would be available through both Conda (https://anaconda.org/rapidsai/cucim) and PyPI package (https://pypi.org/project/cucim/22.02.00/).

🚨 Breaking Changes

  • Update cucim.skimage API to match scikit-image 0.19 (#190) @glee77

🐛 Bug Fixes

  • Fix a bug in v21.12.01 (#191) @gigony
    • Fix GPU memory leak when using nvJPEG API (when device='cuda' parameter is used in read_region method).
  • Fix segfault for preferred_memory_capacity in Python 3.9+ (#214) @gigony

📖 Documentation

🚀 New Features

  1. Update cucim.skimage API to match scikit-image 0.19 (#190) @glee77
  2. Support multi-threads and batch, and support nvJPEG for JPEG-compressed images (#191) @gigony
  3. Allow CuPy 10 (#195) @jakikham

1. Update cucim.skimage API to match scikit-image 0.19 (🚨 Breaking Changes)

channel_axis support

scikit-image 0.19 adds a channel_axis argument that should now be used instead of the multichannel boolean.

In scikit-image 1.0, the multichannel argument will likely be removed so we start supporting channel_axis in cuCIM.

This pulls changes from many scikit-image 0.19.0 PRs related to deprecating multichannel in favor of channel_axis. A few other minor PRs related to deprecations and updates to color.label2rgb are incorporated here as well.

The changes are mostly non-breaking, although a couple of deprecated functions have been removed (rgb2grey, grey2rgb) and a change in the default value of label2rgb's bg_label argument. The deprecated alpha argument was removed from gray2rgb.

Implements:

Update float32 dtype support to match scikit-image 0.19 behavior

Makes float32 and float16 handling consistent with scikit-image 0.19. (must functions support float32, float16 gets promoted to float32)

Deprecate APIs

Introduces new deprecations as in scikit-image 0.19.

Specifically:

  • selem -> footprint
  • grey -> gray
  • iterations -> num_iter
  • max_iter -> max_num_iter
  • min_iter -> min_num_iter

2. Supporting Multithreading and Batch Processing

cuCIM now supports loading the entire image with multi-threads. It also supports batch loading of images.

If device parameter of read_region() method is "cuda", it loads a relevant portion of the image file (compressed tile data) into GPU memory using cuFile(GDS, GPUDirect Storage), then decompress those data using nvJPEG's Batched Image Decoding API.

Current implementations are not efficient and performance is poor compared to CPU implementations. However, we plan to improve it over the next versions.

Example API Usages

The following parameters would be added in the read_region method:

  • num_workers: number of workers(threads) to use for loading the image. (default: 1)
  • batch_size: number of images to load at once. (default: 1)
  • drop_last: whether to drop the last batch if the batch size is not divisible by the number of images. (default: False)
  • preferch_factor: number of samples loaded in advance by each worker. (default: 2)
  • shuffle: whether to shuffle the input locations (default: False)
  • seed: seed value for random value generation (default: 0)

Loading entire image by using multithreads

from cucim import CuImage

img = CuImage("input.tif")

region = img.read_region(level=1, num_workers=8)  # read whole image at level 1 using 8 workers

Loading batched image using multithreads

You can feed locations of the region through the list/tuple of locations or through the NumPy array of locations.
(e.g., ((<x for loc 1>, <y for loc 1>), (<x for loc 2>, <y for loc 2>)])).
Each element in the location should be int type (int64) and the dimension of the location should be
equal to the dimension of the size.
You can feed any iterator of locations (dimensions of the input don't matter, flattening the item in the iterator once if the item is also an iterator).

For example, you can feed the following iterator:

  • [0, 0, 100, 0] or (0, 0, 100, 0) would be interpreted as a list of (0, 0) and (100, 0).
  • ((sx, sy) for sy in range(0, height, patch_size) for sx in range(0, width, patch_size)) would iterate over the locations of the patches.
  • [(0, 100), (0, 200)] would be interpreted as a list of (0, 0) and (100, 0).
  • Numpy array such as np.array(((0, 100), (0, 200))) or np.array((0, 100, 0, 200)) would be also available and using Numpy array object would be faster than using python list/tuple.
import numpy as np
from cucim import CuImage

cache = CuImage.cache("per_process", memory_capacity=1024)

img = CuImage("image.tif")

locations = [[0,   0], [100,   0], [200,   0], [300,   0],
             [0, 200], [100, 200], [200, 200], [300, 200]]
# locations = np.array(locations)

region = img.read_region(locations, (224, 224), batch_size=4, num_workers=8)

for batch in region:
    img = np.asarray(batch)
    print(img.shape)
    for item in img:
        print(item.shape)

# (4, 224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (4, 224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)

Loading image using nvJPEG and cuFile (GDS, GPUDirect Storage)

If cuda argument is specified in device parameter of read_region() method, it uses nvJPEG with GPUDirect Storage to load images.

Use CuPy instead of Numpy, and Image Cache (CuImage.cache) wouldn't be used in the case.

import cupy as cp
from cucim import CuImage

img = CuImage("image.tif")

locations = [[0,   0], [100,   0], [200,   0], [300,   0],
             [0, 200], [100, 200], [200, 200], [300, 200]]
# locations = np.array(locations)

region = img.read_region(locations, (224, 224), batch_size=4, device="cuda")

for batch in region:
    img = cp.asarray(batch)
    print(img.shape)
    for item in img:
        print(item.shape)

# (4, 224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (4, 224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)
# (224, 224, 3)

Experimental Results

We have compared performance against Tifffile for loading the entire image.

System Information
Experiment Setup

Benchmarked loading several images with Tifffile.

Results
  • JPEG2000 YCbCr: TUPAC-TR-467.svs, 55MB, 19920x26420, tile size 240x240
    • cuCIM [6 threads]: 2.7688472287729384
    • tifffile [6 threads]: 7.4588409311138095
    • cuCIM [12 threads]: 2.1468488964252175
    • tifffile [12 threads]: 6.142562598735094
  • JPEG: image.tif (256x256 multi-resolution/tiled TIF conversion of TUPAC-TR-467.svs), 238MB, 19920x26420, tile size 256x256
    • cuCIM [6 threads]: 0.6951584462076426
    • tifffile [6 threads]: 1.0252630705013872
    • cuCIM [12 threads]: 0.5354489935562015
    • tifffile [12 threads]: 1.5688881931826473
  • JPEG2000 RGB: CMU-1-JP2K-33005.svs, 126MB, 46000x32893, tile size 240x240
    • cuCIM [6 threads]: 9.2361351958476
    • tifffile [6 threads]: 27.936951795965435
    • cuCIM [12 threads]: 7.4136177686043085
    • tifffile [12 threads]: 22.46532293939963
  • JPEG: 0005f7aaab2800f6170c399693a96917.tiff, 46MB, 27648x29440, tile size 512x512
    • cuCIM [6 threads]: 0.7972335423342883
    • tifffile [6 threads]: 0.926042037177831
    • cuCIM [12 threads]: 0.6366931471042335
    • tifffile [12 threads]: 0.9512427857145667
  • JPEG: 000920ad0b612851f8e01bcc880d9b3d.tiff, 14MB, 15360x13312, tile size 512x512
    • cuCIM [6 threads]: 0.2257618647068739
    • tifffile [6 threads]: 0.25579613661393524
    • cuCIM [12 threads]: 0.1840262952260673
    • tifffile [12 threads]: 0.2717844221740961
  • JPEG: 001d865e65ef5d2579c190a0e0350d8f.tiff, 71MB, 28672x34560, tile size 512x512
    • cuCIM [6 threads]: 0.9925791253335774
    • tifffile [6 threads]: 1.131185239739716
    • cuCIM [12 threads]: 0.8037087645381689
    • tifffile [12 threads]: 1.1474561678245663

3. Allow CuPy 10

Relaxes version constraints to allow CuPy 10 (in meta.yaml).

cupy 9.* => cupy >=9,<11.0.0a0

🛠️ Improvements