Add ImageDecoder operator (onnx#5294)

### Description  - Adds a new ImageDecoder operator, to be used in preprocessing models. - Tests use generated images and opencv as a reference. ### Motivation and Context  Currently, image decoding happens outside of ONNX, bring problems of portability and deployment.  --------- Signed-off-by: Joaquin Anton <[email protected]>
meilofveeningen-rl · Aug 10, 2023 · 4ad6d31 · 4ad6d31
1 parent 6deac42
commit 4ad6d31
Show file tree

Hide file tree

Showing 39 changed files with 1,462 additions and 3 deletions.
diff --git a/docs/Changelog.md b/docs/Changelog.md
@@ -24092,6 +24092,66 @@ This version of the operator has been available since version 20 of the default
 <dd>Constrain grid types to float tensors.</dd>
 </dl>
 
+### <a name="ImageDecoder-20"></a>**ImageDecoder-20**</a>
+
+  Loads and decodes and image from a file. If it can't decode for any reason (e.g. corrupted encoded
+  stream, invalid format, it will return an empty matrix).
+  The following image formats are supported:
+  * BMP
+  * JPEG (note: Lossless JPEG support is optional)
+  * JPEG2000
+  * TIFF
+  * PNG
+  * WebP
+  * Portable image format (PBM, PGM, PPM, PXM, PNM)
+  Decoded images follow a channel-last layout: (Height, Width, Channels).
+  **JPEG chroma upsampling method:**
+  When upsampling the chroma components by a factor of 2, the pixels are linearly interpolated so that the
+  centers of the output pixels are 1/4 and 3/4 of the way between input pixel centers.
+  When rounding, 0.5 is rounded down and up at alternative pixels locations to prevent bias towards
+  larger values (ordered dither pattern).
+  Considering adjacent input pixels A, B, and C, B is upsampled to pixels B0 and B1 so that
+  ```
+  B0 = round_half_down((1/4) * A + (3/4) * B)
+  B1 = round_half_up((3/4) * B + (1/4) * C)
+  ```
+  This method,  is the default chroma upsampling method in the well-established libjpeg-turbo library,
+  also referred as "smooth" or "fancy" upsampling.
+
+#### Version
+
+This version of the operator has been available since version 20 of the default ONNX operator set.
+
+#### Attributes
+
+<dl>
+<dt><tt>pixel_format</tt> : string (default is RGB)</dt>
+<dd>Pixel format. Can be one of "RGB", "BGR", or "Grayscale".</dd>
+</dl>
+
+#### Inputs
+
+<dl>
+<dt><tt>encoded_stream</tt> (non-differentiable) : T1</dt>
+<dd>Encoded stream</dd>
+</dl>
+
+#### Outputs
+
+<dl>
+<dt><tt>image</tt> (non-differentiable) : T2</dt>
+<dd>Decoded image</dd>
+</dl>
+
+#### Type Constraints
+
+<dl>
+<dt><tt>T1</tt> : tensor(uint8)</dt>
+<dd>Constrain input types to 8-bit unsigned integer tensor.</dd>
+<dt><tt>T2</tt> : tensor(uint8)</dt>
+<dd>Constrain output types to 8-bit unsigned integer tensor.</dd>
+</dl>
+
 ### <a name="RegexFullMatch-20"></a>**RegexFullMatch-20**</a>
 
   RegexFullMatch performs a full regex match on each element of the input tensor. If an element fully matches the regex pattern specified as an attribute, the corresponding element in the output is True and it is False otherwise. [RE2](https://github.com/google/re2/wiki/Syntax) regex syntax is used.

diff --git a/docs/Operators.md b/docs/Operators.md
@@ -71,6 +71,7 @@ For an operator input/output's differentiability, it can be differentiable,
 |<a href="#Hardmax">Hardmax</a>|<a href="Changelog.md#Hardmax-13">13</a>, <a href="Changelog.md#Hardmax-11">11</a>, <a href="Changelog.md#Hardmax-1">1</a>|
 |<a href="#Identity">Identity</a>|<a href="Changelog.md#Identity-19">19</a>, <a href="Changelog.md#Identity-16">16</a>, <a href="Changelog.md#Identity-14">14</a>, <a href="Changelog.md#Identity-13">13</a>, <a href="Changelog.md#Identity-1">1</a>|
 |<a href="#If">If</a>|<a href="Changelog.md#If-19">19</a>, <a href="Changelog.md#If-16">16</a>, <a href="Changelog.md#If-13">13</a>, <a href="Changelog.md#If-11">11</a>, <a href="Changelog.md#If-1">1</a>|
+|<a href="#ImageDecoder">ImageDecoder</a>|<a href="Changelog.md#ImageDecoder-20">20</a>|
 |<a href="#InstanceNormalization">InstanceNormalization</a>|<a href="Changelog.md#InstanceNormalization-6">6</a>, <a href="Changelog.md#InstanceNormalization-1">1</a>|
 |<a href="#IsInf">IsInf</a>|<a href="Changelog.md#IsInf-10">10</a>|
 |<a href="#IsNaN">IsNaN</a>|<a href="Changelog.md#IsNaN-13">13</a>, <a href="Changelog.md#IsNaN-9">9</a>|
@@ -12063,6 +12064,276 @@ expect(
 </details>
 
 
+### <a name="ImageDecoder"></a><a name="imagedecoder">**ImageDecoder**</a>
+
+  Loads and decodes and image from a file. If it can't decode for any reason (e.g. corrupted encoded
+  stream, invalid format, it will return an empty matrix).
+  The following image formats are supported:
+  * BMP
+  * JPEG (note: Lossless JPEG support is optional)
+  * JPEG2000
+  * TIFF
+  * PNG
+  * WebP
+  * Portable image format (PBM, PGM, PPM, PXM, PNM)
+  Decoded images follow a channel-last layout: (Height, Width, Channels).
+  **JPEG chroma upsampling method:**
+  When upsampling the chroma components by a factor of 2, the pixels are linearly interpolated so that the
+  centers of the output pixels are 1/4 and 3/4 of the way between input pixel centers.
+  When rounding, 0.5 is rounded down and up at alternative pixels locations to prevent bias towards
+  larger values (ordered dither pattern).
+  Considering adjacent input pixels A, B, and C, B is upsampled to pixels B0 and B1 so that
+  ```
+  B0 = round_half_down((1/4) * A + (3/4) * B)
+  B1 = round_half_up((3/4) * B + (1/4) * C)
+  ```
+  This method,  is the default chroma upsampling method in the well-established libjpeg-turbo library,
+  also referred as "smooth" or "fancy" upsampling.
+
+#### Version
+
+This version of the operator has been available since version 20 of the default ONNX operator set.
+
+#### Attributes
+
+<dl>
+<dt><tt>pixel_format</tt> : string (default is RGB)</dt>
+<dd>Pixel format. Can be one of "RGB", "BGR", or "Grayscale".</dd>
+</dl>
+
+#### Inputs
+
+<dl>
+<dt><tt>encoded_stream</tt> (non-differentiable) : T1</dt>
+<dd>Encoded stream</dd>
+</dl>
+
+#### Outputs
+
+<dl>
+<dt><tt>image</tt> (non-differentiable) : T2</dt>
+<dd>Decoded image</dd>
+</dl>
+
+#### Type Constraints
+
+<dl>
+<dt><tt>T1</tt> : tensor(uint8)</dt>
+<dd>Constrain input types to 8-bit unsigned integer tensor.</dd>
+<dt><tt>T2</tt> : tensor(uint8)</dt>
+<dd>Constrain output types to 8-bit unsigned integer tensor.</dd>
+</dl>
+
+
+#### Examples
+
+<details>
+<summary>image_decoder_decode_bmp_rgb</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="RGB",
+)
+
+data, output = generate_test_data(".bmp", "RGB")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_bmp_rgb",
+)
+```
+
+</details>
+
+
+<details>
+<summary>image_decoder_decode_jpeg2k_rgb</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="RGB",
+)
+
+data, output = generate_test_data(".jp2", "RGB")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_jpeg2k_rgb",
+)
+```
+
+</details>
+
+
+<details>
+<summary>image_decoder_decode_jpeg_bgr</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="BGR",
+)
+
+data, output = generate_test_data(".jpg", "BGR")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_jpeg_bgr",
+)
+```
+
+</details>
+
+
+<details>
+<summary>image_decoder_decode_jpeg_grayscale</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="Grayscale",
+)
+
+data, output = generate_test_data(".jpg", "Grayscale")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_jpeg_grayscale",
+)
+```
+
+</details>
+
+
+<details>
+<summary>image_decoder_decode_jpeg_rgb</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="RGB",
+)
+
+data, output = generate_test_data(".jpg", "RGB")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_jpeg_rgb",
+)
+```
+
+</details>
+
+
+<details>
+<summary>image_decoder_decode_png_rgb</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="RGB",
+)
+
+data, output = generate_test_data(".png", "RGB")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_png_rgb",
+)
+```
+
+</details>
+
+
+<details>
+<summary>image_decoder_decode_pnm_rgb</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="RGB",
+)
+
+data, output = generate_test_data(".pnm", "RGB")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_pnm_rgb",
+)
+```
+
+</details>
+
+
+<details>
+<summary>image_decoder_decode_tiff_rgb</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="RGB",
+)
+
+data, output = generate_test_data(".tiff", "RGB")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_tiff_rgb",
+)
+```
+
+</details>
+
+
+<details>
+<summary>image_decoder_decode_webp_rgb</summary>
+
+```python
+node = onnx.helper.make_node(
+    "ImageDecoder",
+    inputs=["data"],
+    outputs=["output"],
+    pixel_format="RGB",
+)
+
+data, output = generate_test_data(".webp", "RGB")
+expect(
+    node,
+    inputs=[data],
+    outputs=[output],
+    name="test_image_decoder_decode_webp_rgb",
+)
+```
+
+</details>
+
+
 ### <a name="InstanceNormalization"></a><a name="instancenormalization">**InstanceNormalization**</a>
 
   Carries out instance normalization as described in the paper