Skip to content

Commit

Permalink
Add ImageDecoder operator (onnx#5294)
Browse files Browse the repository at this point in the history
### Description
<!-- - Describe your changes. -->
- Adds a new ImageDecoder operator, to be used in preprocessing models.
- Tests use generated images and opencv as a reference.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve? -->
Currently, image decoding happens outside of ONNX, bring problems of
portability and deployment.
<!-- - If it fixes an open issue, please link to the issue here. -->

---------

Signed-off-by: Joaquin Anton <[email protected]>
  • Loading branch information
jantonguirao authored Aug 10, 2023
1 parent 6deac42 commit 4ad6d31
Show file tree
Hide file tree
Showing 39 changed files with 1,462 additions and 3 deletions.
60 changes: 60 additions & 0 deletions docs/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -24092,6 +24092,66 @@ This version of the operator has been available since version 20 of the default
<dd>Constrain grid types to float tensors.</dd>
</dl>

### <a name="ImageDecoder-20"></a>**ImageDecoder-20**</a>

Loads and decodes and image from a file. If it can't decode for any reason (e.g. corrupted encoded
stream, invalid format, it will return an empty matrix).
The following image formats are supported:
* BMP
* JPEG (note: Lossless JPEG support is optional)
* JPEG2000
* TIFF
* PNG
* WebP
* Portable image format (PBM, PGM, PPM, PXM, PNM)
Decoded images follow a channel-last layout: (Height, Width, Channels).
**JPEG chroma upsampling method:**
When upsampling the chroma components by a factor of 2, the pixels are linearly interpolated so that the
centers of the output pixels are 1/4 and 3/4 of the way between input pixel centers.
When rounding, 0.5 is rounded down and up at alternative pixels locations to prevent bias towards
larger values (ordered dither pattern).
Considering adjacent input pixels A, B, and C, B is upsampled to pixels B0 and B1 so that
```
B0 = round_half_down((1/4) * A + (3/4) * B)
B1 = round_half_up((3/4) * B + (1/4) * C)
```
This method, is the default chroma upsampling method in the well-established libjpeg-turbo library,
also referred as "smooth" or "fancy" upsampling.

#### Version

This version of the operator has been available since version 20 of the default ONNX operator set.

#### Attributes

<dl>
<dt><tt>pixel_format</tt> : string (default is RGB)</dt>
<dd>Pixel format. Can be one of "RGB", "BGR", or "Grayscale".</dd>
</dl>

#### Inputs

<dl>
<dt><tt>encoded_stream</tt> (non-differentiable) : T1</dt>
<dd>Encoded stream</dd>
</dl>

#### Outputs

<dl>
<dt><tt>image</tt> (non-differentiable) : T2</dt>
<dd>Decoded image</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T1</tt> : tensor(uint8)</dt>
<dd>Constrain input types to 8-bit unsigned integer tensor.</dd>
<dt><tt>T2</tt> : tensor(uint8)</dt>
<dd>Constrain output types to 8-bit unsigned integer tensor.</dd>
</dl>

### <a name="RegexFullMatch-20"></a>**RegexFullMatch-20**</a>

RegexFullMatch performs a full regex match on each element of the input tensor. If an element fully matches the regex pattern specified as an attribute, the corresponding element in the output is True and it is False otherwise. [RE2](https://github.com/google/re2/wiki/Syntax) regex syntax is used.
Expand Down
271 changes: 271 additions & 0 deletions docs/Operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ For an operator input/output's differentiability, it can be differentiable,
|<a href="#Hardmax">Hardmax</a>|<a href="Changelog.md#Hardmax-13">13</a>, <a href="Changelog.md#Hardmax-11">11</a>, <a href="Changelog.md#Hardmax-1">1</a>|
|<a href="#Identity">Identity</a>|<a href="Changelog.md#Identity-19">19</a>, <a href="Changelog.md#Identity-16">16</a>, <a href="Changelog.md#Identity-14">14</a>, <a href="Changelog.md#Identity-13">13</a>, <a href="Changelog.md#Identity-1">1</a>|
|<a href="#If">If</a>|<a href="Changelog.md#If-19">19</a>, <a href="Changelog.md#If-16">16</a>, <a href="Changelog.md#If-13">13</a>, <a href="Changelog.md#If-11">11</a>, <a href="Changelog.md#If-1">1</a>|
|<a href="#ImageDecoder">ImageDecoder</a>|<a href="Changelog.md#ImageDecoder-20">20</a>|
|<a href="#InstanceNormalization">InstanceNormalization</a>|<a href="Changelog.md#InstanceNormalization-6">6</a>, <a href="Changelog.md#InstanceNormalization-1">1</a>|
|<a href="#IsInf">IsInf</a>|<a href="Changelog.md#IsInf-10">10</a>|
|<a href="#IsNaN">IsNaN</a>|<a href="Changelog.md#IsNaN-13">13</a>, <a href="Changelog.md#IsNaN-9">9</a>|
Expand Down Expand Up @@ -12063,6 +12064,276 @@ expect(
</details>


### <a name="ImageDecoder"></a><a name="imagedecoder">**ImageDecoder**</a>

Loads and decodes and image from a file. If it can't decode for any reason (e.g. corrupted encoded
stream, invalid format, it will return an empty matrix).
The following image formats are supported:
* BMP
* JPEG (note: Lossless JPEG support is optional)
* JPEG2000
* TIFF
* PNG
* WebP
* Portable image format (PBM, PGM, PPM, PXM, PNM)
Decoded images follow a channel-last layout: (Height, Width, Channels).
**JPEG chroma upsampling method:**
When upsampling the chroma components by a factor of 2, the pixels are linearly interpolated so that the
centers of the output pixels are 1/4 and 3/4 of the way between input pixel centers.
When rounding, 0.5 is rounded down and up at alternative pixels locations to prevent bias towards
larger values (ordered dither pattern).
Considering adjacent input pixels A, B, and C, B is upsampled to pixels B0 and B1 so that
```
B0 = round_half_down((1/4) * A + (3/4) * B)
B1 = round_half_up((3/4) * B + (1/4) * C)
```
This method, is the default chroma upsampling method in the well-established libjpeg-turbo library,
also referred as "smooth" or "fancy" upsampling.

#### Version

This version of the operator has been available since version 20 of the default ONNX operator set.

#### Attributes

<dl>
<dt><tt>pixel_format</tt> : string (default is RGB)</dt>
<dd>Pixel format. Can be one of "RGB", "BGR", or "Grayscale".</dd>
</dl>

#### Inputs

<dl>
<dt><tt>encoded_stream</tt> (non-differentiable) : T1</dt>
<dd>Encoded stream</dd>
</dl>

#### Outputs

<dl>
<dt><tt>image</tt> (non-differentiable) : T2</dt>
<dd>Decoded image</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T1</tt> : tensor(uint8)</dt>
<dd>Constrain input types to 8-bit unsigned integer tensor.</dd>
<dt><tt>T2</tt> : tensor(uint8)</dt>
<dd>Constrain output types to 8-bit unsigned integer tensor.</dd>
</dl>


#### Examples

<details>
<summary>image_decoder_decode_bmp_rgb</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="RGB",
)

data, output = generate_test_data(".bmp", "RGB")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_bmp_rgb",
)
```

</details>


<details>
<summary>image_decoder_decode_jpeg2k_rgb</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="RGB",
)

data, output = generate_test_data(".jp2", "RGB")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_jpeg2k_rgb",
)
```

</details>


<details>
<summary>image_decoder_decode_jpeg_bgr</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="BGR",
)

data, output = generate_test_data(".jpg", "BGR")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_jpeg_bgr",
)
```

</details>


<details>
<summary>image_decoder_decode_jpeg_grayscale</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="Grayscale",
)

data, output = generate_test_data(".jpg", "Grayscale")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_jpeg_grayscale",
)
```

</details>


<details>
<summary>image_decoder_decode_jpeg_rgb</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="RGB",
)

data, output = generate_test_data(".jpg", "RGB")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_jpeg_rgb",
)
```

</details>


<details>
<summary>image_decoder_decode_png_rgb</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="RGB",
)

data, output = generate_test_data(".png", "RGB")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_png_rgb",
)
```

</details>


<details>
<summary>image_decoder_decode_pnm_rgb</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="RGB",
)

data, output = generate_test_data(".pnm", "RGB")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_pnm_rgb",
)
```

</details>


<details>
<summary>image_decoder_decode_tiff_rgb</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="RGB",
)

data, output = generate_test_data(".tiff", "RGB")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_tiff_rgb",
)
```

</details>


<details>
<summary>image_decoder_decode_webp_rgb</summary>

```python
node = onnx.helper.make_node(
"ImageDecoder",
inputs=["data"],
outputs=["output"],
pixel_format="RGB",
)

data, output = generate_test_data(".webp", "RGB")
expect(
node,
inputs=[data],
outputs=[output],
name="test_image_decoder_decode_webp_rgb",
)
```

</details>


### <a name="InstanceNormalization"></a><a name="instancenormalization">**InstanceNormalization**</a>

Carries out instance normalization as described in the paper
Expand Down
Loading

0 comments on commit 4ad6d31

Please sign in to comment.