forked from onnx/onnx
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add RegexFullMatch operator (onnx#5401)
### Description This PR introduces the `RegexFullMatch` operator, as originally proposed in onnx#5317. The `RegexFullMatch` operator takes one string tensor as input and returns a bool tensor of identical shape indicating if each element fully matches the regex pattern encoded in the `pattern` string attribute. This attribute is a string and we expect valid [re2](https://github.com/google/re2) regex. Some examples are as follows: ``` RegexFullMatch(["www.google.com", "www.facebook.com", "www.bbc.co.uk"], "www\.[\w.-]+\.\bcom\b") => [True, True, False] RegexFullMatch([["[email protected]", "[email protected]"], ["not email", "[email protected]"]], "(\W|^)[\w.\-]{0,25}@(yahoo|gmail)\.com(\W|$)") => [[True, False], [False, True]] ``` ### Motivation and Context Closes onnx#5317. Following discussion at the last Operators SIG Weekly the "engine" attribute has been dropped in favour of simply using [re2](https://github.com/google/re2) syntax for now. This reflects the fact that both [Tensorflow](https://www.tensorflow.org/api_docs/python/tf/strings/regex_full_match) and [PyTorch](https://pytorch.org/text/0.15.0/transforms.html#regextokenizer) operators requiring regex use re2 already. --------- Signed-off-by: Aditya Goel <[email protected]> Signed-off-by: Chun-Wei Chen <[email protected]> Signed-off-by: Aditya Goel <[email protected]> Co-authored-by: Chun-Wei Chen <[email protected]> Co-authored-by: Christian Bourjau <[email protected]> Co-authored-by: Xavier Dupré <[email protected]>
- Loading branch information
1 parent
0f1f98c
commit 2e0908d
Showing
32 changed files
with
483 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -118,6 +118,7 @@ For an operator input/output's differentiability, it can be differentiable, | |
|<a href="#ReduceMin">ReduceMin</a>|<a href="Changelog.md#ReduceMin-18">18</a>, <a href="Changelog.md#ReduceMin-13">13</a>, <a href="Changelog.md#ReduceMin-12">12</a>, <a href="Changelog.md#ReduceMin-11">11</a>, <a href="Changelog.md#ReduceMin-1">1</a>| | ||
|<a href="#ReduceProd">ReduceProd</a>|<a href="Changelog.md#ReduceProd-18">18</a>, <a href="Changelog.md#ReduceProd-13">13</a>, <a href="Changelog.md#ReduceProd-11">11</a>, <a href="Changelog.md#ReduceProd-1">1</a>| | ||
|<a href="#ReduceSum">ReduceSum</a>|<a href="Changelog.md#ReduceSum-13">13</a>, <a href="Changelog.md#ReduceSum-11">11</a>, <a href="Changelog.md#ReduceSum-1">1</a>| | ||
|<a href="#RegexFullMatch">RegexFullMatch</a>|<a href="Changelog.md#RegexFullMatch-20">20</a>| | ||
|<a href="#Reshape">Reshape</a>|<a href="Changelog.md#Reshape-19">19</a>, <a href="Changelog.md#Reshape-14">14</a>, <a href="Changelog.md#Reshape-13">13</a>, <a href="Changelog.md#Reshape-5">5</a>, <a href="Changelog.md#Reshape-1">1</a>| | ||
|<a href="#Resize">Resize</a>|<a href="Changelog.md#Resize-19">19</a>, <a href="Changelog.md#Resize-18">18</a>, <a href="Changelog.md#Resize-13">13</a>, <a href="Changelog.md#Resize-11">11</a>, <a href="Changelog.md#Resize-10">10</a>| | ||
|<a href="#ReverseSequence">ReverseSequence</a>|<a href="Changelog.md#ReverseSequence-10">10</a>| | ||
|
@@ -22588,6 +22589,121 @@ expect( | |
</details> | ||
|
||
|
||
### <a name="RegexFullMatch"></a><a name="regexfullmatch">**RegexFullMatch**</a> | ||
|
||
RegexFullMatch performs a full regex match on each element of the input tensor. If an element fully matches the regex pattern specified as an attribute, the corresponding element in the output is True and it is False otherwise. [RE2](https://github.com/google/re2/wiki/Syntax) regex syntax is used. | ||
|
||
#### Version | ||
|
||
This version of the operator has been available since version 20 of the default ONNX operator set. | ||
|
||
#### Attributes | ||
|
||
<dl> | ||
<dt><tt>pattern</tt> : string</dt> | ||
<dd>Regex pattern to match on. This must be valid RE2 syntax.</dd> | ||
</dl> | ||
|
||
#### Inputs | ||
|
||
<dl> | ||
<dt><tt>X</tt> (non-differentiable) : T1</dt> | ||
<dd>Tensor with strings to match on.</dd> | ||
</dl> | ||
|
||
#### Outputs | ||
|
||
<dl> | ||
<dt><tt>Y</tt> (non-differentiable) : T2</dt> | ||
<dd>Tensor of bools indicating if each input string fully matches the regex pattern specified.</dd> | ||
</dl> | ||
|
||
#### Type Constraints | ||
|
||
<dl> | ||
<dt><tt>T1</tt> : tensor(string)</dt> | ||
<dd>Inputs must be UTF-8 strings</dd> | ||
<dt><tt>T2</tt> : tensor(bool)</dt> | ||
<dd>Outputs are bools and are True where there is a full regex match and False otherwise.</dd> | ||
</dl> | ||
|
||
|
||
#### Examples | ||
|
||
<details> | ||
<summary>basic</summary> | ||
|
||
```python | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"www\.[\w.-]+\.\bcom\b", | ||
) | ||
|
||
x = np.array(["www.google.com", "www.facebook.com", "www.bbc.co.uk"]).astype( | ||
object | ||
) | ||
result = np.array([True, True, False]) | ||
expect(node, inputs=[x], outputs=[result], name="test_regex_full_match_basic") | ||
``` | ||
|
||
</details> | ||
|
||
|
||
<details> | ||
<summary>match_email_domain</summary> | ||
|
||
```python | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"(\W|^)[\w.\-]{0,25}@(yahoo|gmail)\.com(\W|$)", | ||
) | ||
|
||
x = np.array( | ||
[ | ||
["[email protected]", "[email protected]"], | ||
["not email", "[email protected]"], | ||
] | ||
).astype(object) | ||
result = np.array([[True, False], [False, True]]) | ||
expect( | ||
node, | ||
inputs=[x], | ||
outputs=[result], | ||
name="test_regex_full_match_email_domain", | ||
) | ||
``` | ||
|
||
</details> | ||
|
||
|
||
<details> | ||
<summary>match_empty</summary> | ||
|
||
```python | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"(\W|^)[\w.\-]{0,25}@(yahoo|gmail)\.com(\W|$)", | ||
) | ||
|
||
x = np.array([[], []]).astype(object) | ||
result = np.array([[], []]).astype(bool) | ||
expect( | ||
node, | ||
inputs=[x], | ||
outputs=[result], | ||
name="test_regex_full_match_empty", | ||
) | ||
``` | ||
|
||
</details> | ||
|
||
|
||
### <a name="Relu"></a><a name="relu">**Relu**</a> | ||
|
||
Relu takes one input data (Tensor<T>) and produces one output data | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ | |
* [Overall Test Coverage](#overall-test-coverage) | ||
# Node Test Coverage | ||
## Summary | ||
Node tests have covered 177/190 (93.16%, 5 generators excluded) common operators. | ||
Node tests have covered 178/191 (93.19%, 5 generators excluded) common operators. | ||
|
||
Node tests have covered 0/0 (N/A) experimental operators. | ||
|
||
|
@@ -15258,6 +15258,78 @@ expect( | |
</details> | ||
|
||
|
||
### RegexFullMatch | ||
There are 3 test cases, listed as following: | ||
<details> | ||
<summary>basic</summary> | ||
|
||
```python | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"www\.[\w.-]+\.\bcom\b", | ||
) | ||
|
||
x = np.array(["www.google.com", "www.facebook.com", "www.bbc.co.uk"]).astype( | ||
object | ||
) | ||
result = np.array([True, True, False]) | ||
expect(node, inputs=[x], outputs=[result], name="test_regex_full_match_basic") | ||
``` | ||
|
||
</details> | ||
<details> | ||
<summary>match_email_domain</summary> | ||
|
||
```python | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"(\W|^)[\w.\-]{0,25}@(yahoo|gmail)\.com(\W|$)", | ||
) | ||
|
||
x = np.array( | ||
[ | ||
["[email protected]", "[email protected]"], | ||
["not email", "[email protected]"], | ||
] | ||
).astype(object) | ||
result = np.array([[True, False], [False, True]]) | ||
expect( | ||
node, | ||
inputs=[x], | ||
outputs=[result], | ||
name="test_regex_full_match_email_domain", | ||
) | ||
``` | ||
|
||
</details> | ||
<details> | ||
<summary>match_empty</summary> | ||
|
||
```python | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"(\W|^)[\w.\-]{0,25}@(yahoo|gmail)\.com(\W|$)", | ||
) | ||
|
||
x = np.array([[], []]).astype(object) | ||
result = np.array([[], []]).astype(bool) | ||
expect( | ||
node, | ||
inputs=[x], | ||
outputs=[result], | ||
name="test_regex_full_match_empty", | ||
) | ||
``` | ||
|
||
</details> | ||
|
||
|
||
### Relu | ||
There are 1 test cases, listed as following: | ||
<details> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Copyright (c) ONNX Project Contributors | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import numpy as np | ||
|
||
import onnx | ||
from onnx.backend.test.case.base import Base | ||
from onnx.backend.test.case.node import expect | ||
|
||
|
||
class RegexFullMatch(Base): | ||
@staticmethod | ||
def export_basic() -> None: | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"www\.[\w.-]+\.\bcom\b", | ||
) | ||
|
||
x = np.array(["www.google.com", "www.facebook.com", "www.bbc.co.uk"]).astype( | ||
object | ||
) | ||
result = np.array([True, True, False]) | ||
expect(node, inputs=[x], outputs=[result], name="test_regex_full_match_basic") | ||
|
||
@staticmethod | ||
def export_match_email_domain() -> None: | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"(\W|^)[\w.\-]{0,25}@(yahoo|gmail)\.com(\W|$)", | ||
) | ||
|
||
x = np.array( | ||
[ | ||
["[email protected]", "[email protected]"], | ||
["not email", "[email protected]"], | ||
] | ||
).astype(object) | ||
result = np.array([[True, False], [False, True]]) | ||
expect( | ||
node, | ||
inputs=[x], | ||
outputs=[result], | ||
name="test_regex_full_match_email_domain", | ||
) | ||
|
||
@staticmethod | ||
def export_match_empty() -> None: | ||
node = onnx.helper.make_node( | ||
"RegexFullMatch", | ||
inputs=["X"], | ||
outputs=["Y"], | ||
pattern=r"(\W|^)[\w.\-]{0,25}@(yahoo|gmail)\.com(\W|$)", | ||
) | ||
|
||
x = np.array([[], []]).astype(object) | ||
result = np.array([[], []]).astype(bool) | ||
expect( | ||
node, | ||
inputs=[x], | ||
outputs=[result], | ||
name="test_regex_full_match_empty", | ||
) |
Binary file not shown.
1 change: 1 addition & 0 deletions
1
onnx/backend/test/data/node/test_regex_full_match_basic/test_data_set_0/input_0.pb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
2www.google.com2www.facebook.com2www.bbc.co.ukBX | ||
|
Binary file added
BIN
+12 Bytes
onnx/backend/test/data/node/test_regex_full_match_basic/test_data_set_0/output_0.pb
Binary file not shown.
Binary file added
BIN
+187 Bytes
onnx/backend/test/data/node/test_regex_full_match_email_domain/model.onnx
Binary file not shown.
1 change: 1 addition & 0 deletions
1
onnx/backend/test/data/node/test_regex_full_match_email_domain/test_data_set_0/input_0.pb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
2[email protected][email protected] not email2[email protected]X |
Binary file added
BIN
+15 Bytes
onnx/backend/test/data/node/test_regex_full_match_email_domain/test_data_set_0/output_0.pb
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+9 Bytes
onnx/backend/test/data/node/test_regex_full_match_empty/test_data_set_0/input_0.pb
Binary file not shown.
Binary file added
BIN
+11 Bytes
onnx/backend/test/data/node/test_regex_full_match_empty/test_data_set_0/output_0.pb
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.