-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Categorify
inference and testing
#1874
Changes from 5 commits
337e904
a92f0b9
f091625
f4c9de3
700b196
e22a836
c83d679
f568565
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,15 @@ | |
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
from merlin.dag import BaseOperator, ColumnSelector # noqa pylint: disable=unused-import | ||
from merlin.dag import ( # noqa pylint: disable=unused-import | ||
BaseOperator, | ||
ColumnSelector, | ||
DataFormats, | ||
) | ||
|
||
Operator = BaseOperator | ||
|
||
# Avoid TENSOR_TABLE by default (for now) | ||
class Operator(BaseOperator): | ||
@property | ||
def supported_formats(self): | ||
return DataFormats.PANDAS_DATAFRAME | DataFormats.CUDF_DATAFRAME | ||
Comment on lines
+26
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With NVIDIA-Merlin/systems#389 applied to Since I am not entirely sure which NVTabular operations are supported with In a follow-up to this PR, it probably makes sense to add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @oliverholworthy and @jperez999 do you have a chance to look at Rick's comments above? thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be alright. All of the operators in nvtabular were created with data frames in mind. If we ever decide to add in tensor table, we can make the change then. If this speeds up the runs I say we should do it and solves breaking issues we should execute. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -734,3 +734,8 @@ def test_categorify_inference(): | |
output_tensors = inference_op.transform(cats.input_columns, input_tensors) | ||
for key in input_tensors: | ||
assert output_tensors[key].dtype == np.dtype("int64") | ||
|
||
# Check results are consistent with python code path | ||
expect = workflow.transform(df) | ||
got = pd.DataFrame(output_tensors) | ||
assert_eq(expect, got) | ||
Comment on lines
+738
to
+741
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test was only.checking the data type of the result. Now it also checks if the result is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
base_operator.py
file was renamed tooperator.py
in NVIDIA-Merlin/core#359. Therefore, this fix should be valid for >=23.08