[BUG]: Nondeterministic results from gnn_fraud_detection_pipeline example #1676

dagardner-nv · 2024-05-01T16:47:40Z

Version

24.06 & 23.11

Which installation method(s) does this occur on?

Source

Describe the bug.

Running this pipeline from the run.py script yields different results with each run.

Minimum reproducible example

cd ${MORPHEUS_ROOT}/examples/gnn_fraud_detection_pipeline
python run.py --output_file output1.csv
python run.py --output_file output2.csv
${MORPHEUS_ROOT}/scripts/compare_data_files.py --index_col=index output1.csv output2.csv

Relevant log output

Click here to see error details

Results do 1000 1001 index 753 res 64.87 1 val 64.87 1 757 res 1039.87 0 val 1039.87 0 758 res 130.00 1 val 130.00 1 759 res 429.91 0 val 429.91 0 760 res 19.50 1 val 19.50 1 761 res 9100.00 0 val 9100.00 0 762 res 30.42 0 val 30.42 0 764 res 25.74 1 val 25.74 1 765 res 191.62 1 val 191.62 1 766 res 116.87 1 val 116.87 1 not match. Diff 191/265 (72.075472 %). First 10 mismatched rows:
client_node merchant_node 1004 1005 1006 ... ind_emb_57 ind_emb_58 ind_emb_59 ind_emb_60 ind_emb_61 ind_emb_62 ind_emb_63
...
80776 91780 100482 1 1 ... 0.802462 -0.290038 0.427980 0.198512 0.051006 -0.343573 0.464877
80776 91780 100482 1 1 ... 0.331588 -0.743460 0.057602 0.929850 -1.243031 -0.505327 0.655568
86378 91782 100499 1 1 ... -0.224234 -0.638873 3.208107 -0.581114 0.397186 0.054665 -0.771643
86378 91782 100499 1 1 ... -0.180428 -0.802617 2.976311 -0.472270 0.143358 0.099732 -0.657378
60551 92009 100486 1 1 ... 1.158630 0.328182 -1.933454 1.046066 -1.425990 -0.595764 3.954228
60551 92009 100486 1 1 ... 1.257934 0.483261 -1.617801 1.167611 -1.563025 -0.663233 3.842766
53182 91831 100510 1 1 ... 0.037058 3.374917 5.680276 -1.872390 2.608538 0.077500 -1.728548
53182 91831 100510 1 1 ... 0.091996 3.370662 5.754779 -1.836666 2.449290 0.051542 -1.720287
87501 91775 100519 1 1 ... -3.088483 3.069178 0.803048 1.461829 1.134519 1.522516 3.125674
87501 91775 100519 1 1 ... -3.210214 2.942406 1.007788 0.901810 1.105968 1.198515 2.616581
64035 94642 100757 1 1 ... -0.372750 -1.881383 -1.552402 0.933195 -2.579973 -0.910605 1.146505
64035 94642 100757 1 1 ... -0.399015 -1.882237 -1.528848 0.836805 -2.585224 -0.798560 1.101774
57394 91775 100607 1 1 ... 0.019567 0.589125 2.286478 0.793889 0.114065 0.116889 -2.347970
57394 91775 100607 1 1 ... -0.102164 0.462353 2.491218 0.233870 0.085513 -0.207112 -2.857063
74296 91784 100484 1 1 ... -1.295459 1.133638 2.393340 0.342958 0.540464 2.828200 -1.021909
74296 91784 100484 1 1 ... -1.636326 1.124704 2.607330 -0.247052 0.490438 2.303633 -1.600628
70354 96216 100483 1 1 ... -1.228315 0.555677 4.470200 -0.938110 1.238917 -0.028751 -2.252760
70354 96216 100483 1 1 ... -1.447451 0.673515 4.479450 -0.968101 1.217442 -0.229318 -2.322386
77701 91790 100482 1 1 ... 1.203686 -0.882477 0.483784 -0.034068 0.431342 -0.423109 1.098541
77701 91790 100482 1 1 ... 1.494646 -0.591068 0.493658 -0.357471 0.327692 -0.152491 0.899048

[20 rows x 180 columns]

Full env printout

Click here to see environment details

[Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct

I agree to follow Morpheus' Code of Conduct
I have searched the open bugs and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

dagardner-nv · 2024-05-01T16:53:53Z

I tried running with a single thread and limiting the output to just the index and prediction fields still returns mismatched results:

Results do not match. Diff 64/265 (24.150943 %). First 10 mismatched rows:
           prediction
index                
753   res    0.448414
      val    0.141227
758   res    0.291023
      val    0.219923
762   res    0.010932
      val    0.001276
764   res    0.003048
      val    0.001248
766   res    0.001374
      val    0.004702
767   res    0.000720
      val    0.001853
771   res    0.001419
      val    0.002863
772   res    0.845840
      val    0.910736
777   res    0.001184
      val    0.036856
781   res    0.120291
      val    0.072192

Fix the issues by updating the sampler during inference to full sampling from subsampling. Closes #1676 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Tad ZeMicheal (https://github.com/tzemicheal) Approvers: - David Gardner (https://github.com/dagardner-nv) - https://github.com/raykallen URL: #1677

dagardner-nv added the bug Something isn't working label May 1, 2024

github-project-automation bot added this to Morpheus Boards May 1, 2024

github-project-automation bot moved this to Todo in Morpheus Boards May 1, 2024

tzemicheal self-assigned this May 2, 2024

tzemicheal mentioned this issue May 2, 2024

Fix non-deterministic output of gnn sampler #1677

Merged

rapids-bot bot closed this as completed in #1677 May 21, 2024

github-project-automation bot moved this from Todo to Done in Morpheus Boards May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Nondeterministic results from gnn_fraud_detection_pipeline example #1676

[BUG]: Nondeterministic results from gnn_fraud_detection_pipeline example #1676

dagardner-nv commented May 1, 2024 •

edited

Loading

dagardner-nv commented May 1, 2024 •

edited

Loading

[BUG]: Nondeterministic results from gnn_fraud_detection_pipeline example #1676

[BUG]: Nondeterministic results from gnn_fraud_detection_pipeline example #1676

Comments

dagardner-nv commented May 1, 2024 • edited Loading

Version

Which installation method(s) does this occur on?

Describe the bug.

Minimum reproducible example

Relevant log output

Full env printout

Other/Misc.

Code of Conduct

dagardner-nv commented May 1, 2024 • edited Loading

dagardner-nv commented May 1, 2024 •

edited

Loading

dagardner-nv commented May 1, 2024 •

edited

Loading