Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Nondeterministic results from gnn_fraud_detection_pipeline example #1676

Closed
2 tasks done
dagardner-nv opened this issue May 1, 2024 · 1 comment · Fixed by #1677
Closed
2 tasks done

[BUG]: Nondeterministic results from gnn_fraud_detection_pipeline example #1676

dagardner-nv opened this issue May 1, 2024 · 1 comment · Fixed by #1677
Assignees
Labels
bug Something isn't working

Comments

@dagardner-nv
Copy link
Contributor

dagardner-nv commented May 1, 2024

Version

24.06 & 23.11

Which installation method(s) does this occur on?

Source

Describe the bug.

Running this pipeline from the run.py script yields different results with each run.

Minimum reproducible example

cd ${MORPHEUS_ROOT}/examples/gnn_fraud_detection_pipeline
python run.py --output_file output1.csv
python run.py --output_file output2.csv
${MORPHEUS_ROOT}/scripts/compare_data_files.py --index_col=index output1.csv output2.csv

Relevant log output

Click here to see error details

Results do not match. Diff 191/265 (72.075472 %). First 10 mismatched rows:
1000 1001 client_node merchant_node 1004 1005 1006 ... ind_emb_57 ind_emb_58 ind_emb_59 ind_emb_60 ind_emb_61 ind_emb_62 ind_emb_63
index ...
753 res 64.87 1 80776 91780 100482 1 1 ... 0.802462 -0.290038 0.427980 0.198512 0.051006 -0.343573 0.464877
val 64.87 1 80776 91780 100482 1 1 ... 0.331588 -0.743460 0.057602 0.929850 -1.243031 -0.505327 0.655568
757 res 1039.87 0 86378 91782 100499 1 1 ... -0.224234 -0.638873 3.208107 -0.581114 0.397186 0.054665 -0.771643
val 1039.87 0 86378 91782 100499 1 1 ... -0.180428 -0.802617 2.976311 -0.472270 0.143358 0.099732 -0.657378
758 res 130.00 1 60551 92009 100486 1 1 ... 1.158630 0.328182 -1.933454 1.046066 -1.425990 -0.595764 3.954228
val 130.00 1 60551 92009 100486 1 1 ... 1.257934 0.483261 -1.617801 1.167611 -1.563025 -0.663233 3.842766
759 res 429.91 0 53182 91831 100510 1 1 ... 0.037058 3.374917 5.680276 -1.872390 2.608538 0.077500 -1.728548
val 429.91 0 53182 91831 100510 1 1 ... 0.091996 3.370662 5.754779 -1.836666 2.449290 0.051542 -1.720287
760 res 19.50 1 87501 91775 100519 1 1 ... -3.088483 3.069178 0.803048 1.461829 1.134519 1.522516 3.125674
val 19.50 1 87501 91775 100519 1 1 ... -3.210214 2.942406 1.007788 0.901810 1.105968 1.198515 2.616581
761 res 9100.00 0 64035 94642 100757 1 1 ... -0.372750 -1.881383 -1.552402 0.933195 -2.579973 -0.910605 1.146505
val 9100.00 0 64035 94642 100757 1 1 ... -0.399015 -1.882237 -1.528848 0.836805 -2.585224 -0.798560 1.101774
762 res 30.42 0 57394 91775 100607 1 1 ... 0.019567 0.589125 2.286478 0.793889 0.114065 0.116889 -2.347970
val 30.42 0 57394 91775 100607 1 1 ... -0.102164 0.462353 2.491218 0.233870 0.085513 -0.207112 -2.857063
764 res 25.74 1 74296 91784 100484 1 1 ... -1.295459 1.133638 2.393340 0.342958 0.540464 2.828200 -1.021909
val 25.74 1 74296 91784 100484 1 1 ... -1.636326 1.124704 2.607330 -0.247052 0.490438 2.303633 -1.600628
765 res 191.62 1 70354 96216 100483 1 1 ... -1.228315 0.555677 4.470200 -0.938110 1.238917 -0.028751 -2.252760
val 191.62 1 70354 96216 100483 1 1 ... -1.447451 0.673515 4.479450 -0.968101 1.217442 -0.229318 -2.322386
766 res 116.87 1 77701 91790 100482 1 1 ... 1.203686 -0.882477 0.483784 -0.034068 0.431342 -0.423109 1.098541
val 116.87 1 77701 91790 100482 1 1 ... 1.494646 -0.591068 0.493658 -0.357471 0.327692 -0.152491 0.899048

[20 rows x 180 columns]

Full env printout

Click here to see environment details

[Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct

  • I agree to follow Morpheus' Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@dagardner-nv dagardner-nv added the bug Something isn't working label May 1, 2024
@dagardner-nv
Copy link
Contributor Author

dagardner-nv commented May 1, 2024

I tried running with a single thread and limiting the output to just the index and prediction fields still returns mismatched results:

Results do not match. Diff 64/265 (24.150943 %). First 10 mismatched rows:
           prediction
index                
753   res    0.448414
      val    0.141227
758   res    0.291023
      val    0.219923
762   res    0.010932
      val    0.001276
764   res    0.003048
      val    0.001248
766   res    0.001374
      val    0.004702
767   res    0.000720
      val    0.001853
771   res    0.001419
      val    0.002863
772   res    0.845840
      val    0.910736
777   res    0.001184
      val    0.036856
781   res    0.120291
      val    0.072192

@tzemicheal tzemicheal self-assigned this May 2, 2024
rapids-bot bot pushed a commit that referenced this issue May 21, 2024
Fix the issues by updating the sampler during inference to full sampling from subsampling.



Closes #1676 

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Tad ZeMicheal (https://github.com/tzemicheal)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)
  - https://github.com/raykallen

URL: #1677
@github-project-automation github-project-automation bot moved this from Todo to Done in Morpheus Boards May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants