-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Fix - Fix numa error on grace cpu in gpu-copy #658
Conversation
@microsoft-github-policy-service agree company="Microsoft" |
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
superbench/benchmarks/micro_benchmarks/gpu_copy_performance/gpu_copy.cu
Outdated
Show resolved
Hide resolved
Thanks for the fix! I left one minor comment. |
One major concern: with current fix, the dtoh/htod performance regarding to those non-CPU NUMA nodes are just skipped. We need to find a way to allocate memory buffers in non-CPU NUMA nodes (instead of current numa_run_on_node) and get these dtoh/htod tests done. |
superbench/benchmarks/micro_benchmarks/gpu_copy_performance/gpu_copy.cu
Outdated
Show resolved
Hide resolved
Totally Agree. This case should only occur in case of single socket grace machines. will need a work around to handle the setting of node affinity for such nodes. Will work in it and create a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
87dc0ec
to
41492d6
Compare
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run ansible-integration-test |
Azure Pipelines successfully started running 1 pipeline(s). |
superbench/benchmarks/micro_benchmarks/gpu_copy_performance/gpu_copy.cu
Outdated
Show resolved
Hide resolved
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #658 +/- ##
=======================================
Coverage 85.80% 85.80%
=======================================
Files 97 97
Lines 6923 6923
=======================================
Hits 5940 5940
Misses 983 983
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The current GPU Copy BW Performance fails on Nvidia Grace systems. This is due to the memory only numa node and thus the numa_run_on_node fails for such nodes and halts completely.
This fix checks for the presence of assigned CPU cores for the numa node, on checking if it has no cpu cores assigned, it skips that specific node during the args creation and continues.