-
Why does the same tensor require multiple copies, and what does cur_copy mean? |
Beta Was this translation helpful? Give feedback.
Answered by
slaren
Dec 30, 2024
Replies: 1 comment
-
It is used to reduce synchronization overhead when using pipeline parallelism. With a single GPU there is only one copy. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
ysay-d
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It is used to reduce synchronization overhead when using pipeline parallelism. With a single GPU there is only one copy.