-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: mark model_parallel tests as cuda specific #35269
Conversation
@@ -3046,6 +3046,7 @@ def test_multi_gpu_data_parallel_forward(self): | |||
with torch.no_grad(): | |||
_ = model(**self._prepare_for_class(inputs_dict, model_class)) | |||
|
|||
@require_torch_gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@require_torch_multi_gpu
on the next line was previously made non-CUDA specific by 11c27dd. Right now it sets requirement for having any multi-GPUs on the system.
I wonder, maybe it makes sense to rename @require_torch_gpu
to @require_torch_cuda
to avoid naming collision? I can follow up on that in separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ydshieh on this! 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dvrogozh I am not sure I follow with was previously made non-CUDA specific by 11c27dd
. Is the commit link is the correct one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the commit link is the correct one?
Yes, that's correct commit and link. Unfortunately it's a huge commit with multiple changes inside and one of them was to make @require_torch_multi_gpu
non-cuda specific. See a change in src/transformers/testing_utils.py
, line 743 in old code, line 788 in new code, Commit changed torch.cuda.device_count()
to device_count
and the latter is calculated via newly added function (which works for CUDA and XPU at the moment):
def get_device_count(): |
Currently
@require_torch_multi_gpu
looks like this and is not CUDA specific (after 11c27dd):transformers/src/transformers/testing_utils.py
Lines 769 to 781 in 18e896b
def require_torch_multi_gpu(test_case): | |
""" | |
Decorator marking a test that requires a multi-GPU setup (in PyTorch). These tests are skipped on a machine without | |
multiple GPUs. | |
To run *only* the multi_gpu tests, assuming all test names contain multi_gpu: $ pytest -sv ./tests -k "multi_gpu" | |
""" | |
if not is_torch_available(): | |
return unittest.skip(reason="test requires PyTorch")(test_case) | |
device_count = get_device_count() | |
return unittest.skipUnless(device_count > 1, "test requires multiple GPUs")(test_case) |
Note that @require_torch_gpu
is still CUDA-specific:
transformers/src/transformers/testing_utils.py
Lines 984 to 986 in 18e896b
def require_torch_gpu(test_case): | |
"""Decorator marking a test that requires CUDA and PyTorch.""" | |
return unittest.skipUnless(torch_device == "cuda", "test requires CUDA")(test_case) |
I am not sure I follow with was previously made non-CUDA specific by 11c27dd.
The point I am trying to make is that @require_torch_multi_gpu
and @require_torch_gpu
are named similarly, but diverged (after 11c27dd) in what they actually signify which leads to confusion. My proposal is to rename @require_torch_gpu
to @require_torch_cuda
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a series of work regarding device agnostic, and require_torch_accelerator
, require_torch_multi_accelerator
etc are added for device agnostic. The ``require_torch_xxx_gpu` should not be modified IMO, but I guess @Titus-von-Koelle wasn't realizing the existing methods #31098 is added at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -3046,6 +3046,7 @@ def test_multi_gpu_data_parallel_forward(self): | |||
with torch.no_grad(): | |||
_ = model(**self._prepare_for_class(inputs_dict, model_class)) | |||
|
|||
@require_torch_gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ydshieh on this! 🤗
`parallelize()` API is deprecated in favor of accelerate's `device_map="auto"` and therefore is not accepting new features. At the same time `parallelize()` implementation is currently CUDA-specific. This commit marks respective ci tests with `@require_torch_gpu`. Fixes: huggingface#35252 Signed-off-by: Dmitry Rogozhkin <[email protected]>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The discussed topic about (potential) naming etc. should be in a follow PR
`parallelize()` API is deprecated in favor of accelerate's `device_map="auto"` and therefore is not accepting new features. At the same time `parallelize()` implementation is currently CUDA-specific. This commit marks respective ci tests with `@require_torch_gpu`. Fixes: huggingface#35252 Signed-off-by: Dmitry Rogozhkin <[email protected]>
parallelize()
API is deprecated in favor of accelerate'sdevice_map="auto"
and therefore is not accepting new features. At the same timeparallelize()
implementation is currently CUDA-specific. This commit marks respective ci tests with@require_torch_gpu
.Fixes: #35252
CC: @ArthurZucker @SunMarc