ci: mark model_parallel tests as cuda specific #35269

dvrogozh · 2024-12-13T20:30:07Z

parallelize() API is deprecated in favor of accelerate's device_map="auto" and therefore is not accepting new features. At the same time parallelize() implementation is currently CUDA-specific. This commit marks respective ci tests with @require_torch_gpu.

Fixes: #35252
CC: @ArthurZucker @SunMarc

dvrogozh · 2024-12-13T20:36:53Z

tests/test_modeling_common.py

@@ -3046,6 +3046,7 @@ def test_multi_gpu_data_parallel_forward(self):
            with torch.no_grad():
                _ = model(**self._prepare_for_class(inputs_dict, model_class))

+    @require_torch_gpu


@require_torch_multi_gpu on the next line was previously made non-CUDA specific by 11c27dd. Right now it sets requirement for having any multi-GPUs on the system.

I wonder, maybe it makes sense to rename @require_torch_gpu to @require_torch_cuda to avoid naming collision? I can follow up on that in separate PR.

cc @ydshieh on this! 🤗

Hi @dvrogozh I am not sure I follow with was previously made non-CUDA specific by 11c27dd. Is the commit link is the correct one?

Is the commit link is the correct one?

Yes, that's correct commit and link. Unfortunately it's a huge commit with multiple changes inside and one of them was to make @require_torch_multi_gpu non-cuda specific. See a change in src/transformers/testing_utils.py, line 743 in old code, line 788 in new code, Commit changed torch.cuda.device_count() to device_count and the latter is calculated via newly added function (which works for CUDA and XPU at the moment):

transformers/src/transformers/testing_utils.py

Line 238 in 18e896b

def get_device_count():

Currently @require_torch_multi_gpu looks like this and is not CUDA specific (after 11c27dd):

transformers/src/transformers/testing_utils.py

Lines 769 to 781 in 18e896b

def require_torch_multi_gpu(test_case):

"""

Decorator marking a test that requires a multi-GPU setup (in PyTorch). These tests are skipped on a machine without

multiple GPUs.

To run *only* the multi_gpu tests, assuming all test names contain multi_gpu: $ pytest -sv ./tests -k "multi_gpu"

"""

if not is_torch_available():

return unittest.skip(reason="test requires PyTorch")(test_case)

device_count = get_device_count()

return unittest.skipUnless(device_count > 1, "test requires multiple GPUs")(test_case)

Note that @require_torch_gpu is still CUDA-specific:

transformers/src/transformers/testing_utils.py

Lines 984 to 986 in 18e896b

def require_torch_gpu(test_case):

"""Decorator marking a test that requires CUDA and PyTorch."""

return unittest.skipUnless(torch_device == "cuda", "test requires CUDA")(test_case)

I am not sure I follow with was previously made non-CUDA specific by 11c27dd.

The point I am trying to make is that @require_torch_multi_gpu and @require_torch_gpu are named similarly, but diverged (after 11c27dd) in what they actually signify which leads to confusion. My proposal is to rename @require_torch_gpu to @require_torch_cuda.

We have a series of work regarding device agnostic, and require_torch_accelerator, require_torch_multi_accelerator etc are added for device agnostic. The ``require_torch_xxx_gpu` should not be modified IMO, but I guess @Titus-von-Koelle wasn't realizing the existing methods #31098 is added at that time.

ArthurZucker

lgtm

ArthurZucker · 2024-12-16T12:53:11Z

tests/test_modeling_common.py

@@ -3046,6 +3046,7 @@ def test_multi_gpu_data_parallel_forward(self):
            with torch.no_grad():
                _ = model(**self._prepare_for_class(inputs_dict, model_class))

+    @require_torch_gpu


cc @ydshieh on this! 🤗

`parallelize()` API is deprecated in favor of accelerate's `device_map="auto"` and therefore is not accepting new features. At the same time `parallelize()` implementation is currently CUDA-specific. This commit marks respective ci tests with `@require_torch_gpu`. Fixes: huggingface#35252 Signed-off-by: Dmitry Rogozhkin <[email protected]>

HuggingFaceDocBuilderDev · 2024-12-24T16:49:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh

LGTM. The discussed topic about (potential) naming etc. should be in a follow PR

`parallelize()` API is deprecated in favor of accelerate's `device_map="auto"` and therefore is not accepting new features. At the same time `parallelize()` implementation is currently CUDA-specific. This commit marks respective ci tests with `@require_torch_gpu`. Fixes: huggingface#35252 Signed-off-by: Dmitry Rogozhkin <[email protected]>

dvrogozh commented Dec 13, 2024

View reviewed changes

This was referenced Dec 13, 2024

gpt2: enable model_parallel for xpu backend #35253

Closed

xpu: parallelize() not supported for PyTorch XPU backend #35252

Closed

ArthurZucker approved these changes Dec 16, 2024

View reviewed changes

dvrogozh force-pushed the tests branch from 4aadd0b to 5a5308e Compare December 16, 2024 18:20

SunMarc requested a review from ydshieh December 23, 2024 12:03

ydshieh approved these changes Jan 7, 2025

View reviewed changes

ydshieh merged commit 9fd123a into huggingface:main Jan 7, 2025
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: mark model_parallel tests as cuda specific #35269

ci: mark model_parallel tests as cuda specific #35269

dvrogozh commented Dec 13, 2024

dvrogozh Dec 13, 2024

ArthurZucker Dec 16, 2024

ydshieh Jan 6, 2025

dvrogozh Jan 6, 2025

ydshieh Jan 7, 2025 •

edited

Loading

ArthurZucker left a comment

ArthurZucker Dec 16, 2024

HuggingFaceDocBuilderDev commented Dec 24, 2024

ydshieh left a comment

	def require_torch_multi_gpu(test_case):
	"""
	Decorator marking a test that requires a multi-GPU setup (in PyTorch). These tests are skipped on a machine without
	multiple GPUs.

	To run only the multi_gpu tests, assuming all test names contain multi_gpu: $ pytest -sv ./tests -k "multi_gpu"
	"""
	if not is_torch_available():
	return unittest.skip(reason="test requires PyTorch")(test_case)

	device_count = get_device_count()

	return unittest.skipUnless(device_count > 1, "test requires multiple GPUs")(test_case)

	def require_torch_gpu(test_case):
	"""Decorator marking a test that requires CUDA and PyTorch."""
	return unittest.skipUnless(torch_device == "cuda", "test requires CUDA")(test_case)

ci: mark model_parallel tests as cuda specific #35269

ci: mark model_parallel tests as cuda specific #35269

Conversation

dvrogozh commented Dec 13, 2024

dvrogozh Dec 13, 2024

Choose a reason for hiding this comment

ArthurZucker Dec 16, 2024

Choose a reason for hiding this comment

ydshieh Jan 6, 2025

Choose a reason for hiding this comment

dvrogozh Jan 6, 2025

Choose a reason for hiding this comment

ydshieh Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Dec 16, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 24, 2024

ydshieh left a comment

Choose a reason for hiding this comment

ydshieh Jan 7, 2025 •

edited

Loading