[docs] MPS #28016

stevhliu · 2023-12-13T19:56:29Z

As a part of a larger effort to clean up the Trainer API docs in #27986, this PR moves the Trainer for accelerated PyTorch training on Mac section to the currently empty Training on Specialized Hardware page.

Other updates include rewriting it a bit so it doesn't sound like it's copied directly from the blog post and removing the link to the paywalled article for setup 🙂

amyeroberts

Very nice - thanks for reworking and tidying up!

amyeroberts · 2023-12-14T11:12:02Z

docs/source/en/perf_train_special.md

- Note: Most of the strategies introduced in the [single GPU section](perf_train_gpu_one) (such as mixed precision training or gradient accumulation) and [multi-GPU section](perf_train_gpu_many) are generic and apply to training models in general so make sure to have a look at it before diving into this section.
+<Tip warning={true}>
+
+Some PyTorch operations are not implemented in MPS yet and will throw an error. To avoid this, you should set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU kernels instead (you'll still see a `UserWarning`).


Is there a way to have trainer just use the CPU entirely and ignore the MPS backend?

I think you can set use_cpu=True here, but cc'ing @pacman100 who'll know more about it 🙂

transformers/src/transformers/training_args.py

Line 864 in ffa04de

use_cpu: bool = field(

amyeroberts · 2023-12-14T11:12:36Z

docs/source/en/main_classes/trainer.md

-2. Distributed setups `gloo` and `nccl` are not working with `mps` device. 
-This means that currently only single GPU of `mps` device type can be used.


Is this no longer the case?

I believe its still true, I didn't see mps among the supported backends for torch.distributed (included in the second to last paragraph of the new doc)

* mps docs * toctree

stevhliu added 2 commits December 13, 2023 11:46

mps docs

816e7f4

toctree

e3e191b

stevhliu requested a review from amyeroberts December 13, 2023 20:17

amyeroberts reviewed Dec 14, 2023

View reviewed changes

amyeroberts approved these changes Dec 14, 2023

View reviewed changes

Merge branch 'main' into mps-docs

cc1bcfc

stevhliu merged commit ebfdb9c into huggingface:main Dec 15, 2023
8 checks passed

stevhliu deleted the mps-docs branch December 15, 2023 21:17

iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023

[docs] MPS (huggingface#28016)

157353a

* mps docs * toctree

staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024

[docs] MPS (huggingface#28016)

c01e9ee

* mps docs * toctree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] MPS #28016

[docs] MPS #28016

stevhliu commented Dec 13, 2023

amyeroberts left a comment

amyeroberts Dec 14, 2023

stevhliu Dec 15, 2023

amyeroberts Dec 14, 2023

stevhliu Dec 15, 2023

		2. Distributed setups `gloo` and `nccl` are not working with `mps` device.
		This means that currently only single GPU of `mps` device type can be used.

[docs] MPS #28016

[docs] MPS #28016

Conversation

stevhliu commented Dec 13, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Dec 14, 2023

Choose a reason for hiding this comment

stevhliu Dec 15, 2023

Choose a reason for hiding this comment

amyeroberts Dec 14, 2023

Choose a reason for hiding this comment

stevhliu Dec 15, 2023

Choose a reason for hiding this comment