diff --git a/README.md b/README.md index 31051297d10c..5f5452ce232d 100755 --- a/README.md +++ b/README.md @@ -121,7 +121,7 @@ DeepSpeed has been integrated with several different popular open-source DL fram | | Documentation | | ---------------------------------------------------------------------------------------------- | -------------------------------------------- | - | [Transformers with DeepSpeed](https://huggingface.co/docs/transformers/main/main_classes/deepspeed) | + | [Transformers with DeepSpeed](https://huggingface.co/docs/transformers/deepspeed) | | | [Accelerate with DeepSpeed](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) | | | [Lightning with DeepSpeed](https://lightning.ai/docs/pytorch/stable/advanced/model_parallel.html#deepspeed) | | | [MosaicML with DeepSpeed](https://docs.mosaicml.com/projects/composer/en/latest/trainer/using_the_trainer.html?highlight=deepspeed#deepspeed-integration) | diff --git a/blogs/deepspeed-chat/japanese/README.md b/blogs/deepspeed-chat/japanese/README.md index 23ead99f5c8d..3ec570a9ea40 100644 --- a/blogs/deepspeed-chat/japanese/README.md +++ b/blogs/deepspeed-chat/japanese/README.md @@ -332,7 +332,7 @@ DeepSpeedは、以下のような機能を提供します。 DeepSpeedは、Microsoftの[AI at Scale initiative](https://www.microsoft.com/en-us/research/project/ai-at-scale/)の一部で、次世代AIの機能の大規模な実現を進めています。詳細は[こちら](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)をご覧ください。DeepSpeedは、[Megatron-Turing NLG (530B)](https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/), [Jurassic-1 (178B)](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf), [BLOOM (176B)](https://huggingface.co/blog/bloom-megatron-deepspeed), [GLM (130B)](https://github.com/THUDM/GLM-130B), [YaLM (100B)](https://github.com/yandex/YaLM-100B) を含め、様々な大規模モデルを学習するのに使用されてきました。 -またDeepSpeedは、 [Hugging Face Transformers](https://huggingface.co/docs/transformers/main/main_classes/deepspeed), [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/usage_guides/deepspeed), [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.strategies.DeepSpeedStrategy.html), [MosaicML Composer](https://docs.mosaicml.com/en/latest/trainer/using_the_trainer.html?highlight=deepspeed#deepspeed-integration), [Determined AI](https://docs.determined.ai/latest/training/apis-howto/deepspeed/overview.html) など、多くの著名なオープンソースの深層学習フレームワークのバックエンドとして利用されています。 +またDeepSpeedは、 [Hugging Face Transformers](https://huggingface.co/docs/transformers/deepspeed), [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/usage_guides/deepspeed), [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.strategies.DeepSpeedStrategy.html), [MosaicML Composer](https://docs.mosaicml.com/en/latest/trainer/using_the_trainer.html?highlight=deepspeed#deepspeed-integration), [Determined AI](https://docs.determined.ai/latest/training/apis-howto/deepspeed/overview.html) など、多くの著名なオープンソースの深層学習フレームワークのバックエンドとして利用されています。 DeepSpeedについてのより詳しい情報は、以下をご覧ください。 diff --git a/deepspeed/autotuning/README.md b/deepspeed/autotuning/README.md index b1fa435364d2..fc76ed1e9f8d 100755 --- a/deepspeed/autotuning/README.md +++ b/deepspeed/autotuning/README.md @@ -336,7 +336,7 @@ The Autotuner stops exploring the space when any of the following conditions mee ## Using Autotuning with Hugging Face -Hugging Face users can set some configurations values to ["auto"](https://huggingface.co/transformers/main_classes/deepspeed.html?highlight=gradient_accumulation_steps#shared-configuration). +Hugging Face users can set some configurations values to ["auto"](https://huggingface.co/docs/transformers/deepspeed#deepspeed-and-trainer-parameters). `"auto"` means the value will be set to the default in Hugging Face or be overwritten using the supplied values from the command line arguments. In DeepSpeed Autotuning, if the user-provided DeepSpeed configuration file has "auto" keywords, they are treated as the value "auto". diff --git a/docs/_tutorials/getting-started.md b/docs/_tutorials/getting-started.md index ce9e3ee9a892..36dcdf41d9d0 100644 --- a/docs/_tutorials/getting-started.md +++ b/docs/_tutorials/getting-started.md @@ -9,7 +9,7 @@ tags: getting-started * Installing is as simple as `pip install deepspeed`, [see more details](/tutorials/advanced-install/). * To get started with DeepSpeed on AzureML, please see the [AzureML Examples GitHub](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/deepspeed) -* DeepSpeed has direct integrations with [HuggingFace Transformers](https://github.com/huggingface/transformers) and [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning). HuggingFace Transformers users can now easily accelerate their models with DeepSpeed through a simple ``--deepspeed`` flag + config file [See more details](https://huggingface.co/docs/transformers/main_classes/deepspeed). PyTorch Lightning provides easy access to DeepSpeed through the Lightning Trainer [See more details](https://pytorch-lightning.readthedocs.io/en/stable/advanced/multi_gpu.html?highlight=deepspeed#deepspeed). +* DeepSpeed has direct integrations with [HuggingFace Transformers](https://github.com/huggingface/transformers) and [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning). HuggingFace Transformers users can now easily accelerate their models with DeepSpeed through a simple ``--deepspeed`` flag + config file [See more details](https://huggingface.co/docs/transformers/deepspeed). PyTorch Lightning provides easy access to DeepSpeed through the Lightning Trainer [See more details](https://pytorch-lightning.readthedocs.io/en/stable/advanced/multi_gpu.html?highlight=deepspeed#deepspeed). * DeepSpeed on AMD can be used via our [ROCm images](https://hub.docker.com/r/deepspeed/rocm501/tags), e.g., `docker pull deepspeed/rocm501:ds060_pytorch110`. * DeepSpeed also supports Intel Xeon CPU, Intel Data Center Max Series XPU, Intel Gaudi HPU, Huawei Ascend NPU etc, please refer to the [accelerator setup guide](/tutorials/accelerator-setup-guide/) diff --git a/docs/_tutorials/zero.md b/docs/_tutorials/zero.md index c84339ece9e5..0bb95cbcddd8 100644 --- a/docs/_tutorials/zero.md +++ b/docs/_tutorials/zero.md @@ -13,7 +13,7 @@ ZeRO leverages the aggregate computation and memory resources of data parallelis * **Stage 1**: The optimizer states (e.g., for [Adam optimizer](https://arxiv.org/abs/1412.6980), 32-bit weights, and the first, and second moment estimates) are partitioned across the processes, so that each process updates only its partition. -* **Stage 2**: The reduced 32-bit gradients for updating the model weights are also partitioned such that each process retains only the gradients corresponding to its portion of the optimizer states. +* **Stage 2**: The reduced 16-bit gradients for updating the model weights are also partitioned such that each process retains only the gradients corresponding to its portion of the optimizer states. * **Stage 3**: The 16-bit model parameters are partitioned across the processes. ZeRO-3 will automatically collect and partition them during the forward and backward passes. diff --git a/docs/code-docs/source/zero3.rst b/docs/code-docs/source/zero3.rst index aa8139a654a1..a24313cadb7a 100644 --- a/docs/code-docs/source/zero3.rst +++ b/docs/code-docs/source/zero3.rst @@ -10,7 +10,7 @@ communication efficiency. #. **ZeRO Stage 1**: The optimizer states (e.g., for `Adam optimizer `_, 32-bit weights, and the first, and second moment estimates) are partitioned across the processes, so that each process updates only its partition. -#. **ZeRO Stage 2**: The reduced 32-bit gradients for updating the model weights are also partitioned such that each process retains only the gradients corresponding to its portion of the optimizer states. +#. **ZeRO Stage 2**: The reduced 16-bit gradients for updating the model weights are also partitioned such that each process retains only the gradients corresponding to its portion of the optimizer states. #. **ZeRO Stage 3**: The 16-bit model parameters are partitioned across the processes. ZeRO-3 will automatically collect and partition them during the forward and backward passes. diff --git a/docs/index.md b/docs/index.md index 1efdcea132d2..3279682b42d4 100755 --- a/docs/index.md +++ b/docs/index.md @@ -94,7 +94,7 @@ DeepSpeed has been integrated with several different popular open-source DL fram | | Documentation | | ---------------------------------------------------------------------------------------------- | -------------------------------------------- | -| | [Transformers with DeepSpeed](https://huggingface.co/docs/transformers/main/main_classes/deepspeed) | +| | [Transformers with DeepSpeed](https://huggingface.co/docs/transformers/deepspeed) | | | [Accelerate with DeepSpeed](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) | | | [Lightning with DeepSpeed](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.strategies.DeepSpeedStrategy.html) | | | [MosaicML with DeepSpeed](https://docs.mosaicml.com/en/latest/trainer/using_the_trainer.html?highlight=deepspeed#deepspeed-integration) | diff --git a/examples/README.md b/examples/README.md index 5dfc26c17613..14393ef4545d 100644 --- a/examples/README.md +++ b/examples/README.md @@ -5,5 +5,5 @@ If you are looking for examples using DeepSpeed please see the following resourc 1. [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples) 2. [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed) 3. [DeepSpeed + AzureML](https://github.com/Azure/azureml-examples/tree/main/v1/python-sdk/workflows/train/deepspeed) -4. [DeepSpeed + Hugging Face Transformers Integration](https://huggingface.co/docs/transformers/main_classes/deepspeed) +4. [DeepSpeed + Hugging Face Transformers Integration](https://huggingface.co/docs/transformers/deepspeed) 5. [DeepSpeed + PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.utilities.deepspeed.html)