Skip to content

Commit

Permalink
docs: minor spelling tweaks (#623)
Browse files Browse the repository at this point in the history
Co-authored-by: Jeff Rasley <[email protected]>
  • Loading branch information
brettkoonce and jeffra authored Jan 5, 2021
1 parent d38ad6a commit 46d2e28
Show file tree
Hide file tree
Showing 6 changed files with 10 additions and 10 deletions.
8 changes: 4 additions & 4 deletions docs/_pages/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ DeepSpeed.

### Optimizer State and Gradient Partitioning
Optimizer State and Gradient Partitioning in ZeRO reduces the memory consumption of the
model states (optimizer states, gradients and parmaeters) by 8x compared to standard
model states (optimizer states, gradients and parameters) by 8x compared to standard
data parallelism by partitioning these states across data parallel process instead of
replicating them.

Expand Down Expand Up @@ -150,8 +150,8 @@ Please see the [core API doc](https://deepspeed.readthedocs.io/) for more detail

### Activation Checkpointing API

DeepSpeed's Activation Checkpoinitng API supports activation checkpoint partitioning,
cpu checkpoiniting, and contiguous memory optimizations, while also allowing layerwise
DeepSpeed's Activation Checkpointing API supports activation checkpoint partitioning,
cpu checkpointing, and contiguous memory optimizations, while also allowing layerwise
profiling. Please see the [core API doc](https://deepspeed.readthedocs.io/) for more details.


Expand Down Expand Up @@ -190,7 +190,7 @@ NVIDIA, or any training optimizer that extends torch's `torch.optim.Optimizer` c
We introduce an efficient implementation of Adam optimizer on CPU that improves the parameter-update
performance by nearly an order of magnitude. We use the AVX SIMD instructions on Intel-x86 architecture
for the CPU-Adam implementation. We support both AVX-512 and AVX-2 instruction sets. DeepSpeed uses
AVX-2 by defualt which can be switched to AVX-512 by setting the build flag, `DS_BUILD_AVX512` to 1 when
AVX-2 by default which can be switched to AVX-512 by setting the build flag, `DS_BUILD_AVX512` to 1 when
installing DeepSpeed. Using AVX-512, we observe 5.1x to 6.5x speedups considering the model-size between
1 to 10 billion parameters with respect to torch-adam.

Expand Down
2 changes: 1 addition & 1 deletion docs/_posts/2020-09-08-sparse-attention-news.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ DeepSpeed offers sparse attention kernels, an instrumental technology to support
* Brief overview, see our [press release]({{ site.press_release_v3 }}).
* Detailed technology deep dive, see our [blog post](https://www.deepspeed.ai/news/2020/09/08/sparse-attention.html).
* Tutorial on how to use sparse attention, see our [Sparse attention tutorial](https://www.deepspeed.ai/tutorials/sparse-attention/).
* The source code for our sparse attention kernels can be found in the [DeepSpeed repo](https://github.com/microsoft/deepspeed) and BERT pre-training code useing sparse attention can be found in the [DeepSpeedExamples repo](https://github.com/microsoft/deepspeedexamples).
* The source code for our sparse attention kernels can be found in the [DeepSpeed repo](https://github.com/microsoft/deepspeed) and BERT pre-training code using sparse attention can be found in the [DeepSpeedExamples repo](https://github.com/microsoft/deepspeedexamples).
4 changes: 2 additions & 2 deletions docs/_tutorials/onebit-adam.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Alternatively, we show how the standard `mpirun` launcher can be used for launch
mpirun -np [#processes] -ppn [#GPUs on each node] -hostfile [hostfile] [MPI flags] bash run_squad_mpi_onebitadam.sh
```
For example, in order to use 32 GPUs (4GPUs/node, 8 nodes in total), with the support of InfiniBand, you can use the `mpirun` launcher packaged with the MVAPICH2 library. Please run the folowing command:
For example, in order to use 32 GPUs (4GPUs/node, 8 nodes in total), with the support of InfiniBand, you can use the `mpirun` launcher packaged with the MVAPICH2 library. Please run the following command:
```shell
mpirun -np 32 -ppn 4 -hostfile hosts -env MV2_USE_CUDA=1 -env MV2_SUPPORT_DL=1 -env MV2_ENABLE_AFFINITY=0 -env MV2_SMP_USE_CMA=0 bash run_squad_mpi_onebitadam.sh
Expand Down Expand Up @@ -166,7 +166,7 @@ We fixed the learning rate to 3e-5. The table below shows the F1 and the EM scor
***Training Speed and Scalability:***
1-bit Adam enables up to 2.7x overall speedup in training speed for SQuAD fine-tuning. This is made possible by up to 6.2x faster througput during the compressed stage of the algorithm as shown in Figure 1.
1-bit Adam enables up to 2.7x overall speedup in training speed for SQuAD fine-tuning. This is made possible by up to 6.2x faster throughput during the compressed stage of the algorithm as shown in Figure 1.
![SQuAD Finetuning](/assets/images/squad-scaling.png){: .align-center}
Expand Down
2 changes: 1 addition & 1 deletion docs/_tutorials/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ net = PipelineModule(layers=net, num_stages=2)
```
`PipelineModule` uses its `layers` argument as the sequence of layers that
comprise the model. After initialization, `net` is divided into two pipeline
stages and its layers moved to the correpsonding GPUs. If more than two GPUs
stages and its layers moved to the corresponding GPUs. If more than two GPUs
are present, DeepSpeed will also use hybrid data parallelism.

**Note:** The total number of GPUs must be divisible by the number of pipeline
Expand Down
2 changes: 1 addition & 1 deletion docs/_tutorials/progressive_layer_dropping.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Note that the above configuration assumes training on 64 X 32GB V100 GPUs. Each

Table 1. Pre-training hyperparameters

**Note:** DeepSpeed now supports PreLayerNorm as the default way for training BERT, because of its ability to avoid vanishing gradient, stablize optimization, and performance gains, as described in our fastest BERT training [blog post](https://www.deepspeed.ai/news/2020/05/27/fastest-bert-training.html). We therefore support the switchable Transformer block directly on the the BERT with PreLayerNorm. The implementation can be found at "example\bing_bert\nvidia\modelingpreln_layerdrop.py".
**Note:** DeepSpeed now supports PreLayerNorm as the default way for training BERT, because of its ability to avoid vanishing gradient, stabilize optimization, and performance gains, as described in our fastest BERT training [blog post](https://www.deepspeed.ai/news/2020/05/27/fastest-bert-training.html). We therefore support the switchable Transformer block directly on the the BERT with PreLayerNorm. The implementation can be found at "example\bing_bert\nvidia\modelingpreln_layerdrop.py".

## Fine-tuning with DeepSpeed on GLUE Tasks

Expand Down
2 changes: 1 addition & 1 deletion docs/_tutorials/zero.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Next, we need to update the DeepSpeed json configuration, as shown below, to ena
}
```

In the above changes, we have set the _stage_ field to 2, and configured other optimization knobs that are available in ZeRO stage 2. For example, we have enabled _contiguous_gradients_ to reduce memory fragmenation during backward pass. A full description of these optimization knobs is available [here](/docs/config-json/#zero-optimizations-for-fp16-training). With these changes, we can now launch the training run.
In the above changes, we have set the _stage_ field to 2, and configured other optimization knobs that are available in ZeRO stage 2. For example, we have enabled _contiguous_gradients_ to reduce memory fragmentation during backward pass. A full description of these optimization knobs is available [here](/docs/config-json/#zero-optimizations-for-fp16-training). With these changes, we can now launch the training run.

Here is a screenshot of the training log:

Expand Down

0 comments on commit 46d2e28

Please sign in to comment.