forked from espnet/espnet
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request espnet#5856 from jctian98/deepspeed
Add DeepSpeed trainer for large-scale training
- Loading branch information
Showing
11 changed files
with
504 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
{ | ||
"train_micro_batch_size_per_gpu": 1, | ||
"gradient_accumulation_steps": 1, | ||
"gradient_clipping": 1.0, | ||
"bf16": { | ||
"enabled": true | ||
}, | ||
"zero_optimization": { | ||
"stage": 2, | ||
"contiguous_gradients": true, | ||
"overlap_comm": true, | ||
"reduce_scatter": true, | ||
"reduce_bucket_size": 5e8, | ||
"allgather_bucket_size": 5e8 | ||
}, | ||
"optimizer": { | ||
"type": "Adam", | ||
"params": { | ||
"lr": 0.001, | ||
"betas": [ | ||
0.9, | ||
0.95 | ||
], | ||
"eps": 1e-8, | ||
"weight_decay": 3e-7, | ||
"adam_w_mode": true | ||
} | ||
}, | ||
"scheduler": { | ||
"type": "WarmupLR", | ||
"params": { | ||
"warmup_min_lr": 0, | ||
"warmup_max_lr": 0.0001, | ||
"warmup_num_steps": 30000 | ||
} | ||
}, | ||
"wall_clock_breakdown": false, | ||
"steps_per_print": 1000 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# A toy example of how DeepSpeed is used in ESPnet. | ||
# With DeepSpeed, users only need to specify the model- and dataloader-realted items. | ||
# Other configs should be specified in deepspeed_config file, such as: | ||
# * optimization | ||
# * training dtype or automatic mixed precision (AMP) setup | ||
# * gradient accumulation | ||
# * gradient clip | ||
# * model saving and loading | ||
# * learning rate scheduler | ||
# * ... | ||
# | ||
# With DeepSpeed, one can also use some advanced trainer features, such as: | ||
# * ZeRO-1/2/3 optimization | ||
# * parameter offload | ||
# * activation checkpointing | ||
# * ... | ||
# So that a very large model can be trained easily. | ||
# | ||
# The provided conf/deepspeed_zero2.json only contains a simple use case of DeepSpeed. | ||
# Based on model arch and cluster feature, advanced users are encouraged to tune the | ||
# config file following the official documents: https://deepspeed.readthedocs.io/en/latest/ | ||
# | ||
# Note: the batch size-related setup is up to ESPnet dataloader settings rather than | ||
# those specified in DeepSpeed config. | ||
# | ||
# Before training with DeepSpeed, make sure it has been installed. | ||
# DeepSpeed will compile some torch extensions when you use them for the first time. So make | ||
# sure you have ${CUDA_HOME} in your environment variables that contain a complete CUDA | ||
# installation that is compatible with your pytorch CUDA. The compatibility requirement is | ||
# only about the major CUDA version. E.g., CUDA 11.x are always compatible with each other. | ||
|
||
use_deepspeed: true | ||
deepspeed_config: conf/deepspeed_zero2.json | ||
|
||
batch_type: folded | ||
batch_size: 64 | ||
max_epoch: 200 | ||
|
||
encoder: transformer | ||
encoder_conf: | ||
output_size: 256 | ||
attention_heads: 4 | ||
linear_units: 2048 | ||
num_blocks: 12 | ||
dropout_rate: 0.1 | ||
positional_dropout_rate: 0.1 | ||
attention_dropout_rate: 0.0 | ||
input_layer: conv2d | ||
normalize_before: true | ||
|
||
decoder: transformer | ||
decoder_conf: | ||
attention_heads: 4 | ||
linear_units: 2048 | ||
num_blocks: 6 | ||
dropout_rate: 0.1 | ||
positional_dropout_rate: 0.1 | ||
self_attention_dropout_rate: 0.0 | ||
src_attention_dropout_rate: 0.0 | ||
|
||
model_conf: | ||
ctc_weight: 0.3 | ||
lsm_weight: 0.1 | ||
length_normalized_loss: false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
{ | ||
"train_micro_batch_size_per_gpu": 1, | ||
"gradient_accumulation_steps": 1, | ||
"gradient_clipping": 1.0, | ||
"bf16": { | ||
"enabled": true | ||
}, | ||
"zero_optimization": { | ||
"stage": 2, | ||
"contiguous_gradients": true, | ||
"overlap_comm": true, | ||
"reduce_scatter": true, | ||
"reduce_bucket_size": 5e8, | ||
"allgather_bucket_size": 5e8 | ||
}, | ||
"optimizer": { | ||
"type": "Adam", | ||
"params": { | ||
"lr": 0.001, | ||
"betas": [ | ||
0.9, | ||
0.95 | ||
], | ||
"eps": 1e-8, | ||
"weight_decay": 3e-7, | ||
"adam_w_mode": true | ||
} | ||
}, | ||
"scheduler": { | ||
"type": "WarmupLR", | ||
"params": { | ||
"warmup_min_lr": 0, | ||
"warmup_max_lr": 0.0001, | ||
"warmup_num_steps": 30000 | ||
} | ||
}, | ||
"wall_clock_breakdown": false, | ||
"steps_per_print": 1000 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.