Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with latest upstream #5

Open
wants to merge 80 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
27d7865
fix circular dependencies when using Lora (#284)
zhangsheng377 Nov 9, 2023
6824e31
Universal ckp fixes (#276)
mosheisland Nov 9, 2023
4c0bc7f
Enable the combination of sequence length warmup and RoPE (#285)
conglongli Nov 9, 2023
e7f0201
Add the __init__.py for including the vision folder in installation p…
yuanwu2017 Nov 14, 2023
8760390
Change the text name of sample for compatible with Huggingface traine…
yuanwu2017 Nov 14, 2023
15355af
Enable the args.deepspeed_config to use dict type (#290)
yuanwu2017 Nov 14, 2023
155ce98
Add tgs metrics (#286)
CokeDong Nov 14, 2023
37050b8
Fix lm_eval_harness for GPT models (#292)
conglongli Nov 15, 2023
2348eed
universal-ckp: support llama model (#287)
mosheisland Nov 16, 2023
8415d03
fuse q and kve parameters for qga case (#291)
RezaYazdaniAminabadi Nov 21, 2023
b93495a
fix dropout of flash attention (#295)
tohtana Nov 21, 2023
7ca477d
fix typo error (#299)
inkcherry Nov 27, 2023
61d5d61
fix reshape for split qga (#307)
inkcherry Nov 30, 2023
bdef2b0
support huggingface tokenizer (#306)
zhangjian94cn Dec 5, 2023
aad7ad9
add RMSnorm torch fallback path (#312)
inkcherry Dec 6, 2023
b2b8b01
Update Universal Checkpointing README visualization PR (#314)
lekurile Dec 12, 2023
71e8407
Universal Checkpoint for Sequence Parallelism (#305)
samadejacobs Dec 14, 2023
d65921c
Revert "Modify the args_default usage" (#325)
yuanwu2017 Jan 5, 2024
a4f8079
bug fix on args.deepspeed_config_dict (#328)
ys950902 Jan 10, 2024
a3635ad
Add worker option for preprocess_data.py (#339)
xu-song Jan 23, 2024
141bfbe
Align variable before all_reduce with other files (#340)
kamil-kaczor Jan 23, 2024
b09c6a6
use fused_adam in deepspeed (#335)
ys950902 Jan 24, 2024
f9323e3
fix rms norm import on non cuda device (#341)
nrailg Jan 25, 2024
11f2d93
Supervised Fine-tuning for HugginFace pretrained weight. (#318)
inkcherry Jan 26, 2024
89e03fd
Fixed incorrect argument supplied to deepspeed init (#323)
SulRash Jan 29, 2024
6d4c535
Track additional metrics with W&B in `megatron/training.py` (#343)
saforem2 Feb 1, 2024
d47f3cd
Clean up UC scripts and update UC README (#344)
lekurile Feb 16, 2024
3557944
Add TensorBoard analysis script to Universal Checkpointing Example (#…
lekurile Feb 21, 2024
3a30913
Remove duplicate ctx save backward in cross_entropy.py (#347)
Feb 21, 2024
ea82c14
Update Megatron type check (#346)
lekurile Feb 21, 2024
c934137
Track additional metrics with W&B in `megatron/training.py` (#348)
saforem2 Feb 21, 2024
9ba0dcb
Add steps for running TensorBoard analysis in Universal Checkpointing…
lekurile Feb 22, 2024
81d68a3
Support configuration of RoPE theta (#351)
mosheisland Feb 25, 2024
31e2584
Support loading checkpoint specific tag (#352)
mosheisland Feb 26, 2024
a9856ce
Update pretrain_bert.py (#355)
lzzmm Feb 27, 2024
df0e2e4
Support universal checkpoint for GPTModel (#361)
mosheisland Mar 10, 2024
ebe8025
MOE: Support disable top2 2nd expert sampling (#362)
mosheisland Mar 10, 2024
888a63a
fix an issue for DP on Megatron-DeepSpeed (#368)
ys950902 Apr 2, 2024
3c5f475
remove contiguous copy for flash-attn opbuilder (#372)
YizhouZ Apr 2, 2024
bcedecd
Support MoE for GPTModelPipe (#373)
mosheisland Apr 9, 2024
7eb36a1
get distributed backend name via accelerator and check loss_scale bef…
polisettyvarma May 13, 2024
634e37b
Add steps and results for running ZeRO stage 3 withUniversal Checkpoi…
xylian86 Jun 26, 2024
527957e
Add Zero Bubble Pipeline Parallelism H1 Schedule (#396)
nvmdava Jun 27, 2024
f2d7589
Fix ParallelMLP and enable accelerator test (#403)
xinyu-intel Jun 27, 2024
ea4b67a
Fix test_deallocate_output_tensor (#404)
xinyu-intel Jun 27, 2024
08f5a99
Fixed missing BookCorpus dataset. (#407)
costin-eseanu Jul 1, 2024
c3a13be
Set proper arguments when constructing models in unit tests (#408)
xinyu-intel Jul 1, 2024
330f9f2
use split/squeeze instead of slice for performance (#409)
polisettyvarma Jul 8, 2024
af06d14
improve performance by keeping attention_mask on device and run ops f…
polisettyvarma Jul 8, 2024
ec3f1f4
Improve RoPE perf by using cached sin/cos tensors (#410)
polisettyvarma Jul 11, 2024
354e420
Extend test utilities to support more accelerators (#418)
xinyu-intel Jul 12, 2024
73252c0
clear document (#395)
inkcherry Jul 12, 2024
0971e68
add PyTorch profiler support (#414)
polisettyvarma Jul 15, 2024
73029ed
[Wandb] Refine wandb logging function (#416)
billishyahao Jul 16, 2024
fc989b8
add kill switch file support to gracefully exit training at runtime (…
polisettyvarma Jul 17, 2024
7d23e33
add support to run custom Hf tokenizer for training and dataset pre-p…
polisettyvarma Jul 18, 2024
13f2673
improve repeat_kv GQA perf (#419)
polisettyvarma Jul 19, 2024
3af2e25
acquire device when required (#420)
polisettyvarma Jul 19, 2024
08b9376
Add basic compilation test (#426)
loadams Jul 19, 2024
3afd267
Update yml to be valid (#427)
loadams Jul 19, 2024
8822a5c
Update/add GPT/Llama universal checkpointing scripts (#391)
lekurile Jul 29, 2024
1bfc35c
fixing the bug of flash_attn import and the wrong gather index when u…
YJHMITWEB Aug 1, 2024
53b241f
add fused_rms_norm support on XPU device (#431)
ys950902 Aug 4, 2024
61350c5
pass batch_dim_idx to deepspeed sequence parallel distributed attenti…
YJHMITWEB Aug 7, 2024
f132876
[LLaMa] Adding support converting checkpoint from mds to hf (#432)
billishyahao Aug 10, 2024
cdf5194
add device check when import ipex (#436)
ys950902 Aug 14, 2024
b7b2d5e
fix TFLOPs calculation (#371)
polisettyvarma Aug 19, 2024
4f9f1f6
fix nan issue when running megatron-deepspeed (#434)
ys950902 Aug 24, 2024
8e9d973
enable empty cache on XPU device (#438)
ys950902 Aug 26, 2024
543543a
[wandb] disable wandb more gracefully (#422)
billishyahao Aug 27, 2024
1280f59
[Bug] Fix crash when logging optimizer state to tb (#417)
billishyahao Aug 27, 2024
0d6e379
Enable Sequence Parallelism (#429)
polisettyvarma Sep 4, 2024
598c092
grad_wei can't be NoneType when running with DeepSpeed, for zero3 wil…
ys950902 Sep 20, 2024
8be7f48
fix init issue for rms_norm in squence_parallel (#448)
ys950902 Oct 4, 2024
4448492
enable profiler for specific ranks (#451)
ranzhejiang Oct 8, 2024
deb95cd
fix init issue for silently ignoring the deepspeed config (#452)
xylian86 Oct 17, 2024
6acc370
fix moe tflops (#445)
ranzhejiang Oct 18, 2024
676a482
Adding the new feature of FPDT (#441)
YJHMITWEB Dec 5, 2024
c3df187
[tool]GQA convert support (#454)
inkcherry Dec 18, 2024
f4157be
Fix import error in `deepspeed_to_megatron.py` (#455)
hotsuyuki Dec 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: python

on:
workflow_dispatch:
pull_request:
branches:
'**'
schedule:
- cron: "0 0 * * *"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
unit-tests:
strategy:
matrix:
pyVersion: ["3.7", "3.8", "3.9", "3.10"]
fail-fast: false

runs-on: ubuntu-22.04
container:
image: deepspeed/gh-builder:py${{ matrix.pyVersion }}

steps:
- uses: actions/checkout@v4

- name: environment
run: |
which python
python --version
- name: Install Megatron-DeepSpeed
run: |
pip3 install .
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,8 @@ python tools/preprocess_data.py \
--output-prefix my-bert \
--vocab-file bert-vocab.txt \
--tokenizer-type BertWordPieceLowerCase \
--split-sentences
--split-sentences \
--workers 5
</pre>

The output will be two files named, in this case, `my-bert_text_sentence.bin` and `my-bert_text_sentence.idx`. The `--data-path` specified in later BERT training is the full path and new filename, but without the file extension.
Expand All @@ -150,7 +151,8 @@ python tools/preprocess_data.py \
--dataset-impl mmap \
--tokenizer-type GPT2BPETokenizer \
--merge-file gpt2-merges.txt \
--append-eod
--append-eod \
--workers 5
</pre>

Here the output files are named `my-gpt2_text_document.bin` and `my-gpt2_text_document.idx`. As before, in GPT training, use the longer name without the extension as `--data-path`.
Expand Down
3 changes: 2 additions & 1 deletion examples_deepspeed/MoE/ds_evalharness.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ TASKS="lambada"
VOCAB_FILE=/data/Megatron-LM/data/gpt2-vocab.json
MERGE_FILE=/data/Megatron-LM/data/gpt2-merges.txt

export HF_DATASETS_OFFLINE=1
# export HF_DATASETS_OFFLINE=1

# Dummy arguments to make megatron happy. No need to configure them.
# The reason we don't need to configure them and many other arguments is
Expand All @@ -53,6 +53,7 @@ CMD="../../tasks/eval_harness/evaluate.py \
--no-load-rng \
--inference \
--disable-moe-token-dropping \
--tokenizer-type GPT2BPETokenizer \
--adaptive_seq_len\
--eval_fp32\
--task_list $TASKS\
Expand Down
8 changes: 4 additions & 4 deletions examples_deepspeed/MoE/readme_evalharness.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,10 @@ This particular setup uses the normal deepspeed checkpoint and requires no conve
On login console with external network

Get lm-eval harness (https://github.com/EleutherAI/lm-evaluation-harness) and `best-download==0.0.7` needed to download some tasks.
Below package version numbers are what we tested that work.
```
(maybe need pip install --upgrade pip)
pip install best-download==0.0.7
pip install lm-eval
(previously we used "pip install git+https://github.com/EleutherAI/lm-evaluation-harness" to install, but later found the command above has less dependency issues)
pip install best-download==0.0.7 lm-eval==0.2.0 datasets==1.15.1 transformers==4.20.1 huggingface-hub==0.8.1
```

2. Pre-download needed datasets
Expand All @@ -33,7 +32,8 @@ Then install datasets for the tasks:
```
python ../../tasks/eval_harness/download.py --task_list hellaswag,lambada,triviaqa,webqs,winogrande,piqa,arc_challenge,arc_easy,openbookqa,race,boolq,cb,copa,rte,wic,wsc,multirc,record,anli_r1,anli_r2,anli_r3,wikitext,logiqa,mathqa,mc_taco,mrpc,prost,pubmedqa,qnli,qqp,sciq,sst,wnli
```
and make sure that `export HF_DATASETS_OFFLINE=1`

Previously we set `export HF_DATASETS_OFFLINE=1` to make the dataset offline after the above manual download. But somehow now this could trigger error on some kind of online verification for some of the datasets, so it's recommended to only set offline mode when necessary.

<!-- If there are things like custom tokenizers, pre-download those too, e.g.:

Expand Down
5 changes: 3 additions & 2 deletions examples_deepspeed/compression/ds_evalharness.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# This is an example zero-shot eval script. Please first read the readme_evalharness.md under the same directory.
# This is an example zero-shot eval script. Please first read the readme_evalharness.md under the ../MoE directory.

# CHECKPOINT_PATH=/blob/users/minjiaz/compression_library/checkpoint/125M10L_Compression_Test_INT8_64gpu_lr6e-5_tokens5.25B_nocl_alpha-no_pp/global_step2000/
# CHECKPOINT_PATH=/blob/users/conglli/project/gpt3_with_pile/checkpoint/gpt3-with-pile-0.125B-lr-2.4e-3-minlr-6.0e-5-bs-2048-gpus-64-zero-0-mp-1-pp-1-no_pp-cl-startseqlen-72-step-27638-token-60B/global_step71000/
Expand Down Expand Up @@ -31,7 +31,7 @@ TASKS="lambada,wikitext"
VOCAB_FILE=/blob/data/the_pile_public_merged_nopreprocessing/gpt2-vocab.json
MERGE_FILE=/blob/data/the_pile_public_merged_nopreprocessing/gpt2-merges.txt

export HF_DATASETS_OFFLINE=1
# export HF_DATASETS_OFFLINE=1

# Dummy arguments to make megatron happy. No need to configure them.
# The reason we don't need to configure them and many other arguments is
Expand All @@ -56,6 +56,7 @@ CMD="../../tasks/eval_harness/evaluate.py \
--no-load-rng \
--inference \
--disable-moe-token-dropping \
--tokenizer-type GPT2BPETokenizer \
--adaptive_seq_len\
--eval_fp32\
--task_list $TASKS\
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"train_batch_size": GBSIZE,
"train_micro_batch_size_per_gpu": MBSIZE,
"steps_per_print": LOG_INTERVAL,

"zero_optimization": {
"stage": ZERO_STAGE
},

"gradient_clipping": 1.0,
"prescale_gradients": PRESCALE_GRAD,

"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 500,
"hysteresis": 2,
"min_loss_scale": 1,
"initial_scale_power": 11
},

"wall_clock_breakdown" : false,
"curriculum_learning": {
"enabled": true,
"curriculum_type": "seqlen",
"min_difficulty": CONFIG_CL_MIN,
"max_difficulty": CONFIG_CL_MAX,
"schedule_type": "fixed_linear",
"schedule_config": {
"total_curriculum_step": CONFIG_CL_DURATION,
"difficulty_step": 8
}
}
}
Loading