Skip to content

Actions: microsoft/DeepSpeed

nv-accelerate-v100

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
5,028 workflow runs
5,028 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Add the missing view operations from sequence parallel(async).
nv-accelerate-v100 #12510: Pull request #6750 synchronize by loadams
December 16, 2024 22:49 14m 38s inkcherry:ds_overlap_fix
December 16, 2024 22:49 14m 38s
Zero2: avoid graph breaks in torch.compile by using param_idx
nv-accelerate-v100 #12509: Pull request #6803 synchronize by loadams
December 16, 2024 22:15 6m 22s nelyahu:zero2_param_idx
December 16, 2024 22:15 6m 22s
Fix --enable_each_rank_log when used with PDSH multi-node runner
nv-accelerate-v100 #12508: Pull request #6863 synchronize by loadams
December 16, 2024 21:28 11m 20s akeshet:akeshet/pdsh_rank_log
December 16, 2024 21:28 11m 20s
Fix: forbid repeated deepspeed.initialize on training objects
nv-accelerate-v100 #12507: Pull request #6874 synchronize by traincheck-team
December 16, 2024 21:02 Action required traincheck-team:fix-6848-forbid-repeated-init
December 16, 2024 21:02 Action required
Fix: forbid repeated deepspeed.initialize on training objects
nv-accelerate-v100 #12506: Pull request #6874 synchronize by traincheck-team
December 16, 2024 20:59 Action required traincheck-team:fix-6848-forbid-repeated-init
December 16, 2024 20:59 Action required
Support pure meta model lm_head tp
nv-accelerate-v100 #12505: Pull request #6812 synchronize by loadams
December 16, 2024 19:34 11m 36s Yejing-Lai:lyj/lm_head_replace
December 16, 2024 19:34 11m 36s
Add MLP/lm_head tp grain size setting.
nv-accelerate-v100 #12504: Pull request #6828 synchronize by loadams
December 16, 2024 19:33 28m 3s Yejing-Lai:lyj/tp_grain_size
December 16, 2024 19:33 28m 3s
Add the missing view operations from sequence parallel(async).
nv-accelerate-v100 #12503: Pull request #6750 synchronize by loadams
December 16, 2024 19:33 28m 32s inkcherry:ds_overlap_fix
December 16, 2024 19:33 28m 32s
Fix --enable_each_rank_log when used with PDSH multi-node runner
nv-accelerate-v100 #12502: Pull request #6863 synchronize by loadams
December 16, 2024 19:06 11m 33s akeshet:akeshet/pdsh_rank_log
December 16, 2024 19:06 11m 33s
Fix --enable_each_rank_log when used with PDSH multi-node runner
nv-accelerate-v100 #12499: Pull request #6863 synchronize by loadams
December 16, 2024 17:16 11m 38s akeshet:akeshet/pdsh_rank_log
December 16, 2024 17:16 11m 38s
Add arctic model support by adding w2 to all_reduce
nv-accelerate-v100 #12498: Pull request #6856 synchronize by tjruwase
December 16, 2024 12:24 11m 57s pi314ever:arctic-enabling-upstream
December 16, 2024 12:24 11m 57s
nv-accelerate-v100
nv-accelerate-v100 #12496: Scheduled
December 16, 2024 00:08 3m 51s master
December 16, 2024 00:08 3m 51s
Stage3: Use new torch grad accumulation hooks API
nv-accelerate-v100 #12495: Pull request #6773 synchronize by deepcharm
December 15, 2024 12:43 Action required deepcharm:stage3-use-new-grad-acc-api
December 15, 2024 12:43 Action required
Stage3: Use new torch grad accumulation hooks API
nv-accelerate-v100 #12494: Pull request #6773 synchronize by deepcharm
December 15, 2024 12:39 Action required deepcharm:stage3-use-new-grad-acc-api
December 15, 2024 12:39 Action required
nv-accelerate-v100
nv-accelerate-v100 #12493: Scheduled
December 15, 2024 00:08 3m 48s master
December 15, 2024 00:08 3m 48s
Use ds-specific module id to avoid conflicts
nv-accelerate-v100 #12490: Pull request #6847 synchronize by loadams
December 14, 2024 00:43 54m 51s olruwase/pr_6772
December 14, 2024 00:43 54m 51s
Fix assertion for offloading states
nv-accelerate-v100 #12489: Pull request #6855 synchronize by loadams
December 14, 2024 00:42 41m 58s tohtana/fix_offload_states_assert
December 14, 2024 00:42 41m 58s
nv-accelerate-v100
nv-accelerate-v100 #12488: Scheduled
December 14, 2024 00:07 11m 29s master
December 14, 2024 00:07 11m 29s
Fix assertion for offloading states
nv-accelerate-v100 #12487: Pull request #6855 synchronize by loadams
December 14, 2024 00:05 12m 48s tohtana/fix_offload_states_assert
December 14, 2024 00:05 12m 48s
Remove warnings from autodoc and sphinx
nv-accelerate-v100 #12486: Pull request #6788 synchronize by loadams
December 13, 2024 23:30 11m 45s loadams/autodoc-pydantic-cleanup
December 13, 2024 23:30 11m 45s
Update real_accelerator.py
nv-accelerate-v100 #12484: Pull request #6845 synchronize by loadams
December 13, 2024 21:54 19m 36s keiwoo:master
December 13, 2024 21:54 19m 36s