Skip to content

Actions: microsoft/DeepSpeed

nv-lightning-v100

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
5,135 workflow runs
5,135 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

nv-lightning-v100
nv-lightning-v100 #13875: Scheduled
December 28, 2024 00:20 5m 47s master
December 28, 2024 00:20 5m 47s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-lightning-v100 #13874: Pull request #6909 synchronize by hj-wei
December 27, 2024 03:06 Action required hj-wei:dev_hjwei
December 27, 2024 03:06 Action required
nv-lightning-v100
nv-lightning-v100 #13871: Scheduled
December 27, 2024 00:20 5m 47s master
December 27, 2024 00:20 5m 47s
Stage3: Use new torch grad accumulation hooks API
nv-lightning-v100 #13870: Pull request #6773 synchronize by loadams
December 26, 2024 20:09 5m 56s deepcharm:stage3-use-new-grad-acc-api
December 26, 2024 20:09 5m 56s
Stage3: Use new torch grad accumulation hooks API
nv-lightning-v100 #13868: Pull request #6773 synchronize by loadams
December 26, 2024 17:40 6m 34s deepcharm:stage3-use-new-grad-acc-api
December 26, 2024 17:40 6m 34s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-lightning-v100 #13867: Pull request #6909 synchronize by loadams
December 26, 2024 17:15 Action required hj-wei:dev_hjwei
December 26, 2024 17:15 Action required
Use ds-specific module id to avoid conflicts
nv-lightning-v100 #13866: Pull request #6847 synchronize by loadams
December 26, 2024 17:13 18m 43s olruwase/pr_6772
December 26, 2024 17:13 18m 43s
Fix checkpointable_layers Logic
nv-lightning-v100 #13865: Pull request #6881 synchronize by loadams
December 26, 2024 17:12 14m 13s Quentin-Anthony:qanthony/fix-act-recomp
December 26, 2024 17:12 14m 13s
Update Gaudi2 jobs to latest 1.19 build
nv-lightning-v100 #13864: Pull request #6905 synchronize by loadams
December 26, 2024 17:12 6m 5s raza-sikander:master
December 26, 2024 17:12 6m 5s
Add fp8_gemm fallback for non-triton systems
nv-lightning-v100 #13862: Pull request #6916 opened by oelayan7
December 26, 2024 08:52 Action required oelayan7:fp8_gemm_no_triton
December 26, 2024 08:52 Action required
nv-lightning-v100
nv-lightning-v100 #13861: Scheduled
December 26, 2024 00:20 5m 52s master
December 26, 2024 00:20 5m 52s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-lightning-v100 #13857: Pull request #6909 synchronize by hj-wei
December 25, 2024 02:18 Action required hj-wei:dev_hjwei
December 25, 2024 02:18 Action required
Add the missing view operations from sequence parallel(async).
nv-lightning-v100 #13856: Pull request #6750 synchronize by inkcherry
December 25, 2024 01:50 Action required inkcherry:ds_overlap_fix
December 25, 2024 01:50 Action required
nv-lightning-v100
nv-lightning-v100 #13855: Scheduled
December 25, 2024 00:20 5m 35s master
December 25, 2024 00:20 5m 35s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-lightning-v100 #13854: Pull request #6909 opened by hj-wei
December 24, 2024 07:38 Action required hj-wei:dev_hjwei
December 24, 2024 07:38 Action required
[inf] Add config var to enable keeping module on host
nv-lightning-v100 #13853: Pull request #6846 synchronize by oelayan7
December 24, 2024 06:49 6m 33s oelayan7:keep_module_on_host
December 24, 2024 06:49 6m 33s
nv-lightning-v100
nv-lightning-v100 #13852: Scheduled
December 24, 2024 00:21 6m 51s master
December 24, 2024 00:21 6m 51s
Tecorigin sdaa accelerator
nv-lightning-v100 #13851: Pull request #6903 synchronize by tjruwase
December 23, 2024 23:13 Action required siqi654321:Tecorigin-SDAA-accelerator
December 23, 2024 23:13 Action required