Skip to content

Self-hosted runner (AMD mi210 scheduled CI caller) #313

Self-hosted runner (AMD mi210 scheduled CI caller)

Self-hosted runner (AMD mi210 scheduled CI caller) #313

Triggered via workflow run September 23, 2024 02:39
@ydshiehydshieh
completed 78b2929
Status Failure
Total duration 6h 8m 20s
Artifacts 3
DeepSpeed CI  /  Check Runner Status
6s
DeepSpeed CI / Check Runner Status
Example CI  /  Check Runner Status
6s
Example CI / Check Runner Status
Model CI  /  Check Runner Status
6s
Model CI / Check Runner Status
Torch pipeline CI  /  Check Runner Status
7s
Torch pipeline CI / Check Runner Status
Matrix: DeepSpeed CI / Check Runners
Matrix: Example CI / Check Runners
Matrix: Model CI / Check Runners
Matrix: Torch pipeline CI / Check Runners
Matrix: DeepSpeed CI / Setup
Matrix: DeepSpeed CI / Examples directory
Matrix: DeepSpeed CI / PyTorch pipelines
Matrix: DeepSpeed CI / Torch ROCm deepspeed tests
Matrix: Example CI / Setup
Matrix: Example CI / Examples directory
Matrix: Example CI / PyTorch pipelines
Matrix: Example CI / Torch ROCm deepspeed tests
Matrix: Model CI / Setup
Matrix: Model CI / Examples directory
Matrix: Model CI / PyTorch pipelines
Matrix: Model CI / Torch ROCm deepspeed tests
Matrix: Torch pipeline CI / Setup
Matrix: Torch pipeline CI / Examples directory
Matrix: Torch pipeline CI / PyTorch pipelines
Matrix: Torch pipeline CI / Torch ROCm deepspeed tests
Matrix: DeepSpeed CI / Single GPU tests
Waiting for pending jobs
Matrix: Example CI / Single GPU tests
Waiting for pending jobs
Matrix: Model CI / Single GPU tests
Waiting for pending jobs
Matrix: Torch pipeline CI / Single GPU tests
Waiting for pending jobs
DeepSpeed CI  /  ...  /  Send results to webhook
17s
DeepSpeed CI / Slack Report / Send results to webhook
Example CI  /  ...  /  Send results to webhook
19s
Example CI / Slack Report / Send results to webhook
Model CI  /  ...  /  Send results to webhook
18s
Model CI / Slack Report / Send results to webhook
Torch pipeline CI  /  ...  /  Send results to webhook
16s
Torch pipeline CI / Slack Report / Send results to webhook
Fit to window
Zoom out
Zoom in

Annotations

15 errors and 1 warning
Model CI / Check Runners (multi-gpu)
Process completed with exit code 139.
Model CI / Check Runners (single-gpu)
The job was canceled because "multi-gpu" failed.
Model CI / Check Runners (single-gpu)
The operation was canceled.
DeepSpeed CI / Check Runners (multi-gpu)
The job running on runner hf-amd-mi210-ci-2gpu-2 has exceeded the maximum execution time of 360 minutes.
DeepSpeed CI / Check Runners (multi-gpu)
The operation was canceled.
DeepSpeed CI / Check Runners (single-gpu)
The job running on runner hf-amd-mi210-ci-1gpu-1 has exceeded the maximum execution time of 360 minutes.
DeepSpeed CI / Check Runners (single-gpu)
The operation was canceled.
Example CI / Check Runners (single-gpu)
The job running on runner hf-amd-mi210-ci-1gpu-3 has exceeded the maximum execution time of 360 minutes.
Example CI / Check Runners (single-gpu)
The operation was canceled.
Example CI / Check Runners (multi-gpu)
The job was canceled because "single-gpu" failed.
Example CI / Check Runners (multi-gpu)
The operation was canceled.
Torch pipeline CI / Check Runners (single-gpu)
The job running on runner hf-amd-mi210-ci-1gpu-2 has exceeded the maximum execution time of 360 minutes.
Torch pipeline CI / Check Runners (single-gpu)
The operation was canceled.
Torch pipeline CI / Check Runners (multi-gpu)
The job was canceled because "single-gpu" failed.
Torch pipeline CI / Check Runners (multi-gpu)
The operation was canceled.
DeepSpeed CI / Slack Report / Send results to webhook
No files were found with the provided path: ci_results_run_torch_cuda_extensions_gpu. No artifacts will be uploaded.

Artifacts

Produced during runtime
Name Size
ci_results_run_examples_gpu Expired
266 Bytes
ci_results_run_models_gpu Expired
154 Bytes
ci_results_run_pipelines_torch_gpu Expired
280 Bytes