Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample H100 job #1235

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Update h100_benchmark.yml

defeb5b
Select commit
Loading
Failed to load commit list.
Open

Sample H100 job #1235

Update h100_benchmark.yml
defeb5b
Select commit
Loading
Failed to load commit list.
PyTorch Bot / Dr.CI completed Nov 7, 2024 in 0s

Dr.CI classification results

{"FAILED":[{"workflowId":11728298718,"workflowUniqueId":89543087,"id":32671465306,"runnerName":"i-0d506f63c6bcd8768","authorEmail":"[email protected]","name":"Run Regression Tests / test (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/whl/nightl... / linux-job","jobName":"test (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/whl/nightl... / linux-job","conclusion":"failure","completed_at":"2024-11-07T17:48:59.000000000Z","html_url":"https://github.com/pytorch/ao/actions/runs/11728298718/job/32671465306","head_branch":"msaroufim/h100","pr_number":1235,"head_sha":"defeb5bf2b628f65f0b2bbffc1d876a534577fca","head_sha_timestamp":"2024-11-07T17:35:59.000000000Z","failure_captures":["test/prototype/test_parametrization.py::TestFakeSparsity::test_jit_trace"],"failure_lines":["FAILED test/prototype/test_parametrization.py::TestFakeSparsity::test_jit_trace - MemoryError: std::bad_alloc"],"failure_context":["+ pytest test --verbose -s","+ LD_LIBRARY_PATH=/opt/conda/lib/:/opt/rh/devtoolset-9/root/usr/lib64:/opt/rh/devtoolset-9/root/usr/lib:","+ export LD_LIBRARY_PATH=/opt/conda/lib/:/opt/rh/devtoolset-9/root/usr/lib64:/opt/rh/devtoolset-9/root/usr/lib:","+ CONDA=/opt/conda","+ export CONDA=/opt/conda","++ dirname /opt/conda/condabin","+++ dirname /opt/conda/condabin/conda","++++ which conda","+ pip install .","+ pip install -r dev-requirements.txt","+ pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu","+ python -m pip install --upgrade pip"],"time":"2024-11-07T17:38:08.000000000Z"},{"workflowId":11728298718,"workflowUniqueId":89543087,"id":32671467309,"runnerName":"i-053337cce24cfbb4d","authorEmail":"[email protected]","name":"Run Regression Tests / test (CUDA 2.5, linux.g5.12xlarge.nvidia.gpu, torch==2.5.0 --index-url https://download.pytorch.o... / linux-job","jobName":"test (CUDA 2.5, linux.g5.12xlarge.nvidia.gpu, torch==2.5.0 --index-url https://download.pytorch.o... / linux-job","conclusion":"failure","completed_at":"2024-11-07T18:39:26.000000000Z","html_url":"https://github.com/pytorch/ao/actions/runs/11728298718/job/32671467309","head_branch":"msaroufim/h100","pr_number":1235,"head_sha":"defeb5bf2b628f65f0b2bbffc1d876a534577fca","head_sha_timestamp":"2024-11-07T17:35:59.000000000Z","failure_captures":["RuntimeError: Command docker exec -t afdd7ebc4de25af394eec30791420fb2402754c60be8810182a6d66790a1de8c /exec failed with exit code 1"],"failure_lines":["RuntimeError: Command docker exec -t afdd7ebc4de25af394eec30791420fb2402754c60be8810182a6d66790a1de8c /exec failed with exit code 1"],"failure_context":["+ pytest test --verbose -s","+ LD_LIBRARY_PATH=/opt/conda/lib/:/opt/rh/devtoolset-9/root/usr/lib64:/opt/rh/devtoolset-9/root/usr/lib:","+ export LD_LIBRARY_PATH=/opt/conda/lib/:/opt/rh/devtoolset-9/root/usr/lib64:/opt/rh/devtoolset-9/root/usr/lib:","+ CONDA=/opt/conda","+ export CONDA=/opt/conda","++ dirname /opt/conda/condabin","+++ dirname /opt/conda/condabin/conda","++++ which conda","+ pip install .","+ pip install -r dev-requirements.txt","+ pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu121","+ python -m pip install --upgrade pip"],"time":"2024-11-07T17:38:10.000000000Z"},{"workflowId":11728298718,"workflowUniqueId":89543087,"id":32671467572,"runnerName":"i-0b87745dddd573f8c","authorEmail":"[email protected]","name":"Run Regression Tests / test (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://download.pytorc... / linux-job","jobName":"test (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://download.pytorc... / linux-job","conclusion":"failure","completed_at":"2024-11-07T18:44:52.000000000Z","html_url":"https://github.com/pytorch/ao/actions/runs/11728298718/job/32671467572","head_branch":"msaroufim/h100","pr_number":1235,"head_sha":"defeb5bf2b628f65f0b2bbffc1d876a534577fca","head_sha_timestamp":"2024-11-07T17:35:59.000000000Z","failure_captures":["test/sparsity/test_fast_sparse_training.py::TestRuntimeSemiStructuredSparsity::test_runtime_weight_sparsification_compile"],"failure_lines":["FAILED test/sparsity/test_fast_sparse_training.py::TestRuntimeSemiStructuredSparsity::test_runtime_weight_sparsification_compile - RuntimeError: CUDA error: internal error when calling cusparseLtMatmulAlgSelectionInit( &handle, &alg_sel, &matmul, CUSPARSELT_MATMUL_ALG_DEFAULT)"],"failure_context":["+ pytest test --verbose -s","+ LD_LIBRARY_PATH=/opt/conda/lib/:/opt/rh/devtoolset-9/root/usr/lib64:/opt/rh/devtoolset-9/root/usr/lib:","+ export LD_LIBRARY_PATH=/opt/conda/lib/:/opt/rh/devtoolset-9/root/usr/lib64:/opt/rh/devtoolset-9/root/usr/lib:","+ CONDA=/opt/conda","+ export CONDA=/opt/conda","++ dirname /opt/conda/condabin","+++ dirname /opt/conda/condabin/conda","++++ which conda","+ pip install .","+ pip install -r dev-requirements.txt","+ pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121","+ python -m pip install --upgrade pip"],"time":"2024-11-07T17:38:11.000000000Z"}],"FLAKY":[],"BROKEN_TRUNK":[],"UNSTABLE":[]}