-
I get the following error when I run statevector simulation job on a 2 node Azure cluster (8 GPU each). Any idea what can be the reason?
|
Beta Was this translation helpful? Give feedback.
Answered by
ymagchi
Oct 23, 2024
Replies: 2 comments 6 replies
-
Hi @hovnatan, Could you please share your compute environments and job script? |
Beta Was this translation helpful? Give feedback.
0 replies
-
It is an Azure managed 2 node, 8 A100 GPU each cluster. The environment ups our docker images from https://hub.docker.com/r/bluequbit/multigpu/tags then tries running job `python ghz_cusvaer.py` in it. It automatically inserts `mpirun -np ...` before the command.
…On Mon, Oct 14, 2024 at 9:07 PM Takuma Yamaguchi ***@***.***> wrote:
Hi @hovnatan <https://github.com/hovnatan>,
Could you please share your compute environments and job script?
The error occurred during MPI communication and it could depend on the
targeting machine.
—
Reply to this email directly, view it on GitHub
<#159 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAJIVU5IC5GGZCVGV2GVWTZ3P26BAVCNFSM6AAAAABP5IJWZSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAOJTHEZTOMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for the log.
I think some environment variables for UCX have been explicitly provided:
Is it possible to run without these restrictions?