You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The change to run vm-tsw-2x2v.lua on multiple ranks, the output from execution including the warnings, and the job submission scripts are pasted below.
which indicates that some messages that were sent were not received before MPI_Finalize was called.
Note, I also hit these warnings on SDSC Expanse. In both cases the system install of OpenMPI was used.
change to run with multiple ranks
[email protected]:[quickstart] $ diff vm-tsw-2x2v.lua vm-tsw-2x2v_orig.lua
101c101
< decompCuts = {5,2}, -- Cuts in each configuration direction
---
> decompCuts = {1,1}, -- Cuts in each configuration direction
output
Wed Apr 10 2024 20:21:49.000000000
Gkyl built with 9663ea594e80+
Gkyl built on Apr 10 2024 16:08:26
Initializing Vlasov-Maxwell simulation ...
Initialization completed in 1.01092 sec
Starting main loop of Vlasov-Maxwell simulation ...
Step 0 at time 0. Time step 0.0360652. Completed 0%
0123456789 Step 139 at time 5.0130661. Time step 3.606522e-02. Completed 10%
0123456789 Step 278 at time 10.026132. Time step 3.606522e-02. Completed 20%
0123456789 Step 416 at time 15.003133. Time step 3.606522e-02. Completed 30%
0123456789 Step 555 at time 20.016199. Time step 3.606522e-02. Completed 40%
0123456789 Step 694 at time 25.029265. Time step 3.606522e-02. Completed 50%
0123456789 Step 832 at time 30.006266. Time step 3.606522e-02. Completed 60%
0123456789 Step 971 at time 35.019332. Time step 3.606522e-02. Completed 70%
0123456789 Step 1110 at time 40.032398. Time step 3.606522e-02. Completed 80%
0123456789 Step 1248 at time 45.009399. Time step 3.606522e-02. Completed 90%
0123456789 Step 1387 at time 50.000000. Time step 3.606522e-02. Completed 100%
0
Total number of time-steps 1388
Number of forward-Euler calls 5548
Number of RK stage-2 failures 0
Number of RK stage-3 failures 0
Solver took 102.58454 s ( 0.073908 s/step) (75.273%)
Solver BCs took 8.60794 s ( 0.006202 s/step) ( 6.316%)
Field solver took 1.14156 s ( 0.000822 s/step) ( 0.838%)
Field solver BCs 0.27180 s ( 0.000196 s/step) ( 0.199%)
Function field solver took 0.00000 s ( 0.000000 s/step) ( 0.000%)
Moment calculations took 8.34645 s ( 0.006013 s/step) ( 6.124%)
Integrated moment calculations took 5.64078 s ( 0.004064 s/step) ( 4.139%)
Field energy calculations took 0.04717 s ( 0.000034 s/step) ( 0.035%)
Collision solver(s) took 0.00000 s ( 0.000000 s/step) ( 0.000%)
Collision (other) took 0.00000 s ( 0.000000 s/step) ( 0.000%)
Source updaters took 0.00000 s ( 0.000000 s/step) ( 0.000%)
Stepper combine/copy took 3.44240 s ( 0.002480 s/step) ( 2.526%)
Forward Euler combine took 0.00000 s ( 0.000000 s/step) ( 0.000%)
Time spent in barrier function 0.21111 s ( 0.000152 s/step) ( 0.155%)
Data write took 5.74470 s ( 0.004139 s/step) ( 4.215%)
Write restart took 0.02960 s ( 0.000021 s/step) ( 0.022%)
[Unaccounted for] 6.19993 s ( 0.004467 s/step) ( 4.549%)
Main loop completed in 136.28258 s ( 0.098186 s/step) ( 100%)
Wed Apr 10 2024 20:24:07.000000000
[1712795047.136074] [a877:220516:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x1aaadc0 was not matched
[1712795047.136109] [a877:220516:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x1aab0c0 was not matched
<.... snip .... UCX WARN unexpected tag-receive appears 21 times>
Hello,
On the Purdue Anvil system (cpus only) I'm hitting OpenMPI UCX (infiniband) warnings at the end of execution of the
vm-tsw-2x2v.lua
example here (https://gkyl.readthedocs.io/en/latest/quickstart/inputFiles/vm-tsw-2x2v.html).The change to run
vm-tsw-2x2v.lua
on multiple ranks, the output from execution including the warnings, and the job submission scripts are pasted below.A quick search lead me to this github issue:
openucx/ucx#6331 (comment)
which indicates that some messages that were sent were not received before MPI_Finalize was called.
Note, I also hit these warnings on SDSC Expanse. In both cases the system install of OpenMPI was used.
change to run with multiple ranks
output
job scripts
slurm submission script
run script
two small changes to gkyl pre-g0
One was for the adios url, pre #178 being merged, and another was to pick up the correct python version.
The text was updated successfully, but these errors were encountered: