You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we have 2 function calls A and B, such that B waits for completion of A as shown below
A = func1(args)
B = func2(args) # logic in B requires A to be completed and waits to make sure A is completed
In an ideal case If we both A and B are executed one after the other everything should work fine. But if A and B are submitted to the same worker(let us say x). A deadlock can occur if B is ready for execution before A is ready to execute. Deadlock description: As B executes, the logic in B waits for A's completion. As A is also submitted to same worker x . A can start only once B is done causing a deadlock.
An example of the case can be demonstrated with the below
Execute with UNIDIST_MPI_SPAWN=False mpiexec -n 6 python rep.py #worker using 3 unidist cpus(-n 5) to make sure A and B are scheduled to same worker
This code would works= with UNIDIST_MPI_SPAWN=False mpiexec -n 6 python rep.py as A and B would be scheduled to different CPUs. The same code would also pass with ray, as ray would be able to spawn additional workers when the workers are unavailable.
Proposed solution.
Unidist should be able to spawn additional workers as in the case of RAY if all the workers are busy for certain threshold in time and there are tasks pending to be executed. This would also require sending the pending tasks to execute on newly spawned workers by doing something similar to work stealing in Dask.
The text was updated successfully, but these errors were encountered:
If we have 2 function calls A and B, such that B waits for completion of A as shown below
In an ideal case If we both A and B are executed one after the other everything should work fine. But if A and B are submitted to the same worker(let us say x). A deadlock can occur if B is ready for execution before A is ready to execute. Deadlock description: As B executes, the logic in B waits for A's completion. As A is also submitted to same worker x . A can start only once B is done causing a deadlock.
An example of the case can be demonstrated with the below
Execute with
UNIDIST_MPI_SPAWN=False mpiexec -n 6 python rep.py #worker
using 3 unidist cpus(-n 5) to make sure A and B are scheduled to same workerThis code would works= with
UNIDIST_MPI_SPAWN=False mpiexec -n 6 python rep.py
as A and B would be scheduled to different CPUs. The same code would also pass with ray, as ray would be able to spawn additional workers when the workers are unavailable.Proposed solution.
Unidist should be able to spawn additional workers as in the case of RAY if all the workers are busy for certain threshold in time and there are tasks pending to be executed. This would also require sending the pending tasks to execute on newly spawned workers by doing something similar to work stealing in Dask.
The text was updated successfully, but these errors were encountered: