runcuda.sh overwrites environment variables #284

lilyminium · 2023-06-14T08:05:02Z

Targets involving workqueue seem to wrap commands in data/runcuda.sh. This unfortunately overwrites variables in the local environment, trying to load quite an old version of CUDA (4/5), if the hostname matches some patterns. I think it would be easier for users to configure their own environments, and easier to debug issues.

The variables configured include:

CUDA_HOME
PATH
LD_LIBRARY_PATH
INCLUDE
BAK
CUDA_CACHE_PATH
OPENMM_CUDA_COMPILER
OPENMM_PLUGIN_DIR

I'm using forcebalance 1.9.5.

The text was updated successfully, but these errors were encountered:

leeping · 2023-06-14T13:37:58Z

Thanks for bringing this up. I think runcuda.sh only wraps around targets that involve running OpenMM MD simulations using the npt.py / nvt.py scripts (such as Liquid_OpenMM). The Work Queue target that OpenFF usually uses is not wrapped by runcuda.sh because they use OpenMM to do energy minimizations and single-point calculations. You are right that the environment variables in runcuda.sh are largely out of date and most of them should be deleted. I agree it would be better if the user could specify their own environment variables. A quick hack for a power user would be to edit the runcuda.sh file in their local install. A longer-range solution would be to add an option in the FB input file to specify a shell script that loads custom CUDA environment variables (which could include logic that loads different variables depending on the host name, if desired). The FB code would then include the environment file in the WQ input file list, and runcuda.sh would source the file if it is present.

…

On Wed, Jun 14, 2023 at 1:05 AM Lily Wang ***@***.***> wrote: Targets involving workqueue seem to wrap commands in data/runcuda.sh. This unfortunately overwrites variables in the local environment, trying to load quite an old version of CUDA (4/5), if the hostname matches some patterns. I think it would be easier for users to configure their own environments, and easier to debug issues. The variables configured include: - CUDA_HOME - PATH - LD_LIBRARY_PATH - INCLUDE - BAK - CUDA_CACHE_PATH - OPENMM_CUDA_COMPILER - OPENMM_PLUGIN_DIR I'm using forcebalance 1.9.5. — Reply to this email directly, view it on GitHub <#284>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAK76GHCB6WMHHKDDUK4SQLXLFWDRANCNFSM6AAAAAAZF6WW5U> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

lilyminium · 2023-06-15T07:26:50Z

Thanks for the suggestion! I did just comment out that block for my own use -- I've also written my own submission script that loads up the necessary environment for each worker, so I'm not sure FB needs to handle it at all (unless I missed an existing facility to handle workqueue workers!)

And yes, I bumped into this with the Liquid_SMIRNOFF target that subclasses Liquid :)

leeping · 2023-06-15T20:44:32Z

I think if someone uses a single WQ job submission script (on a cluster) but uses WQ for different types of jobs (such as distributing QM calculations, or running FB Liquid simulations), it could be helpful for the applications to change the environment variables. We can probably comment out all of the code blocks leaving them as examples for any user who wants to customize their worker's environment, and then FB can default to not loading anything.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runcuda.sh overwrites environment variables #284

runcuda.sh overwrites environment variables #284

lilyminium commented Jun 14, 2023

leeping commented Jun 14, 2023 via email

lilyminium commented Jun 15, 2023

leeping commented Jun 15, 2023

runcuda.sh overwrites environment variables #284

runcuda.sh overwrites environment variables #284

Comments

lilyminium commented Jun 14, 2023

leeping commented Jun 14, 2023 via email

lilyminium commented Jun 15, 2023

leeping commented Jun 15, 2023