Skip to content
This repository has been archived by the owner on Dec 19, 2024. It is now read-only.

#Draft/WIP: Simplify lookup of scratch directory #16

Closed

Conversation

rahmans1
Copy link
Contributor

Briefly, what does this PR introduce?

Different systems use different scratch spaces. We can pass this through the template file specific to the site the jobs are being submitted from. For example, on OSG https://github.com/eic/job_submission_condor/blob/main/templates/osg_csv.sh.in.

export TMPDIR=${_CONDOR_SCRATCH_DIR}

On JLAB slurm,
export TMPDIR=/scratch/slurm

On CEDAR/narval slurm,

export TMPDIR=$SLURM_TMPDIR

What kind of change does this PR introduce?

  • Bug fix (issue #__)
  • New feature (issue #__)
  • Documentation update
  • Other: __

Please check if this PR fulfills the following:

  • Tests for the changes have been added
  • Documentation has been added / updated
  • Changes have been communicated to collaborators

Does this PR introduce breaking changes? What changes might users need to make to their code?

Does this PR change default behavior?

@rahmans1 rahmans1 requested a review from wdconinc May 16, 2024 17:42
@rahmans1 rahmans1 marked this pull request as draft May 16, 2024 17:43
@wdconinc
Copy link
Contributor

How does this deal with TMPDIRs that are only set on the node itself, like the /localscratch/epicprod.$JOBID?

@rahmans1
Copy link
Contributor Author

How does this deal with TMPDIRs that are only set on the node itself, like the /localscratch/epicprod.$JOBID?

$SLURM_TMPDIR variable usually points to the /localscratch location but jlab for example recommends using the /scratch/slurm location for working directory. https://scicomp2015.jlab.org/docs/slurm_file

Similarly $_CONDOR_SCRATCH_DIR points to /srv location when submitting through OSG jobs but at BNL condor that variable is set to /tmp and that location seems to not be writable.

It would be an easy fix if all of the servers had the same variable pointing to the same location but that seems not to be the case. This may require a bit of trial and error and may be it just doesn't work

But I was thinking something like export TMPDIR=/localscratch/epicprod$SLURM_ARRAY_JOB_ID.$SLURM_ARRAY_TASK_ID in the template file if we know what pattern the automatedly created folders by slurm will be. The environment script doesn't get executed till the job is running. So, the variables should be populated when it does.

Basically, _CONDOR_SCRATCH_DIR works for all OSG submissions. But it's not going to work for dedicated running on BNL condor or the slurm servers.

Different systems use different scratch spaces. We can pass this through the template file specific to the site the jobs are being submitted from. For example, on OSG https://github.com/eic/job_submission_condor/blob/main/templates/osg_csv.sh.in. 

export TMPDIR=${_CONDOR_SCRATCH_DIR}

On JLAB slurm,
export TMPDIR=/scratch/slurm

On CEDAR/narval slurm,

export TMPDIR=$SLURM_TMPDIR
@rahmans1 rahmans1 force-pushed the feature-user-defined-scratch-dir-on-remote-node branch from a732ade to c030509 Compare May 17, 2024 12:33
@rahmans1 rahmans1 closed this Dec 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants