Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using cgroups to limit job resource usage #7853

Open
chrisburr opened this issue Oct 23, 2024 · 2 comments
Open

Using cgroups to limit job resource usage #7853

chrisburr opened this issue Oct 23, 2024 · 2 comments
Milestone

Comments

@chrisburr
Copy link
Member

We recently failed to add support for cgroups in #7723 due to there being no session dbus instance started for the non-interactive jobs started by the batch system.

@fstagni pointed out to me that ALICE has made progress with this in:

https://indico.cern.ch/event/1338689/contributions/6010982/attachments/2952948/5191366/CHEP2024_Subdivision_GM.pdf

We should see if we can profit from it also.

cc @sfayer as you helped me understand this in the first place.

@fstagni
Copy link
Contributor

fstagni commented Oct 23, 2024

Few additional minor bullets after talking with the author from ALICE:

  • they patched HtCondor and SLURM
  • the WN needs to be Rhel9
  • they create a "slot cgroup" and inside it sub-cgroups
  • they create first the cgroup and inside it they launch apptainer (IIUC we were try to do the opposite) and inside it the job
  • check the backup slides for some implementation (and caveat) details

@sfayer
Copy link
Member

sfayer commented Oct 23, 2024

Hi,

I had a quick look on our production HTCondor at IC and we do seem to have the patch already: A few of our cgroup control files for the per-slot cgroup tree are owned by the job user (rather than root), so it should be possible to create custom groups below that.

We're running the feature release of condor (to fix an unrelated cgroup bug!), so I'm not sure if it'll be available at most condor sites yet, but it certainly looks promising... I can't comment about the other batch systems, I felt they seemed less optimistic about getting the patches generally available for those in the talk.

Regards,
Simon

@fstagni fstagni added this to the v8.0 milestone Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants