Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(slurmctld): Signal children processes #45

Closed

Conversation

jamesbeedy
Copy link
Contributor

Slurm process tracking is currently not configured to kill child processes of a job. These changes set SignalChildrenProcesses=yes in cgroup.conf to enable this functionality.

Fixes #37

These changes remove the unused "cluster_name" from the relation
data sent to slurmd on the slurmd relation. Additionally, make the
"cluster_name" property private.
These changes add a peer relation for the slurmctld charm and
replace using the slurmd interface to obtain the ingress_address
with the new slurmctld-peer relation.

The reason for this change is that we do not want to depend on
the existence of the slurmd relation in order to know our ip.

Using a peer relation we will always have resolvability so long as
juju knows the ip address of the unit.
Slurm process tracking is currently not configured to kill child
processes of a job. By default, set SignalChildrenProcesses=yes in
cgroup.conf to enable this functionality.
@jamesbeedy
Copy link
Contributor Author

Closing this until slurm charms support the configuration.

@jamesbeedy jamesbeedy closed this Nov 24, 2024
@NucciTheBoss
Copy link
Member

NucciTheBoss commented Nov 25, 2024

@jamesbeedy what specifically do you need for the charms to support this configuration in cgroup.conf? Do we need to modify which cgroup version is running on the machine?

@jamesbeedy
Copy link
Contributor Author

@jamesbeedy what specifically do you need for the charms to support this configuration in cgroup.conf? Do we need to modify which cgroup version is running on the machine?

Slurm > 23.02

@NucciTheBoss
Copy link
Member

Ah, that'll do it. We'll have at least Slurm 23.11 once the fixes for the apt charm library are landed for Noble. Should be some time this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable SignalChildrenProcesses by default
2 participants