Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sackd): add sackd operator #53

Merged
merged 6 commits into from
Dec 10, 2024
Merged

Conversation

dsloanm
Copy link
Contributor

@dsloanm dsloanm commented Dec 9, 2024

Adds a new Slurm charm for deploying sackd: the login node daemon which gets configuration files from the controller in 'configless' setups and provides (future) support for the 'auth/slurm' authentication method.

Follows from investigation and discussion in https://github.com/orgs/charmed-hpc/discussions/11

Adds a new Slurm charm for deploying sackd: the login node daemon
which gets configuration files from the controller in 'configless'
setups and provides support for the 'auth/slurm' authentication
method.
@NucciTheBoss
Copy link
Member

IMPORTANT: We'll need to update the CHARMCRAFT_AUTH secret to include publishing rights for the sackd operator. I have already registered the name on Charmhub.

@NucciTheBoss NucciTheBoss added the enhancement New feature or request label Dec 9, 2024
Copy link
Member

@NucciTheBoss NucciTheBoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start for your first "from scratch" charm!

I left a few comments. Overall, I think this charm is good, but it needs to be refactored to not use juju_systemd_notices to inform Juju when sackd has started or stopped. juju_systemd_notices is built to solve the issue of when we try to start the slurmd service but there's a delay between when the compute node shares its data via Juju and when it's entered into the slurm.conf configuration file on the primary control node. sackd doesn't have this problem as sackd nodes don't need to be entered into slurm.conf, so we don't need juju_systemd_notices to watch the service.

Let me know if you have any questions!

charms/sackd/charmcraft.yaml Outdated Show resolved Hide resolved
charms/sackd/build.yaml Outdated Show resolved Hide resolved
charms/sackd/src/charm.py Outdated Show resolved Hide resolved
charms/sackd/src/charm.py Outdated Show resolved Hide resolved
charms/sackd/src/charm.py Outdated Show resolved Hide resolved
test-requirements.txt Outdated Show resolved Hide resolved
charms/sackd/requirements.txt Outdated Show resolved Hide resolved
charms/sackd/terraform/variables.tf Outdated Show resolved Hide resolved
charms/sackd/terraform/outputs.tf Show resolved Hide resolved
charms/slurmctld/src/interface_sackd.py Outdated Show resolved Hide resolved
@dsloanm
Copy link
Contributor Author

dsloanm commented Dec 10, 2024

Looks like the release of cosl 0.0.46 about an hour ago is causing integration test issues.

Looking into it now but as a quick fix, hacking PYDEPS = ["cosl==0.0.45", "pydantic"] into external/lib/charms/grafana_agent/v0/cos_agent.py gets slurmctld building again.

@dsloanm
Copy link
Contributor Author

dsloanm commented Dec 10, 2024

Added jsonschema-specifications >= 2023.03.6 as a temp fix for the build.

@NucciTheBoss
Copy link
Member

Added jsonschema-specifications >= 2023.03.6 as a temp fix for the build.

Should be able to revert this change now with the release of cosl 0.00.47: https://pypi.org/project/cosl/

@dsloanm
Copy link
Contributor Author

dsloanm commented Dec 10, 2024

Should be able to revert this change now with the release of cosl 0.00.47: https://pypi.org/project/cosl/

Yep, that's it gone now.

@NucciTheBoss
Copy link
Member

Just one tiny nit. The commit message for a819163 is feat(sack) when it should be feat(sackd).

@NucciTheBoss NucciTheBoss self-requested a review December 10, 2024 17:14
Copy link
Member

@NucciTheBoss NucciTheBoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the removal of juju_systemd_notices :shipit:

I'll update the repository secret in the background while you fix the commit message for a819163. We'll worry about the interface library later when we have more time for QA work.

Great work on this!

@dsloanm
Copy link
Contributor Author

dsloanm commented Dec 10, 2024

Just one tiny nit. The commit message for a819163 is feat(sack) when it should be feat(sackd).

Ah.

Fixed now.

@NucciTheBoss NucciTheBoss merged commit d438339 into charmed-hpc:main Dec 10, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants