This repository has been archived by the owner on Aug 9, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
These changes make a number of modifications to the slurm charms. * Removes no longer used interfaces * remove slurmctld dependency on slurmdbd and slurmd * support partitions with 0 nodes * refactor relation data * recreate how configs are written * support user supplied partition configuration * start modeling config * add type checking * remove unused code * support partition events and slurmd node events * remove upgrade-charm hook as it did nothing * use systemd and apt charm libs * rename interfaces
- Loading branch information
1 parent
81d201d
commit 0619c6e
Showing
22 changed files
with
1,252 additions
and
1,983 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,3 +8,5 @@ __pycache__/ | |
.idea | ||
.vscode/ | ||
version | ||
.mypy_cache | ||
.ruff_cache |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,28 +22,15 @@ links: | |
source: | ||
- https://github.com/charmed-hpc/slurmctld-operator | ||
|
||
peers: | ||
slurmctld-peer: | ||
interface: slurmctld-peer | ||
requires: | ||
slurmd: | ||
interface: slurmd | ||
slurmdbd: | ||
interface: slurmdbd | ||
slurmrestd: | ||
interface: slurmrestd | ||
influxdb-api: | ||
interface: influxdb-api | ||
elasticsearch: | ||
interface: elasticsearch | ||
fluentbit: | ||
interface: fluentbit | ||
provides: | ||
prolog-epilog: | ||
interface: prolog-epilog | ||
grafana-source: | ||
interface: grafana-source | ||
scope: global | ||
|
||
assumes: | ||
- juju | ||
|
@@ -58,134 +45,80 @@ bases: | |
channel: "22.04" | ||
architectures: [amd64] | ||
|
||
parts: | ||
charm: | ||
build-packages: [git] | ||
charm-python-packages: [setuptools] | ||
|
||
# Create a version file and pack it into the charm. This is dynamically generated | ||
# as part of the build process for a charm to ensure that the git revision of the | ||
# charm is always recorded in this version file. | ||
version-file: | ||
plugin: nil | ||
build-packages: | ||
- git | ||
override-build: | | ||
VERSION=$(git -C $CRAFT_PART_SRC/../../charm/src describe --dirty --always) | ||
echo "Setting version to $VERSION" | ||
echo $VERSION > $CRAFT_PART_INSTALL/version | ||
stage: | ||
- version | ||
|
||
config: | ||
options: | ||
custom-slurm-repo: | ||
type: string | ||
default: "" | ||
description: > | ||
Use a custom repository for Slurm installation. | ||
This can be set to the Organization's local mirror/cache of packages and | ||
supersedes the Omnivector repositories. Alternatively, it can be used to | ||
track a `testing` Slurm version, e.g. by setting to | ||
`ppa:omnivector/osd-testing`. | ||
|
||
Note: The configuration `custom-slurm-repo` must be set *before* | ||
deploying the units. Changing this value after deploying the units will | ||
not reinstall Slurm. | ||
cluster-name: | ||
type: string | ||
default: osd-cluster | ||
description: > | ||
description: | | ||
Name to be recorded in database for jobs from this cluster. | ||
This is important if a single database is used to record information from | ||
multiple Slurm-managed clusters. | ||
|
||
default-partition: | ||
type: string | ||
default: "" | ||
description: > | ||
description: | | ||
Default Slurm partition. This is only used if defined, and must match an | ||
existing partition. | ||
custom-config: | ||
slurm-conf-parameters: | ||
type: string | ||
default: "" | ||
description: > | ||
User supplied Slurm configuration. | ||
This value supplements the charm supplied `slurm.conf` that is used for | ||
Slurm Controller and Compute nodes. | ||
description: | | ||
User supplied Slurm configuration as a multiline string. | ||
Example usage: | ||
$ juju config slurmcltd custom-config="FirstJobId=1234" | ||
proctrack-type: | ||
type: string | ||
default: proctrack/cgroup | ||
description: > | ||
Identifies the plugin to be used for process tracking on a job step | ||
basis. | ||
cgroup-config: | ||
$ juju config slurmcltd slurm-conf-parameters="$(cat additional.conf)" | ||
|
||
cgroup-parameters: | ||
type: string | ||
default: | | ||
CgroupAutomount=yes | ||
ConstrainCores=yes | ||
description: > | ||
Configuration content for `cgroup.conf`. | ||
description: | | ||
User supplied configuration for `cgroup.conf`. | ||
health-check-params: | ||
default: "" | ||
type: string | ||
description: > | ||
description: | | ||
Extra parameters for NHC command. | ||
This option can be used to customize how NHC is called, e.g. to send an | ||
e-mail to an admin when NHC detects an error set this value to | ||
e-mail to an admin when NHC detects an error set this value to. | ||
`-M [email protected]`. | ||
|
||
health-check-interval: | ||
default: 600 | ||
type: int | ||
description: Interval in seconds between executions of the Health Check. | ||
|
||
health-check-state: | ||
default: "ANY,CYCLE" | ||
type: string | ||
description: Only run the Health Check on nodes in this state. | ||
|
||
acct-gather-frequency: | ||
type: string | ||
default: "task=30" | ||
description: > | ||
Accounting and profiling sampling intervals for the acct_gather plugins. | ||
Note: A value of `0` disables the periodic sampling. In this case, the | ||
accounting information is collected when the job terminates. | ||
|
||
Example usage: | ||
$ juju config slurmcltd acct-gather-frequency="task=30,network=30" | ||
acct-gather-custom: | ||
type: string | ||
default: "" | ||
description: > | ||
User supplied `acct_gather.conf` configuration. | ||
This value supplements the charm supplied `acct_gather.conf` file that is | ||
used for configuring the acct_gather plugins. | ||
|
||
actions: | ||
show-current-config: | ||
description: > | ||
description: | | ||
Display the currently used `slurm.conf`. | ||
Note: This file only exists in `slurmctld` charm and is automatically | ||
distributed to all compute nodes by Slurm. | ||
|
||
Example usage: | ||
$ juju run-action slurmctld/leader --format=json --wait | jq .[].results.slurm.conf | xargs -I % -0 python3 -c 'print(%)' | ||
|
||
```bash | ||
juju run slurmctld/leader show-current-config \ | ||
--quiet --format=json | jq .[].results.slurm.conf | xargs -I % -0 python3 -c 'print(%)' | ||
``` | ||
|
||
drain: | ||
description: > | ||
description: | | ||
Drain specified nodes. | ||
Example usage: | ||
$ juju run-action slurmctld/leader drain nodename=node-[1,2] reason="Updating kernel" | ||
$ juju run slurmctld/leader drain nodename=node-[1,2] reason="Updating kernel" | ||
params: | ||
nodename: | ||
type: string | ||
|
@@ -197,24 +130,17 @@ actions: | |
- nodename | ||
- reason | ||
resume: | ||
description: > | ||
description: | | ||
Resume specified nodes. | ||
Note: Newly added nodes will remain in the `down` state until configured, | ||
with the `node-configured` action. | ||
|
||
Example usage: $ juju run-action slurmctld/leader resume nodename=node-[1,2] | ||
Example usage: $ juju run slurmctld/leader resume nodename=node-[1,2] | ||
params: | ||
nodename: | ||
type: string | ||
description: > | ||
description: | | ||
The nodes to resume, using the Slurm format, e.g. `node-[1,2]`. | ||
required: | ||
- nodename | ||
|
||
influxdb-info: | ||
description: > | ||
Get InfluxDB info. | ||
This action returns the host, port, username, password, database, and | ||
retention policy regarding to InfluxDB. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,3 @@ | ||
ops==2.* | ||
influxdb==5.3.1 | ||
jinja2==3.1.3 | ||
distro | ||
pycryptodome |
Oops, something went wrong.