-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure the slurm user exists before starting slurm(d/ctld/dbd) #73
Comments
On 2017-01-17 12:57, Johan Guldmyr wrote:
This one time slurmd couldn't find the unix user slurm when it started.
Maybe because ypbind was not up yet.
We could add extra systemd settings to the systemd scripts in:
/etc/systemd/system/
Not sure which if After,Wants or Requires should be used.
Well, what if you don't use ypbind at all? (ldap for example).
…--
Ulf Tigerstedt || Senior systems specialist
CSC Oy || NeIC NT1 / NDGF GSM +358503818558
Johannesbergsvägen 17 || Närpes || Finland
|
Having it require ypbind didn't help anyway - maybe still too aggressive parallelism. Requires=remote-fs.target did not solve it either. Maybe one can have systemd restart slurmd once or twice with some delay in between if it doesn't? For LDAP - created fgci-org/fgci-ansible#176 |
Adding Restart=on-failure and increase the interval in /etc/systemd/system/slurmd.service.d/slurmd_extra.conf helps in some cases. For services that have Restart= configured the defaults is to attempt to restart it five times but with 100ms interval. From /etc/systemd/system.conf on CentOS 7.3: #DefaultRestartSec=100ms #DefaultStartLimitInterval=10s #DefaultStartLimitBurst=5 [Service] Restart=on-failure RestartSec=20 |
- currently defaults to only do this on slurmd but allow to optionally enable it also for slurmdbd and slurmctld - #73
As we merged in #74 then closing this. |
When rebooting a compute node many times in sequence, quite a few times slurmd couldn't find the unix user slurm when it started slurmd.
Maybe because ypbind was not up yet "completely".
We could add extra systemd settings to the systemd scripts in:
/etc/systemd/system/slurmd.service.d/myfile.confg
Not sure which should be used..
The text was updated successfully, but these errors were encountered: