Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing PID file for pscheduler service #1364

Closed
arlake228 opened this issue Nov 3, 2023 · 1 comment
Closed

Missing PID file for pscheduler service #1364

arlake228 opened this issue Nov 3, 2023 · 1 comment
Assignees

Comments

@arlake228
Copy link
Collaborator

A number of users have report issues where the pScheduler service is showing as not running on the Toolkit web page. This appears to be caused by a missing PID file for one of the services. So far, I have only been able to access a host with this issue once and the service in question was pscheduler-archiver. I don't have enough data points to know if it is limited to this service or others have it as well since any one process missing a pid would cause the toolkit to display as such. It also seems to be most prevalent on EL9 systems.

If you look at systemctl status pscheduler-archive while in this state it is listed as running. Also, doing systemctl show pscheduler-archiver has NRestarts as 0, which makes me think this wasn't the result of some automatic restart by systemd. I was able to fix this by restarting the archive process with systemctl by hand, at which point the pid was created. What's also strange is that before the restart there was no results getting archived, but after restarting the process I started getting results on the host...so i think this is more than just a display issue on toolkit page. Output of systemctl commands below:

systemctl status pscheduler-archiver

# systemctl status pscheduler-archiver
● pscheduler-archiver.service - pScheduler server - archiver
     Loaded: loaded (/usr/lib/systemd/system/pscheduler-archiver.service; enabled; preset: disabled)
     Active: active (running) since Fri 2023-09-29 06:18:24 CEST; 1 month 4 days ago
   Main PID: 987086 (archiver)
      Tasks: 9 (limit: 76198)
     Memory: 121.4M
        CPU: 23min 20.660s
     CGroup: /system.slice/pscheduler-archiver.service
             ├─236298 /usr/bin/python3 /usr/libexec/pscheduler/daemons/archiver --dsn @/etc/pscheduler/database/database-dsn --pid-file /run/pscheduler-ser>
             ├─236306 python3 /usr/libexec/pscheduler/classes/archiver/esmond/archive
             ├─236308 python3 /usr/libexec/pscheduler/classes/archiver/http/archive
             ├─236309 python3 /usr/libexec/pscheduler/classes/archiver/http/archive
             └─987086 /usr/bin/python3 /usr/libexec/pscheduler/daemons/archiver --dsn @/etc/pscheduler/database/database-dsn --pid-file /run/pscheduler-ser>

Nov 03 15:03:58 perfsonar01-iep-grid.saske.sk archiver[225158]: archiver WARNING  2964: Failed to archive https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:03:58 perfsonar01-iep-grid.saske.sk archiver[225158]: archiver WARNING  2964: Gave up archiving https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:04:35 perfsonar01-iep-grid.saske.sk archiver[225158]: archiver WARNING  2967: Failed to archive https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:04:35 perfsonar01-iep-grid.saske.sk archiver[225158]: archiver WARNING  2967: Gave up archiving https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:04:38 perfsonar01-iep-grid.saske.sk archiver[225158]: archiver WARNING  2970: Failed to archive https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:04:38 perfsonar01-iep-grid.saske.sk archiver[225158]: archiver WARNING  2970: Gave up archiving https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:06:50 perfsonar01-iep-grid.saske.sk archiver[236298]: archiver WARNING  2973: Failed to archive https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:06:50 perfsonar01-iep-grid.saske.sk archiver[236298]: archiver WARNING  2973: Gave up archiving https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:07:29 perfsonar01-iep-grid.saske.sk archiver[236298]: archiver WARNING  2976: Failed to archive https://perfsonar01-iep-grid.saske.sk/pscheduler/>
Nov 03 15:07:29 perfsonar01-iep-grid.saske.sk archiver[236298]: archiver WARNING  2976: Gave up archiving https://perfsonar01-iep-grid.saske.sk/pscheduler/>

systemctl show pscheduler-archiver

systemctl show pscheduler-archiver
Type=simple
ExitType=main
Restart=always
PIDFile=/run/pscheduler-server/archiver/pid
NotifyAccess=none
RestartUSec=3s
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
RuntimeMaxUSec=infinity
RuntimeRandomizedExtraUSec=0
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=987086
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=972
GID=972
NRestarts=0
OOMPolicy=stop
ExecMainStartTimestamp=Fri 2023-09-29 06:18:24 CEST
ExecMainStartTimestampMonotonic=914780404147
ExecMainExitTimestampMonotonic=0
ExecMainPID=987086
ExecMainCode=0
ExecMainStatus=0
ExecStartPre={ path=/bin/mkdir ; argv[]=/bin/mkdir -p /run/pscheduler-server/archiver ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPre={ path=/bin/chown ; argv[]=/bin/chown pscheduler:pscheduler /run/pscheduler-server/archiver ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPre={ path=/bin/chmod ; argv[]=/bin/chmod 755 /run/pscheduler-server/archiver ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPre={ path=/bin/mkdir ; argv[]=/bin/mkdir -p /run/pscheduler-server/archiver/tmp ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPre={ path=/bin/chmod ; argv[]=/bin/chmod 700 /run/pscheduler-server/archiver/tmp ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPre={ path=/bin/chown ; argv[]=/bin/chown pscheduler:pscheduler /run/pscheduler-server/archiver/tmp ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0>
ExecStartPre={ path=/bin/sh ; argv[]=/bin/sh -c if [ -r /etc/pscheduler/daemons/archiver.conf ]; then opts=$(sed -e 's/#.*$//' /etc/pscheduler/daemons/archiver.conf); echo OPTIONS=$opts > /run/pschedu>
ExecStartPreEx={ path=/bin/mkdir ; argv[]=/bin/mkdir -p /run/pscheduler-server/archiver ; flags=ignore-failure ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPreEx={ path=/bin/chown ; argv[]=/bin/chown pscheduler:pscheduler /run/pscheduler-server/archiver ; flags=ignore-failure ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/>
ExecStartPreEx={ path=/bin/chmod ; argv[]=/bin/chmod 755 /run/pscheduler-server/archiver ; flags=ignore-failure ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPreEx={ path=/bin/mkdir ; argv[]=/bin/mkdir -p /run/pscheduler-server/archiver/tmp ; flags=ignore-failure ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPreEx={ path=/bin/chmod ; argv[]=/bin/chmod 700 /run/pscheduler-server/archiver/tmp ; flags=ignore-failure ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPreEx={ path=/bin/chown ; argv[]=/bin/chown pscheduler:pscheduler /run/pscheduler-server/archiver/tmp ; flags=ignore-failure ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; statu>
ExecStartPreEx={ path=/bin/sh ; argv[]=/bin/sh -c if [ -r /etc/pscheduler/daemons/archiver.conf ]; then opts=$(sed -e 's/#.*$//' /etc/pscheduler/daemons/archiver.conf); echo OPTIONS=$opts > /run/psche>
ExecStart={ path=/usr/libexec/pscheduler/daemons/archiver ; argv[]=/usr/libexec/pscheduler/daemons/archiver --dsn @/etc/pscheduler/database/database-dsn $OPTIONS --pid-file /run/pscheduler-server/arch>
ExecStartEx={ path=/usr/libexec/pscheduler/daemons/archiver ; argv[]=/usr/libexec/pscheduler/daemons/archiver --dsn @/etc/pscheduler/database/database-dsn $OPTIONS --pid-file /run/pscheduler-server/ar>
ExecStopPost={ path=/bin/rm ; argv[]=/bin/rm -rf /run/pscheduler-server/archiver ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStopPostEx={ path=/bin/rm ; argv[]=/bin/rm -rf /run/pscheduler-server/archiver ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/pscheduler-archiver.service
ControlGroupId=204352
MemoryCurrent=127320064
MemoryAvailable=infinity
CPUUsageNSec=1393784665000
TasksCurrent=9
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=no
CPUAccounting=yes
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=76198
IPAccounting=no
ManagedOOMSwap=auto
ManagedOOMMemoryPressure=auto
ManagedOOMMemoryPressureLimit=0
ManagedOOMPreference=none
Environment=TMPDIR=/run/pscheduler-server/archiver/tmp
EnvironmentFiles=/run/pscheduler-server/archiver/options (ignore_errors=yes)
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=0
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=32768
LimitNOFILESoft=32768
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=32768
LimitNPROCSoft=32768
LimitMEMLOCK=8388608
LimitMEMLOCKSoft=8388608
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=47624
LimitSIGPENDINGSoft=47624
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=0
CoredumpFilter=0x33
Nice=0
IOSchedulingClass=2
IOSchedulingPriority=4
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinityFromNUMA=no
NUMAPolicy=n/a
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=journal
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net>
User=pscheduler
Group=pscheduler
DynamicUser=no
RemoveIPC=no
PrivateTmp=no
PrivateDevices=no
ProtectClock=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
PrivateIPC=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=2147483646
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectProc=default
ProcSubset=all
ProtectHostname=no
KillMode=control-group
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=pscheduler-archiver.service
Names=pscheduler-archiver.service
Requires=system.slice sysinit.target
WantedBy=postgresql.service multi-user.target
Conflicts=shutdown.target
Before=psconfig-pscheduler-agent.service multi-user.target shutdown.target
After=basic.target system.slice systemd-journald.socket sysinit.target
Description=pScheduler server - archiver
LoadState=loaded
ActiveState=active
FreezerState=running
SubState=running
FragmentPath=/usr/lib/systemd/system/pscheduler-archiver.service
UnitFileState=enabled
UnitFilePreset=disabled
StateChangeTimestamp=Fri 2023-09-29 06:18:24 CEST
StateChangeTimestampMonotonic=914780406237
InactiveExitTimestamp=Fri 2023-09-29 06:18:24 CEST
InactiveExitTimestampMonotonic=914780368485
ActiveEnterTimestamp=Fri 2023-09-29 06:18:24 CEST
ActiveEnterTimestampMonotonic=914780404242
ActiveExitTimestamp=Fri 2023-09-29 06:18:23 CEST
ActiveExitTimestampMonotonic=914779007828
InactiveEnterTimestamp=Fri 2023-09-29 06:18:23 CEST
InactiveEnterTimestampMonotonic=914779077808
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
CanFreeze=yes
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnSuccessJobMode=fail
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Fri 2023-09-29 06:18:24 CEST
ConditionTimestampMonotonic=914780365844
AssertTimestamp=Fri 2023-09-29 06:18:24 CEST
AssertTimestampMonotonic=914780365847
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=2e951bbf89824dc1a0068b77faf31f06
CollectMode=inactive
@mfeit-internet2
Copy link
Member

Close after toolkit code is changed to ask systemd instead of relying on the PID file.

@arlake228 arlake228 moved this from In Progress to In Review in perfSONAR Nov 10, 2023
@arlake228 arlake228 moved this from In Review to Done in perfSONAR Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants