Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervisord-based Container Doesn't Function #22

Open
mfeit-internet2 opened this issue Sep 22, 2021 · 4 comments
Open

Supervisord-based Container Doesn't Function #22

mfeit-internet2 opened this issue Sep 22, 2021 · 4 comments

Comments

@mfeit-internet2
Copy link
Member

Ignacio Peluaga Lozada writes:

I have been experiencing issues with the testpoint image lately. I created a container using the latest image (4.4.1) and always got "Run did not complete: Missed" on pscheduler's CLI, regardless of the task or target node. On the same host I had the Docker container I installed the toolkit v4.4.1 and everything worked. Besides, I tried some other Docker image versions:

-perfsonar/testpoint:v4.4.0: same issues as with v4.4.1.
-perfsonar/testpoint:v4.3.4: worked fine.
-perfsonar/testpoint:systemd: worked fine.

Therefore I believe the problem is with perfSONAR's supervisord based v4.4.x testpoint images. Is anyone else experiencing this?

Internet2 saw this as well. The runner service fails to start.

@mfeit-internet2 mfeit-internet2 changed the title Supervisord-based Container Doesn't Functions Supervisord-based Container Doesn't Function Sep 22, 2021
@DanielNeto
Copy link
Contributor

DanielNeto commented Sep 28, 2021

It seems to be a problem with the docker version. With the latest version 20.10.8 it doesn't work, but with versions 20.10.0 and 19.03.9 the tasks run correctly, even though pscheduler processes keep restarting all the time.
I still haven't figured out what changed between versions to cause this.

Here, a snippet of the container log with docker 19.03.9

2021-09-28 18:56:42,952 INFO spawned: 'pscheduler-scheduler' with pid 1430
2021-09-28 18:56:42,952 INFO success: pscheduler-runner entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-28 18:56:43,600 INFO exited: pscheduler-archiver (exit status 1; not expected)
2021-09-28 18:56:43,768 INFO spawned: 'pscheduler-archiver' with pid 1432
2021-09-28 18:56:44,300 INFO success: pscheduler-scheduler entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-28 18:56:44,344 INFO exited: pscheduler-ticker (exit status 1; not expected)
2021-09-28 18:56:44,900 INFO spawned: 'pscheduler-ticker' with pid 1435
2021-09-28 18:56:44,900 INFO success: pscheduler-archiver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-28 18:56:44,900 INFO exited: pscheduler-runner (exit status 1; not expected)

@pmooo
Copy link

pmooo commented Nov 24, 2021

Updating supervisor resolved this issue on my images using supervisord and docker 20,10,1+.

Must be a version after this merge to the API supervisor uses:

docker/docker-py@1757c97

You may have to install around yum repo using python3 pip in the dockerfile.

@yorickps
Copy link

Experienced exactly the same issue. Using the systemd based image now.

@MiddelkoopT
Copy link
Contributor

Fixes for this applied to the 5.0.0 branch. This let's supervisord manage the process instead of using --daemon options. Look at /etc/supervisord.conf for changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready
Development

No branches or pull requests

5 participants