-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker swarm incompatability #50
Comments
THe same issue:
|
I'm running into this issue on #70 but I'm not using docker swarm, just docker. |
moby/moby#24862 |
I've managed to get a little further by replacing I'm now stuck on the following error: {"timestamp":"2022-01-28T19:11:04.686241153Z","level":"error","source":"baggageclaim","message":"baggageclaim.api.volume-server.create-volume-async.failed-to-create","data":{"error":"operation not permitted","handle":"cbd0b4dd-84f8-4a9d-4b01-8ac8c27a968e","privileged":true,"session":"4.1.10","strategy":{"type":"import","path":"/usr/local/concourse/resource-types/docker-image/rootfs.tgz","follow_symlinks":false}}} Which shows up as It might be because I'm trying to create a privileged docker-image container. |
I've just brought up worker service in docker swarm successfully with sysbox-runc. But it requires me to set the default runtime of nodes that will run worker containers because docker stack does not suppport the
{
"runtimes": {
"sysbox-runc": {
"path": "/usr/bin/sysbox-runc"
}
},
"default-runtime": "sysbox-runc"
} When I try running the
However, the reason showed in logs is strange:
It's not something about permissions, but Here's my docker-compose.yml: version: '3.9'
services:
web:
image: concourse/concourse
command: web
ports:
- published: 8084
target: 8080
mode: host
networks:
- concourse
deploy:
mode: global
placement:
constraints:
- "node.role == manager"
secrets:
- authorized_worker_keys
- session_signing_key
- tsa_host_key
- tsa_host_key.pub
environment:
CONCOURSE_EXTERNAL_URL: https://concourse.xxxxxxxxxxxx.com
CONCOURSE_POSTGRES_HOST: xxxxxxxxxxxx
CONCOURSE_POSTGRES_USER: concourse
CONCOURSE_POSTGRES_PASSWORD: xxxxxxxxxxxx
CONCOURSE_POSTGRES_DATABASE: concourse
CONCOURSE_ADD_LOCAL_USER: balthild:xxxxxxxxxxxx
CONCOURSE_MAIN_TEAM_LOCAL_USER: balthild
CONCOURSE_SESSION_SIGNING_KEY: /run/secrets/session_signing_key
CONCOURSE_TSA_AUTHORIZED_KEYS: /run/secrets/authorized_worker_keys
CONCOURSE_TSA_HOST_KEY: /run/secrets/tsa_host_key
CONCOURSE_TSA_PUBLIC_KEY: /run/secrets/tsa_host_key.pub
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
worker:
image: concourse/concourse
command: worker
networks:
- concourse
#privileged: true
#runtime: sysbox-runc
depends_on: [web]
stop_signal: SIGUSR2
deploy:
mode: global
placement:
constraints:
- "node.role != manager"
secrets:
- tsa_host_key.pub
- worker_key
- worker_key.pub
environment:
CONCOURSE_TSA_PUBLIC_KEY: /run/secrets/tsa_host_key.pub
CONCOURSE_TSA_WORKER_PRIVATE_KEY: /run/secrets/worker_key
CONCOURSE_TSA_HOST: web:2222
CONCOURSE_RUNTIME: containerd
CONCOURSE_BIND_IP: 0.0.0.0
CONCOURSE_BAGGAGECLAIM_BIND_IP: 0.0.0.0
# avoid using loopbacks
CONCOURSE_BAGGAGECLAIM_DRIVER: overlay
# work with docker-compose's dns
CONCOURSE_CONTAINERD_DNS_PROXY_ENABLE: "true"
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
secrets:
session_signing_key:
file: ./keys/web/session_signing_key
authorized_worker_keys:
file: ./keys/web/authorized_worker_keys
tsa_host_key:
file: ./keys/web/tsa_host_key
tsa_host_key.pub:
file: ./keys/web/tsa_host_key.pub
worker_key:
file: ./keys/worker/worker_key
worker_key.pub:
file: ./keys/worker/worker_key.pub
networks:
concourse:
driver: overlay |
Update: The real message describes the actual error is produced by kernel, and it can be viewed with
It's said that the support for idmapped layers in overlayfs will be available in Linux 5.19 (current mainline kernel is 5.18). |
It might be worth to write in the documentation this won't work on docker swarm due to the requirement of privileged mode.
The database and web containers will work just fine however the worker node will fail with some very cryptic error messages like:
{"timestamp":"2019-09-30T14:31:24.520408669Z","level":"error","source":"guardian","message":"guardian.starting-guardian-backend","data":{"error":"bulk starter: mounting subsystem 'cpuset' in '/sys/fs/cgroup/cpuset': operation not permitted"}}
and
{"timestamp":"2019-09-30T14:31:24.528488853Z","level":"error","source":"worker","message":"worker.garden-runner.logging-runner-exited","data":{"error":"Exit trace for group:\ngdn exited with error: exit status 1\ndns-proxy exited with nil\n","session":"8"}}
which disappears rather quickly because the following error gets spammed repeatedly
{"timestamp":"2019-09-30T14:31:28.144058311Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.forward-conn.failed-to-dial","data":{"addr":"127.0.0.1:7777","error":"dial tcp 127.0.0.1:7777: connect: connection refused","network":"tcp","session":"4.1.5"}}
The web node also registers the worker node leading to further confusion.
Hopefully this saves someone else a couple of painful hours.
The text was updated successfully, but these errors were encountered: