Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission denied when creating workers. Rootfs. #27

Closed
larssb opened this issue Oct 15, 2018 · 15 comments
Closed

Permission denied when creating workers. Rootfs. #27

larssb opened this issue Oct 15, 2018 · 15 comments

Comments

@larssb
Copy link

larssb commented Oct 15, 2018

Hopefully someone here can help me with this. I'm running ConcourseCI v4.2.1. Running it via docker-compose.
The version of Docker on the host is Docker version 17.09.1-ce. I can successfully setup Concourse. However, I get the following error in the tasks of the pipeline I have pushed:

runc run: exit status 1: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"/worker-state/4.2.1/assets/bin/init\\\" to rootfs \\\"/worker-state/volumes/live/3e85b13b-522b-428a-6d14-5d0d605e45bb/volume\\\" at \\\"/worker-state/volumes/live/3e85b13b-522b-428a-6d14-5d0d605e45bb/volume/tmp/garden-init\\\" caused \\\"open /worker-state/volumes/live/3e85b13b-522b-428a-6d14-5d0d605e45bb/volume/tmp/garden-init: permission denied\\\"\""

TRIED:

  • Controlling that the worker container is running in privileged mode. It is.
  • Adding
cap_add:
    - SYS_ADMIN
security_opt:
          - apparmor=unconfined
          - seccomp=unconfined

to the worker container. It still fails.

I can conclude that the host running docker is on a rather old Linux kernel. It is v4.2.8. However, I am on a kernel higher than the min. requirement. As mentioned on https://concourse-ci.org/install.html

Any help will be highly appreciated. Thank you.

@larssb
Copy link
Author

larssb commented Dec 14, 2018

I still have the issue. Tried as @sam701 describes in preparing rootfs caused "permission denied" #2892. Made no difference.
The /sys/fs/cgroup path exists as well.

Anyone?

My setup is:

  • Concourse v4.2.2. (updated from v4.2.1)
  • QNAP kernel v4.2.8. It is a busybox modified distro
  • Docker version v17.09.1

Thank you.

@larssb
Copy link
Author

larssb commented Dec 15, 2018

Setting $CONCOURSE_BAGGAGECLAIM_DRIVER to overlay was not a success. The challenge persists.

@larssb
Copy link
Author

larssb commented Dec 16, 2018

Interestingly, when I set $CONCOURSE_BAGGAGECLAIM_DRIVER to naive I get the following error:

iptables: create-instance-chains: iptables: No chain/target/match by that name.

  • What could be the culprit? Thnx

@larssb
Copy link
Author

larssb commented Dec 16, 2018

I do have an image_resource property on my task

@vito
Copy link
Member

vito commented Jan 3, 2019

Hmm, I don't know of any silver bullet for this as kernel errors can be hard to track down, especially with modified kernels. 😕

Have you been able to try this on a vanilla Linux kernel? This may be too big of an ask for production, since I'm assuming you have your reasons for using that kernel, but for the sake of debugging that seems to me like the most obvious variable to change.

We've seen problems on 'variant' kernels before (namely Google's Container-Optimized OS) because they sometimes strip out features that Concourse needs for containerization (possibly even just nested containerization, since it seems like your outer Docker host is working fine).

@larssb
Copy link
Author

larssb commented Jan 3, 2019

Hi @vito,

Thank you for replying to the post.

Yeah I can imagine that this issue is quite hard. At the same time I think you are onto something in regards to it being something in the kernel not being enabled.
Start 2018 I had an issue just installing Concourse on my QNAP. On a kernel below v4.2.8. This was caused by "....After our engineer checked, the issue was caused by the "conntrack" kernel module that was not open in QTS 4.3.4.....". Quoting QNAP support.
Now I can install Concourse and the "Worker" runs. However, I have the issue reported in this issue thread.
I have opened a support case with QNAP and I am going forth and back with them. I've given them the docker-compose.yml file I used to install Concourse on my QNAP as well as an example pipeline that throws the errror on all resources, when they fetch/poll.

I think I'll wait on them and update this case as the QNAP support case progresses.

Thank you.

@larssb
Copy link
Author

larssb commented Jan 14, 2019

I'm back againg. I'm growing pretty tired of the support I kinda get from QNAP support. So I was thinking if with a bit of assistance could troubleshoot into why this happens more specifically. Get to the root cause.

@vito would you by chance have any ideas? Or anybode else?

Thank you very much.

@vito
Copy link
Member

vito commented Jan 15, 2019

@larssb Sorry, no ideas here, but maybe others have had experience with Docker/runC/containers in general on QNAP and maybe ran into the same problem? It likely has something to do with user namespaces.

@avoidik
Copy link

avoidik commented Jan 28, 2019

@larssb for the iptables: create-instance-chains: iptables: No chain/target/match by that name. error please check #29

@larssb
Copy link
Author

larssb commented Jan 29, 2019

Hi @avoidik,

I'm not for downgrading (checked #29). Not even sure that is possible on a QNAP and its ContainerStation. But thank you very much for chirping in and suggesting it.

Have a great one.

@FilBot3
Copy link

FilBot3 commented Feb 15, 2019

@larssb which version did you downgrade to? I'm on 4.2.2 on Oracle Linux with UEK R4 Kernel, and I can get the worker to startup, but the same problem exists for me.

{
  "timestamp":"1550257062.072880745",
  "source":"guardian",
  "message":"guardian.api.garden-server.create.failed",
  "log_level":2,
  "data":{
    "error":"runc run: exit status 1: container_linux.go:348: starting container process caused \"process_linux.go:402: container init caused \\\"rootfs_linux.go:46: preparing rootfs caused \\\\\\\"permission denied\\\\\\\"\\\"\"\n",
    "request":{
      "Handle":"ab1c8bbe-ec8f-4c41-7534-f10e081b5351",
      "GraceTime":0,
      "RootFSPath":"raw:///opt/concourse-ci/volumes/live/f3f53232-bd2a-428b-7cf6-925ea9e080c8/volume/rootfs",
      "BindMounts":[
        {
          "src_path":"/opt/concourse-ci/volumes/live/450737c4-a993-4d97-517a-19e65ee902c1/volume",
          "dst_path":"/scratch",
          "mode":1
        },
        {
          "src_path":"/opt/concourse-ci/volumes/live/00d9a1b3-8ff6-42cb-533f-4f35e70beab1/volume",
          "dst_path":"/tmp/build/dadbfeaa",
          "mode":1
        }
      ],
      "Network":"",
      "Privileged":false,
      "Limits":{
        "bandwidth_limits":{},
        "cpu_limits":{},
        "disk_limits":{},
        "memory_limits":{},
        "pid_limits":{}
      }
    },
    "session":"3.1.47"
  }
}

And I thought some people had the right ideas going in other threads, but I haven't seen any actual resolutions yet. However, if you're downgrading and it worked, which version did you downgrade to?

@larssb
Copy link
Author

larssb commented Feb 15, 2019

Hi @predatorian3,

I never downgraded. Do not want to do that. So I'm coldstarting Concourse on another machine, away from my QNAP, and running the jobs I need. However, I still want Concourse to work on my QNAP.

The worker also starts for me. The issue happens when whatever job on a pipeline tries to fetch a resource, kickstart a task or the like. Then that error is thrown.

QNAP ended up concluding that they cannot help me in this case. Something along the lines of; "we do not support 3rd party products", even though it works on so many other machines than the QNAP NAS I have.

So what is one to do 👎

@robinhuiser
Copy link

Not sure if this helps, but I got a similar error when deploying Concourse on Microk8s - in the end it was not a permission problem, but a runtime setting - once I set CONCOURSE_RUNTIME=containerd the issue was solved.

@taylorsilva
Copy link
Member

^this is likely due to changes in the kernel that guardian does not support. Similar to cgroupsv2 not being supported by guardian and the only solution being to switch to containerd or use cgroupsv1.

I'm going to close this issue because it's very old and likely not relevant to other users anymore.

Thanks for sharing the new info @robinhuiser

@larssb
Copy link
Author

larssb commented Feb 20, 2022

Allow me to come back and chime in. It seems I've finally succeeded in solving this. The culprit being something along the following lines.

  • cgroups v1 is used on the QNAP I'm using
  • kernel v5.10.60-qnap

the most recent iteration of the err. is mounting cgroup to rootfs at /sys/fs/cgroup caused: invalid argument: unknown

Trying one more debugging session on this and found it could potentially be around cgroups v1 being used.

I ended up switching to use docker run instead of docker-compose as docker-compose do not support setting cgroupns in a docker-compose.yaml file. See this link for more on that.

On the docker run cmd the important parameters to use are:

  • --cgroupns=host
  • -v /sys/fs/cgroup:/sys/fs/cgroup:rw

With the above no error. Freaking awesome!

N.B. I did also try with containerd as the runtime for concourse. However, that did not change anything in this case.


It sucks though that this is necessary as pretty much no isolation is left and the container is pretty insecure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants