Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad cluster is unable to pull docker images from private registry #236

Closed
Zortaniac opened this issue Aug 25, 2023 · 10 comments
Closed

Nomad cluster is unable to pull docker images from private registry #236

Zortaniac opened this issue Aug 25, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@Zortaniac
Copy link
Contributor

Describe the bug
I'm unable to get nomad/docker to pull images from a private registry. As far as I can see, the problem seems to be related to the image cache.
First I added an auth config to the nomad configuration. But I noticed that it didn't work for nomad as well as docker itself on the running container.
What I then tried was to add the private registry to noProxy in the daemon.json, but that causes docker to crash during startup. Probably a better way would be to make it possible to let the image proxy to authenticate against the private registry.

To Reproduce

  1. Add a nomad cluster resource
  2. Run a nomad job with an image from a private registry
  3. See that nomad/docker is unable to pull the image

Expected behavior
It should be possible to allow nomad/docker to pull an image from a private registry.

@nicholasjackson
Copy link
Contributor

That @Zortaniac, this is going to be an issue with the pull-through cache; in the past, the way I have dealt with this is to use the
copy_image stanza on the nomad cluster.

copy_image {
  name = "myprivate.com/org/image:tag
   username = env("REGISTRY_USERNAME")
   password = env("REGISTRY_PASSWORD")
}

Then, use the image on nomad without any authentication.

copy_image first downloads the image to the local docker cache using authentication, then copies a tarball of that image to the nomad nodes local docker cache. When you try to run a nomad job that uses this image, it finds it locally and does not attempt to pull through the cache.

That said, this is not a fix but a workaround; I am looking at this now to see if I can fix this so that authenticated images can be pulled through the cache. It should be possible.

@nicholasjackson nicholasjackson added the bug Something isn't working label Aug 27, 2023
@Zortaniac
Copy link
Contributor Author

Thanks @nicholasjackson. I will use the workaround for now.

I'm not that familiar with the docker image cache, but if I understood it right, the cache needs to be able to authenticate against the private registry.
So, I was wondering if it would be an option to add a registry stanza that allows to configure additional registries in docker image cache.
I can try to add that stanza myself, if you haven't build a fix yet.

@nicholasjackson
Copy link
Contributor

@Zortaniac I have been looking into this, the pull through cache should just pass the auth through, I wonder if I am missing a setting. A registry stanza is a great idea, the pull through cache is great for saving bandwidth and stopping pull limits on Docker Hub.

@Zortaniac
Copy link
Contributor Author

As said I haven't used the docker cache yet, but as far as I understand the documentation it does not pass through private repositories due to security issues. It might allow access to private images by accident, if the cache is publicly available.

I will have a look if I get this implemented in the coming days.

@nicholasjackson
Copy link
Contributor

Ah we don’t use that one, we use this Nginx config that was written by a Googler for an example for Cloud Run.

https://github.com/shipyard-run/docker-registry-proxy

@Zortaniac
Copy link
Contributor Author

Thanks for the heads-up. It seems the same applies for that one as well. At least the there is some configuration to authenticate against private repositories.

@Zortaniac
Copy link
Contributor Author

@nicholasjackson, short update on this topic. I actually extended the image cache resource with some options to specify repositories, but it turned out to not be the problem.

I noticed that our repository isn't actually private (that is fine btw), but the current proxy settings interfere with the image download in a way that it causes authentication issues. I basically 'solved' it by removing the HTTP_PROXY environment variables all together. This is of course not a real fix because the cache isn't used anymore.
A better approach seams to be to either pass a --registry-mirror option to the docker daemon or adding registry-mirrors to the daemon.json.

Since the docker daemon is started by supervisord I assume passing the option isn't really possible? The daemon.json would need to be altered dynamically. Is there already a place in the code where something similar is done, to get an idea how it could be easily solved?

I also noticed some other problems, some I was already able to fix and will send a PR when I'm done:

  1. the default image cache is quite hard coded and always started, no matter if another image cache resource is configured.
  2. the environment settings on nomad resources are ignored/overwritten, making it impossible to set HTTP_PROXY (fixed)
  3. The networks configuration is inconsistent on the image cache resource and makes it hard to attach a network resource (fixed)

@nicholasjackson
Copy link
Contributor

Registry mirrors, sadly, would not solve the problem as you would still need to store the authentication credentials in the mirror. It might be a better option than using the HTTP proxy.

The current solution, where copy_images can be used to pull an authenticated image and then add that image to the remote clusters without authentication, is workable. But, it would be helpful to expose a stanza block for image cache configuration that would allow the proxy to be configured for registries outside the current list and to configure authenticated repos.

@Zortaniac
Copy link
Contributor Author

Sry, if I didn't make it clear. I already extended the stanza to allow for repository configuration. It gets passed to the right environment variables within the image proxy as described in the README and looks like this:

resource "image_cache" "private" {
  network {
    id = resource.network.my.id
  }

  registry {
    hostname = "my.image.reg"
    auth {
      hostname = "auth-my.image.reg"
      username = "user"
      password = "password"
    }
  }
}

That's how I noticed that the authentication doesn't seem to be the problem but the proxy configuration on the nomad container itself.

@Zortaniac
Copy link
Contributor Author

@nicholasjackson, I opened the PR #244 that adds support for specifying (private) registries. I tested it with gitlab.com, others might require some additional work.
I integrated the image cache resource into the nomad cluster and allowed to specify the NO_PROXY environment variable. It seems that the trouble with our gitlab registry is due to running on a non standard port, causing the proxy cache to be confused.
I will see that I add some tests, but feedback is already welcome.

@eveld eveld added core and removed core labels Oct 31, 2023
@eveld eveld modified the milestone: v0.1.0 Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants