Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stacker ideas from <nixpkgs>.dockerTools #105

Open
CajuM opened this issue Nov 6, 2020 · 4 comments
Open

stacker ideas from <nixpkgs>.dockerTools #105

CajuM opened this issue Nov 6, 2020 · 4 comments

Comments

@CajuM
Copy link

CajuM commented Nov 6, 2020

  • Use a programing language for image generation, for example nixpkgs uses a purely functional programing language called nix expressions. This will greatly increase the expressiveness of stackerfiles and improve code re-use.
  • Use hashes instead of git revisions for dependency tracking, also avoid using git for this purpose. Thereby decoupling from any VCS.
  • Make stackerfiles pure, this is in order to ensure 1 to 1 mapping between source and image. Possibly enabling reproducible builds. environment substitutions must be removed to this end.
  • Address images by the hash of their evaluated stackerfile, for caching purposes.
  • Create a distributed build environment by generating a DAG from the evaluated stackerfile and triggering remote builds, by ex: ssh-ing into them and running stacker build stacker.nix -A myC3Image

References:

@tych0
Copy link
Collaborator

tych0 commented Nov 19, 2020

Make stackerfiles pure, this is in order to ensure 1 to 1 mapping between source and image.

It's not really clear how useful this would be. e.g. you can't use yum any more, because the repos could point to other stuff, etc. You could use stacker to accomplish fully reproducible stuff if you want, but it's very painful to do, and not likely to be any kind of default any time soon.

Address images by the hash of their evaluated stackerfile, for caching purposes.

The caching code uses the hash of the input stacker file, but also the hashes of the import values. So I think this is covered.

Create a distributed build environment by generating a DAG from the evaluated stackerfile and triggering remote builds, by ex: ssh-ing into them and running stacker build stacker.nix -A myC3Image

This seems like something better suited for a higher level tool, vs stacker itself. Unless there's some nice way to embed k8s as a go library ;)

@CajuM
Copy link
Author

CajuM commented Nov 19, 2020

Make stackerfiles pure, this is in order to ensure 1 to 1 mapping between source and image.

It's not really clear how useful this would be. e.g. you can't use yum any more, because the repos could point to other stuff, etc. You could use stacker to accomplish fully reproducible stuff if you want, but it's very painful to do, and not likely to be any kind of default any time soon.

Agree, if it were indeed a default it would require package manager support for repo metadata snapshots. Actually... you'd be fully incapable of using the network at all outiside stacker's downloader to enable reproducible builds. So as to not bypass the consitent input guarantees.

Address images by the hash of their evaluated stackerfile, for caching purposes.

The caching code uses the hash of the input stacker file, but also the hashes of the import values. So I think this is covered.

I was reffering here to when you derive from an existing image, you would use the hash of the sources as the tag, instead of the git commit, to decouple from the VCS.

Create a distributed build environment by generating a DAG from the evaluated stackerfile and triggering remote builds, by ex: ssh-ing into them and running stacker build stacker.nix -A myC3Image

This seems like something better suited for a higher level tool, vs stacker itself. Unless there's some nice way to embed k8s as a go library ;)
Agreed

In the current workflow, as I recall, there is a tight coupling between VCS and stacker, it ensures that it's using the latest parent image by walking the git tree until it finds a modification.

I was thinking in the context of reproducible builds that you would be able to just ignore versioning, or any concept of newer/older layer. You would have a working directory with a stacker.nix, call stacker build . -A img1 -A img2 if any dependent image is missing from the zot repo, indexed by its tag, which maps 1 to 1 to its source, you build and upload it.

Currently what we do is start from the image we want to build and walk the git tree until we find the latest change in the source of its parent image. If we had a 1 to 1 mapping between image and source we'd know exactly which image we depend upon and if it's missing without a VCS.

Now that I think of it, it's not really necessary to have a 1 to 1 mapping between source and image hash, just its tag, in order to eliminate a VCS from the workflow. That's only necessary if you want reproducible builds, which as you said is of limited use and difficult to implement.

@tych0
Copy link
Collaborator

tych0 commented Nov 19, 2020

In the current workflow, as I recall, there is a tight coupling between VCS and stacker, it ensures that it's using the latest parent image by walking the git tree until it finds a modification.

There's no required coupling with git, although it does add the current git hash to the generated metadata if you happen to be in a git repo. You can push images named for their git hash via stacker publish, but that's also for convention's sake: it's not clear how one would derive a hash of all the inputs without actually downloading them to the local disk, and the point of the caching convention is to avoid in part that local download.

Now that I think of it, it's not really necessary to have a 1 to 1 mapping between source and image hash, just its tag, in order to eliminate a VCS from the workflow.

I think it depends on what you're after. It's a nice idea, the question is just how to communicate these hashes to everyone. A convention via git hashes is probably the dumbest way, but perhaps there are others.

@CajuM
Copy link
Author

CajuM commented Nov 19, 2020

the way nix does it is by having the developers embed the hashes into their source, for stacker you'd have to add the hash of each import as an extra filed in the stackerfile, this wouldn't work with images without source to binary mapping though, so images should only rely on tags. Also I was thinking of something like:

from: img_var # if we were to use a programing language we would be able to pass arbitrary images as parents in a reusable way
# if the image described by img_var has been cached by zot, download it. Its tag will be the same as its source. Otherwise, build it.
# and at the end, and this should be external to stacker, upload using skopeo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants