Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Build Dockerfiles with incremental cache support #35

Closed
stabai opened this issue Sep 26, 2022 · 7 comments
Closed

FR: Build Dockerfiles with incremental cache support #35

stabai opened this issue Sep 26, 2022 · 7 comments
Assignees
Labels
need: discussion Needs a proper discussion around the problem.

Comments

@stabai
Copy link

stabai commented Sep 26, 2022

Feature request:
Allow Bazel to build Dockerfile images in a way that takes advantage of incremental build caching. The dockerfile_image rule in rules_docker contrib allows building such images, but it's incredibly slow because of the lack of build cache.

@cgrindel cgrindel added enhancement New feature or request need: discussion Needs a proper discussion around the problem. prioritized Similar to P1 and removed need: discussion Needs a proper discussion around the problem. labels Sep 27, 2022
@cgrindel cgrindel added need: discussion Needs a proper discussion around the problem. and removed prioritized Similar to P1 labels Oct 5, 2022
@cgrindel
Copy link
Member

Thanks for your proposal. @thesayyn has been thinking about this, as well. He will comment on this issue with his thoughts. Hopefully, we can come up with a solution that satisfies your goals.

@cgrindel cgrindel added question This is a question rather than a bug report or feature request and removed enhancement New feature or request need: discussion Needs a proper discussion around the problem. labels Oct 20, 2022
@thesayyn
Copy link
Collaborator

There are some problems that'll prevent us from doing this the way you want it;

  • bazel has its own cache so anything that falls out of it is not tracked by bazel. therefore it would not be trivial to get the docker cache included in the bazel cache.
  • an alternative is to keep the docker build cache local to the host and let it leak into the sandbox. but this would increase the likelihood of non-hermetic builds.
  • biggest drawback is that Dockerfile is not hermetic by design, meaning instructions such as RUN apt-get install curl or make build is not repeatable, which would lead to non-reproducible builds.
  • bazel already knows how to [cross-]compile various languages, so it doesn't make sense to delegate this task to docker.
  • requires rules_oci to declare a dependency on a docker_toolchain which we are not willing to take.

I have been thinking about this issue for a while, how to tackle the problem without sacrificing a whole lot. combination of gazelle + rules_oci is the way to do this. to explain the idea roughly;

given a Dockerfile;

FROM node:12

RUN apt-install curl

WORKDIR /app

COPY index.js  .

RUN yarn install --production

CMD ["node", "index.js"]

EXPOSE 3000

a gazelle plugin for rules_oci would translate it to;

WORKSPACE

debian_archive(
    name = "debian_amd64_curl"
    urls = ["debian_archive_url"]
)

oci_pull(
    name = "node_12",
    repository = "index.docker.io/library/node",
    ref = "12"
)

translate_pnpm_lock(
   name = "npm",
   yarn_lock = "yarn.lock"
)

BUILD

js_binary(
   name = "app",
   entry_point = "index.js"
)
oci_image(
   name = "image",
   base = "@node_12//:image", 
   tars = [
      ":app"
   ]
   workdir = "/app",
   cmd = ["index.js"],
   ports = [3000],
   
)

@mvgijssel
Copy link

Does the debian_archive rule exist somewhere? As this would solve 90% of the problem for me!

@michaeljs1990
Copy link

michaeljs1990 commented Aug 14, 2023

Does the debian_archive rule exist somewhere? As this would solve 90% of the problem for me!

This is from the distroless repo https://github.com/GoogleContainerTools/distroless/blob/main/private/remote/debian_archive.bzl however it's not very easy to use and in order to keep your packages up to date you will need automated tooling. They have written debian_package_manager also located in that repo but it only works with the main repo and channels. You can modify it easily to work with other channels but if you want to do something like pull from a different repo such as postgres you will not have a fun time.

I spent a few hours but noticed the way the rules are setup currently it is fairly difficult to just pull the rules out and use them. I unfortunately had to fall back to using rules_docker because of the timeline I am working on however I really want to use rules_oci and have created https://github.com/michaeljs1990/rules_oci_helpers where I'm going to start building out the tooling that will let me switch over to it. It will likely be a few weeks before it's at a point where others can use it.

Additionally https://github.com/GoogleContainerTools/rules_distroless looks to eventually server the same purpose and I have asked about helping with this repo in another ticket but ideally my rule set would be merged with rules_distroless or replaced by it in the future.

@yeukhon
Copy link

yeukhon commented Nov 1, 2023

I come here with a similar frustration. Understand this is not an easy problem, but it seems like it is impossible to build anything without breaking outside of bazel.

Like, to build a base image with all the years-old-wisdom packages (mysql, postgres, curl etc) we need to do this outside of bazel with vanilla dockerfile (or an alternative to docker cli such as https://buildah.io/). Upstream maintainers spent half of their lives building rpm and deb packages. It isn't as simple as just downloading a deb, we'd have to find all the required debs too...

We should find a solution together. Falling back to rules_docker is unreasonable because it is a dead project and it won't receive updates for newer version of bazel.

Slapping on yet another rule seems like npm's approach. While I favor keeping rules_oci as simple as possible, oci_image as a macro, which is to build an image, lacks the ability to actually run actions seems like a missed opportunity.

@thesayyn
Copy link
Collaborator

Now that https://github.com/chainguard-dev/rules_apko exists and
we are working on bazel-contrib/SIG-rules-authors#88 https://github.com/GoogleContainerTools/rules_distroless this should be less frustrating.

dockerfile_image and container_run_and_commit are a mistake, it might be comfortable to use but it barely does the right thing. It's not platforms compatible, not-hermetic (therefore root of cache misses).

NOTE: Whatever dockerfile_image did can be done in a genrule.

@thesayyn thesayyn pinned this issue Dec 12, 2023
@thesayyn thesayyn added need: discussion Needs a proper discussion around the problem. and removed question This is a question rather than a bug report or feature request labels Dec 12, 2023
@thesayyn thesayyn changed the title Build Dockerfiles with incremental cache support FR: Build Dockerfiles with incremental cache support Dec 12, 2023
@thesayyn
Copy link
Collaborator

Closing as completed #570

@thesayyn thesayyn unpinned this issue May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need: discussion Needs a proper discussion around the problem.
Projects
None yet
Development

No branches or pull requests

6 participants