-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support to explicitly specify a CPU architecture for the to-be-copied image instead of relying on auto-detection #961
Conversation
Head branch was pushed to by a user without write access
src/index.ts
Outdated
/** | ||
* The image architecture to be copied. | ||
* | ||
* @default - "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain what this default means in the doc string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
src/index.ts
Outdated
* | ||
* @default - "" | ||
*/ | ||
readonly imageArch?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably lean towards this. What do you think?
readonly imageArch?: string; | |
readonly architecture?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imageArch
is intentionally used to avoid any potential confusion with the architecture field referenced by lambda.SingletonFunction
in the ECRDeployment
class.
Currently, our Dockerfile uses a hard-coded architecture GOARCH=amd64
. If we'd like to allow customers to control what's used for the lambda function, we can uplift such a constraint and then expose lambdaArch
as a sibling field of imageArch
.
Hope that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't really explain the problem and the solution in the PR body, so I have to ask about it here to properly understand the motivation and the solution. I'll summarize to see if I understand correctly:
- This Lambda is copying and image from location A to location B.
- I know that an image can contain multiple architectures.
- In this PR, we are making it possible to copy only one of the architectures in that image over.
Here are my questions:
imageArch?: string
makes it so we can only copy 1 architecture over. Why not multiple? (imageArch?: string[]
)- What is the motivation? It seems like the most logical behavior of this construct would be to copy all architectures over. So either that is not the behavior, or this PR is intended to save space and time for the copies, by only copying a subset out of potentially dozens of images.
- If the behavior is NOT that all architectures are copied, then wouldn't that be a more logical and safer fix than having people specify their image architecture in Yet Another Place? (Potentially still with the subsetting behavior if we want to save on space and time).
Again, it would have helped me a lot if the line of thought had been spelled out more clearly in the PR body, because I'm left guessing a bit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comment! I've updated the PR description (copied below) to provide more clarity.
Regarding why we don't use string[]
, the reason is that the underlying containers/image
lib doesn't give us such an option, as shown here.
Note that the current system behavior is that the destination image is always amd64
even if it's copied from a multi-arch source image. For example, if the multi-arch source image supports both amd64
and arm64
, this PR will give users the flexibility to choose between those two, instead of having to live with amd64
.
===========================================================================
What's the problem?
It's observed that when copying a multi-arch source image from location A to B in ECR, we always end up with amd64
in the destination image at location B. The hypothesis is that the underlying lambda function used to auto-detect the architecture is backed up by Dockerfile with a hard-coded runtime.GOARCH.
Proposed Solution
Instead of solely relying on an auto-detection mechanism, this PR is meant to add a feature to allow users to explicitly specify a CPU architecture for the to-be-copied image, by triggering the underlying logic here. This will unblock the usage of Graviton instances for Amazon SageMaker and other computing platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the point of view of a user, this documentation string is at worst lying, and at best unhelpful:
@default - the underlying lambda auto-detects the relevant architecture (e.g., amd64, arm64)
* https://github.com/containers/image/blob/main/internal/pkg/platform/platform_matcher.go#L161
This description is true for the library that we use. But from the point of view of a CDK user, they don't control the platform of the Lambda that runs the library. That library will always run on amd64
, because we chose what platform to build and run the Lambda for, and we always pick amd64
because that's just who we are 😉 . So the library will helpfully autodetect amd64
in all cases, and there's nothing that CDK user can do about it.
We should just state that the default if unspecified is amd64
.
Question: what happens if the target image only has an arm64
variant and not an amd64
viarant, and we leave the default off? Will the library then copy the one variant that's there, or will it fail because amd64
doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding why we don't use string[], the reason is that the underlying containers/image lib doesn't give us such an option, as shown here.
I am looking at the documentation of this package, and I can at least see a type ImageListSelection
that seems to indicate it's possible to copy a list of images, and a GitHub issue saying it's currently possible to copy all architectures.
https://pkg.go.dev/github.com/containers/image/v5/copy#Image
Do these things not actually work? (I'm not up on all the intricacies of Docker but I hope you are)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comments.
d5cf620
to
3643177
Compare
src/index.ts
Outdated
* | ||
* @default - "" | ||
*/ | ||
readonly imageArch?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't really explain the problem and the solution in the PR body, so I have to ask about it here to properly understand the motivation and the solution. I'll summarize to see if I understand correctly:
- This Lambda is copying and image from location A to location B.
- I know that an image can contain multiple architectures.
- In this PR, we are making it possible to copy only one of the architectures in that image over.
Here are my questions:
imageArch?: string
makes it so we can only copy 1 architecture over. Why not multiple? (imageArch?: string[]
)- What is the motivation? It seems like the most logical behavior of this construct would be to copy all architectures over. So either that is not the behavior, or this PR is intended to save space and time for the copies, by only copying a subset out of potentially dozens of images.
- If the behavior is NOT that all architectures are copied, then wouldn't that be a more logical and safer fix than having people specify their image architecture in Yet Another Place? (Potentially still with the subsetting behavior if we want to save on space and time).
Again, it would have helped me a lot if the line of thought had been spelled out more clearly in the PR body, because I'm left guessing a bit here.
test/example.ecr-deployment.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the assertions in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike other typescript files under the test folder, example.ecr-deployment.ts
doesn't have a single line of expect
statement for assertion. My understanding is that it's a smoke test to check if anything chokes up when we interact with the ECRDeployment class with different flavors of arguments.
501271a
to
f97986a
Compare
…-be-copied image instead of relying on auto-detection.
What's the problem?
It's observed that when copying a multi-arch source image from location A to B in ECR, we always end up with
amd64
in the destination image at location B. The hypothesis is that the underlying lambda function used to auto-detect the architecture is backed up by Dockerfile with a hard-coded runtime.GOARCH.Proposed Solution
Instead of solely relying on an auto-detection mechanism, this PR is meant to add a feature to allow users to explicitly specify a CPU architecture for the to-be-copied image, by triggering the underlying logic here. This will unblock the usage of Graviton instances for Amazon SageMaker and other computing platforms.
As a side note, I've run
npm run release
on my dev box to refreshAPI.md