feat: Add support to explicitly specify a CPU architecture for the to-be-copied image instead of relying on auto-detection #961

xiongbo-sjtu · 2024-11-29T04:08:42Z

What's the problem?

It's observed that when copying a multi-arch source image from location A to B in ECR, we always end up with amd64 in the destination image at location B. The hypothesis is that the underlying lambda function used to auto-detect the architecture is backed up by Dockerfile with a hard-coded runtime.GOARCH.

Proposed Solution

Instead of solely relying on an auto-detection mechanism, this PR is meant to add a feature to allow users to explicitly specify a CPU architecture for the to-be-copied image, by triggering the underlying logic here. This will unblock the usage of Graviton instances for Amazon SageMaker and other computing platforms.

As a side note, I've run npm run release on my dev box to refresh API.md

lambda/main.go

lambda/main_test.go

mrgrain · 2024-11-29T10:34:34Z

src/index.ts

+  /**
+   * The image architecture to be copied.
+   *
+   * @default - ""


please explain what this default means in the doc string

mrgrain · 2024-11-29T10:36:24Z

src/index.ts

+   *
+   * @default - ""
+   */
+  readonly imageArch?: string;


I'd probably lean towards this. What do you think?

Suggested change

readonly imageArch?: string;

readonly architecture?: string;

imageArch is intentionally used to avoid any potential confusion with the architecture field referenced by lambda.SingletonFunction in the ECRDeployment class.

Currently, our Dockerfile uses a hard-coded architecture GOARCH=amd64. If we'd like to allow customers to control what's used for the lambda function, we can uplift such a constraint and then expose lambdaArch as a sibling field of imageArch.

Hope that makes sense.

You didn't really explain the problem and the solution in the PR body, so I have to ask about it here to properly understand the motivation and the solution. I'll summarize to see if I understand correctly:

This Lambda is copying and image from location A to location B.

I know that an image can contain multiple architectures.

In this PR, we are making it possible to copy only one of the architectures in that image over.

Here are my questions:

imageArch?: string makes it so we can only copy 1 architecture over. Why not multiple? (imageArch?: string[])

What is the motivation? It seems like the most logical behavior of this construct would be to copy all architectures over. So either that is not the behavior, or this PR is intended to save space and time for the copies, by only copying a subset out of potentially dozens of images.

If the behavior is NOT that all architectures are copied, then wouldn't that be a more logical and safer fix than having people specify their image architecture in Yet Another Place? (Potentially still with the subsetting behavior if we want to save on space and time).

Again, it would have helped me a lot if the line of thought had been spelled out more clearly in the PR body, because I'm left guessing a bit here.

Great comment! I've updated the PR description (copied below) to provide more clarity.

Regarding why we don't use string[], the reason is that the underlying containers/image lib doesn't give us such an option, as shown here.

Note that the current system behavior is that the destination image is always amd64 even if it's copied from a multi-arch source image. For example, if the multi-arch source image supports both amd64 and arm64, this PR will give users the flexibility to choose between those two, instead of having to live with amd64.

===========================================================================

What's the problem?

It's observed that when copying a multi-arch source image from location A to B in ECR, we always end up with amd64 in the destination image at location B. The hypothesis is that the underlying lambda function used to auto-detect the architecture is backed up by Dockerfile with a hard-coded runtime.GOARCH.

Proposed Solution

Instead of solely relying on an auto-detection mechanism, this PR is meant to add a feature to allow users to explicitly specify a CPU architecture for the to-be-copied image, by triggering the underlying logic here. This will unblock the usage of Graviton instances for Amazon SageMaker and other computing platforms.

From the point of view of a user, this documentation string is at worst lying, and at best unhelpful:

@default - the underlying lambda auto-detects the relevant architecture (e.g., amd64, arm64) * https://github.com/containers/image/blob/main/internal/pkg/platform/platform_matcher.go#L161

This description is true for the library that we use. But from the point of view of a CDK user, they don't control the platform of the Lambda that runs the library. That library will always run on amd64, because we chose what platform to build and run the Lambda for, and we always pick amd64 because that's just who we are 😉 . So the library will helpfully autodetect amd64 in all cases, and there's nothing that CDK user can do about it.

We should just state that the default if unspecified is amd64.

Question: what happens if the target image only has an arm64 variant and not an amd64 viarant, and we leave the default off? Will the library then copy the one variant that's there, or will it fail because amd64 doesn't exist?

Regarding why we don't use string[], the reason is that the underlying containers/image lib doesn't give us such an option, as shown here.

I am looking at the documentation of this package, and I can at least see a type ImageListSelection that seems to indicate it's possible to copy a list of images, and a GitHub issue saying it's currently possible to copy all architectures.

https://pkg.go.dev/github.com/containers/image/v5/copy#Image

Do these things not actually work? (I'm not up on all the intricacies of Docker but I hope you are)

mrgrain

See inline comments.

src/index.ts

rix0rrr · 2024-12-17T10:22:26Z

src/index.ts

+   *
+   * @default - ""
+   */
+  readonly imageArch?: string;


You didn't really explain the problem and the solution in the PR body, so I have to ask about it here to properly understand the motivation and the solution. I'll summarize to see if I understand correctly:

This Lambda is copying and image from location A to location B.

I know that an image can contain multiple architectures.

In this PR, we are making it possible to copy only one of the architectures in that image over.

Here are my questions:

imageArch?: string makes it so we can only copy 1 architecture over. Why not multiple? (imageArch?: string[])

What is the motivation? It seems like the most logical behavior of this construct would be to copy all architectures over. So either that is not the behavior, or this PR is intended to save space and time for the copies, by only copying a subset out of potentially dozens of images.

If the behavior is NOT that all architectures are copied, then wouldn't that be a more logical and safer fix than having people specify their image architecture in Yet Another Place? (Potentially still with the subsetting behavior if we want to save on space and time).

Again, it would have helped me a lot if the line of thought had been spelled out more clearly in the PR body, because I'm left guessing a bit here.

rix0rrr · 2024-12-17T10:23:55Z

test/example.ecr-deployment.ts

Where are the assertions in the tests?

Unlike other typescript files under the test folder, example.ecr-deployment.ts doesn't have a single line of expect statement for assertion. My understanding is that it's a smoke test to check if anything chokes up when we interact with the ECRDeployment class with different flavors of arguments.

…-be-copied image instead of relying on auto-detection.

Addressed

cdklabs-automation enabled auto-merge November 29, 2024 04:08

xiongbo-sjtu changed the title ~~Specify CPU architecture for both image and Lambda~~ feat: Add support to specify CPU architecture for both image and Lambda Nov 29, 2024

auto-merge was automatically disabled November 29, 2024 04:28
Head branch was pushed to by a user without write access

xiongbo-sjtu force-pushed the main branch from 3af917b to 1e088dd Compare November 29, 2024 04:28

xiongbo-sjtu changed the title ~~feat: Add support to specify CPU architecture for both image and Lambda~~ feat: Add support to specify CPU architecture for the to-be-copied image Nov 29, 2024

mrgrain reviewed Nov 29, 2024

View reviewed changes

lambda/main.go Show resolved Hide resolved

mrgrain reviewed Nov 29, 2024

View reviewed changes

lambda/main_test.go Outdated Show resolved Hide resolved

mrgrain reviewed Nov 29, 2024

View reviewed changes

lambda/main_test.go Outdated Show resolved Hide resolved

mrgrain reviewed Nov 29, 2024

View reviewed changes

mrgrain previously requested changes Nov 29, 2024

View reviewed changes

xiongbo-sjtu force-pushed the main branch 4 times, most recently from d5cf620 to 3643177 Compare November 29, 2024 20:02

xiongbo-sjtu requested a review from mrgrain November 29, 2024 20:26

jiayiwang7 assigned mrgrain and rix0rrr and unassigned mrgrain Dec 16, 2024

rix0rrr requested changes Dec 17, 2024

View reviewed changes

xiongbo-sjtu force-pushed the main branch 3 times, most recently from 501271a to f97986a Compare December 18, 2024 20:18

xiongbo-sjtu changed the title ~~feat: Add support to specify CPU architecture for the to-be-copied image~~ feat: Add support to explicitly specify a CPU architecture for the to-be-copied image instead of relying on auto-detection Dec 18, 2024

feat: Add support to explicitly specify a CPU architecture for the to…

91c77ef

…-be-copied image instead of relying on auto-detection.

xiongbo-sjtu force-pushed the main branch from f97986a to 91c77ef Compare December 18, 2024 20:37

Make a single-element array

7804e70

rix0rrr approved these changes Dec 20, 2024

View reviewed changes

Ignore packaging file

9979db5

rix0rrr added this pull request to the merge queue Dec 23, 2024

Merged via the queue into cdklabs:main with commit fbab401 Dec 23, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support to explicitly specify a CPU architecture for the to-be-copied image instead of relying on auto-detection #961

feat: Add support to explicitly specify a CPU architecture for the to-be-copied image instead of relying on auto-detection #961

xiongbo-sjtu commented Nov 29, 2024 •

edited

Loading

mrgrain Nov 29, 2024

xiongbo-sjtu Nov 29, 2024

mrgrain Nov 29, 2024

xiongbo-sjtu Nov 29, 2024

rix0rrr Dec 17, 2024 •

edited

Loading

xiongbo-sjtu Dec 18, 2024 •

edited

Loading

rix0rrr Dec 19, 2024 •

edited

Loading

rix0rrr Dec 19, 2024 •

edited

Loading

mrgrain left a comment

rix0rrr Dec 17, 2024 •

edited

Loading

rix0rrr Dec 17, 2024

xiongbo-sjtu Dec 18, 2024

feat: Add support to explicitly specify a CPU architecture for the to-be-copied image instead of relying on auto-detection #961

feat: Add support to explicitly specify a CPU architecture for the to-be-copied image instead of relying on auto-detection #961

Conversation

xiongbo-sjtu commented Nov 29, 2024 • edited Loading

What's the problem?

Proposed Solution

mrgrain Nov 29, 2024

Choose a reason for hiding this comment

xiongbo-sjtu Nov 29, 2024

Choose a reason for hiding this comment

mrgrain Nov 29, 2024

Choose a reason for hiding this comment

xiongbo-sjtu Nov 29, 2024

Choose a reason for hiding this comment

rix0rrr Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

xiongbo-sjtu Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

What's the problem?

Proposed Solution

rix0rrr Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

rix0rrr Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

mrgrain left a comment

Choose a reason for hiding this comment

rix0rrr Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

rix0rrr Dec 17, 2024

Choose a reason for hiding this comment

xiongbo-sjtu Dec 18, 2024

Choose a reason for hiding this comment

xiongbo-sjtu commented Nov 29, 2024 •

edited

Loading

rix0rrr Dec 17, 2024 •

edited

Loading

xiongbo-sjtu Dec 18, 2024 •

edited

Loading

rix0rrr Dec 19, 2024 •

edited

Loading

rix0rrr Dec 19, 2024 •

edited

Loading

rix0rrr Dec 17, 2024 •

edited

Loading