Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-add "Change the generated image to bci-busybox:15.6 (#252)" with additional fixes #311

Merged
merged 2 commits into from
Nov 26, 2024

Conversation

jiaqiluo
Copy link
Member

@jiaqiluo jiaqiluo commented Nov 13, 2024

rancher/rancher#46100

This PR includes:

More information about the node-driver-downloading issue:

When provisioning an RKE2/K3s node-driver cluster in Rancher, Rancher deploys a Job that runs the rancher-machine container. This container first downloads the external node driver, then invokes the rancher-machine binary to create the VM in the cloud (code). The container runs with a security context configured as runAsUser: 1000 and runAsGroup: 1000. In the original PR, the Job fails to move the downloaded node driver to /usr/local/bin/ due to a permissions issue.

The fix for this issue involves changing the ownership of /usr/local/bin to the machine user (UID 1000). This allows the running container to move the node driver to /usr/local/bin while being unable to modify any existing binaries which are owned by root. Additionally, the security context set on the container ensures that the process runs as a non-root user.

Dev validate

We can use -e CATTLE_MACHINE_PROVISION_IMAGE=$IMAGE in the docker run command to override the rancher-machine image used by the v1 provisioning framework.

The Docker image built from this PR was tested in Rancher by provisioning a node-driver Linode K3s cluster, and the cluster was successfully provisioned.

Below are the pod logs:

Downloading driver from https://<IP>/assets/docker-machine-driver-linode
Doing /etc/rancher/ssl
docker-machine-driver-linode
docker-machine-driver-linode: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
Running pre-create checks...
(aaaa-jiaqi-ta-qbwvq-vrsrl) Generating a secure disposable linode-root-pass...
Creating machine...
(aaaa-jiaqi-ta-qbwvq-vrsrl) Creating Linode machine instance...
(aaaa-jiaqi-ta-qbwvq-vrsrl) Waiting for Machine Running...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Provisioning with custom install script via SSH, not installing Docker...

@jiaqiluo jiaqiluo changed the title Re-add "Change the generated image to bci-busybox:15.6 (#252)" Re-add "Change the generated image to bci-busybox:15.6 (#252)" with extra fixes Nov 13, 2024
@jiaqiluo jiaqiluo changed the title Re-add "Change the generated image to bci-busybox:15.6 (#252)" with extra fixes Re-add "Change the generated image to bci-busybox:15.6 (#252)" with additional fixes Nov 13, 2024
@jiaqiluo jiaqiluo requested review from a team and pjbgf November 13, 2024 23:14
pjbgf
pjbgf previously approved these changes Nov 15, 2024
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few optional changes, otherwise LGTM.

package/Dockerfile Show resolved Hide resolved
package/Dockerfile Show resolved Hide resolved
package/Dockerfile Show resolved Hide resolved
@jiaqiluo
Copy link
Member Author

jiaqiluo commented Nov 15, 2024

Hi @pjbgf, I tested your suggested change but encountered the same issue that the PR aims to address. It turns out that while useradd -m -u 1000 machine does create the /home/machine directory, it is not copied to the final stage of the image. The current approach - manually creating the directory and setting its ownership to machine - appears to be the simplest and most straightforward solution. Please let me know what you think.

@jiaqiluo jiaqiluo requested a review from pjbgf November 15, 2024 22:05
@jakefhyde
Copy link

jakefhyde commented Nov 25, 2024

@jiaqiluo Just wanted to confirm before you merge, the issues we noticed around not being able to provision are no longer seen, correct? I see that you tested with Linode, could you also confirm that CI works if you run a make provisioning-tests locally from rancher?

@jiaqiluo
Copy link
Member Author

Hi @jakefhyde, could you explain how the rancher-machine is being used in the provisioning test suites? To perform the tests you suggested, do I need to build a rancher-machine Docker image from this PR and use it in rancher/rancher?

@jakefhyde
Copy link

Hi @jakefhyde, could you explain how the rancher-machine is being used in the provisioning test suites? To perform the tests you suggested, do I need to build a rancher-machine Docker image from this PR and use it in rancher/rancher?

Yes exactly. Rancher CI uses whatever version of machine is configured in dapper, which is currently v0.15.0-rancher122. I would build a machine image, and then test with that, pointing dapper to use your version and run make provisioning-tests.

@jiaqiluo
Copy link
Member Author

Using a custom Docker image built from this PR branch in rancher/rancher to run the provisioning tests, all tests passed successfully (link), confirming that the PR resolves the issue encountered with the initial fix attempt.

@jiaqiluo jiaqiluo merged commit 442150f into rancher:master Nov 26, 2024
2 checks passed
@jiaqiluo jiaqiluo deleted the re-support-bci-box branch November 26, 2024 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants