Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process-user-data: create generic file provider for userdata #2172

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mkulke
Copy link
Collaborator

@mkulke mkulke commented Nov 28, 2024

The docker provider currently looks up a certain path to find a user data file. We can generalize this to use it also for config disks.

In mkosi the process-user-data service has a soft dependency on a mount unit that will mount a confidisk with the label "cidata". the service can then consume the userdata from the mount point.

This will enable mkosi x86_64 images to work on libvirt unmodified.

@mkulke mkulke requested a review from a team as a code owner November 28, 2024 12:49
@mkulke mkulke marked this pull request as draft November 28, 2024 12:49
@mkulke mkulke force-pushed the mkulke/add-support-for-cidata-isos-in-mkosi branch from 5898e39 to 337136e Compare November 28, 2024 12:50
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I will try and test it in my mkosi libvirt e2e test once I've validated (or given up on) the separate mkoki cloud-init approach

src/cloud-api-adaptor/pkg/paths/paths.go Outdated Show resolved Hide resolved
The docker provider currently looks up a certain path to find a user
data file. We can generalize this to use it also for config disks.

In mkosi the process-user-data service has a soft dependency on a mount
unit that will mount a confidisk with the label "cidata". the service
can then consume the userdata from the mount point.

This will enable mkosi x86_64 images to work on libvirt unmodified.

Signed-off-by: Magnus Kulke <[email protected]>
@mkulke mkulke force-pushed the mkulke/add-support-for-cidata-isos-in-mkosi branch from 337136e to f08f4b5 Compare November 28, 2024 15:47
@stevenhorsman
Copy link
Member

FYI my test run that uses this rather than initdata is failing: https://github.com/stevenhorsman/cloud-api-adaptor/actions/runs/12071315597/job/33663455543 and it looks like pull image is going wrong. I will try and debug this locally to see if I can work out what's happening with daemon.json etc

@mkulke
Copy link
Collaborator Author

mkulke commented Nov 28, 2024

FYI my test run that uses this rather than initdata is failing: https://github.com/stevenhorsman/cloud-api-adaptor/actions/runs/12071315597/job/33663455543 and it looks like pull image is going wrong. I will try and debug this locally to see if I can work out what's happening with daemon.json etc

yes, I can see the same on azure:

CreateContainer fails: rpc error: code = Internal desc = failed to pull manifest error sending request for url (https://index.docker.io/v2/library/nginx/manifests/sha256:e56797eab4a5300158cc015296229e13a390f82bfc88803f45b08912fd5e3348)

Stack backtrace:
   0: anyhow::kind::Adhoc::new
   1: image_rs::image::ImageClient::pull_image::{{closure}}.14396
   2: <kata_agent::storage::image_pull_handler::ImagePullHandler as kata_agent::storage::StorageHandler>::create_device::{{closure}}

@mkulke
Copy link
Collaborator Author

mkulke commented Nov 28, 2024

gah, it's not consistent, sometimes the container will come up.

NAME                         READY   STATUS              RESTARTS      AGE
kube-relay                   1/1     Running             0             9m20s
nginx-caa-66759984cb-29mdw   0/1     RunContainerError   0             2m30s
nginx-caa-66759984cb-2tc9k   0/1     CrashLoopBackOff    1 (17s ago)   2m30s
nginx-caa-66759984cb-9z9cp   1/1     Running             0             2m30s
nginx-caa-66759984cb-dxgqr   0/1     RunContainerError   0             2m30s
nginx-caa-66759984cb-kbl4t   0/1     ContainerCreating   0             2m30s
nginx-caa-66759984cb-q8mw7   1/1     Running             0             2m30s
nginx-caa-66759984cb-rt49k   0/1     CrashLoopBackOff    1 (19s ago)   2m30s
nginx-caa-66759984cb-stlnq   1/1     Running             0             2m30s
nginx-caa-66759984cb-w6chl   0/1     CrashLoopBackOff    1 (17s ago)   2m30s
nginx-caa-66759984cb-x8bml   0/1     CrashLoopBackOff    1 (18s ago)   2m30s

@stevenhorsman
Copy link
Member

Yeah, I saw a few tests that passed too. Sorry for the lack of debug info, my test VM's kcli seems to be on strike:

# sudo kcli create pool -p /var/lib/libvirt/images default
Creating pool default...
Pool default already there.Leaving...
# kcli download image ubuntu2204
Grabbing image ubuntu2204 from url https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
Image ubuntu2204 not Added because Pool default not found

so I'm setting up a new test env, which means a new mkosi image build with a different ssh key, so I might not have much usefulness tonight!

@stevenhorsman
Copy link
Member

stevenhorsman commented Nov 29, 2024

This might be user error, but when logging into the VM I'm seeing:

[root@fedora ~]# ls /run/peerpod
policy.rego
[root@fedora ~]# journalctl -u process-user-data
Nov 29 10:54:38 fedora process-user-data[431]: 2024/11/29 10:54:38 [userdata/provision] unsupported user data provider, we extract and calculate initdata hash only.
Nov 29 10:54:38 fedora process-user-data[431]: 2024/11/29 10:54:38 [userdata/provision] File /run/peerpod/initdata not found, skipped initdata processing.
Nov 29 10:54:38 fedora bash[469]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 10:54:38 fedora bash[462]: ERROR: Algorithm "sha256" expects a size of 32 bytes, got: 0
Nov 29 10:54:38 fedora bash[517]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 10:54:38 fedora bash[515]: ERROR: Algorithm "sha384" expects a size of 48 bytes, got: 0
Nov 29 10:54:38 fedora systemd[1]: Finished process-user-data.service - Process user data.
[root@fedora ~]# ls /run/media/cidata/user-data
/run/media/cidata/user-data

@mkulke
Copy link
Collaborator Author

mkulke commented Nov 29, 2024

This might be user error, but when logging into the VM I'm seeing:

[root@fedora ~]# ls /run/peerpod
policy.rego
[root@fedora ~]# journalctl -u process-user-data
Nov 29 10:54:38 fedora process-user-data[431]: 2024/11/29 10:54:38 [userdata/provision] unsupported user data provider, we extract and calculate initdata hash only.
Nov 29 10:54:38 fedora process-user-data[431]: 2024/11/29 10:54:38 [userdata/provision] File /run/peerpod/initdata not found, skipped initdata processing.
Nov 29 10:54:38 fedora bash[469]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 10:54:38 fedora bash[462]: ERROR: Algorithm "sha256" expects a size of 32 bytes, got: 0
Nov 29 10:54:38 fedora bash[517]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 10:54:38 fedora bash[515]: ERROR: Algorithm "sha384" expects a size of 48 bytes, got: 0
Nov 29 10:54:38 fedora systemd[1]: Finished process-user-data.service - Process user data.
[root@fedora ~]# ls /run/media/cidata/user-data
/run/media/cidata/user-data

yeah, this is not pretty, but not an error, i think. it tries to measure initdata.digest as a non-criticial post-execution step, even if it's not present. I didn't invest too much into it, since attestation-agent was supposed to handle that in the near future. but we can probably do something like a path watcher that will trigger only if /run/peerpod/initdata.digest is written

@stevenhorsman
Copy link
Member

Sorry, I'm not too worried about the initdata message, just posting the full log. I think the unsupported user data provider bit and lack of daemon.json is the key point which suggests to me that the FileUserDataProvider isn't being created?

@mkulke
Copy link
Collaborator Author

mkulke commented Nov 29, 2024

Sorry, I'm not too worried about the initdata message, just posting the full log. I think the unsupported user data provider bit and lack of daemon.json is the key point which suggests to me that the FileUserDataProvider isn't being created?

oh. yes, good catch. this means that /run/media/cidata/user-data does not exist, can you confirm?

@stevenhorsman
Copy link
Member

Sorry, I'm not too worried about the initdata message, just posting the full log. I think the unsupported user data provider bit and lack of daemon.json is the key point which suggests to me that the FileUserDataProvider isn't being created?

oh. yes, good catch. this means that /run/media/cidata/user-data does not exist, can you confirm?

It does:

# cat //run/media/cidata/user-data
#cloud-config

write_files:
  - path: /run/peerpod/agent-config.toml
    content: |
      server_addr = 'unix:///run/kata-containers/agent.sock'
      guest_components_procs = 'none'
      image_registry_auth = 'file:///run/peerpod/auth.json'
  - path: /run/peerpod/daemon.json
    content: |
      {

that's why I think it might be user error here. I'm doing a build with some debug added to pud to try and check I'm even using the correct version of code...

@mkulke
Copy link
Collaborator Author

mkulke commented Nov 29, 2024

Sorry, I'm not too worried about the initdata message, just posting the full log. I think the unsupported user data provider bit and lack of daemon.json is the key point which suggests to me that the FileUserDataProvider isn't being created?

oh. yes, good catch. this means that /run/media/cidata/user-data does not exist, can you confirm?

It does:

# cat //run/media/cidata/user-data
#cloud-config

write_files:
  - path: /run/peerpod/agent-config.toml
    content: |
      server_addr = 'unix:///run/kata-containers/agent.sock'
      guest_components_procs = 'none'
      image_registry_auth = 'file:///run/peerpod/auth.json'
  - path: /run/peerpod/daemon.json
    content: |
      {

that's why I think it might be user error here. I'm doing a build with some debug added to pud to try and check I'm even using the correct version of code...

if it is present then it's maybe a race condition. Can you restart the process-user-data unit to see whether it picks up the file?

or compare the timestamps of the logs unit and the process-user-data unit?

@stevenhorsman
Copy link
Member

stevenhorsman commented Nov 29, 2024

Ok, so after rebuilding everything with my dodgy debug I have some more results:

  • Runs 1 & 2 worked and created the pod correctly:
[root@fedora ~]# ls /run/peerpod
agent-config.toml  auth.json  daemon.json  policy.rego
[root@fedora ~]# journalctl -u process-user-data
Nov 29 13:52:21 fedora systemd[1]: Starting process-user-data.service - Process user data...
Nov 29 13:52:21 fedora process-user-data[536]: Looking for /run/media/cidata/user-data, err: &v%!(EXTRA <nil>)2024/11/29 13:52:21 [userdata/provision] HasUserDataFile
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] provider: File, userDataPath: /run/media/cidata/user-data
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] Wrote /run/peerpod/agent-config.toml
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] Wrote /run/peerpod/daemon.json
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] Wrote /run/peerpod/auth.json
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] File /run/peerpod/initdata not found, skipped initdata processing.
Nov 29 13:52:21 fedora bash[564]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 13:52:22 fedora bash[560]: ERROR: Algorithm "sha256" expects a size of 32 bytes, got: 0
Nov 29 13:52:22 fedora bash[643]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 13:52:22 fedora bash[632]: ERROR: Algorithm "sha384" expects a size of 48 bytes, got: 0
Nov 29 13:52:22 fedora systemd[1]: Finished process-user-data.service - Process user data.
[root@fedora ~]# cat /run/media/cidata/user-data
#cloud-config

write_files:
  - path: /run/peerpod/agent-config.toml
    content: |
  • Run 3 failed and it looks like /run/media/cidata/user-data was completely missing there:
[root@fedora ~]# ls /run/peerpod
policy.rego
[root@fedora ~]# journalctl -u process-user-data
Nov 29 13:46:11 fedora systemd[1]: Starting process-user-data.service - Process user data...
Nov 29 13:46:11 fedora process-user-data[545]: Looking for /run/media/cidata/user-data, err: &v%!(EXTRA *fs.PathError=stat /run/media/cidata/user-data: no such file or directory)2024/1>
Nov 29 13:46:11 fedora process-user-data[545]: 2024/11/29 13:46:11 [userdata/provision] bunsupported user data provider, we extract and calculate initdata hash only.
Nov 29 13:46:11 fedora process-user-data[545]: 2024/11/29 13:46:11 [userdata/provision] File /run/peerpod/initdata not found, skipped initdata processing.
Nov 29 13:46:11 fedora bash[575]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 13:46:12 fedora bash[565]: ERROR: Algorithm "sha256" expects a size of 32 bytes, got: 0
Nov 29 13:46:12 fedora bash[645]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 13:46:12 fedora bash[639]: ERROR: Algorithm "sha384" expects a size of 48 bytes, got: 0
Nov 29 13:46:12 fedora systemd[1]: Finished process-user-data.service - Process user data.
[root@fedora ~]# ls /run/media/cidata/user-data
ls: cannot access '/run/media/cidata/user-data': No such file or directory

I retried ls /run/media/cidata/user-data for a few minutes until the vm got deleted and it never seemed to be written

I used exactly the same deployment and images for these runs, so I'm not sure why /run/media/cidata/user-data was written in two runs, but not the third, but this might explain why the CI tests pass sometimes and not others?

I tried a 4th and 5th time and they both works, so the failure doesn't seem "sticky" either (but might be in the CI due to name clashing or something over incomplete cleanup)

@mkulke
Copy link
Collaborator Author

mkulke commented Nov 29, 2024

Ok, so after rebuilding everything with my dodgy debug I have some more results:

  • Runs 1 & 2 worked and created the pod correctly:
[root@fedora ~]# ls /run/peerpod
agent-config.toml  auth.json  daemon.json  policy.rego
[root@fedora ~]# journalctl -u process-user-data
Nov 29 13:52:21 fedora systemd[1]: Starting process-user-data.service - Process user data...
Nov 29 13:52:21 fedora process-user-data[536]: Looking for /run/media/cidata/user-data, err: &v%!(EXTRA <nil>)2024/11/29 13:52:21 [userdata/provision] HasUserDataFile
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] provider: File, userDataPath: /run/media/cidata/user-data
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] Wrote /run/peerpod/agent-config.toml
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] Wrote /run/peerpod/daemon.json
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] Wrote /run/peerpod/auth.json
Nov 29 13:52:21 fedora process-user-data[536]: 2024/11/29 13:52:21 [userdata/provision] File /run/peerpod/initdata not found, skipped initdata processing.
Nov 29 13:52:21 fedora bash[564]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 13:52:22 fedora bash[560]: ERROR: Algorithm "sha256" expects a size of 32 bytes, got: 0
Nov 29 13:52:22 fedora bash[643]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 13:52:22 fedora bash[632]: ERROR: Algorithm "sha384" expects a size of 48 bytes, got: 0
Nov 29 13:52:22 fedora systemd[1]: Finished process-user-data.service - Process user data.
[root@fedora ~]# cat /run/media/cidata/user-data
#cloud-config

write_files:
  - path: /run/peerpod/agent-config.toml
    content: |
  • Run 3 failed and it looks like /run/media/cidata/user-data was completely missing there:
[root@fedora ~]# ls /run/peerpod
policy.rego
[root@fedora ~]# journalctl -u process-user-data
Nov 29 13:46:11 fedora systemd[1]: Starting process-user-data.service - Process user data...
Nov 29 13:46:11 fedora process-user-data[545]: Looking for /run/media/cidata/user-data, err: &v%!(EXTRA *fs.PathError=stat /run/media/cidata/user-data: no such file or directory)2024/1>
Nov 29 13:46:11 fedora process-user-data[545]: 2024/11/29 13:46:11 [userdata/provision] bunsupported user data provider, we extract and calculate initdata hash only.
Nov 29 13:46:11 fedora process-user-data[545]: 2024/11/29 13:46:11 [userdata/provision] File /run/peerpod/initdata not found, skipped initdata processing.
Nov 29 13:46:11 fedora bash[575]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 13:46:12 fedora bash[565]: ERROR: Algorithm "sha256" expects a size of 32 bytes, got: 0
Nov 29 13:46:12 fedora bash[645]: cat: /run/peerpod/initdata.digest: No such file or directory
Nov 29 13:46:12 fedora bash[639]: ERROR: Algorithm "sha384" expects a size of 48 bytes, got: 0
Nov 29 13:46:12 fedora systemd[1]: Finished process-user-data.service - Process user data.
[root@fedora ~]# ls /run/media/cidata/user-data
ls: cannot access '/run/media/cidata/user-data': No such file or directory

I retried ls /run/media/cidata/user-data for a few minutes until the vm got deleted and it never seemed to be written

I used exactly the same deployment and images for these runs, so I'm not sure why /run/media/cidata/user-data was written in two runs, but not the third, but this might explain why the CI tests pass sometimes and not others?

I tried a 4th and 5th time and they both works, so the failure doesn't seem "sticky" either (but might be in the CI due to name clashing or something over incomplete cleanup)

it looks like there's a (race) condition that will prevent the run-media-cidata.mount unit to start and process-user-data doesn't depend on it. I think there is only casual coupling between the mount unit and process-user-data.service. That wouldn't explain why it fails on azure, though, since there is no cidata iso device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants