Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fedora CoreOS to continuous integration tests #714

Open
debarshiray opened this issue Feb 25, 2021 · 19 comments
Open

Add Fedora CoreOS to continuous integration tests #714

debarshiray opened this issue Feb 25, 2021 · 19 comments
Labels
1. Feature request A request for a new feature 2. CI Automation of testing, analysis and other actions

Comments

@debarshiray
Copy link
Member

debarshiray commented Feb 25, 2021

We want Fedora CoreOS to be one of the primary supported platforms, just like Fedora Silverblue and Workstation or nowadays RHEL 8. Therefore, it would be good to run our continuous integration tests also on Fedora CoreOS hosts.

This will help avoid regressions like #656 and #712

@HarryMichal HarryMichal added 1. Feature request A request for a new feature 2. CI Automation of testing, analysis and other actions labels Feb 25, 2021
@HarryMichal
Copy link
Member

HarryMichal commented Feb 25, 2021

TL;DR, summary at the end

Toolbox uses for its CI Zuul hosted on SoftwareFactory. Just to give a bit of a background, the main reason why we use SF + Zuul is the fact they offer running tests on "native" Fedora hosts, not only containers as in the case of Travis/GitHub Actions/...

Also, worth noting could be the fact that the development of Zuul and SoftwareFactory is in very close contact with works on Fedora CI. If we can not come up with a solution, we could ask folks in the initiative for advice.

I asked a few days ago the folks behind SF and Zuul about the possibility to add support for Fedora CoreOS. The response was that it could be just a matter of finding an image and providing a configuration in SF using that image (https://softwarefactory-project.io/cgit/config/tree/nodepool/virt_images).

To me, this sounds quite feasible. The only obstacle I see (and it may be large) is in the fact that Zuul is built on top Ansible. And I don't know how well Ansible plays with CoreOS. And I don't mean this in the sense of running Ansible inside of CoreOS but in the sense of Ansible operating CoreOS (e.g. installing packages). From what I understand, Zuul/SF execute series of steps before and after running tests in the environment. Considering the fact that Zuul runs 100% of times on "classic" package-based systems, I cannot say that the same steps will work without any problems on CoreOS. But this is currently only a speculation on my part.

Jumping forward, let's say Zuul supports FCOS, how do we build Toolbox & execute the tests?

The easiest solution to me seems to be to use a container. And either we can use a pre-built one with all the dependencies already in place or just create a generic (Fedora?) container, install dependencies and build. Both solutions have their pros and cons but I don't see anything complicated here.

Running the tests will be a bit more "fun" thing to do :). For our system tests we use bats, which is a minimalist testing framework. To get it we can either clone with git or layer a rpm. If we choose to layer then there is also the choice between rebooting (which according to Zuul folks should be totally fine) and applying the changes live with rpm-ostree ex apply-live.

Summary:

  • Toolbox uses Zuul hosted on SoftwareFactory
  • We can use any image/label available in SF. FCOS needs to be added first.
  • Support for FCOS in Zuul is related to support of FCOS in Ansible
  • Zuul folks work with Fedora CI folks. Help is close.
  • To build Toolbox on FCOS and run the tests we'll need some special steps but nothing much to worry about.

@miabbott
Copy link

Support for FCOS in Zuul is related to support of FCOS in Ansible

This might be a non-starter for FCOS. We are actively trying to keep python out of FCOS and Ansible has a requirement on python.

See coreos/fedora-coreos-tracker#592 and coreos/fedora-coreos-tracker#578

That being said, it may be possible to layer in python via an Ignition config but there is a natural tension between managing configs on the host via Ansible and wanting to do it declaratively via Ignition.

I skimmed the docs about adding a diskimage and they seem very specific to traditional RHEL/Fedora style images. I'd be interested to hear from a SF/Zuul expert on this topic.

@HarryMichal
Copy link
Member

Support for FCOS in Zuul is related to support of FCOS in Ansible

This might be a non-starter for FCOS. We are actively trying to keep python out of FCOS and Ansible has a requirement on python.

See coreos/fedora-coreos-tracker#592 and coreos/fedora-coreos-tracker#578

That being said, it may be possible to layer in python via an Ignition config but there is a natural tension between managing configs on the host via Ansible and wanting to do it declaratively via Ignition.

I'm aware of the effort and respect it. Note the wording of "FCOS in Ansible", not "Ansible in FCOS" :). I also mention it in the longer part of the comment:

And I don't mean this in the sense of running Ansible inside of CoreOS but in the sense of Ansible operating CoreOS (e.g. installing packages). From what I understand, Zuul/SF execute series of steps before and after running tests in the environment. Considering the fact that Zuul runs 100% of times on "classic" package-based systems, I cannot say that the same steps will work without any problems on CoreOS. But this is currently only a speculation on my part.

But the lack of Python could still be a potential problem. But this is better to be discussed with Zuul folks.

I skimmed the docs about adding a diskimage and they seem very specific to traditional RHEL/Fedora style images. I'd be interested to hear from a SF/Zuul expert on this topic.

When I asked the folks about the images, I was asking in the context of adding FCOS and Ubuntu images. I didn't get a feeling from their response that they are against the idea.

Discussion from #softwarefactory on Freenode:

harrymichal	Hi folks! I've got a question regarding operating systems available in Zuul in SF. Would it be possible to add to the pool Fedora CoreOS and possibly Ubuntu?
		FCOS probably shouldn't be "much" of a problem but I suppose Ubuntu might not be included because SF only wants Fedora + CentOS ecosystem?

tristanC	harrymichal: hello, you can find the list of image, and how they are built in https://softwarefactory-project.io/cgit/config/tree/nodepool/virt_images
		harrymichal: basically, if there is a cloud qcow available, then we just need to virt-customize it to add the zuul ssh keys and some tools like git or rsync

harrymichal	tristanC: So, if I were to provide a cloud qcow for Ubuntu, you wouldn't be against adding it?

tristanC	harrymichal: i think that's ok, what is the use-case though? :-)

harrymichal	In the future, we want our tool to be "officially" supported on Ubuntu. The best way to do that is to test. We want to prevent CI duplication and just use Zuul to run our tests.
		tristanC: What powers Zuul? As in the machines. OpenStack?

tristanC	harrymichal: the zuul at softwarefactory-project.io is running on OpenStack instances provided by vexxhost, and the deployment is managed by zuul itself through this project: https://softwarefactory-project.io/cgit/software-factory/sf-infra/tree/README.md
		and the nodepool-builder service, that manage images update does use nested-kvm to enable virt-customize

harrymichal	tristanC: Thank you for the answer. I'm asking because Fedora CoreOS has several qcow images separated by different Cloud providers.
		I'm now wondering if using Fedora CoreOS will proceed without any problems. It is "a bit different" than traditional Fedora. Packages are not installed using dnf but layered on top of the base image using rpm-ostree. Hmm... We won't know until we try :).
		I'll try to submit the contribution before the end of the week.

tristanC	harrymichal: Zuul can uses different Cloud providers to run job workload, for example ansible/awx jobs are running aws
		harrymichal: so perhaps we could add a new resources providers for running those new coreos jobs
		harrymichal: when using config/nodepool/virt_images, we could add a new set of role to build the rpm-ostree image too, the images are just ansible playbook that needs to produce a qcow2, it doesn't have to be using virt-customize

harrymichal	tristanC: Ah, interesting. Didn't know that about Zuul. Cool!
		tristanC: One more question. Is it possible to restart the system used in a job during the job?

tristanC	harrymichal: yes that should be possible
		zuul doesn't mind if the job goes offline, it only wait for the ansible-playbook command exit code node* goes offline

harrymichal	Awesome!

@travier
Copy link
Member

travier commented Feb 25, 2021

We also have the option of building toolbox inside of a podman container on the FCOS VMs before running the tests on the VM itself. The best scenario for us would be to produce Ignition configs that perform the tests and report success directly as running Ansible might become tricky quickly.

@miabbott
Copy link

harrymichal: basically, if there is a cloud qcow available, then we just need to virt-customize it to add the zuul ssh keys and some tools like git or rsync

Seems like an early experiment would be to take the FCOS qcow and try using virt-customize to crack it open and drop some binaries on it.

@HarryMichal
Copy link
Member

@miabbott, I have no clue how to work with virt-customize. Would you be so kind and took care of this initial testing?

@cgwalters
Copy link
Collaborator

Mmm...what about the option of adding Prow and/or CoreOS CI to this repo? We now have added good support for nested virt to Prow.

Also tangentially related to this is coreos/fedora-coreos-config#862 (comment)

@HarryMichal
Copy link
Member

Mmm...what about the option of adding Prow and/or CoreOS CI to this repo? We now have added good support for nested virt to Prow.

Also tangentially related to this is coreos/fedora-coreos-config#862 (comment)

My way of thinking here is to make use of what Toolbox already has to reduce maintenance burden. Does it sound too complicated to add FCOS to the existing CI? If yes, then we can go in the direction of adding Prow/CoreOS CI.

@cgwalters
Copy link
Collaborator

cgwalters commented Mar 5, 2021

Considering the fact that Zuul runs 100% of times on "classic" package-based systems,

I suspect the real first problem is that Zuul's OpenStack focus assumes that the systems under test use cloud-init, not Ignition. (EDIT: To clarify, Ignition and FCOS support OpenStack, but it's common for systems talking to OpenStack to assume the guest uses cloud-init)

and applying the changes live with rpm-ostree ex apply-live.

This should be totally fine, though I'd actually just say to use rpm-ostree usroverlay and rpm -Uvh or even skip RPM entirely and just make install or rsync the binaries over.

Honestly though, I am not super concerned about this side of things - my instinct says that the /boot ro mount thing was unusual and not likely to reoccur. I think by far the biggest win is going to be the opposite direction i.e. gating FCOS on toolbox working.

Because what history says is far more likely to happen is e.g. a podman change breaks toolbox - and FCOS' CI is where we gate everything together before it ships to users. (And once Silverblue rebases on FCOS, we would achieve the important property of not shipping an ostree commit to users unless toobox works)

@debarshiray
Copy link
Member Author

what history says is far more likely to happen is e.g. a podman change
breaks toolbox

Yes, I agree. Historically most of the breakages have been Podman regressions. So any progress in that direction is welcome. We (mostly @HarryMichal) once tried to get Toolbox added to Podman's Fedora gating CI, but that ended up getting lost in the weeds.

we would achieve the important property of not shipping an ostree commit to
users unless toobox works

Yes, that would be awesome.

I filed this issue because I felt that there were some really frustrated CoreOS users out there who feel that Toolbox is always broken for them. Until a few months back, it was due to the rootful use-case. Now that sudo toolbox works, unfortunately, they got hit with #656 and #712

So, as part of treating CoreOS as a primary platform, I was looking for a way to avoid such things in the future. But ultimately it's up to you. :) If you are happy to only gate CoreOS images on Toolbox, then that's definitely fine by me. If you want to do something else, or do multiple things, then that's also fine with me.

I'll take anything that reduces the number of user-facing breakages as a win.

@travier
Copy link
Member

travier commented Mar 26, 2021

We now have tests in Fedora CoreOS CI but of course that does not covers changes here so this is still relevant. I'll have to take a look at the Zuul setup.

@HarryMichal
Copy link
Member

@travier We can take a look at it together if you want. Just let me know.

@travier
Copy link
Member

travier commented Apr 16, 2021

Current plan based on discussion with Zuul/SF maintainers/members:

  • Add Fedora CoreOS QCOW2 images for all streams (stable, testing, next) to SF nodepool
  • Write a Butane config to pass as OpenStack userdata with:
    • Zuul SSH key for those images
    • first boot script that installs Python 3 and other Ansible deps
    • disable auto-update / disables Zincati
    • make sure we are running cgroups v2 (no yet default on stable & testing)
    • force update and reboot the node
    • do all of that before sshd is started

Then we can create a new playbook to:

  • Run shell commands to run the job:
    • Build everything inside a container
    • Setup for testing
    • Testing
    • Status report?

@travier
Copy link
Member

travier commented May 12, 2021

See also discussion in containers/podman#10296

@debarshiray
Copy link
Member Author

See also discussion in containers/podman#10296

It got done. The Toolbox test suite is now run as part of Podman's downstream Fedora gating.

@debarshiray
Copy link
Member Author

Any updates on getting a Fedora CoreOS host added to the CI?

@travier
Copy link
Member

travier commented Dec 7, 2021

Sorry, I have not been able to get to this and other issues are keeping me busy right now. 😕

@HarryMichal
Copy link
Member

Sorry, I have not been able to get to this and other issues are keeping me busy right now. confused

Also dropped the ball on this.

@sumantro93
Copy link

I can help with this. Can someone please tell me what has been done until now? I can maybe drive this home

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Feature request A request for a new feature 2. CI Automation of testing, analysis and other actions
Projects
None yet
Development

No branches or pull requests

6 participants