Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] Adding CI checks for toolbox in podman #10296

Closed
travier opened this issue May 10, 2021 · 69 comments
Closed

[RFE] Adding CI checks for toolbox in podman #10296

travier opened this issue May 10, 2021 · 69 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@travier
Copy link
Member

travier commented May 10, 2021

/kind feature

Description

containers/toolbox is a popular (in Fedora CoreOS & Fedora Silverblue) downstream user of podman and has been broken several times by podman updates. We have CI checks now in Fedora CoreOS and soon in toolbox itself but it would be even better if you could run the toolbox test suite on podman at least once before releases or directly in CI.

I'll be setting up toolbox CI on Fedora CoreOS soon so feel free to reach out.

Non-exhaustive list of issues:
#10274
#10288
coreos/fedora-coreos-tracker#734

@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 10, 2021
@mheon
Copy link
Member

mheon commented May 10, 2021

We do have a number of CI checks added by the Toolbox team previously, but I don't believe any of them use Toolbox itself, but instead test the ways Toolbox uses Podman. We are definitely not opposed to adding to this suite.

I think @baude was opposed to having Toolbox tests gate Podman changes, but I could be misremembering here.

@rhatdan
Copy link
Member

rhatdan commented May 10, 2021

Yes, we would be fine with gating our release, but would be reluctant on every PR, unless they are specific to Podman, like we currently have. This issue would be making sure toolbox breakage did not block Podman PRs being merged, which we have seen in CI systems in the past.

@travier
Copy link
Member Author

travier commented May 11, 2021

Agree that doing this for each PR would probably be too much & blocking on toolbox bugs for podman. Once just before release to catch latent bugs would be great.

@baude
Copy link
Member

baude commented May 11, 2021

I just want to reply for the record given @mheon comment. This has been an ongoing discussion with the TB team. We actually were not opposed to running their tests in our CI for every PR, but in the same breath, we also were not going to have the tests be blockers. I think most podman contributors who saw a busted test would give it a look-see and see about fixing it, but it wouldn't be required.

I think that offer is still quite reasonable. The TB team was supposed to provide us with their tests so that we could create a CI instance and run it. That offer is still open. Ideally they would integrate it themselves with a PR but we could also assist happily.

One last item we also agreed on as well. If we cannot figure out why a test is failing AND it lingers due to something like unavailibility, we would temporarily take the tests offline because we pay for our CI resources.

Bottom line for me ... lets get the tests running in our CI. If someone outside our team can do the work, then it can be done as soon as the tests are ready. If not, I'll prioritize amongst all the things we have on our plate (as in, it will get done as we catch up).

Thanks for taking this up!

@matthiasclasen
Copy link

Let me translate how 'we don't want this to be a blocker' reads from the other side:

We like to continue breaking your stuff.

That is very disappointing.

@garrett
Copy link

garrett commented May 12, 2021

I agree with @matthiasclasen.


I've been using Silverblue for a while now (almost 2 years) and it's generally fine, except for numerous Toolbox issues. Almost all of the Toolbox issues have come from Podman breaking things in version bumps.

Like most others using Silverblue, having Toolbox and/or Podman not working prevents me from actually doing work until I fix it, as everything I work on is inside of a Toolbox container for development. (Generally, the fix is to roll back to the previous deployment of Silverblue, apply a hotfix from bodhi as an override replace, and/or nuke all my containers and rebuild my dev environment — which I thankfully have scripted, but others don't.)

Some of these have happened in beta, but a worrying amount of them have hit released versions.

There definitely needs to be some CI somewhere that would prevent a broken podman+toolbox combo from making its way into Silverblue. (Ideally, it would not make it into normal Fedora Workstation/Server either, as people do rely on toolbox and podman working there as well, but it's usually not quite as critical as it is in Silverblue.)

It makes sense to try to catch these issues as early as possible, so having a "does it break toolbox?" test would be beneficial.

(We've also hit a handful of Podman stability issues in Cockpit-Podman as well, but that's a different, but somewhat related topic.)

@debarshiray
Copy link
Member

The consensus was:

  1. We will add unit tests to Podman's own upstream test suite that mimics the ways in which Toolbox uses Podman. These would get run on each pull request, be blocking, etc.. We have done that over time, and we need to keep adding more as the code evolves and bugs are found, etc..
  2. Get Toolbox's upstream test suite run as part of Podman's Fedora gating. This is the get Toolbox tests added to Podman part. @edsantiago was helping @HarryMichal with this, but I think it got stuck somewhere in the complexities of Fedora gating.

I think an important aspect here is that the latest regression involving the PidFile is about backwards compatibility with older Podman releases. This is generally a more difficult thing to test against, and something that affects Toolbox a lot because of the containers being treated as pets.

(There are tests in Podman that test upgrades, and in fact there was a test that could have caught it, but it was disabled because it wasn't obvious how to get it to work. Looks like that's getting fixed now.)

In Toolbox, so far, we have relied upon manual code review and testing to protect backwards compatibility. I think that we should devote a bit more attention to have better automated tests for upgrades in the future.

@mheon
Copy link
Member

mheon commented May 12, 2021 via email

@baude
Copy link
Member

baude commented May 12, 2021

@debarshiray you are exactly correct (and I stand corrected), it was gating tests that we targeted. And shy of the anomaly as @mheon describes, that shouild sufficiently satisfy both parties. Thank you for the clarification. And! The gating IS blocking, so that should satisfy folks here.

@edsantiago
Copy link
Member

I've dropped the ball on this too. I was waiting for my requirements. My requirements are:

  • an rpm package named toolbox-tests
  • that provides /usr/share/toolbox/test/system/ containing one or more *.bats files
  • and also all RPM Requires needed to run said tests (I don't know if this requirement is met, because)
  • Once installed, a user (root or rootless as needed) can run bats /usr/share/toolbox/test/system, while cd'ed to any random directory (not just the test directory), and all tests will pass.

As of this writing, toolbox-tests-0.0.99.1-1.fc34 does not meet the last requirement:

# bats /usr/share/toolbox/test/system/
bats: libs/bats-support/load does not exist [message repeated a total of 7 times]

   (from function `source' in file /usr/share/toolbox/test/system/001-version.bats, line 3,
    from function `bats_run_setup_file' in file /usr/libexec/bats-core/bats-exec-file, line 87,
    from function `main' in test file /usr/libexec/bats-core/bats-exec-file, line 258)
     `bats_run_setup_file' failed
...

This is beyond my ability to fix. Once it is fixed, please let me know and I will work to hook toolbox-tests into podman fedora gating. Thanks!

@HarryMichal
Copy link
Member

* [ ]  Once installed, a user (root or rootless as needed) can run `bats /usr/share/toolbox/test/system`, while cd'ed to any random directory (not just the test directory), and all tests will pass.

I just landed a fix for this. I'll discuss with @debarshiray making a new release with the fix included.

Still, I know the tests will not work locally, since in the tests we rely on some functionality that is backed by the exact testing environment (Zuul). I'll have to amend that to make the tests executable on other systems.

@HarryMichal
Copy link
Member

The tests should be fairly easy to run now since I merged containers/toolbox#774. I used the PR extensively on a FCOS host. So, after this I'll just update the test README and the Requires in the RPM package.

@github-actions
Copy link

github-actions bot commented Jul 1, 2021

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 1, 2021

@debarshiray @garrett What is going on with this?

@HarryMichal
Copy link
Member

Toolbox has a new release (v0.0.99.2) that fixes the tool-related problems @edsantiago mentioned.

The tests themselves are completely functional but still have a few downsides:

  • are destructive (run podman system reset with default settings)
  • depend on bats-support and bats-assert which are not packaged in Fedora (need to be set up manually)

The rpm is not ready due to missing Requires and the missing dependencies. It is being worked on.

@edsantiago
Copy link
Member

depend on bats-support and bats-assert which are not packaged in Fedora (need to be set up manually)

That's sort of a big hassle. I don't think podman gating tests should be in the business of managing and installing non-rpm content. At this point I consider this a showstopper.

I just installed toolbox-tests-0.0.99.2-1.fc35.x86_64 and am still seeing the same errors I reported on May 12, presumably due to this bats-* requirement.

@HarryMichal
Copy link
Member

depend on bats-support and bats-assert which are not packaged in Fedora (need to be set up manually)

That's sort of a big hassle. I don't think podman gating tests should be in the business of managing and installing non-rpm content. At this point I consider this a showstopper.

Agreed.

I just installed toolbox-tests-0.0.99.2-1.fc35.x86_64 and am still seeing the same errors I reported on May 12, presumably due to this bats-* requirement.

Right. This is related to the fact that the two libraries are not packaged, yet.

I remember you trying to run the tests some time ago and bumping into an issue with your current folder not being inside of the container causing the command to error out. That should be fixed now.

@edsantiago
Copy link
Member

@HarryMichal it looks like bats-support is licensed as Creative Commons, could you just include the necessary files in your -tests rpm?

@HarryMichal
Copy link
Member

@HarryMichal it looks like bats-support is licensed as Creative Commons, could you just include the necessary files in your -tests rpm?

Aaah, that should work. I'll do it that way then.

@HarryMichal
Copy link
Member

@edsantiago We've got a build in Fedora Rawhide that should contain all the pieces to make the system tests runnable. The build is already in stable so either wait for it to become available to you or get it from koji. The build is based on a snapshot of main, so any changes can be easily made.

@edsantiago
Copy link
Member

The build is already in stable

I'm not seeing that. I see .0.99.2-1.fc35 (from June) in stable. I downloaded the koji build you cited, but can't install:

  - nothing provides glibc >= 2.33.9000-54.fc35 needed by toolbox-0.0.99.2^1.git9820550c82bb-1.fc35.x86_64

@HarryMichal
Copy link
Member

By stable I meant available via dnf in Rawhide. I suspect you tried to install the package on Fedora 34?

@edsantiago
Copy link
Member

No, I was on Rawhide. I'm pretty sure it's not in stable nor updates-testing.

@HarryMichal
Copy link
Member

Well, the build is still very fresh. I'm merely reporting what Bodhi says. Curious about the glibc requirement, though. Will test it out.

@olivergs
Copy link

olivergs commented Aug 9, 2021

@edsantiago We have fixed the prroblems in the test suite and I've sent a new build to rawhide.

https://koji.fedoraproject.org/koji/taskinfo?taskID=73573768

I suppose it will be available in the next compose. Could you please check it works for you?

@edsantiago
Copy link
Member

tests pass as user fedora on rawhide. Tomorrow I'll look into what it will take to add this to the podman gating tests.

@edsantiago
Copy link
Member

I have the gating-test fixes necessary to run toolbox-tests as part of podman gating tests in rawhide. They pass using toolbox-0.0.99.2^3.git075b9a8d2779-1.fc35.x86_64 from the above koji link.

@containers/podman-maintainers where do we want to go with this? Are we really OK with running these tests as part of our rawhide gating?

@edsantiago
Copy link
Member

the diffs. I am not pushing to fedpkg, nor even submitting as a PR, because (1) the ^3 build is not yet in stable, and (2) we need discussion from the podman team.

@edsantiago
Copy link
Member

@olivergs @HarryMichal I believe we have buy-in to enable this in podman. I just did a final review, and hit a showstopper: as best I can see, you do not have a gating.yaml in your source tree: meaning, you do not have gating tests enabled or running on your package. Please enable gating tests on rawhide. Once we see those enabled and passing, we will proceed on our end.

@olivergs
Copy link

Hi @edsantiago. Ok. I'll work on that during the week and tell you when ready.

Thanks!

@edsantiago
Copy link
Member

Hi @olivergs, how goes your effort to set up toolbox gating tests?

Also, are you replicating these tests on f33 and f34? We have a podman release coming soon, and would really welcome feedback from the toolbox team. It's probably too late to integrate gating tests for this release, but if you have a way of running your tests on our bodhi (f33, f34) we would be most grateful.

@olivergs
Copy link

Hi Ed. As we talked about this yesterday, I'm giving the finishing touches to this. The tests are running but them fail. I'm currently trying to get the artifacts for the test to know what is failing. I will give more feedback when this is finished.

@olivergs
Copy link

Hi again. The gating in fedora is already working for fedora 35. Rawhide fails because there is not a f36 toolbox image yet due to this bug: https://pagure.io/fedora-infrastructure/issue/10145 and f34 needs a backport of containers/toolbox#857 .

@edsantiago You should be able to check it in f35 right now.

On the other hand, I tested latest podman locally from this build: https://koji.fedoraproject.org/koji/taskinfo?taskID=74625958 and executed the test suite with the latest upstream for toolbox. It passes without major incidents and the usual workflow for toolbox seems ok.

@edsantiago
Copy link
Member

Submitted PR for f35. Confirmed that it works in 1minutetip. Also confirmed hat you have an f35 bodhi and that tests pass.

@edsantiago
Copy link
Member

That PR is now merged. Toolbox gating tests are now enabled for podman on f35. But @containers/podman-maintainers I think we have a problem coming up.

For any given Fedora NN, toolbox seems to want to pull a toolbox image with the tag fNN. That means that as soon as fNN branches from Rawhide and Rawhide becomes fNN+1, podman rawhide bodhi will break because the toolbox team will not have a tagged image. We are seeing this right now with f36; and because of the bug mentioned above it may take months to actually get such a tagged image. That would presumably mean months of broken podman bodhis.

I kind of don't think this is acceptable. I would like to ask the toolbox team to make their process more robust, perhaps to pre-tag an fNN+1 image in advance (even if it's bogus), or to otherwise find a solution to this. But this isn't my call to make, and perhaps I'm making it out to be more serious than it is. I would appreciate further thoughts from @olivergs and/or other knowledgeable parties.

@travier
Copy link
Member Author

travier commented Sep 3, 2021

Toolbox now has support for basic config files that we can use to default to a working image on Rawhide and branched Fedora. We of course would need to bump this config from time to time but this rather easy to do.

@travier
Copy link
Member Author

travier commented Sep 3, 2021

And thanks for the work here!

@debarshiray
Copy link
Member

debarshiray commented Sep 7, 2021

podman rawhide bodhi will break because the toolbox team will
not have a tagged image.

Building a new image is a matter of doing a fedpkg container-build. It's pretty trivial, and various people (eg., @juhp) help out with it already. You could do that if you notice that Rawhide gating is missing an image after branching.

We are seeing this right now with f36;
and because of the bug mentioned above it may take months to
actually get such a tagged image. That would presumably mean
months of broken podman bodhis.

Umm... honest question: what makes you think that it will take months? I do see activity on the Fedora infrastructure ticket.

I kind of don't think this is acceptable. I would like to ask the
toolbox team to make their process more robust

I am not sure what the Toolbox developers can do if Fedora's OCI infrastructure is broken, other than talking to the Fedora infra' folks and trying to get it prioritized. Given that this is free software and community and all that, anybody else can help convey that message too. :)

@debarshiray
Copy link
Member

The issue with the Fedora infrastructure got fixed, and we have new image builds in place now.

@edsantiago
Copy link
Member

Thanks! I see gating tests failing. Please update here when those are fixed, we will then look into enabling this on podman rawhide.

@olivergs
Copy link

olivergs commented Sep 9, 2021

@edsantiago The image seems to be in stage yet. As soon as it reaches main regisitry we should be able to check.

On the other hand, the infra bug reappeared. I've reopened the issue and asked for fixing it again.

@juhp
Copy link
Contributor

juhp commented Sep 10, 2021

fedora-toolbox:36 should be in the stable fedora registry now.

@olivergs
Copy link

Thanks @juhp !. Yes. It is in stable and after re-running the CI, the gating passes.

https://osci-jenkins-1.ci.fedoraproject.org/job/fedora-ci/job/dist-git-pipeline/job/master/72523/

@olivergs
Copy link

@edsantiago The image seems to be in stage yet. As soon as it reaches main regisitry we should be able to check.

On the other hand, the infra bug reappeared. I've reopened the issue and asked for fixing it again.

The infra bug seemed to be just a node that did not updated correctly. Now seems everything works as expected.

@edsantiago
Copy link
Member

Do you have a link to a working bodhi? All I see is toolbox-0.0.99.2^3.git075b9a8d2779-6.fc36, which fails. I see the jenkins link you posted, but I don't see that actually being reflected in bodhi.

@olivergs
Copy link

I've rerun the build because the last time I just run the CI work.

Here you have a bodhi update: https://bodhi.fedoraproject.org/updates/FEDORA-2021-306c96fe51

@edsantiago
Copy link
Member

Thank you! I've submitted a PR for rawhide.

@olivergs
Copy link

Thank you @edsantiago !

@edsantiago
Copy link
Member

Toolbox gating tests are now running and passing in rawhide and in f35. Thank you everyone for your perseverance and patience.

@debarshiray
Copy link
Member

Yay! That's wonderful to hear. Thanks, @edsantiago

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests