[RFE] Adding CI checks for toolbox in podman #10296

travier · 2021-05-10T16:29:21Z

/kind feature

Description

containers/toolbox is a popular (in Fedora CoreOS & Fedora Silverblue) downstream user of podman and has been broken several times by podman updates. We have CI checks now in Fedora CoreOS and soon in toolbox itself but it would be even better if you could run the toolbox test suite on podman at least once before releases or directly in CI.

I'll be setting up toolbox CI on Fedora CoreOS soon so feel free to reach out.

Non-exhaustive list of issues:
#10274
#10288
coreos/fedora-coreos-tracker#734

mheon · 2021-05-10T16:57:04Z

We do have a number of CI checks added by the Toolbox team previously, but I don't believe any of them use Toolbox itself, but instead test the ways Toolbox uses Podman. We are definitely not opposed to adding to this suite.

I think @baude was opposed to having Toolbox tests gate Podman changes, but I could be misremembering here.

rhatdan · 2021-05-10T18:45:02Z

Yes, we would be fine with gating our release, but would be reluctant on every PR, unless they are specific to Podman, like we currently have. This issue would be making sure toolbox breakage did not block Podman PRs being merged, which we have seen in CI systems in the past.

travier · 2021-05-11T17:12:17Z

Agree that doing this for each PR would probably be too much & blocking on toolbox bugs for podman. Once just before release to catch latent bugs would be great.

baude · 2021-05-11T20:06:29Z

I just want to reply for the record given @mheon comment. This has been an ongoing discussion with the TB team. We actually were not opposed to running their tests in our CI for every PR, but in the same breath, we also were not going to have the tests be blockers. I think most podman contributors who saw a busted test would give it a look-see and see about fixing it, but it wouldn't be required.

I think that offer is still quite reasonable. The TB team was supposed to provide us with their tests so that we could create a CI instance and run it. That offer is still open. Ideally they would integrate it themselves with a PR but we could also assist happily.

One last item we also agreed on as well. If we cannot figure out why a test is failing AND it lingers due to something like unavailibility, we would temporarily take the tests offline because we pay for our CI resources.

Bottom line for me ... lets get the tests running in our CI. If someone outside our team can do the work, then it can be done as soon as the tests are ready. If not, I'll prioritize amongst all the things we have on our plate (as in, it will get done as we catch up).

Thanks for taking this up!

matthiasclasen · 2021-05-12T11:28:03Z

Let me translate how 'we don't want this to be a blocker' reads from the other side:

We like to continue breaking your stuff.

That is very disappointing.

garrett · 2021-05-12T12:08:37Z

I agree with @matthiasclasen.

I've been using Silverblue for a while now (almost 2 years) and it's generally fine, except for numerous Toolbox issues. Almost all of the Toolbox issues have come from Podman breaking things in version bumps.

Like most others using Silverblue, having Toolbox and/or Podman not working prevents me from actually doing work until I fix it, as everything I work on is inside of a Toolbox container for development. (Generally, the fix is to roll back to the previous deployment of Silverblue, apply a hotfix from bodhi as an override replace, and/or nuke all my containers and rebuild my dev environment — which I thankfully have scripted, but others don't.)

Some of these have happened in beta, but a worrying amount of them have hit released versions.

There definitely needs to be some CI somewhere that would prevent a broken podman+toolbox combo from making its way into Silverblue. (Ideally, it would not make it into normal Fedora Workstation/Server either, as people do rely on toolbox and podman working there as well, but it's usually not quite as critical as it is in Silverblue.)

It makes sense to try to catch these issues as early as possible, so having a "does it break toolbox?" test would be beneficial.

(We've also hit a handful of Podman stability issues in Cockpit-Podman as well, but that's a different, but somewhat related topic.)

debarshiray · 2021-05-12T14:45:42Z

The consensus was:

We will add unit tests to Podman's own upstream test suite that mimics the ways in which Toolbox uses Podman. These would get run on each pull request, be blocking, etc.. We have done that over time, and we need to keep adding more as the code evolves and bugs are found, etc..
Get Toolbox's upstream test suite run as part of Podman's Fedora gating. This is the get Toolbox tests added to Podman part. @edsantiago was helping @HarryMichal with this, but I think it got stuck somewhere in the complexities of Fedora gating.

I think an important aspect here is that the latest regression involving the PidFile is about backwards compatibility with older Podman releases. This is generally a more difficult thing to test against, and something that affects Toolbox a lot because of the containers being treated as pets.

(There are tests in Podman that test upgrades, and in fact there was a test that could have caught it, but it was disabled because it wasn't obvious how to get it to work. Looks like that's getting fixed now.)

In Toolbox, so far, we have relied upon manual code review and testing to protect backwards compatibility. I think that we should devote a bit more attention to have better automated tests for upgrades in the future.

mheon · 2021-05-12T14:58:58Z

Normally we work with the Toolbox team to ensure that each Podman release is tested and working before it reaches stable. This release was an early RC that slipped in through a series of errors. The normal process should have caught this one. This doesn’t mean we shouldn’t improve gating and upstream CI, but this RC slipping out was an anomaly and I don’t expect it to happen again - the process will improve to prevent it.

…

On Wed, May 12, 2021 at 10:46 Debarshi Ray ***@***.***> wrote: The consensus was: 1. We will add unit tests to Podman's own upstream test suite that mimics the ways in which Toolbox uses Podman. These would get run on each pull request, be blocking, etc.. We have done that over time, and we need to keep adding more as the code evolves and bugs are found, etc.. 2. Get Toolbox's upstream test suite run as part of Podman's Fedora gating. This is the *get Toolbox tests added to Podman* part. @edsantiago <https://github.com/edsantiago> was helping @HarryMichal <https://github.com/HarryMichal> with this, but I think it got stuck somewhere in the complexities of Fedora gating. I think an important aspect here is that the latest regression <#10274> involving the PidFile is about backwards compatibility with older Podman releases. This is generally a more difficult thing to test against, and something that affects Toolbox a lot because of the containers being treated as *pets*. (There are tests in Podman that test upgrades, and in fact there was a test that could have caught it, but it was disabled because it wasn't obvious how to get it to work. Looks <#10288> like that's getting fixed now.) In Toolbox, so far, we have relied upon manual code review and testing to protect backwards compatibility. I think that we should devote a bit more attention to have better automated tests for upgrades in the future. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10296 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCF4EB3OWTGE5FAKSFTTNKIDBANCNFSM44RT6J4Q> .

baude · 2021-05-12T15:49:31Z

@debarshiray you are exactly correct (and I stand corrected), it was gating tests that we targeted. And shy of the anomaly as @mheon describes, that shouild sufficiently satisfy both parties. Thank you for the clarification. And! The gating IS blocking, so that should satisfy folks here.

edsantiago · 2021-05-12T18:33:43Z

I've dropped the ball on this too. I was waiting for my requirements. My requirements are:

an rpm package named toolbox-tests
that provides /usr/share/toolbox/test/system/ containing one or more *.bats files
and also all RPM Requires needed to run said tests (I don't know if this requirement is met, because)
Once installed, a user (root or rootless as needed) can run bats /usr/share/toolbox/test/system, while cd'ed to any random directory (not just the test directory), and all tests will pass.

As of this writing, toolbox-tests-0.0.99.1-1.fc34 does not meet the last requirement:

# bats /usr/share/toolbox/test/system/
bats: libs/bats-support/load does not exist [message repeated a total of 7 times]
 ✗
   (from function `source' in file /usr/share/toolbox/test/system/001-version.bats, line 3,
    from function `bats_run_setup_file' in file /usr/libexec/bats-core/bats-exec-file, line 87,
    from function `main' in test file /usr/libexec/bats-core/bats-exec-file, line 258)
     `bats_run_setup_file' failed
...

This is beyond my ability to fix. Once it is fixed, please let me know and I will work to hook toolbox-tests into podman fedora gating. Thanks!

HarryMichal · 2021-05-24T15:35:36Z

* [ ]  Once installed, a user (root or rootless as needed) can run `bats /usr/share/toolbox/test/system`, while cd'ed to any random directory (not just the test directory), and all tests will pass.

I just landed a fix for this. I'll discuss with @debarshiray making a new release with the fix included.

Still, I know the tests will not work locally, since in the tests we rely on some functionality that is backed by the exact testing environment (Zuul). I'll have to amend that to make the tests executable on other systems.

HarryMichal · 2021-05-31T22:47:35Z

The tests should be fairly easy to run now since I merged containers/toolbox#774. I used the PR extensively on a FCOS host. So, after this I'll just update the test README and the Requires in the RPM package.

github-actions · 2021-07-01T00:08:13Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-07-01T09:26:38Z

@debarshiray @garrett What is going on with this?

HarryMichal · 2021-07-01T14:00:50Z

Toolbox has a new release (v0.0.99.2) that fixes the tool-related problems @edsantiago mentioned.

The tests themselves are completely functional but still have a few downsides:

are destructive (run podman system reset with default settings)
depend on bats-support and bats-assert which are not packaged in Fedora (need to be set up manually)

The rpm is not ready due to missing Requires and the missing dependencies. It is being worked on.

edsantiago · 2021-07-01T14:41:39Z

depend on bats-support and bats-assert which are not packaged in Fedora (need to be set up manually)

That's sort of a big hassle. I don't think podman gating tests should be in the business of managing and installing non-rpm content. At this point I consider this a showstopper.

I just installed toolbox-tests-0.0.99.2-1.fc35.x86_64 and am still seeing the same errors I reported on May 12, presumably due to this bats-* requirement.

HarryMichal · 2021-07-02T03:14:26Z

depend on bats-support and bats-assert which are not packaged in Fedora (need to be set up manually)

That's sort of a big hassle. I don't think podman gating tests should be in the business of managing and installing non-rpm content. At this point I consider this a showstopper.

Agreed.

I just installed toolbox-tests-0.0.99.2-1.fc35.x86_64 and am still seeing the same errors I reported on May 12, presumably due to this bats-* requirement.

Right. This is related to the fact that the two libraries are not packaged, yet.

I remember you trying to run the tests some time ago and bumping into an issue with your current folder not being inside of the container causing the command to error out. That should be fixed now.

edsantiago · 2021-07-06T18:38:45Z

@HarryMichal it looks like bats-support is licensed as Creative Commons, could you just include the necessary files in your -tests rpm?

HarryMichal · 2021-07-16T18:44:24Z

@HarryMichal it looks like bats-support is licensed as Creative Commons, could you just include the necessary files in your -tests rpm?

Aaah, that should work. I'll do it that way then.

HarryMichal · 2021-07-28T18:56:29Z

@edsantiago We've got a build in Fedora Rawhide that should contain all the pieces to make the system tests runnable. The build is already in stable so either wait for it to become available to you or get it from koji. The build is based on a snapshot of main, so any changes can be easily made.

edsantiago · 2021-07-28T21:01:25Z

The build is already in stable

I'm not seeing that. I see .0.99.2-1.fc35 (from June) in stable. I downloaded the koji build you cited, but can't install:

  - nothing provides glibc >= 2.33.9000-54.fc35 needed by toolbox-0.0.99.2^1.git9820550c82bb-1.fc35.x86_64

HarryMichal · 2021-07-28T21:17:59Z

By stable I meant available via dnf in Rawhide. I suspect you tried to install the package on Fedora 34?

edsantiago · 2021-07-28T21:20:30Z

No, I was on Rawhide. I'm pretty sure it's not in stable nor updates-testing.

HarryMichal · 2021-07-28T21:24:05Z

Well, the build is still very fresh. I'm merely reporting what Bodhi says. Curious about the glibc requirement, though. Will test it out.

olivergs · 2021-08-09T18:19:51Z

@edsantiago We have fixed the prroblems in the test suite and I've sent a new build to rawhide.

https://koji.fedoraproject.org/koji/taskinfo?taskID=73573768

I suppose it will be available in the next compose. Could you please check it works for you?

edsantiago · 2021-08-09T21:04:35Z

tests pass as user fedora on rawhide. Tomorrow I'll look into what it will take to add this to the podman gating tests.

edsantiago · 2021-08-10T20:31:00Z

I have the gating-test fixes necessary to run toolbox-tests as part of podman gating tests in rawhide. They pass using toolbox-0.0.99.2^3.git075b9a8d2779-1.fc35.x86_64 from the above koji link.

@containers/podman-maintainers where do we want to go with this? Are we really OK with running these tests as part of our rawhide gating?

edsantiago · 2021-08-10T20:38:35Z

the diffs. I am not pushing to fedpkg, nor even submitting as a PR, because (1) the ^3 build is not yet in stable, and (2) we need discussion from the podman team.

edsantiago · 2021-08-16T15:49:05Z

@olivergs @HarryMichal I believe we have buy-in to enable this in podman. I just did a final review, and hit a showstopper: as best I can see, you do not have a gating.yaml in your source tree: meaning, you do not have gating tests enabled or running on your package. Please enable gating tests on rawhide. Once we see those enabled and passing, we will proceed on our end.

olivergs · 2021-08-17T08:43:16Z

Hi @edsantiago. Ok. I'll work on that during the week and tell you when ready.

Thanks!

edsantiago · 2021-08-23T15:37:20Z

Hi @olivergs, how goes your effort to set up toolbox gating tests?

Also, are you replicating these tests on f33 and f34? We have a podman release coming soon, and would really welcome feedback from the toolbox team. It's probably too late to integrate gating tests for this release, but if you have a way of running your tests on our bodhi (f33, f34) we would be most grateful.

olivergs · 2021-08-24T09:02:47Z

Hi Ed. As we talked about this yesterday, I'm giving the finishing touches to this. The tests are running but them fail. I'm currently trying to get the artifacts for the test to know what is failing. I will give more feedback when this is finished.

olivergs · 2021-08-30T16:59:52Z

Hi again. The gating in fedora is already working for fedora 35. Rawhide fails because there is not a f36 toolbox image yet due to this bug: https://pagure.io/fedora-infrastructure/issue/10145 and f34 needs a backport of containers/toolbox#857 .

@edsantiago You should be able to check it in f35 right now.

On the other hand, I tested latest podman locally from this build: https://koji.fedoraproject.org/koji/taskinfo?taskID=74625958 and executed the test suite with the latest upstream for toolbox. It passes without major incidents and the usual workflow for toolbox seems ok.

edsantiago · 2021-08-31T19:14:55Z

Submitted PR for f35. Confirmed that it works in 1minutetip. Also confirmed hat you have an f35 bodhi and that tests pass.

edsantiago · 2021-09-01T00:49:25Z

That PR is now merged. Toolbox gating tests are now enabled for podman on f35. But @containers/podman-maintainers I think we have a problem coming up.

For any given Fedora NN, toolbox seems to want to pull a toolbox image with the tag fNN. That means that as soon as fNN branches from Rawhide and Rawhide becomes fNN+1, podman rawhide bodhi will break because the toolbox team will not have a tagged image. We are seeing this right now with f36; and because of the bug mentioned above it may take months to actually get such a tagged image. That would presumably mean months of broken podman bodhis.

I kind of don't think this is acceptable. I would like to ask the toolbox team to make their process more robust, perhaps to pre-tag an fNN+1 image in advance (even if it's bogus), or to otherwise find a solution to this. But this isn't my call to make, and perhaps I'm making it out to be more serious than it is. I would appreciate further thoughts from @olivergs and/or other knowledgeable parties.

travier · 2021-09-03T18:05:02Z

Toolbox now has support for basic config files that we can use to default to a working image on Rawhide and branched Fedora. We of course would need to bump this config from time to time but this rather easy to do.

travier · 2021-09-03T18:05:42Z

And thanks for the work here!

debarshiray · 2021-09-07T15:46:07Z

podman rawhide bodhi will break because the toolbox team will
not have a tagged image.

Building a new image is a matter of doing a fedpkg container-build. It's pretty trivial, and various people (eg., @juhp) help out with it already. You could do that if you notice that Rawhide gating is missing an image after branching.

We are seeing this right now with f36;
and because of the bug mentioned above it may take months to
actually get such a tagged image. That would presumably mean
months of broken podman bodhis.

Umm... honest question: what makes you think that it will take months? I do see activity on the Fedora infrastructure ticket.

I kind of don't think this is acceptable. I would like to ask the
toolbox team to make their process more robust

I am not sure what the Toolbox developers can do if Fedora's OCI infrastructure is broken, other than talking to the Fedora infra' folks and trying to get it prioritized. Given that this is free software and community and all that, anybody else can help convey that message too. :)

debarshiray · 2021-09-08T22:40:55Z

The issue with the Fedora infrastructure got fixed, and we have new image builds in place now.

edsantiago · 2021-09-09T11:48:08Z

Thanks! I see gating tests failing. Please update here when those are fixed, we will then look into enabling this on podman rawhide.

olivergs · 2021-09-09T11:50:20Z

@edsantiago The image seems to be in stage yet. As soon as it reaches main regisitry we should be able to check.

On the other hand, the infra bug reappeared. I've reopened the issue and asked for fixing it again.

juhp · 2021-09-10T02:53:07Z

fedora-toolbox:36 should be in the stable fedora registry now.

olivergs · 2021-09-10T08:56:23Z

Thanks @juhp !. Yes. It is in stable and after re-running the CI, the gating passes.

https://osci-jenkins-1.ci.fedoraproject.org/job/fedora-ci/job/dist-git-pipeline/job/master/72523/

olivergs · 2021-09-10T09:27:15Z

@edsantiago The image seems to be in stage yet. As soon as it reaches main regisitry we should be able to check.

On the other hand, the infra bug reappeared. I've reopened the issue and asked for fixing it again.

The infra bug seemed to be just a node that did not updated correctly. Now seems everything works as expected.

edsantiago · 2021-09-13T14:29:09Z

Do you have a link to a working bodhi? All I see is toolbox-0.0.99.2^3.git075b9a8d2779-6.fc36, which fails. I see the jenkins link you posted, but I don't see that actually being reflected in bodhi.

olivergs · 2021-09-13T15:37:59Z

I've rerun the build because the last time I just run the CI work.

Here you have a bodhi update: https://bodhi.fedoraproject.org/updates/FEDORA-2021-306c96fe51

edsantiago · 2021-09-13T15:49:33Z

Thank you! I've submitted a PR for rawhide.

olivergs · 2021-09-13T16:25:08Z

Thank you @edsantiago !

edsantiago · 2021-10-05T12:10:08Z

Toolbox gating tests are now running and passing in rawhide and in f35. Thank you everyone for your perseverance and patience.

debarshiray · 2021-10-05T13:21:52Z

Yay! That's wonderful to hear. Thanks, @edsantiago

openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 10, 2021

travier mentioned this issue May 10, 2021

[BUG] toolbox broken after 2:3.2.0-0.1.rc1.fc34 update fedora-silverblue/issue-tracker#166

Closed

travier mentioned this issue May 12, 2021

Add Fedora CoreOS to continuous integration tests containers/toolbox#714

Open

HarryMichal mentioned this issue May 31, 2021

Move image caching from Zuul playbook containers/toolbox#751

Closed

github-actions bot added the stale-issue label Jul 1, 2021

github-actions bot removed the stale-issue label Jul 2, 2021

HarryMichal mentioned this issue Aug 9, 2021

tests: Fixed tests to setup the XDG_RUNTIME_DIR variable when empty containers/toolbox#857

Merged

edsantiago closed this as completed Oct 5, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2023

[RFE] Adding CI checks for toolbox in podman #10296

[RFE] Adding CI checks for toolbox in podman #10296

Comments

travier commented May 10, 2021 • edited Loading

mheon commented May 10, 2021

rhatdan commented May 10, 2021

travier commented May 11, 2021

baude commented May 11, 2021

matthiasclasen commented May 12, 2021

garrett commented May 12, 2021 • edited Loading

debarshiray commented May 12, 2021

mheon commented May 12, 2021 via email

baude commented May 12, 2021

edsantiago commented May 12, 2021

HarryMichal commented May 24, 2021

HarryMichal commented May 31, 2021

github-actions bot commented Jul 1, 2021

rhatdan commented Jul 1, 2021

HarryMichal commented Jul 1, 2021

edsantiago commented Jul 1, 2021

HarryMichal commented Jul 2, 2021

edsantiago commented Jul 6, 2021

HarryMichal commented Jul 16, 2021

HarryMichal commented Jul 28, 2021

edsantiago commented Jul 28, 2021

HarryMichal commented Jul 28, 2021

edsantiago commented Jul 28, 2021

HarryMichal commented Jul 28, 2021

olivergs commented Aug 9, 2021

edsantiago commented Aug 9, 2021

edsantiago commented Aug 10, 2021

edsantiago commented Aug 10, 2021

edsantiago commented Aug 16, 2021

olivergs commented Aug 17, 2021

edsantiago commented Aug 23, 2021

olivergs commented Aug 24, 2021

olivergs commented Aug 30, 2021

edsantiago commented Aug 31, 2021

edsantiago commented Sep 1, 2021

travier commented Sep 3, 2021 • edited Loading

travier commented Sep 3, 2021

debarshiray commented Sep 7, 2021 • edited Loading

debarshiray commented Sep 8, 2021

edsantiago commented Sep 9, 2021

olivergs commented Sep 9, 2021

juhp commented Sep 10, 2021 • edited Loading

olivergs commented Sep 10, 2021

olivergs commented Sep 10, 2021

edsantiago commented Sep 13, 2021

olivergs commented Sep 13, 2021

edsantiago commented Sep 13, 2021

olivergs commented Sep 13, 2021

edsantiago commented Oct 5, 2021

debarshiray commented Oct 5, 2021

travier commented May 10, 2021 •

edited

Loading

garrett commented May 12, 2021 •

edited

Loading

travier commented Sep 3, 2021 •

edited

Loading

debarshiray commented Sep 7, 2021 •

edited

Loading

juhp commented Sep 10, 2021 •

edited

Loading