-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Packit: Run cockpit storage tests in PRs #1161
ci: Packit: Run cockpit storage tests in PRs #1161
Conversation
Can one of the admins verify this patch? |
This comment was marked as resolved.
This comment was marked as resolved.
b96740b
to
1f1d698
Compare
So f38 is fine, rawhide failed because it seems to be uninstallable in the udisks-daily COPR. That may be part of the "too much churn in rawhide" issue from the description, or spot an actual bug. @vojtechtrefny , @tbzatek I'll leave it at that for now, and let you do a round of review and commenting, in particular of the points in the description. Thanks! |
@vojtechtrefny , @tbzatek : FYI, I'll be on PTO for two weeks from tomorrow on. Please ask @marusak and @jelly for questions around this. Thanks! |
1f1d698
to
b89a505
Compare
Interesting, I get a consistent LVM failure in F39 and rawhide, while F38 succeeds. Cockpit so far has run the udisks-daily COPR only on Fedora 38 + updates-testing, that's why we didn't see that previously. That might just be exactly the thing that I'm looking for with these tests. I'll reproduce/investigate tomorrow, maybe we can prevent a bug from landing in the next release :-) In the meantime this is blocked, moving to draft. This reproduces easily locally. I prepare a fedora-39 cockpit test VM, the test works. I run |
Indeed that regression is already reported as issue #1170 |
So this is actually exactly what I'd like to achieve -- with this in place, we would have noticed the regression right in PR #1158 which introduced it. @vojtechtrefny , @tbzatek : So I propose to block this PR until #1170 is fixed, so that you don't start with red statuses. But could we already discuss this approach in general? I have a bunch of questions in the description that we should agree on. Happy to schedule a gmeet as well! |
Ah, I was confused -- in fact, #1170 does affect Fedora 38 as well, but @mvollmer already added a "naughty pattern" (failure suppression) for it. I now applied that to F39/40 as well, so that they should succeed now. So this is once again unblocked. But it nicely demonstrates that we can avoid introducing such regression in the first place with this revdeps testing 😁 |
b89a505
to
c64e526
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you.
@tbzatek I discussed this mostly with @vojtechtrefny in slack yesterday (maybe you didn't see these messages? bridge may not work?). Please let me know if you have further questions, or doubts about the expectations laid out in the description. Thanks! |
Exactly, I believe this regression would have been caught right in the iscsi fixes pull request. On the other hand, the failure in the lvm2 module was not covered by udisks tests either but well, tests will never cover every line of code. At least not at the time of their creation, one is expecting some stuff would never fail... There's another bugreport that would have been caught the same way I believe: #1172 So running Cockpit tests is always a great asset. To be fair, I rarely look at our Jenkins nightly runs. I also tend to ignore the unstable tests in our PR runs. I do however check rawhide even though there's always some known issue, it is also the first place where regressions introduced by other packages or the kernel do show up first. Fixing tests in UDisks is often time consuming and not always failing test means a problem in the real world use. There are number of random failures caused by timing or race conditions (often on the tests side) or simply a slow system that can't keep up with uevent processing. In the past couple of years we've invested quite a bit of time to fix tests and various corner cases while the time spent on this may have been invested somewhere else. It's always about the tradeoff of quality vs. quantity. On an understaffed project like UDisks there's nobody else that would take care of such bugfixing however. I would say: let's try this and let's see what our habit of checking Cockpit tests will be.
That's fine, I have absolutely no problem with ignoring unknown errors :-)
Urgent PRs are quite an exception in UDisks, perhaps only shortly before upcoming release. Backports in Fedora and RHEL are always manual, not even tracking upstream releases. Upstream PRs can stay open for a while.
Rawhide is important. It lets us spot kernel changes that would soon get to RHEL. Block layer, device drivers, filesystem drives - there have been various changes that looked innocent from kernel POV but made a big difference in userspace and UDisks especially. Same thing with userspace package rebases - e.g. util-linux. Even c9s often gets ahead of rawhide and is a good candidate to the most bleeding edge distro. |
Jenkins, ok to test. |
The integration test results look rather sad, but I suppose these are unrelated? can they be retried? |
Welcome to our world. Actually the number of failures this time is rather low. These might be partially fixed (or silenced or removed) once #1168 and #1162 are merged. However tests are failing randomly and as I said above, not every test failure indicates an issue in the real world. It's also incredibly time consuming chasing every single test failure like these. |
So is anything still blocking this PR? |
Nope, let's go for it! |
🚀 yay! Please holler if you see any trouble with this. |
See https://cockpit-project.org/blog/tmt-cross-project-testing.html
I tested this against my fork in martinpitt#2 . That also demonstrates that this "just works" in forks without magic setup, which makes it easy to try out stuff and collaborate.
When done seriously, this is an ongoing commitment. In our recent years the Cockpit team has reported a couple dozen udisks bug reports found by our tests, some of them were actual regressions like these:
With this setup, we have a good chance of prevent landing such regressions.
This approach is fairly new at least for us in the Fedora world, so let's treat this as an experiment. We do need to talk about commitments and expectations, though. From our side, I propose:
We don't expect you to become experts in our tests. When they fail, it would be nice if you could have a quick look at the log and see if it's something obvious (for example, the screenshot shows an error message, or the journal shows an udisks crash). But in general, we expect that someone from our team will investigate and discuss that with you. The PR where that happened provides a nice place to collect notes.
For now, contacting us is manual, i.e. writing a comment like "hey @martinpitt @marusak @mvollmer please have a look at the cockpit failure here". I put that list into the test plan, I'm happy to add it anywhere else where it's easier for you to find. In the future this can hopefully be automated, see RFE: Notify on failure + tag github user(s) along with custom failure message packit/packit-service#1911
We don't expect you to block PRs on these tests, especially not if the testing farm has infrastructure problems . Each Monday morning they tend to run into provisioning errors. Just ignore these then -- we see them too in our projects, and will usually prod #testing-farm then.
We would like you to at least give us a chance on that workday to look at a failed test (not infra failure, a "real" one) before you land a PR, so that this all makes sense. If something is urgent and you quick-land, then at least we still retain the benefit of having a trace in which PR tests started to fail, but then the damage is possibly already done. Note that this isn't a big new commitment from our side: we already spend many hours every week looking at regressions, and it takes a lot more effort to track them down weeks after they happened. So this will acually reduce both our and your time spent on hunting down regressions, because the context (PR) is fresh and small, as opposed to "whatever changed in Fedora in the last 4 weeks".
I set this up for Fedora latest and rawhide for the time being. If you think that rawhide has too much churn, of course feel free to drop that from .packit.yaml, or I do it here right away when you want to start slow. We have enabled rawhide in cockpit projects a while ago, and it does often fail due to unrelated reasons. Improving this is pretty much exactly why I start this initiative, and at least knowing about problems is still better IMHO than entirely ignoring rawhide, but it can get annoying.
Please let me know about any other question that you may have. I'm happy to discuss them here or in a gmeet for higher bandwidth.
Thanks!