-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ACTION NEEDED] Fix flaky integration tests at distribution level #1670
Comments
…oject#1472) (opensearch-project#1670) Signed-off-by: Kajetan Nobel <[email protected]> Signed-off-by: Kajetan Nobel <[email protected]> Co-authored-by: Stephen Crawford <[email protected]> Co-authored-by: Darshit Chanpura <[email protected]> Co-authored-by: Peter Nied <[email protected]> Co-authored-by: Peter Nied <[email protected]> (cherry picked from commit cfc83dd94eea02b5738bf607dd9866308814f2fc) Co-authored-by: jakubp-eliatra <[email protected]>
@RyanL1997 @ps48 Can you please provide your inputs? |
We're working on it, a while back I asked about the failures in opensearch-project/opensearch-build#4635, it doesn't look like the distribution failures are from our tests but somewhere in the pipeline as far as I can tell. I've marked our distribution issues with "help wanted" where the issue is applicable. |
It also looks like many of the manifests are still showing a |
Tagging @zelinh here to provide his inputs. |
Here are some reasons that it may show |
E.g. the 2.14 integration tests autocut, of the three most recent manifests at the time of writing, two of them are unavailable (most recent, second most recent (available), third most recent). |
I saw these in both of the unavailable runs. Seems like the process is terminated because of timeout limit when we run the integ tests for observabilityDashboards ; therefore it didn't run through all the test recording process.
https://build.ci.opensearch.org/job/integ-test-opensearch-dashboards/5856/consoleFull |
Hypothesis: The failing tests are flaky and the timeouts only happen if the tests pass (i.e. something later in the test suite is taking all the time). We only get the failure message when the earlier test fails and cuts the run short. Based on this hypothesis I made opensearch-project/opensearch-dashboards-functional-test#1250 to fix the flakiness, but I'm still not sure what's causing the timeouts. |
For completeness I've checked the recent pipeline logs after the flakiness fix was merged, and am not seeing any integ-test failures for observability. https://build.ci.opensearch.org/blue/rest/organizations/jenkins/pipelines/integ-test-opensearch-dashboards/runs/5899/log/?start=0 I can find the interruption exception, but not the indication of what specifically is being interrupted (is some test hanging?):
|
Tagging @rishabh6788 to look in to the above failure ^ |
Currently just held up by #1822 |
What is the bug?
It was observed in 2.13.0 and previous other releases that this component manually signed off on the release for failing integration tests. See opensearch-project/opensearch-build#4433 (comment)
The flakiness of the test runs take a lot of time from the release team to collect go/no-go decision and significantly lower the confidence in the release bundles.
How can one reproduce the bug?
Steps to reproduce the behavior:
What is the expected behavior?
Tests should be consistently passing.
Do you have any additional context?
Please note that this is a hard blocker for 2.14.0 release as per the discussion here
The text was updated successfully, but these errors were encountered: