Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests seem to be non-deterministic and regularly fail on main #1108

Open
james-garner-canonical opened this issue Sep 23, 2024 · 3 comments
Labels
kind/bug indicates a bug in the project kind/CI About CI infra or tests kind/doc indicates a documentation change kind/test invalid or failing tests cases, build errors due to a test case, any problems regarding testing. priority/high should be prioritized

Comments

@james-garner-canonical
Copy link
Contributor

james-garner-canonical commented Sep 23, 2024

EDIT: See my comment below for a table of current test failures on main.

Description

When trying to fix an issue with test failures run against PRs (e.g. #1088 identifies a breakage with /merge), I had multiple integration test failures from a relatively simple PR against main (#1106 pinning a dependency to the version before a recent release).


To troubleshoot this, I made a simpler PR against main (#1107 editing CONTRIBUTORS), which also has integration test failures.

Here is a table showing the number of times each test failured over 4 runs of the integration tests on these two 2 PRs.

test #1106 #1107
test_app_relation_destroy_block_until_done 4 4
test_deploy_bundle_with_multiple_overlays_with_include_files 4 4
test_deploy_bundle_with_overlay_as_argument 4 4
test_wait_for_idle_more_units_than_needed 3 3
test_upgrade_local_charm - juju... 2 4
test_wait_for_idle_with_not_enough_units 2 2
test_unit_annotations - asyncio.excep... 2 0
test_action - juju.errors.JujuA... 1 1
test_upgrade_local_charm_resource 0 2
test_attach_resource - asyncio.except... 0 1

It would be ideal if the tests were deterministic.

Short of fixing the tests themselves, it would be good if the current state of the tests was prominently documented in contributing guidelines -- which test failures are likely to just be the tests being flaky and shouldn't block a merge. A separate issue can then be opened against that list to fix the individual tests.

Urgency

Blocker for our release

Python-libjuju version

main?

Juju version

the version the github workflow is using (doesn't look like versioning for juju etc is printed during test setup)

Reproduce / Test

Run integration tests on main.

jujubot added a commit that referenced this issue Sep 23, 2024
#1106

#### Description

Pin python kubernetes version to fix recent breakage in jenkins tests.

The latest update to the python kubernetes library (v31, 3 days ago) breaks the Jenkins `github-check-merge-juju-python-libjuju` test due to failure to build a new dependency (durationpy).

I thought this might be the fix for issue #1088, but that's been open since August 9. Should we switch to stricter dependency versioning across the board here to avoid breakages of this nature? In setup.py, 4 dependencies now specify both minimum and maximum versions, 5 only specify a minimum, and 2 have no version specification. In tox.ini, only 1 dependency (kubernetes) specifies a (maximum) version. tox.ini should probably have the same version constraints as setup.py.


#### QA Steps

All tests pass, except for integration tests, which are flaky (see issue #1108).
@james-garner-canonical james-garner-canonical added kind/test invalid or failing tests cases, build errors due to a test case, any problems regarding testing. priority/normal normal priority kind/CI About CI infra or tests kind/bug indicates a bug in the project kind/doc indicates a documentation change labels Sep 23, 2024
jujubot added a commit that referenced this issue Sep 27, 2024
…remove-eol-schema

#1113

Remove `3.2.X` schemas. Delete `_client*.py` and run `make client`.

#### Description

Moving towards the goal of having schemas and generated code in python-libjuju only for the latest supported Juju versions (`3.1.9`, `3.3.6`, `3.5.4`, `3.5.3`), and using client only schemas (see #1099), this PR removes schemas for EOL Juju 3.2, and reruns code generations, removing `_client*.py` files and then running `make client`.


#### QA Steps

CI steps should all continue to pass, except for integration testing which should continue to fail with the usual suspects (see #1108 for a non-exhaustive table of tests that sometimes fail on `main`). 


#### Notes

To hopefully simplify the diffs, this is the first PR in a planned sequence of PRs that will depend on each other. Subsequent PRs will be:
1. replace current schemas with latest release client only schemas (`3.1.9`, `3.3.6`) and regenerate code
2. add client only schema for `3.4.5` and regenerate code
3. add client only schema for `3.5.3` and regenerate code
@james-garner-canonical james-garner-canonical added priority/high should be prioritized and removed priority/normal normal priority labels Oct 1, 2024
@dimaqq
Copy link
Contributor

dimaqq commented Oct 1, 2024

I'm running into a new frequently (or always) failing test:
FAILED tests/integration/test_charmhub.py::test_subordinate_false_field_exists

Maybe it deserves to be added to the list.

@james-garner-canonical
Copy link
Contributor Author

james-garner-canonical commented Oct 1, 2024

commit=50b42d013aee01536416e6334f99443f2b4f1e4c
n_jobs=50
n_failing_tests=26

path test # jobs % jobs
test_application.py test_app_relation_destroy_block_until_done 50 100.00%
test_model.py test_deploy_bundle_with_multiple_overlays_with_include_files 50 100.00%
test_model.py test_deploy_bundle_with_overlay_as_argument 50 100.00%
test_model.py test_wait_for_idle_more_units_than_needed 41 82.00%
test_charmhub.py test_subordinate_false_field_exists 14 28.00%
test_application.py test_app_destroy 11 22.00%
test_application.py test_upgrade_local_charm 11 22.00%
test_application.py test_upgrade_local_charm_resource 8 16.00%
test_crossmodel.py test_relate_with_offer 8 16.00%
test_model.py test_deploy_bundle_with_storage_constraint 8 16.00%
test_application.py test_app_remove_wait_flag 7 14.00%
test_application.py test_action 6 12.00%
test_machine.py test_machine_ssh 5 10.00%
test_model.py test_destroy_units 5 10.00%
test_model.py test_unit_annotations 4 8.00%
test_model.py test_wait_for_idle_with_not_enough_units 4 8.00%
test_model.py test_local_file_resource_charm 3 6.00%
test_secrets.py test_list_secrets 3 6.00%
test_model.py test_attach_resource 2 4.00%
test_model.py test_deploy_with_base 2 4.00%
test_model.py test_storage_pools_on_lxd 2 4.00%
test_controller.py test_secrets_backend_lifecycle 1 2.00%
test_model.py test_add_and_list_storage 1 2.00%
test_model.py test_add_manual_machine_ssh 1 2.00%
test_model.py test_add_manual_machine_ssh_root 1 2.00%
test_unit.py test_run 1 2.00%

How many failing tests does each job have?

# tests failing # jobs
3 *
4 ****
5 ************
6 ****************
7 **************
8 *
9 **

How many tests fail once, twice, etc?

# fails # tests
1 *****
2 ***
3 **
4 **
5 **
6 *
7 *
8 ***
11 **
14 *
41 *
50 ***

Previous tables of 30 jobs preserved below for reference:

commit=50b42d013aee01536416e6334f99443f2b4f1e4c
n_jobs=30
n_failing_tests=20

path test # jobs % jobs
test_application.py test_app_relation_destroy_block_until_done 30 100.00%
test_model.py test_deploy_bundle_with_multiple_overlays_with_include_files 30 100.00%
test_model.py test_deploy_bundle_with_overlay_as_argument 30 100.00%
test_model.py test_wait_for_idle_more_units_than_needed 24 80.00%
test_application.py test_app_destroy 8 26.67%
test_application.py test_upgrade_local_charm 8 26.67%
test_charmhub.py test_subordinate_false_field_exists 8 26.67%
test_application.py test_app_remove_wait_flag 6 20.00%
test_application.py test_upgrade_local_charm_resource 6 20.00%
test_application.py test_action 5 16.67%
test_model.py test_destroy_units 3 10.00%
test_model.py test_unit_annotations 3 10.00%
test_machine.py test_machine_ssh 2 6.67%
test_model.py test_attach_resource 2 6.67%
test_model.py test_wait_for_idle_with_not_enough_units 2 6.67%
test_model.py test_add_manual_machine_ssh 1 3.33%
test_model.py test_add_manual_machine_ssh_root 1 3.33%
test_model.py test_deploy_with_base 1 3.33%
test_model.py test_local_file_resource_charm 1 3.33%
test_unit.py test_run 1 3.33%

How many failing tests does each job have?

# tests failing # jobs
3 *
4 ***
5 *********
6 **********
7 *****
8 *
9 *

How many tests fail once, twice, etc?

# fails # tests
1 *****
2 ***
3 **
5 *
6 **
8 ***
24 *
30 ***

@james-garner-canonical
Copy link
Contributor Author

james-garner-canonical commented Oct 14, 2024

Running integration tests serially seems to have helped a lot, but we still get some intermittent failures. For example, quarantined integration tests even regularly pass, but just now we had this failure:

FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint

https://github.com/juju/python-libjuju/actions/runs/11319443801/job/31475510517?pr=1158

and in a second run:

FAILED tests/integration/test_model.py::test_deploy_local_bundle_include_base64

https://github.com/juju/python-libjuju/actions/runs/11319443801/job/31476578410

with the third run passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug indicates a bug in the project kind/CI About CI infra or tests kind/doc indicates a documentation change kind/test invalid or failing tests cases, build errors due to a test case, any problems regarding testing. priority/high should be prioritized
Projects
None yet
Development

No branches or pull requests

2 participants