Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: run integration tests serially #1143

Conversation

james-garner-canonical
Copy link
Contributor

Description

Integration tests are incredibly flakey currently. Perhaps this is related to the parallel execution of the tests. Certainly the fact that the tests are randomly distributed across threads doesn't help with figuring out what works and what doesn't.

QA Steps

Run integration tests and see.

Integration tests are incredibly flakey currently. Perhaps this is
related to the parallel execution of the tests. Certainly the fact that
the tests are randomly distributed across threads doesn't help with
figuring out what works and what doesn't.
@james-garner-canonical
Copy link
Contributor Author

Thinking about this, I suspect the integration tests will hit the 150 minute limit and time out, since the -n auto version uses 4 workers and runs in just under and hour. A better approach will probably be to separate the integration tests out by file into 4+ separate tox environments, each running serially. This would get us deterministic execution, and potentially even a significant speedup in integration test run time depending how many environments we split across.

@james-garner-canonical
Copy link
Contributor Author

Woah, it actually completed. 73 minutes! There sure is a lot of parallelism overhead if 4 workers only gets us down to around 75% of that time (55 minutes).

Three failed tests:

FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files

test_deploy_bundle_with_overlay_as_argument and test_deploy_bundle_with_multiple_overlays_with_include_files were also both 100% failures across 50 runs of the tests with -n auto, but the third failure, test_deploy_bundle_with_storage_constraint, only failed 16% of the time -- weird.

Also of note is that there's a third test that would fail 100% of the time, but didn't fail here -- test_app_relation_destroy_block_until_done. So that test can pass when run serially, but always fails when run with pytest-xdist's automatic parallelization via -n auto.

@dimaqq
Copy link
Contributor

dimaqq commented Oct 8, 2024

This is clearly an improvement!

@james-garner-canonical
Copy link
Contributor Author

james-garner-canonical commented Oct 8, 2024

Having run integration tests twice on this PR shows that while this greatly reduces flakiness, we aren't quite at deterministic integration testing yet -- the two runs have two failed tests in common, but each have one additional test fail which passes in the other. I'm going to run tests again to collect additional data, and maybe create a second draft PR to parallelize exploring this.

First run on this PR

# https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31202956855
FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 4417.29s (1:13:37) =======

Second run

# https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31205481140
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charm_series_manifest
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 4314.72s (1:11:54) =======

The third run on this PR got stuck and failed due to timeout, so we've still got that to contend with

# https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31207774444
tests/integration/test_model.py::test_deploy_bundle_local_charms 
[gw0] [ 50%] PASSED tests/integration/test_model.py::test_deploy_bundle_local_charms 

First run from #1144

# https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31208051522
FAILED tests/integration/test_crossmodel.py::test_relate_with_offer - juju.er...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 4627.02s (1:17:07) =======

We have two very consistent failing tests, but weirdly all 3 completed runs have an extra test failure unique to them.

Common failures: test_deploy_bundle_with_overlay_as_argument, test_deploy_bundle_with_multiple_overlays_with_include_files.

Unique failures: test_deploy_bundle_with_storage_constraint, test_deploy_bundle_local_charm_series_manifest, test_relate_with_offer.

The second run on #1144 runs long and has multiple failures

# https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31210231133
FAILED tests/integration/test_crossmodel.py::test_relate_with_offer - juju.er...
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charm_series_manifest
FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint
FAILED tests/integration/test_model.py::test_deploy_local_charm - asyncio.exc...
FAILED tests/integration/test_model.py::test_wait_local_charm_blocked - async...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 7 failed, 107 passed, 37 skipped, 1 warning in 6981.15s (1:56:21) =======

From running the tests in my fork too, we have a first run

# https://github.com/james-garner-canonical/python-libjuju/actions/runs/11224761806/job/31202185686
FAILED tests/integration/test_crossmodel.py::test_relate_with_offer - juju.er...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 4 failed, 110 passed, 37 skipped, 1 warning in 4310.46s (1:11:50) =======

Featuring the two consistent failures and two of the unique ones.

And a second run

# https://github.com/james-garner-canonical/python-libjuju/actions/runs/11226791048/job/31210229626
FAILED tests/integration/test_crossmodel.py::test_relate_with_offer - juju.er...
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charms - asy...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint
FAILED tests/integration/test_model.py::test_deploy_local_charm - asyncio.exc...
FAILED tests/integration/test_model.py::test_wait_local_charm_waiting_timeout
FAILED tests/integration/test_model.py::test_deploy_bundle - requests.excepti...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multi_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 9 failed, 105 passed, 37 skipped, 1 warning in 6338.12s (1:45:38) =======

Which looks significantly worse ... the two consistent failures, two of the unique ones, and five more ... note also the significantly longer running time


https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31213287003

# https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31213287003
FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 4461.86s (1:14:21) =======

https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31212363093

# https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31212363093
FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 4474.78s (1:14:34) =======

https://github.com/james-garner-canonical/python-libjuju/actions/runs/11226791048/job/31213202448

# https://github.com/james-garner-canonical/python-libjuju/actions/runs/11226791048/job/31213202448
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charm_series_manifest
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 4683.80s (1:18:03) =======

https://github.com/james-garner-canonical/python-libjuju/actions/runs/11224761806/job/31213344497

# https://github.com/james-garner-canonical/python-libjuju/actions/runs/11224761806/job/31213344497
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charms - asy...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 4943.01s (1:22:23) =======

https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31219977035

# https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31219977035
FAILED tests/integration/test_application.py::test_action - AssertionError: m...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 5175.93s (1:26:15) =======

https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31219987411

# https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31219987411
FAILED tests/integration/test_crossmodel.py::test_relate_with_offer - juju.er...
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charms - asy...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 4 failed, 110 passed, 37 skipped, 1 warning in 4645.05s (1:17:25) =======

https://github.com/james-garner-canonical/python-libjuju/actions/runs/11226791048/job/31219991541

# https://github.com/james-garner-canonical/python-libjuju/actions/runs/11226791048/job/31219991541
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charm_series_manifest
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 5140.36s (1:25:40) =======

https://github.com/james-garner-canonical/python-libjuju/actions/runs/11224761806/job/31219993294

# https://github.com/james-garner-canonical/python-libjuju/actions/runs/11224761806/job/31219993294
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charms - asy...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 3 failed, 111 passed, 37 skipped, 1 warning in 4579.61s (1:16:19) =======

https://github.com/james-garner-canonical/python-libjuju/actions/runs/11224761806/job/31260868384

# https://github.com/james-garner-canonical/python-libjuju/actions/runs/11224761806/job/31260868384
FAILED tests/integration/test_crossmodel.py::test_relate_with_offer - juju.er...
FAILED tests/integration/test_model.py::test_deploy_bundle_local_charms - asy...
FAILED tests/integration/test_model.py::test_deploy_bundle_with_storage_constraint
FAILED tests/integration/test_model.py::test_deploy_bundle_with_overlay_as_argument
FAILED tests/integration/test_model.py::test_deploy_bundle_with_multiple_overlays_with_include_files
====== 5 failed, 109 passed, 37 skipped, 1 warning in 4873.29s (1:21:13) =======

@james-garner-canonical
Copy link
Contributor Author

james-garner-canonical commented Oct 8, 2024

Here are tables summarising the (so far) 15 runs of the serialised tests. I'll probably continue to edit the previous comment with output, and this comment to update the tables.

commit=501cc36b7a1da0bfc329894e71e478dba900dc28
n_jobs=15
n_failing_tests=12

path test # jobs % jobs
test_model.py test_deploy_bundle_with_multiple_overlays_with_include_files 15 100.00%
test_model.py test_deploy_bundle_with_overlay_as_argument 15 100.00%
test_model.py test_deploy_bundle_with_storage_constraint 7 46.67%
test_crossmodel.py test_relate_with_offer 6 40.00%
test_model.py test_deploy_bundle_local_charms 5 33.33%
test_model.py test_deploy_bundle_local_charm_series_manifest 4 26.67%
test_model.py test_deploy_local_charm 2 13.33%
test_application.py test_action 1 6.67%
test_model.py test_deploy_bundle 1 6.67%
test_model.py test_deploy_bundle_with_multi_overlay_as_argument 1 6.67%
test_model.py test_wait_local_charm_blocked 1 6.67%
test_model.py test_wait_local_charm_waiting_timeout 1 6.67%

How many failing tests does each job have?

# tests failing # jobs
3 **********
4 **
5 *
6
7 *
8
9 *

How many tests fail once, twice, etc?

# fails # tests
1 *****
2 *
4 *
5 *
6 *
7 *
15 **

@james-garner-canonical
Copy link
Contributor Author

In addition to tests still failing apparently at random, and tests sometimes failing to terminate, the integration test suite can also sometimes fail due to external causes

https://github.com/james-garner-canonical/python-libjuju/actions/runs/11226791048/job/31260866663
Quickly failed with

ERROR cannot deploy controller application: deploying charmhub controller charm: downloading charm "juju-controller" from origin {charm-hub charm 0xc000c9a368 3.4/stable amd64/ubuntu/22.04 }: cannot retrieve "https://api.charmhub.io/api/v1/charms/download/WV4pShb4jnG1eXAB8HMypujHRKRXMRW9_101.charm": cannot get archive: Get "https://canonical-bos01.cdn.snapcraftcontent.com/download-origin/canonical-lgw01/WV4pShb4jnG1eXAB8HMypujHRKRXMRW9_101.charm?token=1728435600_75b8b638a12f49d6f06bf7ebffbb9ad044e99563": read tcp 10.220.190.148:45804->91.189.91.42:443: read: connection reset by peer

https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31260861822
Quickly failed with

ERROR cannot deploy controller application: deploying charmhub controller charm: downloading charm "juju-controller" from origin {charm-hub charm 0xc000c88428 3.4/stable amd64/ubuntu/22.04 }: unexpected EOF

https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31260863660
Quickly failed with

ERROR cannot deploy controller application: deploying charmhub controller charm: downloading charm "juju-controller" from origin {charm-hub charm 0xc000b30068 3.4/stable amd64/ubuntu/22.04 }: unexpected EOF

@james-garner-canonical
Copy link
Contributor Author

Closing in favour of #1149

jujubot added a commit that referenced this pull request Oct 9, 2024
…ine-and-serialise

#1149

Tests in `integration/test_model.py` seem to be flaky even when run serially. All tests in `integration/test_crossmodel.py` are currently skipped, except one which used to be skipped, and is currently flaky even when run serially.

This PR:
* Serialises all integration tests following #1143
* Skips two tests from `test_model.py` that seem to always fail currently, whether run in serial or in parallel, following #1145
* Moves the flaky tests noted above into a separate job, so that the job running the remaining integration tests will hopefully have a shot at succeeding

As a bonus feature, this split of the tests into two runners with `-n 1` seems to be faster than the original method of running all the integration tests in a single runner with `-n auto` (which worked out to be 4 processes on github).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants