Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve some falky tests and improve CI times #2401

Merged
merged 54 commits into from
Nov 18, 2024
Merged

Conversation

AurelienFT
Copy link
Contributor

@AurelienFT AurelienFT commented Oct 28, 2024

Linked Issues/PRs

Fix #2408
Fix #2407
Fix #2406
Fix #2351

Fix #2393
Fix #2394
Fix #2395

Description

This PR fix an issue in P2P heartbeat. The problem was that P2P heartbeat was updated only if new blocks were received or produced. This means that if we start the node from an existing db but doesn't produce blocks and not connect it to anyone it will send block height 0 to the peers that connects to him. We believe that this fix, resolves #2408 #2407 #2406 and #2351.

For #2394 we just increased the timeouts.
For #2393 we removed the panic in the test and just let p2p reconnect
For #2395 we launch this test using multi-threads mode of Tokio to follow the convention of all the others tests that launch a node using FuelCoreDriver. Also we added a kill of the driver to try to kill the node in a more graceful way in all of the test, it should fix a lot of flakyness in these tests

This PR also change the CI workflow by removing all docker related jobs and codecov job. These two set of jobs has been moved to separated workflow that are not triggered automatically but can be triggered manually on the "Actions" tab of this repository (after the merge of this PR).

The tests launched by the CI job now use nextest that allow us to add timeout for each test and provide more detailed output. The timeout is currently 5 min (and 8 for two really big tests) because we have tests that take a long time but we should lower it in the future.
The steps on the matrix are not cancelled anymore when one failed to allow possible other success and cache their success for a relaunch of the tests.

There is still more improve to do on our tests especially on timeout and rapid execution but this should improve a lot our workflow.

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

@AurelienFT AurelienFT added the no changelog Skip the CI check of the changelog modification label Oct 28, 2024
@AurelienFT AurelienFT self-assigned this Oct 28, 2024
@AurelienFT AurelienFT changed the title Test adding nextest Try resolve some falky tests and reduce timeout Oct 28, 2024
@AurelienFT AurelienFT changed the title Try resolve some falky tests and reduce timeout Resolve some falky tests and improve CI times Nov 12, 2024
Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failed. And the time required to do the dry run is enormously big. It shouldn't be so slow. We have a problem that we need to investigate; increasing the timeout for it will not help.

image

@AurelienFT
Copy link
Contributor Author

@xgreenx do we want to block all the improvements that are provided by this PR for this test or we can leave the issue related to this test open and still merge this ? Because even if there is still a problem on this test I think we improve a lot of things here.

…xecutor(it uses 1024).

Speed up state rewind test by using less blocks to re-run.
Speedup gas price test by using less maximum block height.
Speed up all tests by not creating all family colums in the RocksDB database. Related issue: facebook/rocksdb#5117
Fixed deploy large contract e2e test(before it was not tested).
Buffered dry runs in e2e tests to avoid resource exhaustion.
Use interval in e2e tests, to avoid issues with block broadcasting.
Also run p2p only once.
xgreenx
xgreenx previously approved these changes Nov 17, 2024
@AurelienFT
Copy link
Contributor Author

Ok boss 😂😂

@xgreenx xgreenx merged commit f9b1fe2 into master Nov 18, 2024
31 checks passed
@xgreenx xgreenx deleted the resolve_flaky_tests branch November 18, 2024 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog Skip the CI check of the changelog modification
Projects
None yet
4 participants