Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relayer can still crash upon startup due to bad RPCs #4887

Open
Tracked by #3321
tkporter opened this issue Nov 21, 2024 · 1 comment · May be fixed by #5115
Open
Tracked by #3321

Relayer can still crash upon startup due to bad RPCs #4887

tkporter opened this issue Nov 21, 2024 · 1 comment · May be fixed by #5115
Assignees
Labels

Comments

@tkporter
Copy link
Collaborator

tkporter commented Nov 21, 2024

Problem

Despite most relayer crashes being fixed (the ones to do with contract sync building here #4811), it's still possible to fail e.g. here

let mailboxes = settings
.build_mailboxes(settings.destination_chains.iter(), &core_metrics)
.await?;
let validator_announces = settings
.build_validator_announces(settings.origin_chains.iter(), &core_metrics)
.await?;

Some additional context:

  • build_mailboxes and build_validator_announces (but possibly other calls from the from_settings function as well) iterate the list of configured chains and if any error occurs (such as because of a flaky RPC) it gets propagated, crashing the relayer.
  • as more chains are relayed between, the chance of one of the RPCs being flaky increases
  • instead of crashing, the relayer should at least log an error
  • the relayer should also record a critical error (example here) for that chain if an error occurs, so we alert on it

Solution

We probably don't want to take down everything if one chain is having issues

@kamiyaa
Copy link
Collaborator

kamiyaa commented Jan 6, 2025

Talked with @daniel-savu about this

Takeaways:

  1. Ensure no unwrap()s in tasks the relayer spawns
  2. replace try_join_all with join_all perhaps. Such that 1 failed future does not affect all futures
  1. Unittesting panic!() s in all tasks spawned by relayer to ensure liveliness is not lost when a single task dies

@kamiyaa kamiyaa linked a pull request Jan 7, 2025 that will close this issue
@cmcewen cmcewen moved this from Sprint to In Review in Hyperlane Tasks Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Review
Development

Successfully merging a pull request may close this issue.

2 participants