Load Balanced RPC redirecting to out of sync nodes #70

yohanelly95 · 2024-07-18T10:29:26Z

Validators using the load-balanced RPC were sometimes redirected to a faulty/out-of-sync RPC, which consistently returned a stale block number, leading to validator downtime. Restarting the validator node did not fix it. The out-of-sync RPC was found to be https://skale-node2.01node.com:10136/ , but it could change over time.

load-balanced RPC: https://mainnet.skalenodes.com/v1/turbulent-unique-scheat

The text was updated successfully, but these errors were encountered:

dmytrotkk · 2024-07-19T18:42:50Z

Thanks for opening this issue, @yohanelly95. We’ll look into it.

For context: we cannot check if a node is synced on each call, as it would add significant overhead to the Nginx proxy. Instead, we check block timestamps every three hours and remove out-of-sync endpoints from the rotation.

Here’s how it works: we check the block timestamps on all endpoints, identify the highest one, and compare it to the others. The maximum allowed slippage is 300 seconds (5 minutes). Given the average block frequency of 10.5 seconds per block (for Razor chain), this could result in an approximate 28-block outage.

skale-proxy/proxy/endpoints.py

Line 69 in 41b1e88

if is_node_out_of_sync(node['block_ts'], max_ts):

ALLOWED_TIMESTAMP_DIFF = 300

We will come back to you after additional checks on our side.

yohanelly95 · 2024-09-02T07:23:52Z

Thanks for opening this issue, @yohanelly95. We’ll look into it.

For context: we cannot check if a node is synced on each call, as it would add significant overhead to the Nginx proxy. Instead, we check block timestamps every three hours and remove out-of-sync endpoints from the rotation.

Here’s how it works: we check the block timestamps on all endpoints, identify the highest one, and compare it to the others. The maximum allowed slippage is 300 seconds (5 minutes). Given the average block frequency of 10.5 seconds per block (for Razor chain), this could result in an approximate 28-block outage.

skale-proxy/proxy/endpoints.py

Line 69 in 41b1e88

if is_node_out_of_sync(node['block_ts'], max_ts):
ALLOWED_TIMESTAMP_DIFF = 300
We will come back to you after additional checks on our side.

Hey @dmytrotkk! Are there any updates on how we can handle this

DmytroNazarenko added the support label Jul 19, 2024

DmytroNazarenko added this to SKALE Engineering 🚀 Jul 19, 2024

DmytroNazarenko assigned dmytrotkk Jul 19, 2024

DmytroNazarenko moved this to To Do in SKALE Engineering 🚀 Oct 14, 2024

PolinaKiporenko removed the status in SKALE Engineering 🚀 Nov 15, 2024

PolinaKiporenko closed this as completed Nov 25, 2024

github-project-automation bot moved this to Ready For Release Candidate in SKALE Engineering 🚀 Nov 25, 2024

PolinaKiporenko removed this from SKALE Engineering 🚀 Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load Balanced RPC redirecting to out of sync nodes #70

Load Balanced RPC redirecting to out of sync nodes #70

yohanelly95 commented Jul 18, 2024

dmytrotkk commented Jul 19, 2024

yohanelly95 commented Sep 2, 2024

Load Balanced RPC redirecting to out of sync nodes #70

Load Balanced RPC redirecting to out of sync nodes #70

Comments

yohanelly95 commented Jul 18, 2024

dmytrotkk commented Jul 19, 2024

yohanelly95 commented Sep 2, 2024