-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrease load on DB #2291
Decrease load on DB #2291
Conversation
We have seen that in some cases, when storage is slow or the node is way behind on its event stream, it can trigger many fetches of old proposals, which other nodes have all garbage collected. These fetches accumulate over time, leading to a worsening problem. This change limits the number of parallelism in proposal fetching. Instead of spawning a new task dynamically each time we need to fetch a proposal, we spawn a fixed number of worker tasks, each of which will only fetch one proposal at a time. A scanner task follows the event stream and detects when a proposal needs to be fetched, then broadcasts the request to fetch it to the worker tasks. It will be picked up when a worker is free.
|
||
// If we fail fetching the proposal, don't let it clog up the fetching task. Just push | ||
// it back onto the queue and move onto the next proposal. | ||
sender.broadcast_direct((view, leaf)).await.ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this create a busy loop if the task is failing consistently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it kind of does. I was thinking we would be rate limited by the finite number of workers and time spent waiting on I/O for each failure. But I guess that's not guaranteed, depending on what the failure is. I'll add a sleep here
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-thehounds
git worktree add -d .worktree/backport-2291-to-release-thehounds origin/release-thehounds
cd .worktree/backport-2291-to-release-thehounds
git switch --create backport-2291-to-release-thehounds
git cherry-pick -x 4f4efbaee341f679fa33bf676e3b0f90904ce295 |
We have seen that in some cases, when storage is slow or the node is way behind on its event stream, it can trigger many fetches of old proposals, which other nodes have all garbage collected. These fetches accumulate over time, leading to a worsening problem.
While debugging this, I also noticed that we don't "unfill" the leaf payload when storing anchor leaves, so DB performance gets worse when blocks are very large, because we are storing/loading these large payloads when we think we are only dealing with leaves.
This PR:
Limits the number of parallelism in proposal fetching. Instead of spawning a new task dynamically each time we need to fetch a proposal, we spawn a fixed number of worker tasks, each of which will only fetch one proposal at a time. A scanner task follows the event stream and detects when a proposal needs to be fetched, then broadcasts the request to fetch it to the worker tasks. It will be picked up when a worker is free.
Unfills leaf payloads before storing decided leaves.