Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease load on DB #2291

Merged
merged 4 commits into from
Nov 18, 2024
Merged

Decrease load on DB #2291

merged 4 commits into from
Nov 18, 2024

Conversation

jbearer
Copy link
Member

@jbearer jbearer commented Nov 15, 2024

We have seen that in some cases, when storage is slow or the node is way behind on its event stream, it can trigger many fetches of old proposals, which other nodes have all garbage collected. These fetches accumulate over time, leading to a worsening problem.

While debugging this, I also noticed that we don't "unfill" the leaf payload when storing anchor leaves, so DB performance gets worse when blocks are very large, because we are storing/loading these large payloads when we think we are only dealing with leaves.

This PR:

Limits the number of parallelism in proposal fetching. Instead of spawning a new task dynamically each time we need to fetch a proposal, we spawn a fixed number of worker tasks, each of which will only fetch one proposal at a time. A scanner task follows the event stream and detects when a proposal needs to be fetched, then broadcasts the request to fetch it to the worker tasks. It will be picked up when a worker is free.

Unfills leaf payloads before storing decided leaves.

We have seen that in some cases, when storage is slow or the node is
way behind on its event stream, it can trigger many fetches of old
proposals, which other nodes have all garbage collected. These fetches
accumulate over time, leading to a worsening problem.

This change limits the number of parallelism in proposal fetching.
Instead of spawning a new task dynamically each time we need to fetch
a proposal, we spawn a fixed number of worker tasks, each of which
will only fetch one proposal at a time. A scanner task follows the
event stream and detects when a proposal needs to be fetched, then
broadcasts the request to fetch it to the worker tasks. It will be
picked up when a worker is free.
@jbearer jbearer changed the title Limit number of simultaneous proposal fetches Decrease load on DB Nov 16, 2024

// If we fail fetching the proposal, don't let it clog up the fetching task. Just push
// it back onto the queue and move onto the next proposal.
sender.broadcast_direct((view, leaf)).await.ok();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this create a busy loop if the task is failing consistently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it kind of does. I was thinking we would be rate limited by the finite number of workers and time spent waiting on I/O for each failure. But I guess that's not guaranteed, depending on what the failure is. I'll add a sleep here

@jbearer jbearer merged commit 4f4efba into main Nov 18, 2024
18 checks passed
@jbearer jbearer deleted the jb/proposal-fetching-worker branch November 18, 2024 19:16
Copy link
Contributor

Backport failed for release-thehounds, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-thehounds
git worktree add -d .worktree/backport-2291-to-release-thehounds origin/release-thehounds
cd .worktree/backport-2291-to-release-thehounds
git switch --create backport-2291-to-release-thehounds
git cherry-pick -x 4f4efbaee341f679fa33bf676e3b0f90904ce295

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants