Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is memoization a plausible option for pulsers? #4636

Open
nfrisby opened this issue Sep 17, 2024 · 2 comments
Open

Is memoization a plausible option for pulsers? #4636

nfrisby opened this issue Sep 17, 2024 · 2 comments
Assignees
Labels
🤝 consensus Tickets that the fine folks over at ouroboros-consensus want to keep track of performance

Comments

@nfrisby
Copy link
Contributor

nfrisby commented Sep 17, 2024

The "obvious" approach to Issue IntersectMBO/ouroboros-consensus#1255 is for Consensus to call the tick function more often, instead of only when blocks arrive. There are some remaining subtleties (eg which state to forecast from in the ChainSync client and minting logic), but the main idea is that the ledger interface wouldn't need to change --- Consensus should just call it more often.

However, Consensus has "over-specified" its types somewhat, such that ledger states must evolve according to a strict tick -> apply -> tick -> apply -> ... schema. So we'd have to re-architect things a bit/dilute those types.

Also, various threads read the ledger state at various times, "on demand". For example, the ChainSync client doesn't necessarily use the latest ledger state when forecasting. If the cost of forecasting is improved by incremental ticking of the ledger state you're forecasting from, then the ChainSync clients would need to be using ledger states that have been incrementally ticked as the clock advances.

There are more trade-offs to consider here, since the current ChainSync client is not necessarily forecasting from the tip of its selection, eg if the peer has sent headers that branch off from a previous block. (Moreover, we anticipate changing it to forecast from the tip of the immutable selection, ie k blocks back. So maybe we'd need to also be ticking that ledger state?) The minting logic, on the other hand, does do its forecasts from the tip of the node's selection --- that's probably the one everyone immediately imagines, and it wouldn't be too bad to handle that as a special case.

Let me try to sketch the challenge another way. Suppose we do change Consensus so that it's constantly ticking and updating the ledger state at the tip of its selection, as the wall clock changes. This would improve latency in the minting logic (aka "the leadership check"), for example. If the next block extends our selection, then it would also improve latency in validating that next block. But suppose that instead the next block is a fork (even just one block deep). Then all of the ticking we did of our selection was for not: we weren't ticking the one-older ledger state, which is the one we need to use to validate the new block(s).


This Issue is to ask for the Ledger Team to consider an alternative approach to the incremental ticking that would avoid the above complaints/concerns on the Consensus side. I'll sketch the idea, and then you can shoot it down :D

  • When creating a pulser, also create an (unforced) stream of thunks that represent the pieces of work to be done.
  • The pulser merely chooses how many thunks to force.

The desired upshot is this: all of the existing Consensus code would not need to change. Consensus would merely add a new thread that wakes up every second and ticks the ledger state at the tip of the node's selection to the slot of the current wall clock. That would force the thunks, and then classic memoization would ensure that anytime the other threads tick and/or forecast any (recent) ledger state, it'll be fast since those thunks will have already been evaluated.


To summarize:

Q1) Will repeatedly forecasting (at most 3k/f slots) from some ledger state ever pay costs that would have been eliminated by incrementally ticking that ledger state before forecasting from it?

Q2) Do you have any openness to relying on memoization in the pulsers, so that repeatedly ticking/forecasting from some ledger state will only pay the pulser costs once?

Q3) Please don't ignore the short fork scenario. (This is a re-quest-ion, I suppose 😁.)

Let's convene to see where our options are at after you've replied to the above questions.

@nfrisby nfrisby added 🤝 consensus Tickets that the fine folks over at ouroboros-consensus want to keep track of performance labels Sep 17, 2024
@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 17, 2024

I'll assign to @lehins for now, but delegation is an option, of course.

@nfrisby
Copy link
Contributor Author

nfrisby commented Dec 18, 2024

Just had a nice chat with Alexey, including this Issue. Summary:

  • We both think it's appealing and that it seems plausible. Would require some reworking of today's pulsers.

  • For future reference: I had a misconception about how today's pulsers are scheduled. I had thought it was "by slot X, (X/deadline) of the overall work must have been completed". Instead it's "each time the TICK rule is run, we do the next 1.25/(expected number of blocks in epoch) portion of work". The latter is informed by the fact that the Consensus layer calls TICK once per block (on a single extending chain), and that 5% of the slots in an epoch are expected to have a block in them (though the Ledger assumes only 80% of them will succeed). Ultimately, the difference is whether a "missed" block leads to a extra work on the next block or whether all missed work is delayed until the final deadline.

  • My misunderstanding might be apparent in the Description of this Issue, but I think it's just a matter of mutatis mutandi --- no real obstruction.

But the actual scheduling mechanics do deserve some more thought, which might benefit the final design. EG Alexey shared the nice idea of sparking the pulser work, so that it doesn't slowdown the adoption/propagation of the block that incurred it (crucially, sparks don't get scheduled immediately). My original motivation for the memoization was somewhat orthogonal to that: I was thinking about utilizing the idle time between two blocks, for example. But -- with memoization --- if a block arrived later than its slot, its validation wouldn't force any unevaluated thunks. However, it might also arrive while the node is opportunistically forcing the next pulser chunk, which could delay the block's validation. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤝 consensus Tickets that the fine folks over at ouroboros-consensus want to keep track of performance
Projects
None yet
Development

No branches or pull requests

2 participants