Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTxO-HD targeting main #1267

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

UTxO-HD targeting main #1267

wants to merge 13 commits into from

Conversation

jasagredo
Copy link
Contributor

@jasagredo jasagredo commented Sep 26, 2024

Description

The changes from UTxO-HD span over ouroboros-consensus, ouroboros-consensus-diffusion and ouroboros-consensus-cardano. The core change is:

  • The UTxO set is extracted from the LedgerState in the form of LedgerTables.
  • These tables are stored in the LedgerDB, which can keep them in memory or on disk.
  • When performing an action that requires UTxOs, we have to ask the LedgerDB for those. This might perform IO.

Here I will explain how I would review this enormous PR. Instead of listing files I will describe concepts, and my suggestion is to go look at the mentioned files (or search for the concepts) then mark the file as viewed to offload it from the brain.

The ledger tables

  • The first step would be to understand the concept of LedgerTables, see Ouroboros.Consensus.Ledger.Tables.* modules. The LedgerTables are parametrized by l (in the end it will be by blk) and by mk (or MapKinds). MapKinds are just types parametrized by the Key and Value of l. These will be TxIn|TxOut for unitary blocks and CanonicalTxIn|HardForkTxOut for hard fork blocks.
  • LedgerTables are barbies-like, see Ouroboros.Consensus.Ledger.Tables.Combinators.
  • LedgerTables are (most commonly) empty (EmptyMK), a (possibly restricted) UTxO set (ValuesMK), a set of TxIns (KeysMK), a sequence of differences (DiffMK) or a combination of values + diffs (TrackingMK). The only non-obvious one is DiffMK which is a map of sequences of changes to a value (in the UTxO case values don't change, they are created and destroyed, so there will be at most 2 elements there). On top of that there is a DiffSeqMK which is a fingertree of differences. Only used in V1 (see below).
  • The LedgerState is itself parametrized by this same mk. The data instances will then make use of that mk to define tables associated with the block. So the byron ledger state ignores it, the shelley ledger state has a new field with the tables and the hard fork ledger state will propagate the mk through the telescope, therefore having an mk of the particular state in the Telescope.
  • The LedgerTables can live on their own, which for unitary blocks don't make a difference, but for the Cardano Block, we go from an mk passed to the Telescope (therefore tables at the tip of the Telescope) to CardanoLedgerTables, in which each value is a HardForkTxOut. This cost is non-trivial and we only want to pay it when applying a new block/transaction.
  • LedgerTables can be extracted and injected into the ledger state via (un)stowLedgerTables.
  • The ledger tables of the Extended ledger state are the same as the ones form the LedgerState.
  • A very important bit that maybe was not clear above is that the HardForkBlock has no canonical tables because our definitions are not compositional for the HF block, only the CardanoBlock has "hard fork tables". See the constraints of HasHardForkLedgerTables.

Applying and ticking (Ouroboros.Consensus.Ledger.Abstract/Basics)

When ticking a block, some differences might be created, and no values are needed. So the types go from l EmptyMK to Ticked1 l DiffMK. This is the case at least in two moments: when going from Byron to Shelley (all values are created here) and when going from Shelley to Allegra (avvm addresses are deleted). See the relevant functions: translateLedgerStateByronToShelley and translateLedgerStateShelleyToAllegra.

When applying a block, we get the inputs needed (getBlockKeySets then read those from the LedgerDB), tick the ledger state without tables (possibly creating diffs), apply those diffs on the values from the LedgerDB, then call the ledger rules. We then diff the input and output tables to get a set of differences from applying a block, to which we will prepend the ones from ticking. See applyBlockResult and the Shelley functions for applying blocks.

The story with transactions is pretty similar.

The LedgerDB versions (Ouroboros.Consensus.Storage.LedgerDB)

There are two flavors of the LedgerDB, each one having two implementations:

  • V1 (Ouroboros.Consensus.Storage.LedgerDB.V1): we keep a sequence of EmptyMK ledger states and dump the values into a BackingStore. We can get back values from the backing store at any ledger state, by opening a BackingStoreValueHandle and reading from it. The BackingStore consists of a "complete" UTxO set at some anchor and then a sequence of differences. To get values at a given point we have to read the anchor, then reapply the differences up to the desired point. This is "wasteful" if done in memory (why keep diffs and have to reapply them every time if we can just apply them in place?) but it is useful on the on-disk implementation which puts the "complete" UTxO set on the disk, offloading it from memory. There are two implementations:
    • OnDisk: It uses LMDB underneath. See the Ouroboros.Consensus.Storage.LedgerDB.V1.BackingStore.Impl.LMDB.* modules.
    • InMemory: Not intended for real use. As mentioned above it is wasteful. It serves as a reference impl for the OnDisk implementation.
  • V2 (Ouroboros.Consensus.Storage.LedgerDB.V2): We keep a sequence of StateRefs, which are EmptyMK ledger states together with a tables handle from which we can read values monadically. This is very similar to the previous LedgerDB, in which we kept a sequence of (complete) LedgerStates. There are two implementations:
    • InMemory
    • LSM: still a WIP

Evaluating forks

In order to evaluate forks, we created the concept of Forkers, where each LedgerDB implementation has their own concept. They are just an abstract interface that allows to query for values and push differences that eventually can be dumped back into the LedgerDB (only by ChainSelection, others use ReadOnlyForkers). Note that they allocate resources so there is some juggling with ResourceRegistries there.

Ledger queries (Ouroboros.Consensus.Ledger.Query)

Some queries will have to look at the UTxO set, in particular GetUtxoByAddress, GetUtxoWhole and GetUtxoByTxin. We categorize them by the means of QueryFootprint. We will process each one of them differently.

Other queries use QFNoTables, GetUtxoByTxIn uses QFLookupTables and will have to read a single value from the tables, and GetUtxoWhole and GetUtxoByAddress use QFTraverseTables as they will have to scan the whole UTxO set.

For the HardForkBlock there is another class Ouroboros.Consensus.HardFork.Combinator.Ledger.Query.BlockSupportsHFLedgerQuery which has faster implementations than projecting the tables into the particular tip of the Telescope, because we can usually judge whether we want the result without upgrading the TxOut to the latest era.

In essence, queries are now monadic. Queries that don't look at the UTxO set are artificially monadic (just a pure of the already existing logic).

The mempool

The mempool in essence will have to acquire (read only) forkers on the LedgerDB at the tip, then read values for the incoming transactions and apply them. The returned diffs are appended to the ones in the mempool, which keeps a TrackingMK with the current values and past diffs.

When revalidating transactions we cannot know if the UTxO set changed so we will have to re-read the values from the (new) forker.

The internal state is now a TMVar because we need to acquire >> read tables >> update where read tables is in IO and the others are in STM.

The snapshots

We now store snapshots in a new format:

  • V1-OnDisk: a copy of the lmdb database and a (Haskell-CBOR) serialization of the LedgerState.
  • V*-InMemory: a (Haskell-CBOR) serialization of the UTxO set and a (Haskell-CBOR) serialization of the LedgerState.

Note that for V2 we can take snapshots at any time of the immutable tip, but for v1 we have to take flush some differences from the BackingStore into the anchor to advance it to the immutable tip.

This is abstracted by either implementation in Ouroboros.Consensus.Storage.LedgerDB.V*...tryTakeSnapshot

The forging loop

The forging loop didn't change much. Each iteration runs with a resource registry (to allocate the forkers). Then we use the forker to provide values for the mempool snapshot acquisition, in case of a revalidation.

Changes in Byron/Shelley/Cardano

The changes here are mostly fulfilling everything that was described above, to make all the types match. There are some specific things which are interesting to look at because they might be non-trivial:

  • Translation functions (with the two examples I already mentioned)
  • The TxIn|TxOut data instances, the LedgerState data instance and the HasLedgerTables instances
  • applyBlock for shelley. The cardano one is just the HFC one, which injects the CardanoTables into the tip of the Telescope (here is where we do the costly step, but it usually won't be that costly because the UTxO set for a block is small).
  • The Cardano.Ledger module which defines the CardanoTxIn and CardanoTxOut.

Other changes

The rest of the changes are mainly just following GHC adjusting the types here and there. Most other code doesn't use tables so an abstract mk or EmptyMK is used to make the kind well-formed.

@jasagredo jasagredo force-pushed the utxo-hd-main branch 2 times, most recently from b2d53b0 to 9395433 Compare October 8, 2024 12:59
@jasagredo jasagredo force-pushed the utxo-hd-main branch 6 times, most recently from eb278d3 to 6b427d2 Compare October 24, 2024 10:22
Copy link
Contributor Author

@jasagredo jasagredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass over the non-testing libraries.

@jasagredo jasagredo marked this pull request as ready for review October 24, 2024 13:29
@jasagredo jasagredo changed the title WIP: UTxO-HD targeting main UTxO-HD targeting main Oct 24, 2024
Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the result of my first pass on the Ouroboros.Consensus.Ledger.Tables.* modules.

Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments. This is all of the *Hard* files, except for Query.hs.

Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All *Query*hs files, except:

  • ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ChainDB/Impl/Query.hs
  • ouroboros-consensus/test/consensus-test/Test/Consensus/MiniProtocol/LocalStateQuery/Server.hs
  • ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Mempool/Query.hs

Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*LedgerDB* files, except I stopped when I got to the LMDB impl. I'll pick up there tomorrow.

Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous review was the *LedgerDB* files up to but excluding LMDB.

This review picks up there and stops before V2.

Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the LedgerDB*V2 files

Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the LedgerDB files after V2, ie the tests.

jasagredo and others added 12 commits December 10, 2024 14:50
This type for storing resources was reinventing the wheel: `quickcheck-dynamic`
already keep track of resources by storing a `Var` for each action result.

`IOSim` support for tests is also removed. It would be straightforward to revive
`IOSim` support in the future, if necessary.
* Rename `MockState` to `MockMonad`.
* Remove exception handler hoop jumping in `mBSClose` and `mBSVHClose`.
* Tag `ReadAfterWrite` and `RangeReadAfterWrite` only once per action sequence.
* Resolve some TODOs
I had initially decided it was best to replace uses of `ltcollapse` by some new
`ltfoldMap` function, but after some thinking it's best to instead use
`ltcollapse` as is, but removing the use of monoids there, which currently has
no effect.

The reason I think this is the right call is because ledger tables are currently
a single-constructor newtype, and they will be for at least a while. We do not
know what ledger tables will look like when we store more parts of the ledger
state, so let's cross that bridge when we get there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

4 participants