Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate TxLimits #1175

Merged
merged 6 commits into from
Sep 4, 2024
Merged

Consolidate TxLimits #1175

merged 6 commits into from
Sep 4, 2024

Conversation

nfrisby
Copy link
Contributor

@nfrisby nfrisby commented Jul 2, 2024

Closes #1177.

Beyond that Issue, this PR also changes the mempool's definition of full. The mempool will accepte transactions until the tx chosen to next enter the mempool would cause any component of the capacity vector (byte size, exunit steps, exunit mems, ref script byte size) to exceed its limit. The default capacity is chosen as twice the capacity of a block. Such a capacity can admit more txs than could fit into two back-to-back blocks, but no single component (eg ExUnits) of the mempool's contents' size could exceed what could fit in two blocks.

The mempool capacity has two consequences.

  • It implements latency hiding: txs can cover the network a little in advance, so that two blocks on a chain whose slots are very close together could still both be full (if there are enough submitted txs).
  • It bounds the potential burst of CPU load necessary to fill an empty mempool, which otherwise could be a DoS vector.

Also closes #1168 by superseding the "ad-hoc" mempool reference scripts capacity limit.

This PR also moves the logic for choosing which prefix of the mempool to put in the block out of the forging functions and into the NodeKernel forging thread --- the mempool snapshot is the perfect place to do that, since tx measurements (each using the correct ledger state) are already determined.

@nfrisby
Copy link
Contributor Author

nfrisby commented Jul 8, 2024

Phew! I think it's compiling without warnings and I'm optimistic the tests will pass on CI.

But the commit history is a total mess, and some new comments are missing, at the very least.

@fraser-iohk
Copy link
Contributor

I also have some questions about what the TxMeasure types should look like, especially in the context of #1178, but they're probably better suited for discussion on a call

@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch 2 times, most recently from 6a9ec88 to fb968f8 Compare July 10, 2024 14:12
@nfrisby
Copy link
Contributor Author

nfrisby commented Jul 10, 2024

I rebased

@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch from fb968f8 to 489476f Compare July 10, 2024 14:36
@nfrisby
Copy link
Contributor Author

nfrisby commented Jul 10, 2024

The next step for this PR is to merge #1182. Eventually I'll rebase this one on top of that.

@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch 3 times, most recently from 282735a to cedd6f7 Compare July 11, 2024 16:33
@nfrisby
Copy link
Contributor Author

nfrisby commented Jul 11, 2024

I don't know if we want to merge this before the Chang fork, so this remains in Draft. I think that's the only reason, though. We need to consider the benchmark numbers.

  10000 transactions
    setup mempool:             FAIL (0.17s)
      156  μs ±  11 μs, 166 KB allocated, 132 B  copied, 6.0 MB peak memory, 179% more than baseline
      Use -p '$0=="Just adding.10000 transactions.setup mempool"' to rerun this test only.
    setup mempool + benchmark: OK (0.73s)
      23.1 ms ± 1.0 ms,  68 MB allocated, 4.0 MB copied,  11 MB peak memory, 19% more than baseline
    test:                      OK (0.02s)
    txs length:                OK
  20000 transactions
    setup mempool:             FAIL (0.19s)
      330  μs ±  27 μs, 327 KB allocated, 252 B  copied,  13 MB peak memory, 176% more than baseline
      Use -p '$0=="Just adding.20000 transactions.setup mempool"' to rerun this test only.
    setup mempool + benchmark: OK (0.13s)
      46.0 ms ± 4.5 ms, 139 MB allocated, 5.9 MB copied,  15 MB peak memory, 17% more than baseline
    test:                      OK (0.06s)
    txs length:                OK

I'm not surprised the benchmark itself is a bit slower: the arithmetic in the mempool are now routed through dictionaries, since the size (it's just ByteSize in this test) is now a polymorphic field in the finger tree measure instead of a strict monomorphic Word32.

I'm pretty sure the setup is ~3x slower just because I'm using foldMap at type ByteSize instead of sum $ fmap at a known type of Word32 --- the total size calculation is done during the setup. (foldMap' wins some of that back, but not all 🤔). Moreover, Word32 has more optimized instances, eg (in particular, the Generics-based InstantiatedAt stuff has not been performance tuned). Something to consider, if we see meaningful degradation in the real system.

@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch 2 times, most recently from 288ccbb to ccfc73b Compare July 11, 2024 18:35
@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch from 87f7f18 to a866dbb Compare July 12, 2024 23:46
@nfrisby
Copy link
Contributor Author

nfrisby commented Jul 12, 2024

OK. I added a few commits, after chatting with Esgen.

  • consensus-cardano: use ConwayMeasure in Babbage
  • TOSQUASH fixup overflow workarounds and their explanations
  • consensus: use Natural for ByteSize

The last one supersedes the middle one. The last's commit message already asserting the performance penalty is acceptable, but this deserves some due diligence.

I have a relatively simple idea for that: use (a trivial new) db-analyser to emit the (new) TxMeasure for two recent epochs from mainnet. Build a benchmark that simply adds and removes "transactions" that are nothing except one of those recorded measurements (in the same order as observed on mainnet). This can be done at the TxSeq level, and so won't require the menagerie of blk instances.

Such a benchmark will yield a characteristic cost (eg "total runtime / number of txs"), which I'm hoping/expecting will be quite smaller than the (comparably averaged) cost of fully validating a tx.

@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch 2 times, most recently from 9ed2a72 to 94de084 Compare July 12, 2024 23:54
@nfrisby
Copy link
Contributor Author

nfrisby commented Aug 26, 2024

I haven't yet reorganized the commits into a nice sequence yet, but review can commence --- I will ensure the final diff matches that of the current tip (ie fe778a3)

@nfrisby
Copy link
Contributor Author

nfrisby commented Aug 28, 2024

I'm now attempting to split the monolithic commit. I consider it to have the following components.

  • forging thread no longer splits the given list, see snapshotTake instead

  • refactor the TxLimits class and CanHardFork

  • removed the Babbage mempool ref scripts hotfix

  • remove the block size override BoundedMeasure

  • multi-dimensional tx sizes in the mempool itself, careful with overflow and deadlock DoS, and supplanting the txSize argument

I'm hoping I can tease out a separate commit for at least a few of those.

Edit: I pulled out a couple. Especially the one that removes Mempool.getTxSize is nice to isolate. The final commit is quite large still, but its changes are not easy to disentangle.

@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch from f15f603 to a7bd1c5 Compare August 28, 2024 18:07
Copy link
Member

@amesgen amesgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent 🎉

Only minor comments

@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch from a7bd1c5 to 88e8bfd Compare September 3, 2024 21:49
@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 3, 2024

Some collisions with ouroboros-network (they made SizeInBytes a newtype instead of a type synonym) during the rebase, but easy enough (disclaimer: I haven't ran the tests locally).

@nfrisby nfrisby enabled auto-merge September 3, 2024 21:50
@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch 3 times, most recently from 703c758 to 2f5f06c Compare September 3, 2024 22:42
@nfrisby nfrisby disabled auto-merge September 4, 2024 02:31
@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch from 2f5f06c to 6ad371d Compare September 4, 2024 02:32
@nfrisby
Copy link
Contributor Author

nfrisby commented Sep 4, 2024

The mempool-bench is failing.

Just adding
  10000 transactions
    setup mempool:             FAIL (0.25s)
      1.75 μs ± 153 ns, 6.2 KB allocated,   3 B  copied, 5.0 MB peak memory, 231% more than baseline
      Use -p '$0=="Just adding.10000 transactions.setup mempool"' to rerun this test only.
    setup mempool + benchmark: OK (0.37s)
      24.4 ms ± 741 μs,  68 MB allocated, 4.0 MB copied,  11 MB peak memory, 31% more than baseline
    test:                      OK (0.03s)
    cmds length:               OK
  20000 transactions
    setup mempool:             FAIL (0.28s)
      1.80 μs ±  99 ns, 6.4 KB allocated,   6 B  copied,  13 MB peak memory, 239% more than baseline
      Use -p '$0=="Just adding.20000 transactions.setup mempool"' to rerun this test only.
    setup mempool + benchmark: OK (0.15s)
      48.7 ms ± 3.0 ms, 141 MB allocated, 5.9 MB copied,  15 MB peak memory, 25% more than baseline
    test:                      OK (0.06s)
    cmds length:               OK

Joris helpfully shared the numbers from the latest baseline file:

Just adding
  10000 transactions
    setup mempool:             OK (0.16s)
      528  ns ±  42 ns, 2.7 KB allocated,   1 B  copied, 5.0 MB peak memory
    setup mempool + benchmark: OK (0.28s)
      18.5 ms ± 1.1 ms,  57 MB allocated, 3.4 MB copied, 9.0 MB peak memory
    test:                      OK (0.02s)
    cmds length:               OK
  20000 transactions
    setup mempool:             OK (0.19s)
      530  ns ±  46 ns, 2.7 KB allocated,   1 B  copied,  12 MB peak memory
    setup mempool + benchmark: OK (1.22s)
      38.6 ms ± 521 μs, 119 MB allocated, 7.6 MB copied,  24 MB peak memory
    test:                      OK (0.03s)
    cmds length:               OK

The failing case is the setup step, which increases from 500 nanoseconds to 1750 nanoseconds.

It is very plausible that the additional calculation in computeMempoolCapacity explains this: it use to be * 2 at known type Word32 and is now a call to some small amount of arithmetic and stimes involving a couple dictionaries. It's harmless for the behavior of real node (setup happens once per execution, and it's still microseconds).

So I'm merging despite these failures --- they are not required "checks" and this benchmark is the only FAIL within the checks.

BoundedMeasure is no longer necessary, as of PR
#1182.
The 'LedgerSupportsMempool.txInBlockSize' method suffices, and is already in
scope within the mempool itself.

Adding sizes to the `MempoolSnapshot` interface replaces the other use of
`getTxSize`, in the `NodeKernel`.
Transaction size, block capacity, and mempool capacity are multi-dimensional
vectors (incl ExUnits, etc), instead of merely bytes: see `TxMeasure`.
`TxLimits` has fittingly been promoted from the `Capacity` module to the proper
`SupportsMempool` module, cannibalizing some methds from the
`LedgerSupporstMempool` class.

A transaction cannot be added if it would push any component of the size over
that component's capacity.

The capacity override is still only specified in byte size, but the value is
interpreted as a block count (rounded up).

Enforce block capacity _before_ the forging logic.
Now the forging logic simply includes whatever transactions its given, which is
reasonable and simpler. It's the NodeKernel logic that uses the mempool's
finger tree in order to slice an appropriately-sized prefix, which is then
passed to the now-dumb forging function.

Explicit attention is given to overflow and the DoS vector of providing an
erroneously massive tx that would block the mempool's growth, were it not for
the new guards.

Also, anachronistically use ConwayMeasure in Babbage, since it should have been
counting ref script bytes, as Conway now does.

Many comments are improved and also updated for the new scheme.
@nfrisby nfrisby force-pushed the nfrisby/consolidate-txlimits branch from 6ad371d to 9c022f3 Compare September 4, 2024 13:40
@nfrisby nfrisby enabled auto-merge September 4, 2024 13:40
@nfrisby nfrisby added this pull request to the merge queue Sep 4, 2024
Merged via the queue into main with commit d6e35dc Sep 4, 2024
12 of 15 checks passed
@nfrisby nfrisby deleted the nfrisby/consolidate-txlimits branch September 4, 2024 15:26
github-merge-queue bot pushed a commit that referenced this pull request Sep 24, 2024
Closes #1226

In addition to the previously valid/invalid txs (purely based on the
UTxO ledger rules), we add an optional per-tx limit to the mock block.

As a second step, we generate very large txs that are larger than an
entire mempool, in order to test that we do *not* block when adding them
(just like the other txs), which is important as explained in #1226.

---

One way to validate this PR is to introduce a bug that would cause us to
block on such transactions, and observe that the test now indeed catches
that.

For example, retrying when the per-tx limited is not satisfied (this is
basically what happened in #1168 and was fixed by #1225) via
```diff
diff --git a/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Mempool/Update.hs b/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Mempool/Update.hs
index 372ea15c29..31abe25fbf 100644
--- a/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Mempool/Update.hs
+++ b/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Mempool/Update.hs
@@ -170,25 +170,8 @@ pureTryAddTx ::
   -> TryAddTx blk
 pureTryAddTx cfg wti tx is =
   case runExcept $ txMeasure cfg (isLedgerState is) tx of
-    Left err ->
-      -- The transaction does not have a valid measure (eg its ExUnits is
-      -- greater than what this ledger state allows for a single transaction).
-      --
-      -- It might seem simpler to remove the failure case from 'txMeasure' and
-      -- simply fully validate the tx before determining whether it'd fit in
-      -- the mempool; that way we could reject invalid txs ASAP. However, for a
-      -- valid tx, we'd pay that validation cost every time the node's
-      -- selection changed, even if the tx wouldn't fit. So it'd very much be
-      -- as if the mempool were effectively over capacity! What's worse, each
-      -- attempt would not be using 'extendVRPrevApplied'.
-      TryAddTx
-        Nothing
-        (MempoolTxRejected tx err)
-        (TraceMempoolRejectedTx
-         tx
-         err
-         (isMempoolSize is)
-        )
+    Left _err ->
+      NotEnoughSpaceLeft
     Right txsz
       -- Check for overflow
       --
```
or here
```diff
diff --git a/ouroboros-consensus/src/unstable-mock-block/Ouroboros/Consensus/Mock/Ledger/Block.hs b/ouroboros-consensus/src/unstable-mock-block/Ouroboros/Consensus/Mock/Ledger/Block.hs
index 743e11bc61..e01d8cfe5a 100644
--- a/ouroboros-consensus/src/unstable-mock-block/Ouroboros/Consensus/Mock/Ledger/Block.hs
+++ b/ouroboros-consensus/src/unstable-mock-block/Ouroboros/Consensus/Mock/Ledger/Block.hs
@@ -452,10 +452,7 @@ instance TxLimits (SimpleBlock c ext) where
   -- But not 'maxbound'!, since the mempool sometimes holds multiple blocks worth.
   blockCapacityTxMeasure _cfg _st = IgnoringOverflow simpleBlockCapacity

-  txMeasure cfg _st =
-        fmap IgnoringOverflow
-      . checkTxSize (simpleLedgerMockConfig cfg)
-      . simpleGenTx
+  txMeasure _cfg _st = pure . IgnoringOverflow . txSize . simpleGenTx

 simpleBlockCapacity :: ByteSize32
 simpleBlockCapacity = ByteSize32 512
```
will cause many test failures with `FailureDeadlock [Labelled (ThreadId
[]) (Just "main")]'` via `io-sim`'s nice deadlock detection.

---

Stacked on top of #1175 to avoid boring rebase work
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

Consolidate TxLimits
4 participants