Merge bitcoin#30126: cluster mempool: cluster linearization algorithm

647fa37 bench: add cluster linearization improvement benchmark (Pieter Wuille) 2854979 clusterlin: permit passing in existing linearization to Linearize (Pieter Wuille) 97d9871 clusterlin: add LinearizationChunking class (Pieter Wuille) d5918dc clusterlin: randomize the SearchCandidateFinder search order (Pieter Wuille) 991ff9a clusterlin: use bounded BFS exploration (optimization) (Pieter Wuille) d9b235e bench: Candidate finding and linearization benchmarks (Pieter Wuille) 46aad9b clusterlin: add Linearize function (Pieter Wuille) ee0ddfe clusterlin: add chunking algorithm (Pieter Wuille) 2a41f15 clusterlin: add SearchCandidateFinder class (Pieter Wuille) 4828079 clusterlin: add AncestorCandidateFinder class (Pieter Wuille) 58f7e01 tests: framework for testing DepGraph class (Pieter Wuille) a6e07e7 clusterlin: introduce cluster_linearize.h with Cluster and DepGraph types (Pieter Wuille) Pull request description: Part of cluster mempool: bitcoin#30289 This introduces low-level cluster linearization code, including tests and some benchmarks. It is currently not hooked up to anything. Ultimately, what this PR adds is a function `Linearize` which operates on instances of `DepGraph` (instances of which represent pre-processed transaction clusters) to produce and/or improve linearizations for that cluster. To provide assurance, the code heavily relies on fuzz tests. A novel approach is used here, where the fuzz input is parsed using the serialization.h framework rather than `FuzzedDataProvider`, with a custom serializer/deserializer for `DepGraph` objects. By including serialization, it's possible to ascertain that the format can represent every relevant cluster, as well as potentially permitting the construction of ad-hoc fuzz inputs from clusters (not included in this PR, but used during development). --- The `Linearize(depgraph, iteration_limit, rng_seed, old_linearization)` function is an implementation of the (single) LIMO algorithm, with the $S$ in every iteration found as the best out of (a) the best remaining ancestor set and (b) randomized computationally-bounded search. It incrementally builds up a linearization by finding good topologically-valid subsets to move to the front, in such a way that the resulting linearization has a diagram that is at least as good as the `old_linearization` passed in (if any). * Despite using both best ancestor set and search, this is not Double LIMO, as no intersections between these are involved; just the best of the two. * The `iteration_limit` and `rng_seed` only control the (b) randomized search. Even with 0 iterations, the result will be as good as the old linearization, and the included sets at every point will have a feerate at least as high as the best remaining ancestor set at that point. The search algorithm used in the (b) step is very basic, and largely matches Section 2.1 of [How to Linearize your Cluster.](https://delvingbitcoin.org/t/how-to-linearize-your-cluster/303#h-21-searching-6). See bitcoin#30286 for optimizations to make it more efficient. For background and references, see [Introduction to cluster linearization](https://delvingbitcoin.org/t/introduction-to-cluster-linearization/1032). ACKs for top commit: instagibbs: reACK 647fa37 glozow: reACK 647fa37, both code and mermaid diagram look correct to me sdaftuar: ACK 647fa37 Tree-SHA512: 52c8aa3d1d91190bf1265a947d2712e9d12f745313ffceef6ae7e3ff517d01d8b3b9b4ce6066298d59751c4ba90555a3c0171229868ba50100f588a2aa6a486d
AreaLayer · Jul 26, 2024 · 37bd70a · 37bd70a
2 parents ec700f0 + 647fa37
commit 37bd70a
Show file tree

Hide file tree

Showing 9 changed files with 2,142 additions and 0 deletions.
diff --git a/src/Makefile.am b/src/Makefile.am
@@ -132,6 +132,7 @@ BITCOIN_CORE_H = \
   chainparamsseeds.h \
   checkqueue.h \
   clientversion.h \
+  cluster_linearize.h \
   coins.h \
   common/args.h \
   common/bloom.h \

diff --git a/src/Makefile.bench.include b/src/Makefile.bench.include
@@ -25,6 +25,7 @@ bench_bench_bitcoin_SOURCES = \
   bench/checkblock.cpp \
   bench/checkblockindex.cpp \
   bench/checkqueue.cpp \
+  bench/cluster_linearize.cpp \
   bench/crypto_hash.cpp \
   bench/data.cpp \
   bench/data.h \

diff --git a/src/Makefile.test.include b/src/Makefile.test.include
@@ -83,6 +83,7 @@ BITCOIN_TESTS =\
   test/bloom_tests.cpp \
   test/bswap_tests.cpp \
   test/checkqueue_tests.cpp \
+  test/cluster_linearize_tests.cpp \
   test/coins_tests.cpp \
   test/coinstatsindex_tests.cpp \
   test/common_url_tests.cpp \
@@ -302,6 +303,7 @@ test_fuzz_fuzz_SOURCES = \
  test/fuzz/buffered_file.cpp \
  test/fuzz/chain.cpp \
  test/fuzz/checkqueue.cpp \
+ test/fuzz/cluster_linearize.cpp \
  test/fuzz/coins_view.cpp \
  test/fuzz/coinscache_sim.cpp \
  test/fuzz/connman.cpp \

diff --git a/src/Makefile.test_util.include b/src/Makefile.test_util.include
@@ -10,6 +10,7 @@ EXTRA_LIBRARIES += \
 TEST_UTIL_H = \
   test/util/blockfilter.h \
   test/util/chainstate.h \
+  test/util/cluster_linearize.h \
   test/util/coins.h \
   test/util/index.h \
   test/util/json.h \

diff --git a/src/bench/cluster_linearize.cpp b/src/bench/cluster_linearize.cpp
@@ -0,0 +1,214 @@
+// Copyright (c) The Bitcoin Core developers
+// Distributed under the MIT software license, see the accompanying
+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
+
+#include <bench/bench.h>
+
+#include <util/bitset.h>
+#include <cluster_linearize.h>
+
+using namespace cluster_linearize;
+
+namespace {
+
+/** Construct a linear graph. These are pessimal for AncestorCandidateFinder, as they maximize
+ *  the number of ancestor set feerate updates. The best ancestor set is always the topmost
+ *  remaining transaction, whose removal requires updating all remaining transactions' ancestor
+ *  set feerates. */
+template<typename SetType>
+DepGraph<SetType> MakeLinearGraph(ClusterIndex ntx)
+{
+    DepGraph<SetType> depgraph;
+    for (ClusterIndex i = 0; i < ntx; ++i) {
+        depgraph.AddTransaction({-int32_t(i), 1});
+        if (i > 0) depgraph.AddDependency(i - 1, i);
+    }
+    return depgraph;
+}
+
+/** Construct a wide graph (one root, with N-1 children that are otherwise unrelated, with
+ *  increasing feerates). These graphs are pessimal for the LIMO step in Linearize, because
+ *  rechunking is needed after every candidate (the last transaction gets picked every time).
+ */
+template<typename SetType>
+DepGraph<SetType> MakeWideGraph(ClusterIndex ntx)
+{
+    DepGraph<SetType> depgraph;
+    for (ClusterIndex i = 0; i < ntx; ++i) {
+        depgraph.AddTransaction({int32_t(i) + 1, 1});
+        if (i > 0) depgraph.AddDependency(0, i);
+    }
+    return depgraph;
+}
+
+// Construct a difficult graph. These need at least sqrt(2^(n-1)) iterations in the best
+// known algorithms (purely empirically determined).
+template<typename SetType>
+DepGraph<SetType> MakeHardGraph(ClusterIndex ntx)
+{
+    DepGraph<SetType> depgraph;
+    for (ClusterIndex i = 0; i < ntx; ++i) {
+        if (ntx & 1) {
+            // Odd cluster size.
+            //
+            // Mermaid diagram code for the resulting cluster for 11 transactions:
+            // ```mermaid
+            // graph BT
+            // T0["T0: 1/2"];T1["T1: 14/2"];T2["T2: 6/1"];T3["T3: 5/1"];T4["T4: 7/1"];
+            // T5["T5: 5/1"];T6["T6: 7/1"];T7["T7: 5/1"];T8["T8: 7/1"];T9["T9: 5/1"];
+            // T10["T10: 7/1"];
+            // T1-->T0;T1-->T2;T3-->T2;T4-->T3;T4-->T5;T6-->T5;T4-->T7;T8-->T7;T4-->T9;T10-->T9;
+            // ```
+            if (i == 0) {
+                depgraph.AddTransaction({1, 2});
+            } else if (i == 1) {
+                depgraph.AddTransaction({14, 2});
+                depgraph.AddDependency(0, 1);
+            } else if (i == 2) {
+                depgraph.AddTransaction({6, 1});
+                depgraph.AddDependency(2, 1);
+            } else if (i == 3) {
+                depgraph.AddTransaction({5, 1});
+                depgraph.AddDependency(2, 3);
+            } else if ((i & 1) == 0) {
+                depgraph.AddTransaction({7, 1});
+                depgraph.AddDependency(i - 1, i);
+            } else {
+                depgraph.AddTransaction({5, 1});
+                depgraph.AddDependency(i, 4);
+            }
+        } else {
+            // Even cluster size.
+            //
+            // Mermaid diagram code for the resulting cluster for 10 transactions:
+            // ```mermaid
+            // graph BT
+            // T0["T0: 1"];T1["T1: 3"];T2["T2: 1"];T3["T3: 4"];T4["T4: 0"];T5["T5: 4"];T6["T6: 0"];
+            // T7["T7: 4"];T8["T8: 0"];T9["T9: 4"];
+            // T1-->T0;T2-->T0;T3-->T2;T3-->T4;T5-->T4;T3-->T6;T7-->T6;T3-->T8;T9-->T8;
+            // ```
+            if (i == 0) {
+                depgraph.AddTransaction({1, 1});
+            } else if (i == 1) {
+                depgraph.AddTransaction({3, 1});
+                depgraph.AddDependency(0, 1);
+            } else if (i == 2) {
+                depgraph.AddTransaction({1, 1});
+                depgraph.AddDependency(0, 2);
+            } else if (i & 1) {
+                depgraph.AddTransaction({4, 1});
+                depgraph.AddDependency(i - 1, i);
+            } else {
+                depgraph.AddTransaction({0, 1});
+                depgraph.AddDependency(i, 3);
+            }
+        }
+    }
+    return depgraph;
+}
+
+/** Benchmark that does search-based candidate finding with 10000 iterations.
+ *
+ * Its goal is measuring how much time every additional search iteration in linearization costs.
+ */
+template<typename SetType>
+void BenchLinearizePerIterWorstCase(ClusterIndex ntx, benchmark::Bench& bench)
+{
+    const auto depgraph = MakeHardGraph<SetType>(ntx);
+    const auto iter_limit = std::min<uint64_t>(10000, uint64_t{1} << (ntx / 2 - 1));
+    uint64_t rng_seed = 0;
+    bench.batch(iter_limit).unit("iters").run([&] {
+        SearchCandidateFinder finder(depgraph, rng_seed++);
+        auto [candidate, iters_performed] = finder.FindCandidateSet(iter_limit, {});
+        assert(iters_performed == iter_limit);
+    });
+}
+
+/** Benchmark for linearization improvement of a trivial linear graph using just ancestor sort.
+ *
+ * Its goal is measuring how much time linearization may take without any search iterations.
+ *
+ * If P is the resulting time of BenchLinearizePerIterWorstCase, and N is the resulting time of
+ * BenchLinearizeNoItersWorstCase*, then an invocation of Linearize with max_iterations=m should
+ * take no more than roughly N+m*P time. This may however be an overestimate, as the worst cases
+ * do not coincide (the ones that are worst for linearization without any search happen to be ones
+ * that do not need many search iterations).
+ *
+ * This benchmark exercises a worst case for AncestorCandidateFinder, but for which improvement is
+ * cheap.
+ */
+template<typename SetType>
+void BenchLinearizeNoItersWorstCaseAnc(ClusterIndex ntx, benchmark::Bench& bench)
+{
+    const auto depgraph = MakeLinearGraph<SetType>(ntx);
+    uint64_t rng_seed = 0;
+    std::vector<ClusterIndex> old_lin(ntx);
+    for (ClusterIndex i = 0; i < ntx; ++i) old_lin[i] = i;
+    bench.run([&] {
+        Linearize(depgraph, /*max_iterations=*/0, rng_seed++, old_lin);
+    });
+}
+
+/** Benchmark for linearization improvement of a trivial wide graph using just ancestor sort.
+ *
+ * Its goal is measuring how much time improving a linearization may take without any search
+ * iterations, similar to the previous function.
+ *
+ * This benchmark exercises a worst case for improving an existing linearization, but for which
+ * AncestorCandidateFinder is cheap.
+ */
+template<typename SetType>
+void BenchLinearizeNoItersWorstCaseLIMO(ClusterIndex ntx, benchmark::Bench& bench)
+{
+    const auto depgraph = MakeWideGraph<SetType>(ntx);
+    uint64_t rng_seed = 0;
+    std::vector<ClusterIndex> old_lin(ntx);
+    for (ClusterIndex i = 0; i < ntx; ++i) old_lin[i] = i;
+    bench.run([&] {
+        Linearize(depgraph, /*max_iterations=*/0, rng_seed++, old_lin);
+    });
+}
+
+} // namespace
+
+static void LinearizePerIter16TxWorstCase(benchmark::Bench& bench) { BenchLinearizePerIterWorstCase<BitSet<16>>(16, bench); }
+static void LinearizePerIter32TxWorstCase(benchmark::Bench& bench) { BenchLinearizePerIterWorstCase<BitSet<32>>(32, bench); }
+static void LinearizePerIter48TxWorstCase(benchmark::Bench& bench) { BenchLinearizePerIterWorstCase<BitSet<48>>(48, bench); }
+static void LinearizePerIter64TxWorstCase(benchmark::Bench& bench) { BenchLinearizePerIterWorstCase<BitSet<64>>(64, bench); }
+static void LinearizePerIter75TxWorstCase(benchmark::Bench& bench) { BenchLinearizePerIterWorstCase<BitSet<75>>(75, bench); }
+static void LinearizePerIter99TxWorstCase(benchmark::Bench& bench) { BenchLinearizePerIterWorstCase<BitSet<99>>(99, bench); }
+
+static void LinearizeNoIters16TxWorstCaseAnc(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseAnc<BitSet<16>>(16, bench); }
+static void LinearizeNoIters32TxWorstCaseAnc(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseAnc<BitSet<32>>(32, bench); }
+static void LinearizeNoIters48TxWorstCaseAnc(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseAnc<BitSet<48>>(48, bench); }
+static void LinearizeNoIters64TxWorstCaseAnc(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseAnc<BitSet<64>>(64, bench); }
+static void LinearizeNoIters75TxWorstCaseAnc(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseAnc<BitSet<75>>(75, bench); }
+static void LinearizeNoIters99TxWorstCaseAnc(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseAnc<BitSet<99>>(99, bench); }
+
+static void LinearizeNoIters16TxWorstCaseLIMO(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseLIMO<BitSet<16>>(16, bench); }
+static void LinearizeNoIters32TxWorstCaseLIMO(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseLIMO<BitSet<32>>(32, bench); }
+static void LinearizeNoIters48TxWorstCaseLIMO(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseLIMO<BitSet<48>>(48, bench); }
+static void LinearizeNoIters64TxWorstCaseLIMO(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseLIMO<BitSet<64>>(64, bench); }
+static void LinearizeNoIters75TxWorstCaseLIMO(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseLIMO<BitSet<75>>(75, bench); }
+static void LinearizeNoIters99TxWorstCaseLIMO(benchmark::Bench& bench) { BenchLinearizeNoItersWorstCaseLIMO<BitSet<99>>(99, bench); }
+
+BENCHMARK(LinearizePerIter16TxWorstCase, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizePerIter32TxWorstCase, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizePerIter48TxWorstCase, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizePerIter64TxWorstCase, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizePerIter75TxWorstCase, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizePerIter99TxWorstCase, benchmark::PriorityLevel::HIGH);
+
+BENCHMARK(LinearizeNoIters16TxWorstCaseAnc, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters32TxWorstCaseAnc, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters48TxWorstCaseAnc, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters64TxWorstCaseAnc, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters75TxWorstCaseAnc, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters99TxWorstCaseAnc, benchmark::PriorityLevel::HIGH);
+
+BENCHMARK(LinearizeNoIters16TxWorstCaseLIMO, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters32TxWorstCaseLIMO, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters48TxWorstCaseLIMO, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters64TxWorstCaseLIMO, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters75TxWorstCaseLIMO, benchmark::PriorityLevel::HIGH);
+BENCHMARK(LinearizeNoIters99TxWorstCaseLIMO, benchmark::PriorityLevel::HIGH);