Refactor order tests for deterministic behavior #114

Ayiga · 2024-09-05T12:55:14Z

Closes #113

This PR:

Refactors the order_test file to use ProxyGlobalState as a means of submitting and determing transactions to include as this already contains logic to wait for the transactions to be in a corrected state.

It removes all uses and reliance on async_sleep in these tests as a means of waiting for the GlobalState to be in the correct state, and instead relies on the functions in ProxyGlobalState to determine when this state is reached.

It also refactors and reorganizes much of the logic contained in the tests in a way that SHOULD match what was there before, but avoids the conditionals within the loops so that the loop's logic can appear more linear.

This PR does not:

Aim to modify the logic of the existing tests in any meaningful way. It attemps to keep the same logic, but just re-order and reorganize the order of the events that are occurring within the tests.

dailinsubjam · 2024-09-05T17:40:44Z

crates/marketplace/src/testing/order_test.rs

+/// determining which transactions are included in the block, and how their
+/// order is adjusted before being included for consensus.
+#[derive(Clone, Debug)]
+enum RoundTransactionBehavior {


crates/marketplace/src/testing/order_test.rs

The current order tests can be flaky at times as they rely on waiting for the builder's logic to process via `async_sleep` without being able to determine if its in a state that is ready. To address this specifically the code has been refactored to work with `ProxyGlobalState` instead of the `GlobalState` directly as `ProxyGlobalState` already has some mechanisms for waiting for things to be completed. This also aids in simplifying a bit of the internal logic, as we were reimplementing what `ProxyGlobalState` was already doing. In addition to these changes this also attempts to refactor the code in such a way as to remove the conditionals that occur within the body of the for each loops. This is done by adding an enum for round behavior that can adjust the transactions returned by the bundle call as needed. The order of progression within the for each loop has also been adjusted to perform the consensus portion at the top of the loop before transactions are submitted. Since this order isn't that important, it seems better to have the consensus to occur at the top of the loop rather than after the transactions as it makes it more clear which portion of the logic we are currently in. I.E. are we currently considering consensus' role, or the builder's role.

When a fork happens such that we have separate requests submitted for the builder, we are expecting the builder to be aware of these and to issue any missing transactions for a block in a subsequent block. At the moment, this is not occurring, and the assert at the bottom of the `test_builder_order_chain_fork` that ensures that the two transaction history forks don't match is backwards. Additional asserts have been added when producing the next propsoal to verify that the transaction lists are the same in the event that their previous lists were the same, or differ if their previous lists differ while not having any transactions in the fork, or if the transactions are the same when the previous proposal contained all of the missing transactions.

The current `BuilderState` extending behavior fails to anticipate enough of the various edge cases that determine whether the current `BuilderState` is the best, or one of the only `BuilderState`s that should be extended. Due to this behavior many more `BuilderState`s are spawned that really should not be spawned. This causes a lot of unecessary computation and memory usage that will never get cleaned up as the `Sender`s that are registered for them get overwritten by subsequent clones that have the same `BuilderStateId`. This effectively causes an async memory leak, async processing leak, and a potential fork-bomb that will effectively prevent the builder from being responsive. Additionally, the race condition that stems from multiple `BuilderState`s spawning a clone that extends them for the next `QuorumProposal` means that we often recieve an unexpected list of transactions as a result. This behavior can be best seen by the test marketplace::testing::order_test::test_builder_order_chain_fork`. To fix these issues a few changes have been maded. First a new method has been added called `spawn_clone_that_extends_self` that ensures a standard way of spawning a new `BuilderState` that extends the current `BuilderState`. Another method has been added called `am_i_the_best_builder_state_to_extend` that attempts to ensure that we are the best fit, or one of the only fits that can extend the current `BuilderState` based on the given `QuorumProposal`. Finally, as a safe guard, a check has been added when calling `spawn_clone` that ensures that the same `BuilderStateId` will not be registered more than once. IE, if there is already an async process spawned, there is no need to spawn another. Remove newly added asserts as they fail to represent the correct anticipated intermediate state

The `calc_proposal_msg` is currently returning the same `view_number` for its justify_qc that it is for the entire `QuorumProposal`. Instead the `justify_qc` view should come from the passed previous proposal's view_number.

The `BuiltFromProposedBlock` definitions is going to be referenced in `GlobalState`. In order to prepare for this change the struct needs to be moved to a different crate in order to avoid circular dependencies.

In order to be able to consistently pick the correct `BuilderState` to extend when we receive a `QuorumProposal`, additional information is needed when understanding the spawned `BuilderState`s. In order to add this additional information the `BuiltFromProposedBlock` struct has been added as a `Pair` to the value stored in the `spawned_builder_states`. This addition will allow for inspectors to better understand the `QuorumProposal` information that the `BuilderState` was spawned from. This field is also made an `Option` to make it easier to initialize.

With the addition of the the `BuiltForProposedBlock` to the `spawned_builder_states` we can not make better decisions about which `BlockerState` a given `QuorumProposal` can extend. As such, this splits the determiniation of the best `BuilderState`s to extend from the `am_i_the_best_builder_state_to_extend` call. Additionally the new `best_builder_states_to_extend` function is deliberately made into a function rather than a `method` in order to ensure that information contained within the current `BuilderState` that is only visible to that `BuilderState` is used. This should allow for more consistent determiniations of the proper `BuilderState` to extend.

shenkeyao

I left a question about the function to find the best states and a minor comment. Looks good!

shenkeyao · 2024-09-11T18:45:05Z

crates/marketplace/src/service.rs

+    pub spawned_builder_states: HashMap<
+        BuilderStateId<TYPES>,
+        (
+            Option<BuiltFromProposedBlock<TYPES>>,


Nit: Can we have a comment about when it can be None (IIUC it's on the initialization)?

Technically it can be None at any point. At the moment the only time it should be None is at initialization. However, it would continue to operate somewhat decently even if None was passed more often. The only problem with that would be that a race condition would be present in the fork test again.

I can make this comment though. Or if you think it's important, we can also potentially replace the Tuple with a data structure that represents this information more clearly.

No need to replace the data structure. A short comment sounds good!

Addressed in 15e7c02

shenkeyao · 2024-09-11T18:46:11Z

crates/marketplace/src/builder_state.rs

@@ -172,7 +155,206 @@ pub struct BuilderState<TYPES: NodeType> {
    pub instance_state: Arc<TYPES::InstanceState>,
 }

+/// [best_builder_states_to_extend] is a utility function that is used to


Great documentation!

shenkeyao · 2024-09-11T18:57:57Z

crates/marketplace/src/builder_state.rs

+        quorum_proposal: Arc<Proposal<TYPES, QuorumProposal<TYPES>>>,
+    ) -> bool {
+        let best_builder_states_to_extend =
+            best_builder_states_to_extend(quorum_proposal.clone(), self.global_state.clone()).await;


Looks like this is the only time we call the best_builder_states_to_extend function, and what we care about here is only if the best states contain the current state -- if my understanding is correct, can we pass the current state to best_builder_states_to_extend and simplify its logic to just check if the current state is one of the best options?

This comment is true. I believe we could do what you're suggesting, but then we'd lose sight over the potential duplicates. I'm not certain whether this is desirable or not given the globals of best_builder_states_to_extend. My hesitation here is to try and avoid any BuilderState specific knowledge from changing the answer, as this leads to non-deterministic behavior.

Makes sense. I think for this test we probably don't need all the info from best_builder_states_to_extend, but for future tests or debugging it may be very helpful. I added this issue for us to determine whether to simplify it after other refactor we want to do: #124. But for now, let's keep it!

@shenkeyao

Addresses feeback given by @shenkeyao: #114 (comment)

Ayiga force-pushed the ts/enh/non-deterministic-tests branch from c203cc5 to 0e1b39a Compare September 5, 2024 19:07

dailinsubjam reviewed Sep 5, 2024

View reviewed changes

Ayiga added 4 commits September 10, 2024 07:06

Fix comment on fork test to match assert

a96d8d3

Ayiga force-pushed the ts/enh/non-deterministic-tests branch from d545123 to 2d5ed84 Compare September 10, 2024 13:07

Ayiga added 4 commits September 10, 2024 12:41

Relocate BuiltFromProposedBlock to utils.rs

2191e7a

The `BuiltFromProposedBlock` definitions is going to be referenced in `GlobalState`. In order to prepare for this change the struct needs to be moved to a different crate in order to avoid circular dependencies.

shenkeyao reviewed Sep 11, 2024

View reviewed changes

Add comment describing the Option value of spawned_builder_states

15e7c02

Addresses feeback given by @shenkeyao: #114 (comment)

shenkeyao approved these changes Sep 11, 2024

View reviewed changes

Ayiga merged commit a8ee1c0 into main Sep 12, 2024
5 checks passed

Ayiga mentioned this pull request Sep 12, 2024

Back fix the logic fixes for reorder test refactor to legacy #127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor order tests for deterministic behavior #114

Refactor order tests for deterministic behavior #114

Ayiga commented Sep 5, 2024 •

edited

Loading

dailinsubjam Sep 5, 2024

shenkeyao left a comment

shenkeyao Sep 11, 2024

Ayiga Sep 11, 2024

shenkeyao Sep 11, 2024

Ayiga Sep 11, 2024

shenkeyao Sep 11, 2024

shenkeyao Sep 11, 2024

Ayiga Sep 11, 2024

shenkeyao Sep 11, 2024

Refactor order tests for deterministic behavior #114

Refactor order tests for deterministic behavior #114

Conversation

Ayiga commented Sep 5, 2024 • edited Loading

This PR:

This PR does not:

Choose a reason for hiding this comment

shenkeyao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ayiga commented Sep 5, 2024 •

edited

Loading