-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor order tests for deterministic behavior #114
Conversation
c203cc5
to
0e1b39a
Compare
/// determining which transactions are included in the block, and how their | ||
/// order is adjusted before being included for consensus. | ||
#[derive(Clone, Debug)] | ||
enum RoundTransactionBehavior { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this!
The current order tests can be flaky at times as they rely on waiting for the builder's logic to process via `async_sleep` without being able to determine if its in a state that is ready. To address this specifically the code has been refactored to work with `ProxyGlobalState` instead of the `GlobalState` directly as `ProxyGlobalState` already has some mechanisms for waiting for things to be completed. This also aids in simplifying a bit of the internal logic, as we were reimplementing what `ProxyGlobalState` was already doing. In addition to these changes this also attempts to refactor the code in such a way as to remove the conditionals that occur within the body of the for each loops. This is done by adding an enum for round behavior that can adjust the transactions returned by the bundle call as needed. The order of progression within the for each loop has also been adjusted to perform the consensus portion at the top of the loop before transactions are submitted. Since this order isn't that important, it seems better to have the consensus to occur at the top of the loop rather than after the transactions as it makes it more clear which portion of the logic we are currently in. I.E. are we currently considering consensus' role, or the builder's role.
When a fork happens such that we have separate requests submitted for the builder, we are expecting the builder to be aware of these and to issue any missing transactions for a block in a subsequent block. At the moment, this is not occurring, and the assert at the bottom of the `test_builder_order_chain_fork` that ensures that the two transaction history forks don't match is backwards. Additional asserts have been added when producing the next propsoal to verify that the transaction lists are the same in the event that their previous lists were the same, or differ if their previous lists differ while not having any transactions in the fork, or if the transactions are the same when the previous proposal contained all of the missing transactions.
The current `BuilderState` extending behavior fails to anticipate enough of the various edge cases that determine whether the current `BuilderState` is the best, or one of the only `BuilderState`s that should be extended. Due to this behavior many more `BuilderState`s are spawned that really should not be spawned. This causes a lot of unecessary computation and memory usage that will never get cleaned up as the `Sender`s that are registered for them get overwritten by subsequent clones that have the same `BuilderStateId`. This effectively causes an async memory leak, async processing leak, and a potential fork-bomb that will effectively prevent the builder from being responsive. Additionally, the race condition that stems from multiple `BuilderState`s spawning a clone that extends them for the next `QuorumProposal` means that we often recieve an unexpected list of transactions as a result. This behavior can be best seen by the test marketplace::testing::order_test::test_builder_order_chain_fork`. To fix these issues a few changes have been maded. First a new method has been added called `spawn_clone_that_extends_self` that ensures a standard way of spawning a new `BuilderState` that extends the current `BuilderState`. Another method has been added called `am_i_the_best_builder_state_to_extend` that attempts to ensure that we are the best fit, or one of the only fits that can extend the current `BuilderState` based on the given `QuorumProposal`. Finally, as a safe guard, a check has been added when calling `spawn_clone` that ensures that the same `BuilderStateId` will not be registered more than once. IE, if there is already an async process spawned, there is no need to spawn another. Remove newly added asserts as they fail to represent the correct anticipated intermediate state
d545123
to
2d5ed84
Compare
The `calc_proposal_msg` is currently returning the same `view_number` for its justify_qc that it is for the entire `QuorumProposal`. Instead the `justify_qc` view should come from the passed previous proposal's view_number.
The `BuiltFromProposedBlock` definitions is going to be referenced in `GlobalState`. In order to prepare for this change the struct needs to be moved to a different crate in order to avoid circular dependencies.
In order to be able to consistently pick the correct `BuilderState` to extend when we receive a `QuorumProposal`, additional information is needed when understanding the spawned `BuilderState`s. In order to add this additional information the `BuiltFromProposedBlock` struct has been added as a `Pair` to the value stored in the `spawned_builder_states`. This addition will allow for inspectors to better understand the `QuorumProposal` information that the `BuilderState` was spawned from. This field is also made an `Option` to make it easier to initialize.
With the addition of the the `BuiltForProposedBlock` to the `spawned_builder_states` we can not make better decisions about which `BlockerState` a given `QuorumProposal` can extend. As such, this splits the determiniation of the best `BuilderState`s to extend from the `am_i_the_best_builder_state_to_extend` call. Additionally the new `best_builder_states_to_extend` function is deliberately made into a function rather than a `method` in order to ensure that information contained within the current `BuilderState` that is only visible to that `BuilderState` is used. This should allow for more consistent determiniations of the proper `BuilderState` to extend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question about the function to find the best states and a minor comment. Looks good!
pub spawned_builder_states: HashMap< | ||
BuilderStateId<TYPES>, | ||
( | ||
Option<BuiltFromProposedBlock<TYPES>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we have a comment about when it can be None
(IIUC it's on the initialization)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it can be None
at any point. At the moment the only time it should be None
is at initialization. However, it would continue to operate somewhat decently even if None
was passed more often. The only problem with that would be that a race condition would be present in the fork test again.
I can make this comment though. Or if you think it's important, we can also potentially replace the Tuple
with a data structure that represents this information more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to replace the data structure. A short comment sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 15e7c02
@@ -172,7 +155,206 @@ pub struct BuilderState<TYPES: NodeType> { | |||
pub instance_state: Arc<TYPES::InstanceState>, | |||
} | |||
|
|||
/// [best_builder_states_to_extend] is a utility function that is used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great documentation!
quorum_proposal: Arc<Proposal<TYPES, QuorumProposal<TYPES>>>, | ||
) -> bool { | ||
let best_builder_states_to_extend = | ||
best_builder_states_to_extend(quorum_proposal.clone(), self.global_state.clone()).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is the only time we call the best_builder_states_to_extend
function, and what we care about here is only if the best states contain the current state -- if my understanding is correct, can we pass the current state to best_builder_states_to_extend
and simplify its logic to just check if the current state is one of the best options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is true. I believe we could do what you're suggesting, but then we'd lose sight over the potential duplicates. I'm not certain whether this is desirable or not given the globals of best_builder_states_to_extend
. My hesitation here is to try and avoid any BuilderState
specific knowledge from changing the answer, as this leads to non-deterministic behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I think for this test we probably don't need all the info from best_builder_states_to_extend
, but for future tests or debugging it may be very helpful. I added this issue for us to determine whether to simplify it after other refactor we want to do: #124. But for now, let's keep it!
Addresses feeback given by @shenkeyao: #114 (comment)
Closes #113
This PR:
Refactors the
order_test
file to useProxyGlobalState
as a means of submitting and determing transactions to include as this already contains logic to wait for the transactions to be in a corrected state.It removes all uses and reliance on
async_sleep
in these tests as a means of waiting for theGlobalState
to be in the correct state, and instead relies on the functions inProxyGlobalState
to determine when this state is reached.It also refactors and reorganizes much of the logic contained in the tests in a way that SHOULD match what was there before, but avoids the conditionals within the loops so that the loop's logic can appear more linear.
This PR does not:
Aim to modify the logic of the existing tests in any meaningful way. It attemps to keep the same logic, but just re-order and reorganize the order of the events that are occurring within the tests.