Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor order tests for deterministic behavior #114

Merged
merged 9 commits into from
Sep 12, 2024

Conversation

Ayiga
Copy link
Member

@Ayiga Ayiga commented Sep 5, 2024

Closes #113

This PR:

Refactors the order_test file to use ProxyGlobalState as a means of submitting and determing transactions to include as this already contains logic to wait for the transactions to be in a corrected state.

It removes all uses and reliance on async_sleep in these tests as a means of waiting for the GlobalState to be in the correct state, and instead relies on the functions in ProxyGlobalState to determine when this state is reached.

It also refactors and reorganizes much of the logic contained in the tests in a way that SHOULD match what was there before, but avoids the conditionals within the loops so that the loop's logic can appear more linear.

This PR does not:

Aim to modify the logic of the existing tests in any meaningful way. It attemps to keep the same logic, but just re-order and reorganize the order of the events that are occurring within the tests.

@Ayiga Ayiga force-pushed the ts/enh/non-deterministic-tests branch from c203cc5 to 0e1b39a Compare September 5, 2024 19:07
/// determining which transactions are included in the block, and how their
/// order is adjusted before being included for consensus.
#[derive(Clone, Debug)]
enum RoundTransactionBehavior {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this!

crates/marketplace/src/testing/order_test.rs Show resolved Hide resolved
The current order tests can be flaky at times as they rely on waiting for the builder's
logic to process via `async_sleep` without being able to determine if its in a state that
is ready.

To address this specifically the code has been refactored to work with `ProxyGlobalState`
instead of the `GlobalState` directly as `ProxyGlobalState` already has some mechanisms
for waiting for things to be completed.  This also aids in simplifying a bit of the internal
logic, as we were reimplementing what `ProxyGlobalState` was already doing.

In addition to these changes this also attempts to refactor the code in such a way as to
remove the conditionals that occur within the body of the for each loops.  This is done
by adding an enum for round behavior that can adjust the transactions returned by the
bundle call as needed.

The order of progression within the for each loop has also been adjusted to perform the
consensus portion at the top of the loop before transactions are submitted.  Since this
order isn't that important, it seems better to have the consensus to occur at the top of
the loop rather than after the transactions as it makes it more clear which portion of the
logic we are currently in.  I.E. are we currently considering consensus' role, or the
builder's role.
When a fork happens such that we have separate requests submitted for the builder,
we are expecting the builder to be aware of these and to issue any missing transactions
for a block in a subsequent block.  At the moment, this is not occurring, and the
assert at the bottom of the `test_builder_order_chain_fork` that ensures that the
two transaction history forks don't match is backwards.

Additional asserts have been added when producing the next propsoal to verify that
the transaction lists are the same in the event that their previous lists were the
same, or differ if their previous lists differ while not having any transactions
in the fork, or if the transactions are the same when the previous proposal contained
all of the missing transactions.
The current `BuilderState` extending behavior fails to anticipate enough of the
various edge cases that determine whether the current `BuilderState` is the best,
or one of the only `BuilderState`s that should be extended.  Due to this behavior
many more `BuilderState`s are spawned that really should not be spawned.  This
causes a lot of unecessary computation and memory usage that will never get cleaned
up as the `Sender`s that are registered for them get overwritten by subsequent
clones that have the same `BuilderStateId`. This effectively causes an async
memory leak, async processing leak, and a potential fork-bomb that will effectively
prevent the builder from being responsive.

Additionally, the race condition that stems from multiple `BuilderState`s spawning
a clone that extends them for the next `QuorumProposal` means that we often
recieve an unexpected list of transactions as a result.  This behavior can be
best seen by the test
marketplace::testing::order_test::test_builder_order_chain_fork`.

To fix these issues a few changes have been maded.  First a new method has been
added called `spawn_clone_that_extends_self` that ensures a standard way of
spawning a new `BuilderState` that extends the current `BuilderState`. Another
method has been added called `am_i_the_best_builder_state_to_extend` that
attempts to ensure that we are the best fit, or one of the only fits that can
extend the current `BuilderState` based on the given `QuorumProposal`. Finally,
as a safe guard, a check has been added when calling `spawn_clone` that ensures
that the same `BuilderStateId` will not be registered more than once.  IE, if
there is already an async process spawned, there is no need to spawn another.

Remove newly added asserts as they fail to represent the correct anticipated intermediate state
@Ayiga Ayiga force-pushed the ts/enh/non-deterministic-tests branch from d545123 to 2d5ed84 Compare September 10, 2024 13:07
The `calc_proposal_msg` is currently returning the same `view_number` for its justify_qc
that it is for the entire `QuorumProposal`. Instead the `justify_qc` view should come
from the passed previous proposal's view_number.
The `BuiltFromProposedBlock` definitions is going to be referenced in `GlobalState`.
In order to prepare for this change the struct needs to be moved to a different
crate in order to avoid circular dependencies.
In order to be able to consistently pick the correct `BuilderState` to extend when
we receive a `QuorumProposal`, additional information is needed when understanding
the spawned `BuilderState`s. In order to add this additional information the
`BuiltFromProposedBlock` struct has been added as a `Pair` to the value stored
in the `spawned_builder_states`.

This addition will allow for inspectors to better understand the `QuorumProposal`
information that the `BuilderState` was spawned from.  This field is also made an
`Option` to make it easier to initialize.
With the addition of the the `BuiltForProposedBlock` to the `spawned_builder_states`
we can not make better decisions about which `BlockerState` a given `QuorumProposal`
can extend.  As such, this splits the determiniation of the best `BuilderState`s to
extend from the `am_i_the_best_builder_state_to_extend` call.

Additionally the new `best_builder_states_to_extend` function is deliberately
made into a function rather than a `method` in order to ensure that information
contained within the current `BuilderState` that is only visible to that
`BuilderState` is used.  This should allow for more consistent determiniations of
the proper `BuilderState` to extend.
Copy link
Member

@shenkeyao shenkeyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question about the function to find the best states and a minor comment. Looks good!

pub spawned_builder_states: HashMap<
BuilderStateId<TYPES>,
(
Option<BuiltFromProposedBlock<TYPES>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we have a comment about when it can be None (IIUC it's on the initialization)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it can be None at any point. At the moment the only time it should be None is at initialization. However, it would continue to operate somewhat decently even if None was passed more often. The only problem with that would be that a race condition would be present in the fork test again.

I can make this comment though. Or if you think it's important, we can also potentially replace the Tuple with a data structure that represents this information more clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to replace the data structure. A short comment sounds good!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 15e7c02

@@ -172,7 +155,206 @@ pub struct BuilderState<TYPES: NodeType> {
pub instance_state: Arc<TYPES::InstanceState>,
}

/// [best_builder_states_to_extend] is a utility function that is used to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great documentation!

quorum_proposal: Arc<Proposal<TYPES, QuorumProposal<TYPES>>>,
) -> bool {
let best_builder_states_to_extend =
best_builder_states_to_extend(quorum_proposal.clone(), self.global_state.clone()).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is the only time we call the best_builder_states_to_extend function, and what we care about here is only if the best states contain the current state -- if my understanding is correct, can we pass the current state to best_builder_states_to_extend and simplify its logic to just check if the current state is one of the best options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is true. I believe we could do what you're suggesting, but then we'd lose sight over the potential duplicates. I'm not certain whether this is desirable or not given the globals of best_builder_states_to_extend. My hesitation here is to try and avoid any BuilderState specific knowledge from changing the answer, as this leads to non-deterministic behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think for this test we probably don't need all the info from best_builder_states_to_extend, but for future tests or debugging it may be very helpful. I added this issue for us to determine whether to simplify it after other refactor we want to do: #124. But for now, let's keep it!

@Ayiga Ayiga merged commit a8ee1c0 into main Sep 12, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants