-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChainUpdate
s: model bad peer behavior
#3856
Conversation
10579e5
to
644413a
Compare
644413a
to
205193d
Compare
205193d
to
77852a5
Compare
c542272
to
f0aa7d0
Compare
The BFT protocol uses block numbers as its `SelectView`, so the induced order on blocks is not total. When we generate a rollback, we usually end up with the same chain length, which is then not improving upon the previous chain.
Right now, the peer disconnection logic relies on the interplay of the BlockFetch and the ChainSync client to catch invalid behavior, so this commit adds an actual ChainSync client for every peer. Concretely, consider the case when a peer wants to extend an invalid block. In that case, the ChainSync client will disconnect, either when the extending header is received, or via the invalid block rejector in a background thread. In contrast, when we simply add a block (together with a punishment) to the ChainDB, this punishment will *not* be enacted, as the block is not validated as it is not reachable via any (valid) block in the VolDB.
f0aa7d0
to
cae6ffd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing major. Lots of small things and a couple open-ended questions.
ouroboros-consensus-test/test-infra/Test/Util/ChainUpdates/Tests.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus-test/test-infra/Test/Util/ChainUpdates/Tests.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus-test/test-infra/Test/Util/ChainUpdates/Tests.hs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished a full pass. Similar kinds of comments as in the review above.
Invalid -> | ||
counterexample "Invalid behavior not caught" $ | ||
tabulateFailureMode $ | ||
property (isLeft blockFetchRes || isLeft chainSyncRes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write a short comment here explaining why either might catch it and why it's OK whichever does.
ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs
Outdated
Show resolved
Hide resolved
@@ -181,8 +221,6 @@ runBlockFetchTest BlockFetchClientTestSetup{..} = withRegistry \registry -> do | |||
bfClient | |||
bfServer | |||
|
|||
-- On every tick, we schedule updates to the shared chain fragment | |||
-- (mocking ChainSync). | |||
forkTicking peerId = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could preserve the comment, saying that it's the ChainSync server that's being mocked.
varChains <- uncheckedNewTVarM Map.empty | ||
varControlMessage <- uncheckedNewTVarM Mux.Continue | ||
varFetchedBlocks <- uncheckedNewTVarM (0 <$ peerUpdates) | ||
varCandidates <- uncheckedNewTVarM Map.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the invariants among these?
EG TickWatcher
updates both varChains
and producerStateVars
. It seems like those two are two views onto "the peer's current chain". Is that right? Maybe combine them into a single map of pairs?
|
||
forkChainSync peerId = | ||
forkLinkedThread registry ("BracketSync" <> condense peerId) $ | ||
forkThread registry ("BracketSync" <> condense peerId) $ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Would it be equivalent to keep it a linked thread, but catch ChainSyncClientException
, record them, and then terminate successfully. That way any (unexpected) other kind of exception would still be propagated via the link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I have the same question about the BlockFetch clients too, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds also like a possible approach, I am not necessarily opposed. I think the main thing here is that we then are no longer able to independently see whether ChainSync or BlockFetch threw an exception (i.e. we only have Either ChainSyncEx BlockFetchEx
instead of These ChainSyncEx BlockFetchEx
). Right now, we e.g. use this information for labelling, see tabulateFailureMode
, which results e.g. in
Expected failure due to (3388 in total):
64.14% ChainSync
34.65% BlockFetch,ChainSync
1.21% BlockFetch
topLevelConfig | ||
chainDbView | ||
ntnVersion | ||
(pure Mux.Continue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the ChainSync client never terminates itself successfully?
... I would guess that the bracketSyncWithFetchClient
terminates the ChainSync client (successfully?) when the BlockFetch client terminates itself successfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that "graceful" termination would be somewhat nicer, but the straightforward approach (reading from a TVar
here, which we set to Terminate
in the end) does not work; but maybe one just needs to adapt the ChainSync client implementation in a few places to better respect these control messages; right now, we do that just here:
In any case, the same thing is present in the actual ChainSync client test:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted. It's awkward: the ChainSync client ignores the control message until it returns to the top of its loop. But in this case, at the end of the test, we won't be sending another message, so it will ignore the control message indefinitely.
Also, other tests do this. So it seems like a fine "over"-simplification to keep.
Edit: ooof, but I do want #3856 (comment) , so Esgen will try to see how much extra work it is to get this graceful termination to happen.
ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs
Show resolved
Hide resolved
bfcoPeerOutcomes <- | ||
flip Map.traverseWithKey blockFetchThreads \peerId threadId -> do | ||
blockFetchRes <- waitThread threadId | ||
chainSyncRes <- (Map.! peerId) <$> readTVarIO varChainSyncRes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Instead of this explicit mutvar stuff, could you also capture the ChainSync ThreadId
, also send Mux.Terminate
to the ChainSync client control vars, and read those threads' results the same as you're doing for BlockFetch? (Maybe you'd need to shutdown a peer's ChainSync before shutting down its BlockFetch, to appease bracketSyncWithFetchClient
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #3856 (comment)
, InvalidChainBehavior | ||
] | ||
numPeers <- case behaviorValidity behavior of | ||
-- Interaction of multiple clients can hide invalid behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, this is an disappointing gap. Let's chat on a call.
genChainUpdates behavior maxRollback 20 >>= genSchedule strat | ||
where | ||
strat = case behaviorValidity behavior of | ||
-- Multiple updates per tick can hide invalid behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point as in #3856 (comment)
The `serverUpdates` only depend on the `securityParam`, and the `clientUpdates` additionally depend on `serverUpdates` due to `removeLateClientUpdates`.
Superseded by IntersectMBO/ouroboros-consensus#492 |
Description
Closes CAD-4314
Previously, we did not generate any
ChainUpdate
behavior modeling a malicious or ill-configured peer. This meant that existing tests could only check that our disconnection routines are not overly restrictive, but not that they are strict enough.This PR adds this possibility, namely by randomly toggling some blocks to be invalid. As the result is not necessarily modeling bad behavior, we add a function to "classify" a sequence of chain updates after the fact.
Concerning the BlockFetch client test: as certain invalid behavior is only caught by the ChainSync client, we now also run it in the background.
Checklist
interface-CHANGELOG.md
interface-CHANGELOG.md