New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

`ChainUpdate`s: model bad peer behavior #3856

Closed

amesgen wants to merge 14 commits into master from amesgen/CAD-4314-model-bad-peers

Member

amesgen commented Jul 4, 2022 •

edited by jira bot

Loading

Description

Closes CAD-4314

Previously, we did not generate any ChainUpdate behavior modeling a malicious or ill-configured peer. This meant that existing tests could only check that our disconnection routines are not overly restrictive, but not that they are strict enough.

This PR adds this possibility, namely by randomly toggling some blocks to be invalid. As the result is not necessarily modeling bad behavior, we add a function to "classify" a sequence of chain updates after the fact.

Concerning the BlockFetch client test: as certain invalid behavior is only caught by the ChainSync client, we now also run it in the background.

Checklist

Branch
- Commit sequence broadly makes sense
- Commits have useful messages
- New tests are added if needed and existing tests are updated
- If this branch changes Consensus and has any consequences for downstream repositories or end users, said changes must be documented in interface-CHANGELOG.md
- If this branch changes Network and has any consequences for downstream repositories or end users, said changes must be documented in interface-CHANGELOG.md
- If serialization changes, user-facing consequences (e.g. replay from genesis) are confirmed to be intentional.
Pull Request
- Self-reviewed the diff
- Useful pull request description at least containing the following information:
  - What does this PR change?
  - Why these changes were needed?
  - How does this affect downstream repositories and/or end-users?
  - Which ticket does this PR close (if any)? If it does, is it linked?
- Reviewer requested

amesgen requested review from nfrisby and dnadales as code owners

July 4, 2022 07:48

amesgen added consensus testing labels

amesgen mentioned this pull request

Test promptness of tentative followers #3857

Merged

9 tasks

amesgen force-pushed the amesgen/CAD-4314-model-bad-peers branch from 10579e5 to 644413a Compare

July 6, 2022 12:47

iohk-bors bot deleted the branch master

July 6, 2022 15:30

iohk-bors bot closed this

amesgen reopened this

amesgen changed the base branch from amesgen/CAD-4193-BlockFetch-client-test to master

July 6, 2022 16:01

amesgen force-pushed the amesgen/CAD-4314-model-bad-peers branch from 644413a to 205193d Compare

July 14, 2022 16:30

amesgen requested a review from a team as a code owner

July 14, 2022 16:30

amesgen force-pushed the amesgen/CAD-4314-model-bad-peers branch from 205193d to 77852a5 Compare

July 14, 2022 16:37

amesgen removed the request for review from a team

July 14, 2022 16:37

amesgen force-pushed the amesgen/CAD-4314-model-bad-peers branch 2 times, most recently from c542272 to f0aa7d0 Compare

July 21, 2022 08:36

amesgen added 6 commits

October 18, 2022 18:01


          ChainUpdates: document weaker monotonicity guarantee

c2f117a

The BFT protocol uses block numbers as its `SelectView`, so the induced order on
blocks is not total. When we generate a rollback, we usually end up with the
same chain length, which is then not improving upon the previous chain.


          ChainUpdates: model invalid behavior

2dec969


          consensus-test: add schedules with at most one element per tick

f68b66a


          consensus util: extract catchAlsoLinked

599cf82


          ChainSync client test: allow invalid server behavior

ee98b04


          BlockFetch client test: allow invalid peer behavior

cae6ffd

Right now, the peer disconnection logic relies on the interplay of the
BlockFetch and the ChainSync client to catch invalid behavior, so this commit
adds an actual ChainSync client for every peer.

Concretely, consider the case when a peer wants to extend an invalid block. In
that case, the ChainSync client will disconnect, either when the extending
header is received, or via the invalid block rejector in a background thread.

In contrast, when we simply add a block (together with a punishment) to the
ChainDB, this punishment will *not* be enacted, as the block is not validated as
it is not reachable via any (valid) block in the VolDB.

amesgen force-pushed the amesgen/CAD-4314-model-bad-peers branch from f0aa7d0 to cae6ffd Compare

October 18, 2022 16:22

nfrisby requested changes

View reviewed changes

Contributor

nfrisby left a comment

Nothing major. Lots of small things and a couple open-ended questions.

ouroboros-consensus-test/src/Test/Util/ChainUpdates.hs Outdated Show resolved Hide resolved

ouroboros-consensus-test/src/Test/Util/ChainUpdates.hs Outdated Show resolved Hide resolved

ouroboros-consensus-test/src/Test/Util/ChainUpdates.hs Show resolved Hide resolved

ouroboros-consensus-test/src/Test/Util/ChainUpdates.hs Outdated Show resolved Hide resolved

ouroboros-consensus-test/src/Test/Util/ChainUpdates.hs Outdated Show resolved Hide resolved

ouroboros-consensus/src/Ouroboros/Consensus/Util/Exception.hs Outdated Show resolved Hide resolved

ouroboros-consensus-test/test-infra/Test/Util/ChainUpdates/Tests.hs Outdated Show resolved Hide resolved

ouroboros-consensus-test/test-infra/Test/Util/ChainUpdates/Tests.hs Outdated Show resolved Hide resolved

ouroboros-consensus-test/test-infra/Test/Util/ChainUpdates/Tests.hs Outdated Show resolved Hide resolved

ouroboros-consensus-test/test-infra/Test/Util/ChainUpdates/Tests.hs Show resolved Hide resolved

nfrisby reviewed

View reviewed changes

Contributor

nfrisby left a comment

Finished a full pass. Similar kinds of comments as in the review above.

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs

+                        Invalid ->
+                          counterexample "Invalid behavior not caught" $
+                          tabulateFailureMode $
+                          property (isLeft blockFetchRes || isLeft chainSyncRes)

Contributor

nfrisby Nov 2, 2022

Please write a short comment here explaining why either might catch it and why it's OK whichever does.

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs Outdated Show resolved Hide resolved

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs

                             bfClient
                             bfServer
-                      -- On every tick, we schedule updates to the shared chain fragment
-                      -- (mocking ChainSync).
                       forkTicking peerId =

Contributor

nfrisby Nov 2, 2022

You could preserve the comment, saying that it's the ChainSync server that's being mocked.

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs

-                  varChains           <- uncheckedNewTVarM Map.empty
-                  varControlMessage   <- uncheckedNewTVarM Mux.Continue
-                  varFetchedBlocks    <- uncheckedNewTVarM (0 <$ peerUpdates)
+                  varCandidates     <- uncheckedNewTVarM Map.empty

Contributor

nfrisby Nov 2, 2022

What are the invariants among these?

EG TickWatcher updates both varChains and producerStateVars. It seems like those two are two views onto "the peer's current chain". Is that right? Maybe combine them into a single map of pairs?

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs

                       forkChainSync peerId =
-                        forkLinkedThread registry ("BracketSync" <> condense peerId) $
+                          forkThread registry ("BracketSync" <> condense peerId) $

Contributor

nfrisby Nov 2, 2022

Hmm. Would it be equivalent to keep it a linked thread, but catch ChainSyncClientException, record them, and then terminate successfully. That way any (unexpected) other kind of exception would still be propagated via the link?

Contributor

nfrisby Nov 2, 2022

Actually, I have the same question about the BlockFetch clients too, I suppose.

Member Author

amesgen Nov 28, 2022

Sounds also like a possible approach, I am not necessarily opposed. I think the main thing here is that we then are no longer able to independently see whether ChainSync or BlockFetch threw an exception (i.e. we only have Either ChainSyncEx BlockFetchEx instead of These ChainSyncEx BlockFetchEx). Right now, we e.g. use this information for labelling, see tabulateFailureMode, which results e.g. in

      Expected failure due to (3388 in total):
      64.14% ChainSync
      34.65% BlockFetch,ChainSync
       1.21% BlockFetch

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs

+                                    topLevelConfig
+                                    chainDbView
+                                    ntnVersion
+                                    (pure Mux.Continue)

Contributor

nfrisby Nov 2, 2022

Does this mean the ChainSync client never terminates itself successfully?

... I would guess that the bracketSyncWithFetchClient terminates the ChainSync client (successfully?) when the BlockFetch client terminates itself successfully?

Member Author

amesgen Nov 28, 2022

I agree that "graceful" termination would be somewhat nicer, but the straightforward approach (reading from a TVar here, which we set to Terminate in the end) does not work; but maybe one just needs to adapt the ChainSync client implementation in a few places to better respect these control messages; right now, we do that just here:

https://github.com/input-output-hk/ouroboros-network/blob/2793b6993c8f6ed158f432055fa4ef581acdb661/ouroboros-consensus/src/Ouroboros/Consensus/MiniProtocol/ChainSync/Client.hs#L631

In any case, the same thing is present in the actual ChainSync client test:

https://github.com/input-output-hk/ouroboros-network/blob/2793b6993c8f6ed158f432055fa4ef581acdb661/ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/ChainSync/Client.hs#L296

Contributor

nfrisby Nov 28, 2022 •

edited

Loading

We chatted. It's awkward: the ChainSync client ignores the control message until it returns to the top of its loop. But in this case, at the end of the test, we won't be sending another message, so it will ignore the control message indefinitely.

Also, other tests do this. So it seems like a fine "over"-simplification to keep.

Edit: ooof, but I do want #3856 (comment) , so Esgen will try to see how much extra work it is to get this graceful termination to happen.

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs Show resolved Hide resolved

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs

+                  bfcoPeerOutcomes  <-
+                    flip Map.traverseWithKey blockFetchThreads \peerId threadId -> do
+                      blockFetchRes <- waitThread threadId
+                      chainSyncRes  <- (Map.! peerId) <$> readTVarIO varChainSyncRes

Contributor

nfrisby Nov 2, 2022

Hmm. Instead of this explicit mutvar stuff, could you also capture the ChainSync ThreadId, also send Mux.Terminate to the ChainSync client control vars, and read those threads' results the same as you're doing for BlockFetch? (Maybe you'd need to shutdown a peer's ChainSync before shutting down its BlockFetch, to appease bracketSyncWithFetchClient?)

Member Author

amesgen Nov 28, 2022

Related to #3856 (comment)

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs

+                      , InvalidChainBehavior
+                      ]
+                    numPeers <- case behaviorValidity behavior of
+                      -- Interaction of multiple clients can hide invalid behavior.

Contributor

nfrisby Nov 2, 2022

Oof, this is an disappointing gap. Let's chat on a call.

ouroboros-consensus-test/test-consensus/Test/Consensus/MiniProtocol/BlockFetch/Client.hs

+                        genChainUpdates behavior maxRollback 20 >>= genSchedule strat
+                      where
+                        strat = case behaviorValidity behavior of
+                          -- Multiple updates per tick can hide invalid behavior.

Contributor

nfrisby Nov 2, 2022

Same

Member Author

amesgen Nov 28, 2022 •

edited

Loading

Same point as in #3856 (comment)

amesgen mentioned this pull request

Test trap tentative header monotonicity #3860

Merged

9 tasks

amesgen added 4 commits

November 2, 2022 14:50


          fixup! ChainUpdates: model invalid behavior

daa14a5


          fixup! consensus-test: add schedules with at most one element per tick

cdee0c6


          fixup! consensus util: extract catchAlsoLinked

764493f


          fixup! ChainSync client test: allow invalid server behavior

baa42bb


          fixup! BlockFetch client test: allow invalid peer behavior

d3c6436

amesgen mentioned this pull request

Introduce an abstraction for enumerations with a semantically important Ord instance IntersectMBO/ouroboros-consensus#549

Open

jasagredo assigned amesgen

amesgen added 3 commits

November 28, 2022 12:41


          fixup! ChainUpdates: model invalid behavior

cf4caf2


          ChainSync client test: correct docs of ChainSyncClientSetup

5d88f23

The `serverUpdates` only depend on the `securityParam`, and the `clientUpdates`
additionally depend on `serverUpdates` due to `removeLateClientUpdates`.


          fixup! ChainSync client test: allow invalid server behavior

b2d5458

amesgen mentioned this pull request

ChainUpdates: model bad peer behavior IntersectMBO/ouroboros-consensus#492

Draft

Member Author

amesgen commented Nov 6, 2023

Superseded by IntersectMBO/ouroboros-consensus#492

amesgen closed this

amesgen deleted the amesgen/CAD-4314-model-bad-peers branch

November 6, 2023 10:02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

consensus testing