Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trin is occasionally sending invalid/incomplete content #1610

Open
kdeme opened this issue Dec 20, 2024 · 0 comments
Open

Trin is occasionally sending invalid/incomplete content #1610

kdeme opened this issue Dec 20, 2024 · 0 comments

Comments

@kdeme
Copy link
Contributor

kdeme commented Dec 20, 2024

I noticed this issue when running fluffy and requesting (FindContent) a lot of block headers and bodies. Occasionally the block body send back fails validation. It fails already at the SSZ decoding step. The uTP socket went through proper connection closing (received the remote FIN).

Now this per se is perhaps not an actual issue. There might be some error cases where a uTP socket gets closed with an early FIN without having send all the bytes of the content. However what I am often seeing is that when validation fails a part of the content received is a valid part of the encoded block, but it is always the later part of the block body. A big chunk of bytes of the beginning of the content is missing.

Occasionally I also see the validation fail because 0 data bytes got send. Curious to hear when that would occur, but this is probably more of an normal error case where a socket got closed early.

I have seen it only so far with block bodies, but I am only testing block headers & block bodies. And headers don't require uTP connection, so perhaps it will occur for any data requested that is bigger than the discv5 payload size.

Manually requesting (JSON-RPC FindContent) the same content again from the same node usually works (= valid data) so it is an issue that happens rarely. I have however been able to reproduce this by retrying it semi-manually in a loop.

Typical error logs from requesting fluffy node:

WRN 2024-12-20 17:50:22.843+01:00 Validation of block body failed            topics="portal_hist" error="SSZ PortalBlockBodyLegacy: object dynamic portion starts at invalid offset" node=enr:-LS4QCJ4igzi-8VZzG7fuwwzzbab3u5kgDo2XlpiyBdzhSgsdxpnIJo1HjD3jVydTpBOyhJHvQTPHXau9YnykxZPjCuEZ1tVrWOqdCBjOTNlMTI5ZGNmYTAwNmM1YjQwZjY3M2I3Yzk4OGU5MjNkZGViNmI2gmlkgnY0gmlwhI_0qIWJc2VjcDI1NmsxoQIIu0viHReNIKnjhuSCXDMcC6TLKEDF5oMmqogiPISN4oN1ZHCCIzE contentKey=0x010e8434574dd477d537a77007c7607715ae73fff4447347abd320b8255fd548f5

And the case of data bytes send = 0:

WRN 2024-12-20 17:50:16.730+01:00 Validation of block body failed            topics="portal_hist" error="SSZ PortalBlockBodyLegacy: input of insufficient size" node=enr:-LS4QKBFkP1FnZr7sdzj-bicU6KhWCtpRby6leiRTjO8MgNmOM2mzRLvC9okXvI57OP69SjI9DTGEAzKj6w2VPIkuKuEZ1tWemOqdCBjOTNlMTI5ZGNmYTAwNmM1YjQwZjY3M2I3Yzk4OGU5MjNkZGViNmI2gmlkgnY0gmlwhLKAMrKJc2VjcDI1NmsxoQIjF3KPpXlg6aJ10rzOCiBGTMIkhuEZAGssSi8VlhQ-vIN1ZHCCIzE contentKey=0x01be29a407ad4522d25366cc2e7913bb6005c965fca007af7e6f6773fe29a7445e

I think it is also reproducible with a Trin node as requesting node, which makes me deduce that the issue is on the content sender side (hence the creation of this issue). And I've only seen it occur so far with Trin nodes as sender of the content. But I am not 100% sure as it depends on what the error message I am seeing actually mean (see logs below).

To reproduce:

# Run trin, I used v0.1.0
RUST_LOG=info cargo run -- --web3-http-address http://127.0.0.1:8547 --web3-transport http --discovery-port 9010 --bootnodes default --mb 200 --no-stun

Run:

while true; do curl -s -X POST -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":"1","method":"portal_historyFindContent","params":["enr:-LS4QKsj-3uj6kWH0rLl8-LOf3eeI7o6TkGNk5OMXecV4ETwY0u-F8JqCfAD1P6OusOFZoENox95jBmOvXx5dIK5Fn2EZ1tWG2OqdCBjOTNlMTI5ZGNmYTAwNmM1YjQwZjY3M2I3Yzk4OGU5MjNkZGViNmI2gmlkgnY0gmlwhIDHliOJc2VjcDI1NmsxoQOYfB94zD_LtTgL22BsteJJUnzT8mEQ9BxV3ASaXXuhcYN1ZHCCIzE", "0x017ec54d5714a58002c993856e0d952452fe39d494808647fbdf3ebcf6e3545b9c"]}' http://localhost:8547 | jq ; sleep 2; done

It might take a while but eventually you will hit a response which will take much longer and look like:

{
  "jsonrpc": "2.0",
  "id": "1",
  "error": {
    "code": -32099,
    "message": "FindContent request timeout: FailedValidation(\"Network: History, Reason: Content validation failed for content key BlockBody(BlockBodyKey { block_hash: [126, 197, 77, 87, 20, 165, 128, 2, 201, 147, 133, 110, 13, 149, 36, 82, 254, 57, 212, 148, 128, 134, 71, 251, 223, 62, 188, 246, 227, 84, 91, 156] }) with error: Block Body content has invalid encoding: BytesInvalid(\\\"Invalid block body ssz bytes\\\")\")"
  }
}

or

{
  "jsonrpc": "2.0",
  "id": "1",
  "error": {
    "code": -32099,
    "message": "Failed to parse error message from history subnetwork: Error(\"expected value\", line: 1, column: 1)"
  }
}

I'm not sure which/if one of those relates to the error case of the beginning of the content that is missing, but I assuming here that one of the two does, hence the creation of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant