-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content that is available in the network "not found" depending on network key / node id #1596
Comments
Here is another one:
ENRs that fail but have the content:
|
Initially, I wasn't able to reproduce the issue, but now I am. And I think this is important fact for this issue. After some debugging, I think I know what the problem is. The receiving client has his kbuckets full, and can't store requesting Enr in them. While processing the request, later at some point we only have NodeId and we try to find Enr. We look for it at various places (first one being kbuckets). From what I observed, it's possible that we find stale Enr (e.g. wrong ip_address or port) and we try to send data to it. We had similar problems in the past and it seems that they are not fully fixed (one related pr: #935 ). I'm going to create PR that adds extra debug logs to confirm my suspicion. |
I don't really understand why you need the ENR. Is that some specific discv5 module or utp-over-discv5 module interface thing? Because in theory you don't really need that ENR to be able to respond,. As long as you have the |
I believe that we get IP address+port from Enr. |
Yes, but those you will have from the request and thus in theory can pass along for the response & uTP setup. So I assume it is a module specific interface design that makes you require this. |
@morph-dev pointed out that the issue seems to be occuring when for the same After looking back at my original tests I did notice the following:
So this definitely looks like a plausible case for why it would fail. As with the current design in Trin, where it needs the ENR data instead of re-using the IP/Port from the original requests, this would occasionally indeed hit this issue. Note that this is a valid case and would/will occasionally cause NAT'ed peers without proper network setup to hit this. I did some new tests:
|
This is an issue that I first encountered when running a Fluffy node, but the same is reproducible with a Trin node.
The issue:
For some content key(s) there are deterministic failures when doing a
RecursiveFindContent
from a node with a specificNodeId
/ network key. This traces back to the node(s) that store the content failing in setting up a uTP connection to retrieve the data.The node that is looking for the content does end up requesting it to the node that has the content.
When then directly targeting the node that has the content with a
FindContent
request, this will fail.Changing network key and running the same call will most likely work (unless perhaps you hit by chance another NodeId or NodeId range that does not work? I'm not sure about this exactly as I don't understand what causes this in the first place).
How to reproduce the issue
Here is a way to reproduce this for one specific content key / content id combo.
cargo build
cargo run -- --web3-transport http --unsafe-private-key "0x24b336eb9522dbe3afa8f31a2ec277333538ff211480d32633c45c29477bebe8"
FindContent
method:curl -s -X POST -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":"1","method":"portal_historyFindContent","params":["enr:-LS4QJi4_7PnH3BsKPej9J8--A1AWczIYojJIU_NWdWCIgFbflGJ4f4x35InkIMyQuBdz_XCaNx44V2X1O-dw_zprFKEZyImyGOqdCA2ZmQ0NzRmYWQ2MjNhNWY2MzVmOTFiOWVmZGE2MjI3NmJiYjE2MmZkgmlkgnY0gmlwhI_0qIWJc2VjcDI1NmsxoQIIu0viHReNIKnjhuSCXDMcC6TLKEDF5oMmqogiPISN4oN1ZHCCIzE", "0x018a24f51c42f5c1e216351c6c2ab29d2ae25fc4f366ea690a4e13c640844412e7"]}' http://localhost:8545 | jq
For Trin this will result in this, not so great, error message:
But if you look at the Trin node logs, there is also this error:
cargo run -- --web3-transport http
curl -s -X POST -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":"1","method":"portal_historyFindContent","params":["enr:-LS4QJi4_7PnH3BsKPej9J8--A1AWczIYojJIU_NWdWCIgFbflGJ4f4x35InkIMyQuBdz_XCaNx44V2X1O-dw_zprFKEZyImyGOqdCA2ZmQ0NzRmYWQ2MjNhNWY2MzVmOTFiOWVmZGE2MjI3NmJiYjE2MmZkgmlkgnY0gmlwhI_0qIWJc2VjcDI1NmsxoQIIu0viHReNIKnjhuSCXDMcC6TLKEDF5oMmqogiPISN4oN1ZHCCIzE", "0x018a24f51c42f5c1e216351c6c2ab29d2ae25fc4f366ea690a4e13c640844412e7"]}' http://localhost:8545 | jq
It will properly download and show the block body.
As mentioned, the issue seems to be rather on the other end that has the content. The reason I'm creating an issue here on the Trin repo is because so far the nodes that I've seen fail on this were Trin nodes. And on the requesting side it is reproducible with both Trin and Fluffy nodes.
fyi: I used tag:
v0.1.0
to build Trin fromLinking to the fluffy issue that I created here first: status-im/nimbus-eth1#2901
The text was updated successfully, but these errors were encountered: