Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename portal_*RecursiveFindContent to portal_*GetContent #344

Merged
merged 9 commits into from
Oct 14, 2024

Conversation

bhartnett
Copy link
Contributor

@bhartnett bhartnett commented Oct 1, 2024

I recently started working on building a portal rpc client library to be used in Nimbus to fetch the history network data, headers, bodies and receipts from portal in preparation for EIP-4444. When implementing the code I realized that as a user of the portal RPC API it would be nice to have a higher level endpoint that is able to lookup the content in the local database first before trying to fetch the content from the network.

Without this new endpoint I had to first call portal_historyLocalContent, and then if a not found error was returned then call portal_historyRecursiveFindContent to look up the content over the network. This works but doesn't store the fetched content in the local database unless I also call portal_historyStore after getting the content from the network lookup. Storing the content in the local database after fetching from the network is a good idea so that a second call for the same content can simply get the data from the local database (assuming that the data is within the node's radius).

Having to do multiple RPC calls isn't ideal and it would be helpful to have an endpoint that does the following:

  1. Fetch the content from the local database and return it if it exists.
  2. Otherwise fetch the content from the network using recursive find content.
  3. If the content is found, validate the content before storing in the local database then return the content.

This PR introduces portal_historyGetContent, portal_stateGetContent and portal_beaconGetContent RPC endpoints which should behave as described above.

@morph-dev
Copy link
Contributor

Just want to point out that there was recent discussion about portal_historyRecursiveFindContent query and that it seems that different clients behave differently. My understanding is that "trin" first queries local database, and only then tries to get data from the network, while "fluffy" skips local lookup.

Might be good opportunity to clarify behavior of that endpoint as well.

"$ref": "#/components/contentDescriptors/ContentKey"
}
],
"result": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will also require a ContentNotFoundError, see PR: #343.

Copy link
Contributor Author

@bhartnett bhartnett Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add once that PR is merged.

},
{
"name": "portal_beaconGetContent",
"summary": "Get content from the local database if it exists, otherwise look up the target content key in the network.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we need to somewhere explicitly specify that the content needs to get verified by the RPC server and throw an error if verification fails.

Else we will end up with differences between clients like we have now for the portal_*RecursiveFindContent method (which as @morph-dev mentioned, we should clearify also, but perhaps in another PR)

Copy link
Contributor Author

@bhartnett bhartnett Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll add more detail to the description in that case.

@bhartnett
Copy link
Contributor Author

Just want to point out that there was recent discussion about portal_historyRecursiveFindContent query and that it seems that different clients behave differently. My understanding is that "trin" first queries local database, and only then tries to get data from the network, while "fluffy" skips local lookup.

Might be good opportunity to clarify behavior of that endpoint as well.

Yes I agree, it's something that should be discussed and resolved if possible. My first impression from looking through the current JSON-RPC spec is that it isn't very clear or detailed about the expected behavior having only a single line description. For the portal_historyRecursiveFindContent query @kdeme 's point was that looking up the content from the database isn't according to spec and I would agree based on my reading of the description for that endpoint.

My thoughts are that it might be useful to have portal_historyRecursiveFindContent just do the network lookup which can be used for debugging or other specific use cases and portal_historyGetContent could be the 'higher level' query that avoids having to do multiple RPC calls when the data needs to be queried from the database or the network.

@pipermerriam
Copy link
Member

It will be meaningful to expand the hive test coverage asap to cover these JSON-RPC endpoint semantics so that our clients behave the same, otherwise we're heading down the same path as the original batch of execution clients which all had different behavior and even different payloads for a long time.

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the actual specifics of implementation, but I support the overall direction this takes. Moving trin's current behavior towards this API seems like the right move.

I also agree on adding specific language about which endpoints should validate data would be a big improvement as well.

@bhartnett
Copy link
Contributor Author

bhartnett commented Oct 2, 2024

It will be meaningful to expand the hive test coverage asap to cover these JSON-RPC endpoint semantics so that our clients behave the same, otherwise we're heading down the same path as the original batch of execution clients which all had different behavior and even different payloads for a long time.

Yes it would be good to have hive verify the behavior of the portal JSON-RPC API in clients. Probably a good starting point would be to expand the spec description of all methods where there is any ambiguity so that all teams clearly understand the expected behavior. That can be done in a separate PR.

@acolytec3
Copy link
Contributor

Just want to point out that there was recent discussion about portal_historyRecursiveFindContent query and that it seems that different clients behave differently. My understanding is that "trin" first queries local database, and only then tries to get data from the network, while "fluffy" skips local lookup.

Might be good opportunity to clarify behavior of that endpoint as well.

I can confirm that ultralight currently queries the local DB first before going to the network for this.

@KolbyML
Copy link
Member

KolbyML commented Oct 2, 2024

I didn't review the actual specifics of implementation, but I support the overall direction this takes. Moving trin's current behavior towards this API seems like the right move.

I also agree on adding specific language about which endpoints should validate data would be a big improvement as well.

portal_historyRecursiveFindContent already does this in Trin. We should just write a Hive test to ensure all clients check locally first.

^ Because of this, I don't think we should add portal_*GetContent, and instead, we should better define portal_historyRecursiveFindContent instead of making 2 duplicate RPC methods. I don't think these methods are distinct I think this is just a situation of us missing a test case on Hive. If we want to rename portal_historyRecursiveFindContent I am open to it, but other then that this is just a sign we need more documentation/testing

@KolbyML
Copy link
Member

KolbyML commented Oct 2, 2024

@kdeme @pipermerriam @bhartnett ethereum/hive#1170 here is a test to make sure all implementations of portal_*RecursiveFindContent checks locally before initiating the RFC.

If you could review it that would be nice as well

Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portal_*RecursiveFindContent is supposed to do what this PR proposes. This situation was caused by us not having test coverage for this in hive. So certain implementations had different outcomes. To rectify the situation ethereum/hive#1170 I have written a new Hive test to ensure all Portal Clients match the expected behavior of checking their local DB first before initiating a RFC when the portal_*RecursiveFindContent
method is called.

@KolbyML
Copy link
Member

KolbyML commented Oct 2, 2024

I will also prioritize getting negative cases for History done (so validation is tested). Every Portal JSON-RPC method should validate the data it fetches from the network. An error should be thrown if your Portal Client can't validate something retrieved from the network. We currently do this in Trin

I think this assumption is required for any JSON-RPC method that retrieves information from the network. So if there is any ambiguity in this, more tests should be written (I have been planning to do it, I just didn't think it was a priority till now) to prove validation is being done.

Also we should update our documentation to make this requirement more obvious. If the data isn't validated it might as well be considered malicious.

@pipermerriam
Copy link
Member

I don't feel strongly about this but I'll explain my reasoning for my loose support for this endpoint.

The portal_historyRecursiveFindContent directly correlates to the RFC functionality and it somewhat implies that it is getting the content from the network. It's good UX to give it back from the database... however this behavior could be considered to be surprising. I think of portal_historyRecursiveFindContent as a low level API.

The proposed endpoint is obviously a high level API, and having it handle the negotiation of how and where it gets the content is less surprising.

This allows for an endpoint with good UX (the getContent one), and an endpoint that is low level and good for either low level use cases or simply for testing...

I'm not sure these reasons should be considered compelling, but from a purely architectural and design perspective, the proposed solution seems clean. Kolby's direction of just getting the tests to enforce the current trin behavior are simple and easy and pragmatic. That's the extent to which I'll weigh in here. No need to prove me wrong as I'm not sure I'm correct.

@KolbyML
Copy link
Member

KolbyML commented Oct 3, 2024

I don't feel strongly about this but I'll explain my reasoning for my loose support for this endpoint.

The portal_historyRecursiveFindContent directly correlates to the RFC functionality and it somewhat implies that it is getting the content from the network. It's good UX to give it back from the database... however this behavior could be considered to be surprising. I think of portal_historyRecursiveFindContent as a low level API.

The proposed endpoint is obviously a high level API, and having it handle the negotiation of how and where it gets the content is less surprising.

This allows for an endpoint with good UX (the getContent one), and an endpoint that is low level and good for either low level use cases or simply for testing...

I'm not sure these reasons should be considered compelling, but from a purely architectural and design perspective, the proposed solution seems clean. Kolby's direction of just getting the tests to enforce the current trin behavior are simple and easy and pragmatic. That's the extent to which I'll weigh in here. No need to prove me wrong as I'm not sure I'm correct.

I don't see the value in a low-level portal_historyRecursiveFindContent, which only does an RFC and nothing else, not even validation. The RFC mechanism itself is implementation-specific, so there isn't much to test there, and hence there is no value in exclusively exposing this. If users want lower-level access, they won't use the JSON-RPC because it is slow to begin with (a lot of encoding/decoding). They will build on top of a Portal implementation itself, basically building their software as an extension. Portal's JSON-RPC isn't meant to be a low-level interface; it is meant to be functional for users and also nice to use to test if implementations are specification compliant.

This allows for an endpoint with good UX (the getContent one), and an endpoint that is low level and good for either low level use cases or simply for testing...

There would be no advantage for testing by having both portal_*GetContent and portal_*RecursiveFindContent. The proposed changes do not make it possible to test more edge cases of the specification, so the amount of specification details we could test would remain the same. There wouldn't be a performance difference, really, as well, and validation would be required regardless (for reasons said in my earlier posts). So the only distinction would be checking storage first and storing the content if found; the RFC is the bottleneck, so removing checking storage and storing the content if found wouldn't really be an optimization. So I don't see a reason for having 2 different methods. If anything, checking storage and storing the content if found is a performance optimization

Currently, in Trin, portal_historyRecursiveFindContent implements everything this PR outlines. If the name portal_*RecursiveFindContent is non-descriptive and hence the name is bad UX, we can rename it to portal_*GetContent. I don't think the "low-level" stripped-down portal_*RecursiveFindContent use case exists in any real manner. As I stated earlier the outline is already the expectation of portal_*RecursiveFindContent in Trin and should be for the rest of the clients as well. I will admit we are missing some testing in Hive in that regard, and that was an oversight by me (I didn't think of these edge cases well writing the tests)

TL:DR if the name portal_*RecursiveFindContent is bad UX, we can rename it to portal_*GetContent. I can't think of anything we gain by making an rpc call which only does RecursiveFindContent. We must validate the content regardless. Having a RPC method which only does RFC wouldn't help with testing the specification or have any meaningful performance impacts. If someone is concerned with performance they won't use the JSON-RPC in the first place and instead opt to build an extension ontop of the Portal Client implementation bypassing the JSON-RPC althougher.

@bhartnett bhartnett requested a review from kdeme October 3, 2024 03:39
@bhartnett
Copy link
Contributor Author

I didn't review the actual specifics of implementation, but I support the overall direction this takes. Moving trin's current behavior towards this API seems like the right move.
I also agree on adding specific language about which endpoints should validate data would be a big improvement as well.

portal_historyRecursiveFindContent already does this in Trin. We should just write a Hive test to ensure all clients check locally first.

^ Because of this, I don't think we should add portal_*GetContent, and instead, we should better define portal_historyRecursiveFindContent instead of making 2 duplicate RPC methods. I don't think these methods are distinct I think this is just a situation of us missing a test case on Hive. If we want to rename portal_historyRecursiveFindContent I am open to it, but other then that this is just a sign we need more documentation/testing

I would argue that the spec should be updated to make the expected behavior clear and hive testing is secondary to this but of course still required.

@bhartnett
Copy link
Contributor Author

I don't feel strongly about this but I'll explain my reasoning for my loose support for this endpoint.
The portal_historyRecursiveFindContent directly correlates to the RFC functionality and it somewhat implies that it is getting the content from the network. It's good UX to give it back from the database... however this behavior could be considered to be surprising. I think of portal_historyRecursiveFindContent as a low level API.
The proposed endpoint is obviously a high level API, and having it handle the negotiation of how and where it gets the content is less surprising.
This allows for an endpoint with good UX (the getContent one), and an endpoint that is low level and good for either low level use cases or simply for testing...
I'm not sure these reasons should be considered compelling, but from a purely architectural and design perspective, the proposed solution seems clean. Kolby's direction of just getting the tests to enforce the current trin behavior are simple and easy and pragmatic. That's the extent to which I'll weigh in here. No need to prove me wrong as I'm not sure I'm correct.

I don't see the value in a low-level portal_historyRecursiveFindContent, which only does an RFC and nothing else, not even validation. The RFC mechanism itself is implementation-specific, so there isn't much to test there, and hence there is no value in exclusively exposing this. If users want lower-level access, they won't use the JSON-RPC because it is slow to begin with (a lot of encoding/decoding). They will build on top of a Portal implementation itself, basically building their software as an extension. Portal's JSON-RPC isn't meant to be a low-level interface; it is meant to be functional for users and also nice to use to test if implementations are specification compliant.

This allows for an endpoint with good UX (the getContent one), and an endpoint that is low level and good for either low level use cases or simply for testing...

There would be no advantage for testing by having both portal_*GetContent and portal_*RecursiveFindContent. The proposed changes do not make it possible to test more edge cases of the specification, so the amount of specification details we could test would remain the same. There wouldn't be a performance difference, really, as well, and validation would be required regardless (for reasons said in my earlier posts). So the only distinction would be checking storage first and storing the content if found; the RFC is the bottleneck, so removing checking storage and storing the content if found wouldn't really be an optimization. So I don't see a reason for having 2 different methods. If anything, checking storage and storing the content if found is a performance optimization

Currently, in Trin, portal_historyRecursiveFindContent implements everything this PR outlines. If the name portal_*RecursiveFindContent is non-descriptive and hence the name is bad UX, we can rename it to portal_*GetContent. I don't think the "low-level" stripped-down portal_*RecursiveFindContent use case exists in any real manner. As I stated earlier the outline is already the expectation of portal_*RecursiveFindContent in Trin and should be for the rest of the clients as well. I will admit we are missing some testing in Hive in that regard, and that was an oversight by me (I didn't think of these edge cases well writing the tests)

TL:DR if the name portal_*RecursiveFindContent is bad UX, we can rename it to portal_*GetContent. I can't think of anything we gain by making an rpc call which only does RecursiveFindContent. We must validate the content regardless. Having a RPC method which only does RFC wouldn't help with testing the specification or have any meaningful performance impacts. If someone is concerned with performance they won't use the JSON-RPC in the first place and instead opt to build an extension ontop of the Portal Client implementation bypassing the JSON-RPC althougher.

There was some discussion within the Nimbus team about possible use cases of the portal JSON-RPC API for fetching history data for EIP-4444. The two possible scenarios/use cases were raised that an execution client such as Nimbus might need:

  1. The EL uses the portal history data as a part of the sync process as a replacement for the devp2p eth/68 protocol in which case we want to download data over the network because we don't have it stored locally and we don't want to validate it in Fluffy because the EL will do the full validation as apart of the sync. The lower level implementation of portal_historyRecursiveFindContent without validation makes sense for this use case perhaps.
  2. After implementing EIP-4444 the EL will drop history data older than a year in which case if we want to look up a block older than a year such as for an RPC call then we want to fetch the data from portal but it should be validated because in this case it replaces a database lookup which is normally a trusted source and therefore the portal lookup should have similar trust assumptions and be validated too. For this scenario we would prefer the portal_historyGetContent endpoint.

Looking through the spec it appears that we already have many endpoints that could be considered low level such as portal_*Store, portal_*LocalContent for local database access so I'm not sure why we wouldn't want the same for the network/DHT.

I'm also curious about the expected or current behavior of the other endpoints in Trin and other clients. For portal_*Offer and portal_*Gossip do you store in the local database before sending to the peer/s?

@KolbyML
Copy link
Member

KolbyML commented Oct 3, 2024

There was some discussion within the Nimbus team about possible use cases of the portal JSON-RPC API for fetching history data for EIP-4444. The two possible scenarios/use cases were raised that an execution client such as Nimbus might need:

  1. The EL uses the portal history data as a part of the sync process as a replacement for the devp2p eth/68 protocol in which case we want to download data over the network because we don't have it stored locally and we don't want to validate it in Fluffy because the EL will do the full validation as apart of the sync. The lower level implementation of portal_historyRecursiveFindContent without validation makes sense for this use case perhaps.
  2. After implementing EIP-4444 the EL will drop history data older than a year in which case if we want to look up a block older than a year such as for an RPC call then we want to fetch the data from portal but it should be validated because in this case it replaces a database lookup which is normally a trusted source and therefore the portal lookup should have similar trust assumptions and be validated too. For this scenario we would prefer the portal_historyGetContent endpoint.

I would argue that if Nimbus EL was using Portal they wouldn't use the JSON-RPC API to begin with to fetch history. The encoding/decoding through the JSON-RPC would kill performance by itself. If anything, they would be using a channel directly, and Fluffy is just another running service.

So, both of the use cases you listed wouldn't be done through the Portal's JSON-RPC to begin with. Nimbus EL would send messages directly to Fluffy's service removing the overhead of the JSON-RPC. So I am not sure why Portal's JSON-RPC is being conflated with use cases where it doesn't make sense to use the JSON-RPC in the first place and instead opting for a more direct connection where Nimbus would have finer control over what fluffy does and the messages sent.

Maybe for a demo? But I don't think we should write our JSON-RPC specification for power users who won't use the JSON-RPC long-term anyway, if at all.

Also, if we wanted to turn validation off, that would be a method param. I don't see why it would have to be its own method.

Looking through the spec it appears that we already have many endpoints that could be considered low level such as portal_*Store, portal_*LocalContent for local database access so I'm not sure why we wouldn't want the same for the network/DHT.

I see portal_*Store and portal_*LocalContent as self-node maintenance endpoints

Use cases of portal_*Store,

  • seeding a database
  • setting up tests

Without this functionality it is extremely hard to do either of those.

For portal_*LocalContent

  • used to check if your client holds the value
  • checking to see if implementations store the expected objects

The way this PR defines portal_*GetContent is identical to how portal_*RecursiveFindContent works in Trin today.

RecursiveFindContent is implementation specific. portal_*Store and portal_*LocalContent are not implementation specific.

I don't think this is about exposing low-level functionality portal_*GetContent and portal_*RecursiveFindContent are for fetching content from the network if you are fetching content from the network any implementation worth it's salt would make obvious performance improvements and security guarantees

  • check the local database first because RFC is very expensive in comparison
  • validate the data
  • store the data if it fits the storage criteria

Having an RPC endpoint that exposed a raw RFC wouldn't help test anything that portal_*GetContent/portal_*RecursiveFindContent couldn't already test.

As I said earlier, it doesn't make sense for Nimbus EL to use a high-level connection such as the JSON-RPC; instead, they would opt for a lower-level channel or pipe, or however they call it in nim. Where they would have finer control over what is being done. I don't think the JSON-RPC makes sense for the Execution Client use case. Intensive web3 applications and rollups often build on top of ELs (not using the JSON-RPC) for a reason.

I'm also curious about the expected or current behavior of the other endpoints in Trin and other clients. For portal_*Offer and portal_*Gossip do you store in the local database before sending to the peer/s?

no, as every client that calls portal_*Offer and portal_*Gossip are bridges that don't really want to store it, they already have access to all of it or they wouldn't be able to gossip it in the first place.

So realistically, I don't see why Nimbus would use the JSON-RPC instead of a channel. If it is short-term for a proof of concept, then it doesn't matter if Fluffy validates the content, as they will end up replacing the JSON-RPC with a channel regardless.

@kdeme
Copy link
Collaborator

kdeme commented Oct 3, 2024

Here is my take on this by replying on many of the comments, but if that is a bit much to read, a TLDR:

  1. I am fine with using the portal_*RecursiveFindContent implementation for this as long as specs are clarified and currently no testing is reliant on this. Meaning: no testing would be affected by not testing the network path because it incidentally is already holding the data locally.
  2. I am also fine / in favour of renaming portal_*RecursiveFindContent to portal_*getContent in case of point 1.
  3. We need to clear out and explicitly specify which API methods do validation and where. You might say all of them, but perhaps for some of the lower level ones there is good reason not too, e.g. no outgoing path validation for offer for testing with invalid data?
  4. No need for some specific change to the API only for Nimbus integration needs.

Then:

portal_*RecursiveFindContent is supposed to do what this PR proposes. This situation was caused by us not having test coverage for this in hive.

This is your interpretation of the specification. Mine is different, and from the naming of the method I directly concluded that it should not look locally but only on the network.
The situation is foremost caused by the specification not being explicit about this (not blaming anyone here, I wrote part of that json-rpc spec). Tests would be very useful to discover this difference between clients.

The portal_historyRecursiveFindContent directly correlates to the RFC functionality and it somewhat implies that it is getting the content from the network. It's good UX to give it back from the database... however this behavior could be considered to be surprising. I think of portal_historyRecursiveFindContent as a low level API.

Completely agree with this, and this is also why it is implemented in Fluffy as it is (not the validation part, this is more obvious that it needs to be added for this specific method, but also not explicitly specified).

don't see the value in a low-level portal_historyRecursiveFindContent

I am not sure if there currently is one either.
One thing that we have to think of when testing with the version that looks in local db first is that if you want to specifically test (and typically that is what you want to test with this call) the actual lookup over the p2p network, then you have to be careful that this node did not receive the data already through some other means (e.g. local storage API, or getting it gossiped). Else the lookup over the network would never happen, it would just grab it locally and the test would pass without testing what you want. I think for 1-to-1 tests this is typically not an issue. But in more complex setups where you want to test with several nodes, this might become an issue (e.g. POKE activating, recursive gossip occurring, etc.).

But, because I'm not in favour of adding code/specs that are currently not used, I don't hold this as a reason to keep the lower-level call. It can always be re-added when it turns out to be needed.

if the name portal_*RecursiveFindContent is bad UX, we can rename it to portal_*GetContent.

Yes, I was going to suggest the same naming change. As this method is something that and end users might/will end up using, I think a more simple name like that is valuable.

In general, I think at a certain moment a bigger effort will need to be done to move certain low level API to a portal_debug_ namespace.

The encoding/decoding through the JSON-RPC would kill performance by itself.

It is true that this is less efficient, yet to this day every full node setup communicates via JSON-RPC (engine_api) between nodes. So there must be some value to having an API there with good UX.

Maybe for a demo? But I don't think we should write our JSON-RPC specification for power users who won't use the JSON-RPC long-term anyway, if at all.

Sort of, but also not really. We mostly want to avoid doing too much integration work without knowing whether this is something users want to use in an integrated setup. And at the same time, the JSON-RPC version would still be useful when users want to use another Portal client, or when Fluffy gets used with other EL clients.
In general it is also good to have a higher level Portal API with good UX / easy to understand descriptions.

RecursiveFindContent is implementation specific.

I don't fully agree with this. While it is implementation specific, its result should not be. That is, you should be able to test with this call whether a node can actually find a piece of content that is somewhere available on the network. That is something that can be tested for all clients. But yes, this can be done with the current call.

Also, if we wanted to turn validation off, that would be a method param. I don't see why it would have to be its own method.

I'm not saying that we need to do specific adjustments to the JSON-RPC regarding validation because Nimbus might want this. If Nimbus wants this we can add some non specified methods first to fluffy, and if it turns out to be useful for other clients, eventually this would end up in the specification. As long as that is not proven, it should not get added / affect other clients.

But regarding validation, we do need to specify this better in the specifications for the current API. It is mostly "obvious" and implied for the "higher" level methods. But I wouldn't say this for all.

The actual reason why Fluffy does not have this is because it was easier to test with bogus data in the early days, especially for those lower level calls like FindContent and Offer. Of course, the project has evolved and some of that lower level functionality is not needed any more and tests evolve to basically test more than just the data transfers. So naturally the JSON-RPC API can also evolve.

I am fine with that, but it should be explicitly stated in the specification, especially for the lower level calls.
And it should also be clearly stated whether this is about only incoming data or also outgoing data.
For example, I could imagine an offer (or perhaps even a gossip) call where the data does not get validating by the client that sends out the call, in order to be able to test whether a bogus data gets properly validated on the receiving node or not.

@KolbyML
Copy link
Member

KolbyML commented Oct 3, 2024

Here is my take on this by replying on many of the comments, but if that is a bit much to read, a TLDR:

  1. I am fine with using the portal_*RecursiveFindContent implementation for this as long as specs are clarified and currently no testing is reliant on this. Meaning: no testing would be affected by not testing the network path because it incidentally is already holding the data locally.

I am confused what this means. I have nothing against better documenting how JSON-RPC methods should behave

  1. I am also fine / in favour of renaming portal_*RecursiveFindContent to portal_*getContent in case of point 1.

I would be fine with this as well

  1. We need to clear out and explicitly specify which API methods do validation and where. You might say all of them, but perhaps for some of the lower level ones there is good reason not too, e.g. no outgoing path validation for offer for testing with invalid data?

I get what you mean I agree documenting which JSON-RPC methods are supposed to validate data is important and something we are missing

  1. No need for some specific change to the API only for Nimbus integration needs.

I assume each Portal client will have distinct integration methods mostly likely dependent on which programming language is used

Then:

portal_*RecursiveFindContent is supposed to do what this PR proposes. This situation was caused by us not having test coverage for this in hive.

This is your interpretation of the specification. Mine is different, and from the naming of the method I directly concluded that it should not look locally but only on the network. The situation is foremost caused by the specification not being explicit about this (not blaming anyone here, I wrote part of that json-rpc spec). Tests would be very useful to discover this difference between clients.

I think this point is very valid and I agree we should better document the JSON-RPC

don't see the value in a low-level portal_historyRecursiveFindContent

I am not sure if there currently is one either. One thing that we have to think of when testing with the version that looks in local db first is that if you want to specifically test (and typically that is what you want to test with this call) the actual lookup over the p2p network, then you have to be careful that this node did not receive the data already through some other means (e.g. local storage API, or getting it gossiped). Else the lookup over the network would never happen, it would just grab it locally and the test would pass without testing what you want. I think for 1-to-1 tests this is typically not an issue. But in more complex setups where you want to test with several nodes, this might become an issue (e.g. POKE activating, recursive gossip occurring, etc.).

But, because I'm not in favour of adding code/specs that are currently not used, I don't hold this as a reason to keep the lower-level call. It can always be re-added when it turns out to be needed.

👍

if the name portal_*RecursiveFindContent is bad UX, we can rename it to portal_*GetContent.

Yes, I was going to suggest the same naming change. As this method is something that and end users might/will end up using, I think a more simple name like that is valuable.

👍

In general, I think at a certain moment a bigger effort will need to be done to move certain low level API to a portal_debug_ namespace.

That will probably be a future discussion but it will be interesting what ends up happening

The encoding/decoding through the JSON-RPC would kill performance by itself.

It is true that this is less efficient, yet to this day every full node setup communicates via JSON-RPC (engine_api) between nodes. So there must be some value to having an API there with good UX.

full node setup communicates via JSON-RPC (engine_api) work to phase this out to use a binary ssz protocol is being actively discussed. This is due to more performance-critical proposals being discussed. This transition from JSON-RPC to binary ssz is proposed to be the first SSZ component on the execution layer. It is heavily being pushed due to latency in transferring proofs.

Maybe for a demo? But I don't think we should write our JSON-RPC specification for power users who won't use the JSON-RPC long-term anyway, if at all.

Sort of, but also not really. We mostly want to avoid doing too much integration work without knowing whether this is something users want to use in an integrated setup. And at the same time, the JSON-RPC version would still be useful when users want to use another Portal client, or when Fluffy gets used with other EL clients. In general it is also good to have a higher level Portal API with good UX / easy to understand descriptions.

I think execution clients will only opt to use Portal Implementations implemented in their native language. So I see Fluffy being used with non-nim EL's as hypothetical. users want to use another Portal client If a user wants to use Portal and they aren't a EL so they are using the JSON-RPC then of course they would use the JSON-RPC I don't see how this helps the cause of portal_*RecursiveFindContent without validation.

If other EL's were to integrate Fluffy for performance they wouldn't use the JSON-RPC, they would use FFI bindings then interface with Fluffy directly skipping the overhead. In this case I don't see why we would disable validation on portal_*RecursiveFindContent as they wouldn't be dependent on the JSON-RPC long term. I don't think EL clients are the user's of the Portal JSON-RPC higher level users are. EL clients are lower level users so they will opt for a lower level connection.

We mostly want to avoid doing too much integration work without knowing whether this is something users want to use in an integrated setup.

This would be a demo/proof of concept then. It is a proof of concept I don't see how the delay of portal_*RecursiveFindContent validating the content is good is a blocker

In general it is also good to have a higher level Portal API with good UX / easy to understand descriptions.

I think everyone can say this is generally good. Good UX which is easy to understand is a win

RecursiveFindContent is implementation specific.

I don't fully agree with this. While it is implementation specific, its result should not be. That is, you should be able to test with this call whether a node can actually find a piece of content that is somewhere available on the network. That is something that can be tested for all clients. But yes, this can be done with the current call.

The example given is only true in ideal conditions. In practice network conditions are not ideal which is why clients will opt for optimized/smarter RFC implementations which can't be tested.

The only thing we test from raw RFC is that they can find the content in ideal conditions, we can't create a more complex hive test which takes advantage of implementation-specific details. I don't think we should standardize the RFC implementation as the diversity in implementations and strategies is good for the health of the network IMO

Also, if we wanted to turn validation off, that would be a method param. I don't see why it would have to be its own method.

I'm not saying that we need to do specific adjustments to the JSON-RPC regarding validation because Nimbus might want this. If Nimbus wants this we can add some non specified methods first to fluffy, and if it turns out to be useful for other clients, eventually this would end up in the specification. As long as that is not proven, it should not get added / affect other clients.

I think this is a reasonable approach and is commonly done by EL/CL teams

But regarding validation, we do need to specify this better in the specifications for the current API. It is mostly "obvious" and implied for the "higher" level methods. But I wouldn't say this for all.

The actual reason why Fluffy does not have this is because it was easier to test with bogus data in the early days, especially for those lower level calls like FindContent and Offer. Of course, the project has evolved and some of that lower level functionality is not needed any more and tests evolve to basically test more than just the data transfers. So naturally the JSON-RPC API can also evolve.

I am fine with that, but it should be explicitly stated in the specification, especially for the lower level calls. And it should also be clearly stated whether this is about only incoming data or also outgoing data. For example, I could imagine an offer (or perhaps even a gossip) call where the data does not get validating by the client that sends out the call, in order to be able to test whether a bogus data gets properly validated on the receiving node or not.

👍 I agree we need to better document our JSON-RPC specification

@kdeme
Copy link
Collaborator

kdeme commented Oct 3, 2024

I am confused what this means. I have nothing against better documenting how JSON-RPC methods should behave

I'm talking about a possible use case for the RecursiveFindContent call without local db lookup. It is perhaps not something useful in any of the tests today, or perhaps it is already occuring. If you want to test RecursiveFindContent, then you want to test the network lookup. Not the local database lookup. If you then set up the test and, unknowingly, data already arrives on the node that needs to do the lookup, then the test is pointless as it will directly find the data in the database.

In fluffy, the "native" RecursiveFindContent does not do a db lookup first. That is done a layer above. Perhaps this is different in Trin?

full node setup communicates via JSON-RPC (engine_api) work to phase this out to use a binary ssz protocol is being actively discussed.

I'm well aware of this. That was not my point. My point is that they didn't start with the SSZ protocol solution (which by the way is still a protocol solution, not fully integrated). That apparently was a step too much to take at once, for good reason?

So I see Fluffy being used with non-nim EL's as hypothetical. ``

I was just giving an example, of course its hypothetical, as is anything else in terms of integration, and I think having the (higher level good UX) json-rpc (or any other protocol for that matter, e.g. REST with binary/SSZ) is useful to have, even if alone for fast PoC'ing. But there is more, portal_bridges use it, its the first access to users interested in the network, etc.

(regarding nimbus: I would also prefer the integration to be native, but there are other reasons why in nimbus/fluffy we want to start with the json-rpc version as PoC, but they are Nim specific and I won't bore you with them)

I also didn't state that I didn't want validation on these calls, on the contrary. I state that it should be specified and that some of the lower level calls might benefit for testing purposes not to have always validation, see the example I gave.

@KolbyML
Copy link
Member

KolbyML commented Oct 3, 2024

I am confused what this means. I have nothing against better documenting how JSON-RPC methods should behave

I'm talking about a possible use case for the RecursiveFindContent call without local db lookup. It is perhaps not something useful in any of the tests today, or perhaps it is already occuring. If you want to test RecursiveFindContent, then you want to test the network lookup. Not the local database lookup. If you then set up the test and, unknowingly, data already arrives on the node that needs to do the lookup, then the test is pointless as it will directly find the data in the database.

I don't think this creates any meaningful challenges in testing. We can get around those potential edge cases fairly easily. In practice, I haven't faced a challenge like this yet.

In fluffy, the "native" RecursiveFindContent does not do a db lookup first. That is done a layer above. Perhaps this is different in Trin?

This is done a layer above on Trin as well

full node setup communicates via JSON-RPC (engine_api) work to phase this out to use a binary ssz protocol is being actively discussed.

I'm well aware of this. That was not my point. My point is that they didn't start with the SSZ protocol solution (which by the way is still a protocol solution, not fully integrated). That apparently was a step too much to take at once, for good reason?

I don't think a data availability solution is comparable to the complexity and security concerns of a consensus implementation. I agree we need better documentation, but I don't this PR proposes anything not already possible to do with the JSON-RPC. I think the current RPC specification (minus documentation) is sufficient for a PoC.

So I see Fluffy being used with non-nim EL's as hypothetical. ``

I was just giving an example, of course its hypothetical, as is anything else in terms of integration, and I think having the (higher level good UX) json-rpc (or any other protocol for that matter, e.g. REST with binary/SSZ) is useful to have, even if alone for fast PoC'ing. But there is more, portal_bridges use it, its the first access to users interested in the network, etc.

I want to specify that the functionality this PR proposes already exists in Trin, this is more a question of documenting expect behaviors and renaming JSON-RPC methods for better UX. Which I fully agree with.

I agree we should have a higher level good UX json-rpc. I don't think we disagree on this at all. I don't believe my complaints have ever been targeted at that.

my complaints were more that there is no distinction between portal_*GetContent and the portal_*RecursiveFindContent we have in Trin. If we added portal_*GetContent and reduce the scope portal_*RecursiveFindContent there wouldn't be a good reason to keep portal_*RecursiveFindContent.

So I think we should

  • rename portal_*RecursiveFindContent to portal_*GetContent
  • and better document all the JSON-RPC endpoints

I think we both agree on these 2 points which are the main important ones?

"trin_bridge" will eventually be built as an extension of Trin instead of using the JSON-RPC. Currently, we use custom Offer and Gossip endpoints for the bridge that isn't in the portal-network-specs as they only really have a use case in bridging, so eventually, we can remove the special endpoints when we integrate Trin into the bridge and stop using the JSON-RPC.

(regarding nimbus: I would also prefer the integration to be native, but there are other reasons why in nimbus/fluffy we want to start with the json-rpc version as PoC, but they are Nim specific and I won't bore you with them)

I think building a PoC using the json-rpc is very valid. I don't think we should spec out special JSON-RPC methods for the sole purpose of building PoC's though

I also didn't state that I didn't want validation on these calls, on the contrary.

I think some of this is confusion from when I was talking to @bhartnett

I state that it should be specified and that some of the lower level calls might benefit for testing purposes not to have always validation, see the example I gave.

I think we should write our tests using real-world data. I understand that early on being able to quickly test things with fake data was useful, but I don't think there is a problem testing with real data now-a-days. So maybe those "benefits" were helpful in the past, but now that we have more infrastructure built we can test things just as easily with real data.

@bhartnett
Copy link
Contributor Author

bhartnett commented Oct 4, 2024

So I think we should

  • rename portal_*RecursiveFindContent to portal_*GetContent
  • and better document all the JSON-RPC endpoints

I've pushed an update to remove the new endpoint and have renamed the portal_*RecursiveFindContent endpoints. Note that I also renamed the portal_*TraceRecursiveFindContent endpoints for consistency. I also updated the descriptions to be more detailed. Updating the descriptions of any other endpoints can be a separate PR.

As a side note I noticed that the state network doesn't have a portal_stateTraceRecursiveFindContent. Is there a particular reason for this?

@KolbyML
Copy link
Member

KolbyML commented Oct 4, 2024

As a side note I noticed that the state network doesn't have a portal_stateTraceRecursiveFindContent. Is there a particular reason for this?

It should be included. It was probably just missed when the endpoint was being added

Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: looks good.

In the future, maybe we should have a markdown file for specifying the requirements of each endpoint like the engine_api has as it would probably be easier to read when someone is implementing the endpoints. Of course, this can be done in a future PR.

"name": "portal_historyRecursiveFindContent",
"summary": "Look up a target content key in the network",
"name": "portal_historyGetContent",
"summary": "Get content from the local database if it exists, otherwise look up the target content key in the network. After fetching from the network the content is validated and stored in the local database before being returned.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"summary": "Get content from the local database if it exists, otherwise look up the target content key in the network. After fetching from the network the content is validated and stored in the local database before being returned.",
"summary": "Get content from the local database if it exists, otherwise look up the target content key in the network. After fetching from the network the content is validated and stored in the local database if storage criteria is met before being returned.",

Maybe we should be a little more specific, as we won't store it if the storage criteria aren't met.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point, will update.

@bhartnett
Copy link
Contributor Author

As a side note I noticed that the state network doesn't have a portal_stateTraceRecursiveFindContent. Is there a particular reason for this?

It should be included. It was probably just missed when the endpoint was being added

I'll add it in that case.

@bhartnett bhartnett changed the title Add portal_*GetContent JSON-RPC call to get content from local db or network Rename portal_*RecursiveFindContent to portal_*GetContent Oct 4, 2024
@KolbyML
Copy link
Member

KolbyML commented Oct 14, 2024

There has been no opposition to this PR in over a week and I sent a message in the discord with no opposition about merging this PR.

Since there are 3 approvals and no opposition for a prolonged period of time I am going to merge the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants