Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename portal_*Gossip JSON-RPC endpoint to portal_*PutContent #346

Merged
merged 5 commits into from
Dec 14, 2024

Conversation

bhartnett
Copy link
Contributor

I suggest that we rename the portal_*Gossip JSON-RPC endpoint to portal_*PutContent and also update the method definition to include storing the content in the local database after performing some basic validations.

Changes in this PR:

  • Renamed portal_*Gossip JSON-RPC endpoint to portal_*PutContent
  • Updated the endpoint spec definition to include validation and storage in the local db

This is a similar update to #344. My reasoning for this change is that I think we should have a high level endpoint counterpart to portal_*GetContent with similar naming and implementation storing in the local database when it makes sense to do so.

@bhartnett
Copy link
Contributor Author

cc @KolbyML @kdeme

@KolbyML
Copy link
Member

KolbyML commented Oct 18, 2024

Gossip's main usecase is gossiping to other nodes on the network. Normally this is done via bridges. I am not sure if this is an endpoint high level users would actual use.

Because gossip is normal used by bridges which don't want to store that content I am not sure if I see the utility of that.

Also validation would be limited via this API call as for example if we are gossiping block bodies we can't validate canonicalness without a network look on the bridge side, which I don't think we should do that as bandwidth is very limited.

Anyways I am very open to what others think. I think portal_GetContent really makes sense for users, but I don't see how that is the same with gossip

@bhartnett
Copy link
Contributor Author

Gossip's main usecase is gossiping to other nodes on the network. Normally this is done via bridges. I am not sure if this is an endpoint high level users would actual use.

Because gossip is normal used by bridges which don't want to store that content I am not sure if I see the utility of that.

Also validation would be limited via this API call as for example if we are gossiping block bodies we can't validate canonicalness without a network look on the bridge side, which I don't think we should do that as bandwidth is very limited.

Anyways I am very open to what others think. I think portal_GetContent really makes sense for users, but I don't see how that is the same with gossip

Yes, it is used by bridges and in that case the bridge is usually connected to a local portal client node which is also participating in the network so it makes sense to store the content locally if it falls within the local nodes radius so that the content can be shared with other nodes on request. If disabling local storage is a requirement then clients can set their storage capacity to zero to prevent the local storage.

For the validation, I agree we shouldn't do network calls so we can't validate canonicalness but other local validations should still be done.

The purpose for the rename was to bring some symmetry and a counterpart to the portal_GetContent but not necessarily that it would be used by end users.

@morph-dev
Copy link
Contributor

Because gossip is normal used by bridges which don't want to store that content I am not sure if I see the utility of that.

In my opinion, this is not good argument (of course, storing it locally would only happen if it falls within radius):

  1. as @bhartnett said, you can set radius to zero and not store it
  2. you will be offered this content again by the network and you will store it anyway (again, assuming it falls within radius). So why not do it from the start
    • for example, bridge's node is node A. It offers it to B, which offers it to C, which might offer it to A again

With this in mind, I don't see why we shouldn't add this to the spec.


Regarding validation, I don't have strong opinion. It can be either fully validated (as if content came from other node on the network), it can be validated as much as portal_*Store requests, or it can be configurable with a flag.

@KolbyML
Copy link
Member

KolbyML commented Oct 18, 2024

@bhartnett

Anyways I will probably get used to the name portal_*PutContent over time.

I think we should be more specific what we mean by validation in the RPC's description. With the examples given I understand where you guys are coming from for the storage example now. Also we wouldn't want to set a bridges radius to 0 or it would be easily detectable which is bad. So storing it first isn't a bad thing thinking more about it.

Other then that the PR looks good

Regarding validation, I don't have strong opinion. It can be either fully validated (as if content came from other node on the network)

That would add a lot of overhead in terms of my block body or block receipt example. Also if a node offers you not provable data maybe that is something peerscore should handle. It could be dangerous though if an dos attack was to occur on the network. So I think a flag or no network-call are the only 2 feasible options. Maybe the flag approach could prevent naive users from doing weird stuff if it is enabled by default.

@bhartnett
Copy link
Contributor Author

bhartnett commented Oct 18, 2024

That would add a lot of overhead in terms of my block body or block receipt example. Also if a node offers you not provable data maybe that is something peerscore should handle. It could be dangerous though if an dos attack was to occur on the network. So I think a flag or no network-call are the only 2 feasible options. Maybe the flag approach could prevent naive users from doing weird stuff if it is enabled by default.

@KolbyML I agree, full validation would slow down the bridge significantly and if we did end up deciding to do full validation then yes it should be configurable and perhaps enabled by default to prevent surprises.

Having said that, I think this PR shouldn't try to cover categorizing the types of validation because that needs to be done for all the endpoints and it probably should be handled as a separate PR. It would be good to have the types/categories of validation clearly defined.

Regarding the validation wording, sure I'm happy to update to something more clear. My suggestion is that for this PR, the validation should be the same type as done in portal_*GetContent.

@KolbyML
Copy link
Member

KolbyML commented Oct 19, 2024

Regarding the validation wording, sure I'm happy to update to something more clear. My suggestion is that for this PR, the validation should be the same type as done in portal_*GetContent.

maybe for

  • portal_*GetContent we can call it "Full Validation"
  • portal_*PutContent we can call it "Local Validation"

in the descriptions for the respective endpoints

I think that is a simple enough scheme which gets the point across

@bhartnett
Copy link
Contributor Author

Regarding the validation wording, sure I'm happy to update to something more clear. My suggestion is that for this PR, the validation should be the same type as done in portal_*GetContent.

maybe for

  • portal_*GetContent we can call it "Full Validation"
  • portal_*PutContent we can call it "Local Validation"

in the descriptions for the respective endpoints

I think that is a simple enough scheme which gets the point across

Yes something like this could work well enough for now I guess but one issue I see is that 'full validation' should probably include checking if the data is canonical which would include a network call. For example if you look up a trie node from an uncle block which shouldn't exist in the portal network a malicious node could return it to you if you don't check it in some way. This is very much an edge case and I'm not suggesting we actually need to do this kind of validation but my point is that 'full validation' should probably include canonical checks which we shouldn't do in this case. So probably a different type of validation is needed here.

In my last update I'm describing the validation in portal_*GetContent as 'validation for correctness' and the validation in portal_*PutContent as 'validate that the content is well formed'. Hopefully this is ok for now.

@KolbyML
Copy link
Member

KolbyML commented Oct 21, 2024

In my last update I'm describing the validation in portal_*GetContent as 'validation for correctness' and the validation in portal_*PutContent as 'validate that the content is well formed'. Hopefully this is ok for now.

Both examples sound like the same thing but worded differently. I don't think it clear enough. If we are specifying validation here, we should specify it. If we want to properly write the description for validation for portal_*PutContent in a future PR, which should add the word validation to the portal_*PutContent in that follow up PR.

Yes something like this could work well enough for now I guess but one issue I see is that 'full validation' should probably include checking if the data is canonical which would include a network call. For example if you look up a trie node from an uncle block which shouldn't exist in the portal network a malicious node could return it to you if you don't check it in some way. This is very much an edge case and I'm not suggesting we actually need to do this kind of validation but my point is that 'full validation' should probably include canonical checks which we shouldn't do in this case. So probably a different type of validation is needed here.

I think Full Validation would mean the fullest validation possible. portal_*GetContent is unlikely to be used on the State network other than maybe the wallet use case. Where I assume eth_* calls would be used to reduce code duplication and complexity.

If the validation part isn't relevant to this PR we shouldn't add it to the description and properly do it in a follow up PR. If the validation description is an important part and wants to be kept in this PR people should be able to understand what it means.

@bhartnett
Copy link
Contributor Author

In my last update I'm describing the validation in portal_*GetContent as 'validation for correctness' and the validation in portal_*PutContent as 'validate that the content is well formed'. Hopefully this is ok for now.

Both examples sound like the same thing but worded differently. I don't think it clear enough. If we are specifying validation here, we should specify it. If we want to properly write the description for validation for portal_*PutContent in a future PR, which should add the word validation to the portal_*PutContent in that follow up PR.

Yes something like this could work well enough for now I guess but one issue I see is that 'full validation' should probably include checking if the data is canonical which would include a network call. For example if you look up a trie node from an uncle block which shouldn't exist in the portal network a malicious node could return it to you if you don't check it in some way. This is very much an edge case and I'm not suggesting we actually need to do this kind of validation but my point is that 'full validation' should probably include canonical checks which we shouldn't do in this case. So probably a different type of validation is needed here.

I think Full Validation would mean the fullest validation possible. portal_*GetContent is unlikely to be used on the State network other than maybe the wallet use case. Where I assume eth_* calls would be used to reduce code duplication and complexity.

If the validation part isn't relevant to this PR we shouldn't add it to the description and properly do it in a follow up PR. If the validation description is an important part and wants to be kept in this PR people should be able to understand what it means.

I understand where you are coming from and agree with your points above in general but I would argue the the terms 'Full validation' and 'Local validation' are also not completely clear until we define these terms/categories in another location in the spec. I agree that should be done in a separate PR.

In that case, I'm happy to remove the references to validation if that is preferred. Note that portal_*GetContent already mentions validation and I didn't introduce that in this PR (previous PR) so one could argue that mentioning validation in portal_*PutContent at the same level of detail does no harm as we know it will require some form of validation.

@KolbyML
Copy link
Member

KolbyML commented Oct 21, 2024

In that case, I'm happy to remove the references to validation if that is preferred. Note that portal_*GetContent already mentions validation and I didn't introduce that in this PR (previous PR) so one could argue that mentioning validation in portal_*PutContent at the same level of detail does no harm as we know it will require some form of validation.

I think the distinguishing factor here is now validation means 2 different things for 2 different endpoints. If its meaning was consistent I would be more lenient with it. I was lenient with it in the portal_*GetContent call because at the time it was the only usage of the word "validation" in our json-rpc schema.

Well, of course, it wouldn't be too hard for us to figure out what that means; I think the specs should be written explicitly so even an outsider could understand what is implied.

In short I think using the same word to define 2 different expectations is misleading and does harm to people trying to use our specs.

If this is a high level call I don't think we should say the call does validation as that isn't really relevant to the person making the call it is more of an implementation detail in that case I think it would make more sense to define it #345 here. And instead try to focus on higher level description in our JSON-RPC schema then leave implementation details like how validation is handle in the Specification.md per method.

If that is the case though in our json-rpc schema all of the endpoints would contain the word validation in their description to the point it is implicit and since validation would mean a different thing per endpoint I don't think any endpoint should specify it in their description as it would be happening for every JSON-RPC method, but validation would mean different things for most of them or their would be groups of endpoints which do the same form of validation but you get the point.

@bhartnett
Copy link
Contributor Author

In that case, I'm happy to remove the references to validation if that is preferred. Note that portal_*GetContent already mentions validation and I didn't introduce that in this PR (previous PR) so one could argue that mentioning validation in portal_*PutContent at the same level of detail does no harm as we know it will require some form of validation.

I think the distinguishing factor here is now validation means 2 different things for 2 different endpoints. If its meaning was consistent I would be more lenient with it. I was lenient with it in the portal_*GetContent call because at the time it was the only usage of the word "validation" in our json-rpc schema.

Well, of course, it wouldn't be too hard for us to figure out what that means; I think the specs should be written explicitly so even an outsider could understand what is implied.

In short I think using the same word to define 2 different expectations is misleading and does harm to people trying to use our specs.

If this is a high level call I don't think we should say the call does validation as that isn't really relevant to the person making the call it is more of an implementation detail in that case I think it would make more sense to define it #345 here. And instead try to focus on higher level description in our JSON-RPC schema then leave implementation details like how validation is handle in the Specification.md per method.

If that is the case though in our json-rpc schema all of the endpoints would contain the word validation in their description to the point it is implicit and since validation would mean a different thing per endpoint I don't think any endpoint should specify it in their description as it would be happening for every JSON-RPC method, but validation would mean different things for most of them or their would be groups of endpoints which do the same form of validation but you get the point.

This sounds reasonable to me. I've just pushed another change which removes the references to validation from the method descriptions for both rpc methods portal_*GetContent and portal_*PutContent .

Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: looks good to me. Devcon is very close so I am not sure if the team's would let us update the clients to this until after Devcon.

But I am happy with merging this now.

@bhartnett
Copy link
Contributor Author

@KolbyML Any thoughts on getting this one in the specs?

@KolbyML
Copy link
Member

KolbyML commented Nov 28, 2024

@KolbyML Any thoughts on getting this one in the specs?

I would try and get @kdeme to review it.

I think @pipermerriam would be for this as it makes the RPC commands simpler and requires less domain specific knowledge.

After they take a look I don't really see a blocker for this PR

Copy link
Collaborator

@kdeme kdeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on this naming from my side.

The put version is a little more consistent with the get, that's true.

@bhartnett
Copy link
Contributor Author

@pipermerriam Did you have any thoughts on this PR?

"GossipResult": {
"name": "gossipResult",
"PutContentResult": {
"name": "PutContentResult",
"description": "Returns the number of peers that the content was gossiped to",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should also return information about whether the content was added to the database, and possibly if the data was already in the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a boolean to indicate if the data was written to the database makes sense. This can be implemented in the same way as portal_*Store currently does. Indicating if the data was already in the database would require a second database lookup to check if the key/data exists before overwriting so I'd say we shouldn't do this for performance reasons.

"name": "portal_beaconGossip",
"summary": "Send the provided content item to interested peers. Clients may choose to send to some or all peers.",
"name": "portal_beaconPutContent",
"summary": "Store the content in the local database if storage criteria is met, then send the content to interested peers. Clients may choose to send to some or all peers.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"summary": "Store the content in the local database if storage criteria is met, then send the content to interested peers. Clients may choose to send to some or all peers.",
"summary": "Store the content in the local database if storage criteria is met, then send the content to interested peers using the client's default gossip mechanisms.",

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm in favor of this. Is this endpoint in use anywhere? I think my only concern, which is minor is that without an active use case, it's difficult to know if we're getting the API correct. This concern isn't a blocker and I'm fine with moving this forward.

@bhartnett
Copy link
Contributor Author

bhartnett commented Dec 3, 2024

I think I'm in favor of this. Is this endpoint in use anywhere? I think my only concern, which is minor is that without an active use case, it's difficult to know if we're getting the API correct. This concern isn't a blocker and I'm fine with moving this forward.

Yes, the Fluffy State and History Bridges both use this endpoint to gossip/send content into the network. Other bridges will likely also be using it for sending content.

@bhartnett
Copy link
Contributor Author

I've applied the updates as suggested by @pipermerriam

@KolbyML KolbyML merged commit 69bd32c into ethereum:master Dec 14, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants