Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Add a draft spec for RMN OffChain Blessing #1043

Closed
wants to merge 2 commits into from

Conversation

rstout
Copy link
Contributor

@rstout rstout commented Jun 17, 2024

Motivation

Solution

@rstout rstout requested a review from a team as a code owner June 17, 2024 20:30
@rstout rstout force-pushed the rs/commit_plugin_rmn_ocb_spec branch from d6610f5 to 2223947 Compare June 17, 2024 21:58
@rstout rstout force-pushed the rs/commit_plugin_rmn_ocb_spec branch from 2223947 to 0b67563 Compare June 18, 2024 20:45
@rstout rstout force-pushed the rs/commit_plugin_rmn_ocb_spec branch from 0b67563 to 80558ae Compare June 19, 2024 14:45
@rstout rstout force-pushed the rs/commit_plugin_rmn_ocb_spec branch from 80558ae to a1d27cf Compare June 19, 2024 15:00
@rstout rstout force-pushed the rs/commit_plugin_rmn_ocb_spec branch from a1d27cf to 31ec17f Compare June 24, 2024 22:46
Copy link
Contributor

@makramkd makramkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall makes sense but is a bit tough to follow. There is an implicit state machine in the existing impl that basically matches what you have laid out here:

  1. Determine intervals to be used in the next round
  2. If previous outcome is not nil, read messages from readable chains, include those in the observation.
  3. In outcome, agree on intervals.

re: your point here:

the complications that can arise if a report is not successfully transmitted (as we explicitly only continue once we know the previous report has been committed).

This is typically what ShouldAccept and ShouldTransmit are used for, it seems a bit odd to me to have states regarding report acceptance onchain.

Some things that would help me personally:

  • State machine diagram with clear start and terminal states
  • Handling the nil outcome in the spec code

case ReportGenerated(signed_intervals):
return CommitReport(signed_intervals)
case _:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to figure out how token/gas prices fit in this design.

pass

# TODO: doc
def query(self, previous_outcome: CommitOutcome) -> CommitQuery:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing validation,shouldTransmit,shouldAccept phases, is it a followup?

Copy link
Contributor

@dimkouv dimkouv Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example we need to define what's happening after e.g. shouldAccept returns false. In that case and based on the previous outcome state we'll keep waiting (until reaching retries limit) for the report to be committed even though this will never happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought much about validation, shouldTransmit, shouldAccept honestly. I don't think there's much we can do besides exhaust the "max committed seq num" checks. How acceptable that is depends on how often we expect shouldTransmit/shouldAccept to fail.

@rstout rstout requested review from a team, elatoskinas, RayXpub and jasonmci as code owners June 28, 2024 18:43
@rstout rstout force-pushed the rs/commit_plugin_rmn_ocb_spec branch from 9df28ae to 977d264 Compare June 28, 2024 18:47
@rstout rstout removed request for a team, jasonmci, elatoskinas and RayXpub June 28, 2024 18:48
@@ -0,0 +1,382 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rstout rstout force-pushed the rs/commit_plugin_rmn_ocb_spec branch from 977d264 to 3132f5a Compare July 1, 2024 17:12
- Determine intervals to be used in the next round
2. BuildingReport
- Build a report from the intervals determined in the previous round
3. CheckingForUpdatedMaxCommittedSeqNums
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to follow the states if we name them in relation to the state of the report? Like SelectingMessagesForReport, BuildingReport, WaitingForReportInclusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

observed_merkle_roots = self.flatten_merkle_root_observations(observations)
signed_intervals = self.get_signed_intervals_to_report(intervals, query.signed_intervals,
observed_merkle_roots)
return ReportGenerated(signed_intervals)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its possible that signed_intervals ends up being empty (unlikely, but say briefly >f_chain nodes goes down in either RMN or CCIP after providing a set of intervals). For that case I think we'd want to go back to ChoosingIntervals to start gathering potentially a larger batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, great call out!

pass

# The OCR3 implementation of Outcome
def outcome(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth noting somewhere that if this returns an error repeatedly, we could get stuck in a state as if outcome errors the next round has the same prevOutcome. However since its a pure function we expect to fully control the error scenarios

# have different merkle roots for the same chain, this chain is not included in the output. Additionally, if there
# are chains that don't require RMN support, these chains will be in observed_merkle_roots but not
# rmn_signed_intervals, and will be included in the output (with an empty set of RMN signatures).
def get_signed_intervals_to_report(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is worth spelling out in the spec. You'd need CCIP f_chain and RMN f_chain defined on class or as inputs. We can include them in config for now (can just copy in CommitConfig in the commit_plugin.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be simpler to implement this first in the Go code and then port it over

pass

# Verify the RMN signatures on the given signed_intervals
def verify_signed_intervals(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe to be called inside of get_signed_intervals_to_report? Then if an sig doesn't validate we exclude that observation but don't error

) -> Dict[ChainSelector, SignedInterval]:
pass

# Given a list of SequenceNumbersObservation, return a flattened consensus on the max committed sequence number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting same logic as here

@rstout rstout force-pushed the rs/commit_plugin_rmn_ocb_spec branch from 3132f5a to 367a8b4 Compare July 2, 2024 18:09
Copy link
Contributor

@kaleofduty kaleofduty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! here are some initial questions

def request_max_seq_nums(
self,
chains: List[ChainSelector]
) -> Dict[ChainSelector, int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be signed?



@dataclass
class SignedInterval:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this name a little confusing, it's the root that's signed, no?

return ReportNotYetTransmitted(previous_max_committed_seq_nums, attempts + 1)

# The OCR3 implementation of Report
def report(self, outcome: CommitOutcome) -> Optional[CommitReport]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be interesting to split into multiple reports in case they get too large?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah potentially once we have enough chains s.t. roots can't easily fit in one report (long way away though)

Comment on lines +372 to +373
observed_merkle_roots = self.flatten_merkle_root_observations(observations)
signed_intervals = self.get_signed_intervals_to_report(intervals, query.signed_intervals,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we check that only nodes that have been assigned the role of reading from a chain include it in its intervals?

# If we are choosing the next intervals this round, we need to query RMN for the max uncommitted sequence
# numbers it has for each source chain, so we can set appropriate upper ranges for our intervals.
case SelectingIntervalsForReport():
rmn_max_seq_nums = self.rmn_client.request_max_seq_nums(self.all_source_chains)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this sync or async?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed live - we want async because:

  • Avoids long query timeout, which leads to longer deltaProgress (slowing OCR leader rotation)
  • Can do the network calls themselves in parallel

pass

# The OCR3 implementation of Outcome
def outcome(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checking,this function is pure, right?

return SequenceNumbersObservation(self.get_max_committed_seq_nums(), {})

# Given a list of MerkleRootObservations, return a flattened consensus on the merkle root for each source chain
def flatten_merkle_root_observations(self, observations: List[CommitObservation]) -> Dict[ChainSelector, bytes]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this function work internally?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max voted on root which has at least f+1 votes

# exhausted, or return ReportNotYetTransmitted with an incremented "attempts" value otherwise
case WaitingForReportTransmission(previous_max_committed_seq_nums, attempts):
max_committed_seq_nums = self.flatten_max_committed_seq_nums_observations(observations)
if self.max_committed_seq_nums_are_updated(max_committed_seq_nums, previous_max_committed_seq_nums):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this evaluate to true whenever they change, or only when they match what we transmitted?

dest_chain: ChainSelector
chain_readers: Dict[ChainSelector, ChainReader]
f: int
max_check_report_persisted_attempts: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider making this wall clock time?
consider checking vs SharedConfig

Copy link
Contributor

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Aug 18, 2024
Copy link
Contributor

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants