-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(async-flow): panic if a guest fulfills during replay #10160
Conversation
quick nit: settles. An early rejection is as incorrect and an early fulfillment, and actually more likely to occur. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle the rejection case too :)
Deploying agoric-sdk with Cloudflare Pages
|
fb1407a
to
cf3fe86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, feel free to merge once the following is addressed:
- Small documentation update to reflect in which case the flow goes from running to failed.
- I am uncomfortable defaulting the test async flow panic handler to just log, and I think it's safe to make it fail the test. Please let m know if I'm misunderstanding something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the flow can also go from running to failed if the guest does something illegal like passing a non passable to a guest wrapper. It's not just for internal failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the diagram.
packages/async-flow/test/_utils.js
Outdated
* @param {PreparationOptions} [opts] | ||
*/ | ||
export const prepareTestAsyncFlowTools = (t, zone, opts) => { | ||
const { panicHandler = e => t.log('Handled panic', e) } = opts || {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the default behavior be to t.fail
the test? I know that's what the unhandled rejection handler ava sets up would do, but I think that's the correct default behavior, and here it'd just be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so you're talking about the default panicHandler
. I presume you mean that the tests that had unhandled rejections, but expected them, would override the panicHandler
option?
@@ -149,8 +150,12 @@ const testBadShortReplay = async (t, zone) => { | |||
|
|||
const { guestMethod } = { | |||
async guestMethod(_gOrch7, _g1, _p3) { | |||
t.log(' badReplay return early'); | |||
resolveStep(true); | |||
t.log(' badShortReplay return early'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this log line before the return
?
@@ -10,6 +10,7 @@ | |||
|
|||
- ***Replaying***. To start ***Replaying***, the activation first translates the saved activation arguments from host to guest, invokes the guest function, and starts the membrane replaying from its durable log. The replay is finished when the last log entry has been replayed. Once replaying is finished, the activation has caught up and transitions back to ***Running***. | |||
|
|||
- ***Failed***. If during the ***Replaying*** state the guest activation fails to exactly reproduce its previously logged behavior, it goes into the inactive ***Failed*** state, with a diagnostic explaining how the replay failed, so it can be repaired by another future upgrade. As of the next reincarnation, the failure status is cleared and it starts ***Replaying*** again, hoping not to fail this time. If replay failed because the guest async function did not reproduce its previous behavior, then the upgrade needs to replace the function with one which does. If the replay failed because of a failure of the `asyncFlow` mechanism, whether a bug or merely hitting a case that is not yet implemented, then the upgrade needs to replace the relevant part of `asyncFlow`'s mechanism. | |||
- ***Failed***. If during the ***Replaying*** state the guest activation fails to exactly reproduce its previously logged behavior, it goes into the inactive ***Failed*** state, with a diagnostic explaining how the replay failed, so it can be repaired by another future upgrade. As of the next reincarnation, the failure status is cleared and it starts ***Replaying*** again, hoping not to fail this time. If replay failed because the guest async function did not reproduce its previous behavior, then the upgrade needs to replace the function with one which does. If a ***Replaying*** or ***Running*** guest failed because of a failure of the `asyncFlow` mechanism, whether a bug or merely hitting a case that is not yet implemented, then an upgrade needs to replace the relevant part of the `asyncFlow`'s mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to mention that a Running guest can go into failed state if it interacts with a host wrapper in an unsupported way (e.g. uses non passable or otherwise non recognized values as arguments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "a case that is not yet implemented" cover that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded this bullet to be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things may be not yet implemented. But others may just be intrinsic limitations of async-flow or of our object model. Passing a non-passable value as argument is one of those purely guest error that async-flow isn't responsible for.
I just realized, can we add an explicit test of the new panic handler? I'm thinking something like a guest that passes a locally defined function as argument, and verify that trips up the panic handler. |
A guest that passes a locally defined function doesn't cause a panic, it just shows up as a |
That is surprising, and not what I expected at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Small nit, as relying on precise turn counts is likely not reliable long term. That said, we likely need to update all tests to be more resilient to that, so feel free to ignore for now.
I'm satisfied that the new panic handler is exercised in some of these tests already, It would be nice to also cover the case of guest misbehavior in playing state, but that's something we can add as part of #9933
@@ -103,6 +111,8 @@ const testBadHostFirstPlay = async (t, zone) => { | |||
// Notice that the bad call was not recorded in the log | |||
]); | |||
t.log('badHost firstPlay done'); | |||
// Allow panicHandler to be called. | |||
await null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to say we should await eventLoopIteration()
(aka await nextCrank()
) here
870a055
to
1daa37f
Compare
closes: #9465
closes: #10148
Description
This pull request introduces several changes to the
async-flow
package, focusing on enhancing error handling, updating documentation, and improving test robustness. Key changes include updating panicking if the guest function settles during replay (unexpected early completion), Failure state handling, adding a panic handler, and refactoring tests to utilize a new utility function.Security Considerations
Behaviour is more correct by guaranteeing the guest behaviour actually matches the recorded log.
Scaling Considerations
n/a
Documentation Considerations
n/a
Testing Considerations
Converted an existing
test.failing
to a passing test, with a minor change to useeventLoopIteration
to paper over multiple turns needed for the async flow to register an early completion failure.Upgrade Considerations
n/a