-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KREAd exit subscriber not robust to v9-zoe upgrades #8714
Comments
"contract behavior" didn't make clear to me what the bug is. I hope I found a helpful refinement. |
Per @Chris-Hibbert on our agreed approach post the Upgrade All Vats review meeting "@warner |
The creatorFacet is available from I should be able to test this with a3p and cosgov quickly... |
E(kreadKit.creatorFacet).reviveMarketExitSubscribers(...)Is that the one? It seems to be available...
core eval materials used for testing: |
@Chris-Hibbert it seems like we don't expect to treat this as a bug and "fix" the KREAd contract; rather, it looks like we'll accept this limitation in the KREAd contract and use the We could
Thoughts? Preferences? |
Those sound like reasonable approaches to me. I feel like a spectator here: I didn't do any of the relevant investigation, and don't have strong opinions on issue management. If it's left to me, I'll flip a mental coin and choose whichever is easiest. Separate from issue management, the one step I'm unsure how to carry out is where code goes to cause the KREAd revival to be invoked on Zoe upgrade. I presume that shouldn't be included in a core-eval that's concerned with upgrading Zoe (since it's customer code). How does it tag along? |
Agreement durning sprint planning was that we should include upgrading KREAd in the proposal that upgrades Zoe. Created #8837. |
As part of #8336, I'm looking at 23 promises decided by v9-zoe and subscribed by v63-KREAd (as of the "run-18" dataset, which ends at block 13194537 on 21-dec-2023).
kp8631817
is one example.@gibson042 's classifier says:
This is the KREAd contract waiting for Zoe to announce that a Seat has exited, by calling
getUpdateSince()
on a subscriber hosted by v9-zoe. Someone invokedmakeSellCharacterInvitation
(in crank 36017975) to make a "Sell Character" invitation, which invokedkreadKit.market.sellCharacter()
, which created thehandler
function and registered it to be run when the invitation was redeemed. A moment later (in c36018002) that redemption invoked the handler, at the end of which is a call tomarketFacet.handleExitCharacter(newEntry)
.handleExitCharacter
finishes with:The
E(seat).getSubscriber()
calls into ZCF, which sendsgetExitSubscriber()
over to Zoe, which returns a reference to onw of its subscriber objects.Potential Failure Modes
For #8336 we look at what happens if either vat is upgraded/restarted. We'll focus on v9-zoe getting restarted, because that's what we're planning to do soon.
When the upstream (deciding) vat is upgraded/restarted, all of its outstanding Promises get rejected with a special "disconnection" value. Downstream (subscribing) vats might be confused by this apparent rejection. There are two kinds of problems that could happen here.
The first is when the subscribing vat mistakenly thinks the disconnection is a failure, and reacts badly (immediately after the upgrade). This can happen when the subscriber uses
await
(which will throw an error when the promise is disconnected) and doesn't explicitly catch+handle the disconnection value. It can also happen when the subscriber uses.then
orE.when()
and includes an "errback" argument (the second argument to.then
, or the third argument toE.when()
), or a.catch
.The second is when the subscribing vat might not realize that it needs to acquire a replacement promise, and thus will fail to react correctly to the eventual fulfillment or rejection (perhaps days or months after the upgrade). Clients which use the
observeNotifier()
helper function are safe against this: assuming thesubscriber
object is durable (and zoe's is), the client will react to the disconnect by callinggetUpdateSince()
to request a new promise.subscribeLatest()
andsubscribeEach()
also know how to re-connect after disconnection. However a rawgetUpdateSince()
call is not safe, and can lead to loss of progress or other failure-to-react.handleExitCharacter failure modes
The code above uses
E.when()
with only two arguments. So it will not succumb to the first failure mode. When v9-zoe is upgraded, thegetUpdateSince()
promise will reject, and the contract will effectively ignore the rejection (it might log anUnhandledPromiseRejectionWarning
, but no userspace code will run).However, because it uses a bare
getUpdateSince
, it will not re-subscribe. So when the seat finally does exit, it will fail to run the three last lines:marketFacet.updateMetrics()
market.characterEntries.delete()
marketFacet.deleteNode(.. getStorageNode)
Skipping the metrics update is probably not a big deal, nor is leaving the storage node around (it's a storage/IAVL-space leak, and might also confuse external followers). But leaving the character in
market.characterEntries
seems more problematic: does that mean the character is still available to be purchased?Mitigations
The
handleExitCharacter()
function is also called bycreatorFacet.reviveMarketExitSubscribers()
. In the version I'm reading, this is only called from the proposal which launched KREAd, but it looks like thecreatorFacet
is part of thekreadKit
that gets stashed in theproduce
table. That means a future proposal could reach it and callreviveMarketExitSubscribers()
again, to reconnect these values.So if we upgrade zoe, and KREAd indeed stops following the exitSubscriber, perhaps we could recover with a governance proposal that did a CORE_EVAL to invoke
reviveMarketExitSubscribers()
. To avoid leaving things in the marketplace, this would need to happen before any of those seats were exited, otherwise other mitigations would need to be made (probably involving upgrading the KREAd contract code itself, instead of merely invoking pre-existing methods on it).If these 23 promises are all for outstanding sell offers, another conceivable mitigation would be to convince everybody making those offers to close them before we trigger the Zoe upgrade, so there would be nothing to be forgotten. They could be re-opened after the Zoe upgrade.
The text was updated successfully, but these errors were encountered: