Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KREAd exit subscriber not robust to v9-zoe upgrades #8714

Closed
warner opened this issue Jan 5, 2024 · 7 comments
Closed

KREAd exit subscriber not robust to v9-zoe upgrades #8714

warner opened this issue Jan 5, 2024 · 7 comments
Assignees
Labels
bug Something isn't working Zoe package: Zoe

Comments

@warner
Copy link
Member

warner commented Jan 5, 2024

As part of #8336, I'm looking at 23 promises decided by v9-zoe and subscribed by v63-KREAd (as of the "run-18" dataset, which ends at block 13194537 on 21-dec-2023). kp8631817 is one example.

@gibson042 's classifier says:

% grep kp8631817 report.txt |jq .
{
  "kpid": "kp8631817",
  "decider": "v9",
  "subscriber": "v63",
  "type": "E(zoeInstanceAdmin).getExitSubscriber(likelySeatHandle) .getUpdateSince(...)",
  "details": {
    "contractBundleID": "b1-853acd6ba3993f0f19d6c5b0a88c9a722c9b41da17cf7f98ff7705e131860c4737d7faa758ca2120773632dbaf949e4bcce2a2cbf2db224fa09cd165678f64ac"
  }
}

This is the KREAd contract waiting for Zoe to announce that a Seat has exited, by calling getUpdateSince() on a subscriber hosted by v9-zoe. Someone invoked makeSellCharacterInvitation (in crank 36017975) to make a "Sell Character" invitation, which invoked kreadKit.market.sellCharacter(), which created the handler function and registered it to be run when the invitation was redeemed. A moment later (in c36018002) that redemption invoked the handler, at the end of which is a call to marketFacet.handleExitCharacter(newEntry).

handleExitCharacter finishes with:

          const subscriber = E(seat).getSubscriber();
          void E.when(E(subscriber).getUpdateSince(), () => {
            marketFacet.updateMetrics('character', {
              marketplaceAverageLevel: {
                type: 'remove',
                value: characterLevel,
              },
            });

            market.characterEntries.delete(asset.name);

            void marketFacet.deleteNode(recorderKit.recorder.getStorageNode());
          });

The E(seat).getSubscriber() calls into ZCF, which sends getExitSubscriber() over to Zoe, which returns a reference to onw of its subscriber objects.

Potential Failure Modes

For #8336 we look at what happens if either vat is upgraded/restarted. We'll focus on v9-zoe getting restarted, because that's what we're planning to do soon.

When the upstream (deciding) vat is upgraded/restarted, all of its outstanding Promises get rejected with a special "disconnection" value. Downstream (subscribing) vats might be confused by this apparent rejection. There are two kinds of problems that could happen here.

The first is when the subscribing vat mistakenly thinks the disconnection is a failure, and reacts badly (immediately after the upgrade). This can happen when the subscriber uses await (which will throw an error when the promise is disconnected) and doesn't explicitly catch+handle the disconnection value. It can also happen when the subscriber uses .then or E.when() and includes an "errback" argument (the second argument to .then, or the third argument to E.when()), or a .catch.

The second is when the subscribing vat might not realize that it needs to acquire a replacement promise, and thus will fail to react correctly to the eventual fulfillment or rejection (perhaps days or months after the upgrade). Clients which use the observeNotifier() helper function are safe against this: assuming the subscriber object is durable (and zoe's is), the client will react to the disconnect by calling getUpdateSince() to request a new promise. subscribeLatest() and subscribeEach() also know how to re-connect after disconnection. However a raw getUpdateSince() call is not safe, and can lead to loss of progress or other failure-to-react.

handleExitCharacter failure modes

The code above uses E.when() with only two arguments. So it will not succumb to the first failure mode. When v9-zoe is upgraded, the getUpdateSince() promise will reject, and the contract will effectively ignore the rejection (it might log an UnhandledPromiseRejectionWarning, but no userspace code will run).

However, because it uses a bare getUpdateSince, it will not re-subscribe. So when the seat finally does exit, it will fail to run the three last lines:

  • marketFacet.updateMetrics()
  • market.characterEntries.delete()
  • marketFacet.deleteNode(.. getStorageNode)

Skipping the metrics update is probably not a big deal, nor is leaving the storage node around (it's a storage/IAVL-space leak, and might also confuse external followers). But leaving the character in market.characterEntries seems more problematic: does that mean the character is still available to be purchased?

Mitigations

The handleExitCharacter() function is also called by creatorFacet.reviveMarketExitSubscribers(). In the version I'm reading, this is only called from the proposal which launched KREAd, but it looks like the creatorFacet is part of the kreadKit that gets stashed in the produce table. That means a future proposal could reach it and call reviveMarketExitSubscribers() again, to reconnect these values.

So if we upgrade zoe, and KREAd indeed stops following the exitSubscriber, perhaps we could recover with a governance proposal that did a CORE_EVAL to invoke reviveMarketExitSubscribers(). To avoid leaving things in the marketplace, this would need to happen before any of those seats were exited, otherwise other mitigations would need to be made (probably involving upgrading the KREAd contract code itself, instead of merely invoking pre-existing methods on it).

If these 23 promises are all for outstanding sell offers, another conceivable mitigation would be to convince everybody making those offers to close them before we trigger the Zoe upgrade, so there would be nothing to be forgotten. They could be re-opened after the Zoe upgrade.

@warner warner added bug Something isn't working Zoe package: Zoe labels Jan 5, 2024
@dckc dckc changed the title KREAd contract behavior when v9-zoe is upgraded KREAd exit subscriber not robust to v9-zoe upgrades Jan 8, 2024
@dckc
Copy link
Member

dckc commented Jan 8, 2024

"contract behavior" didn't make clear to me what the bug is. I hope I found a helpful refinement.

@aj-agoric
Copy link

Per @Chris-Hibbert on our agreed approach post the Upgrade All Vats review meeting

"@warner
has done a light survey of the contract, and says there's a method in the creatorFacet that looks intended to repair things after the kind of break he's worried about. First priority should be figuring out what it would take to invoke that, and whether it would address the problem w/out the need to write any new code for the contract."

@dckc
Copy link
Member

dckc commented Jan 11, 2024

... there's a method in the creatorFacet that looks intended to repair things after the kind of break he's worried about. First priority should be figuring out what it would take to invoke that

The creatorFacet is available from kreadKit in the bootstrap promise space. start-kread-proposal.js put it there..
https://github.com/Kryha/KREAd/blob/KREAd-rc1/agoric/contract/src/proposal/start-kread-proposal.js#L210C16-L210C24
https://github.com/Kryha/KREAd/blob/KREAd-rc1/agoric/contract/src/proposal/start-kread-proposal.js#L321

I should be able to test this with a3p and cosgov quickly...

@dckc
Copy link
Member

dckc commented Jan 11, 2024

E(kreadKit.creatorFacet).reviveMarketExitSubscribers(...)

Is that the one? It seems to be available...

dapp-offer-up-agd-1  | 2024-01-11T17:34:28.445Z SwingSet: vat: v1: { kreadKit: { instance: Object [Alleged: InstanceHandle] {}, publicFacet: Object [Alleged: KreadKit public] {}, governor: Object [Alleged: InstanceHandle] {}, creatorFacet: Object [Alleged: KreadKit creator] {}, adminFacet: Object [Alleged: adminFacet] {}, governorCreatorFacet: Object [Alleged: ContractGovernorKit creator] {}, governorAdminFacet: Object [Alleged: adminFacet] {}, label: 'KREAd', privateArgs: { powers: { storageNode: Object [Alleged: ChainStorageNode] {}, marshaller: Object [Alleged: Board readonlyMarshaller] {} }, clock: Object [Alleged: timerClock] {}, seed: 303 } } }
dapp-offer-up-agd-1  | 2024-01-11T17:34:28.446Z SwingSet: vat: v1: { creatorFacet: Object [Alleged: KreadKit creator] {} }
dapp-offer-up-agd-1  | 2024-01-11T17:34:28.449Z SwingSet: ls: v54: Logging sent error stack (TypeError#1)
dapp-offer-up-agd-1  | 2024-01-11T17:34:28.450Z SwingSet: ls: v54: TypeError#1: target has no method tellMeYourMethods , has [ 'initializeBaseAssets', 'initializeCharacterNamesEntries', 'initializeMetrics', 'makeMintItemInvitation', 'publishItemCollection', 'reviveMarketExitSubscribers' ]
dapp-offer-up-agd-1  | 2024-01-11T17:34:28.450Z SwingSet: ls: v54: TypeError: target has no method "tellMeYourMethods", has ["initializeBaseAssets","initializeCharacterNamesEntries","initializeMetrics","makeMintItemInvitation","publishItemCollection","reviveMarketExitSubscribers"]

core eval materials used for testing:

@dckc
Copy link
Member

dckc commented Jan 29, 2024

@Chris-Hibbert it seems like we don't expect to treat this as a bug and "fix" the KREAd contract; rather, it looks like we'll accept this limitation in the KREAd contract and use the reviveMarketExitSubscribers method during Zoe upgrades.

We could

  1. re-scope this to something like Revive KREAd subscribers during zoe upgrades. Or
  2. make a new issue and
    1. close this bug as wontfix. Or
    2. re-scope this as a question (aka design issue) and close it as resolved

Thoughts? Preferences?

@Chris-Hibbert
Copy link
Contributor

Those sound like reasonable approaches to me. I feel like a spectator here: I didn't do any of the relevant investigation, and don't have strong opinions on issue management.

If it's left to me, I'll flip a mental coin and choose whichever is easiest.

Separate from issue management, the one step I'm unsure how to carry out is where code goes to cause the KREAd revival to be invoked on Zoe upgrade. I presume that shouldn't be included in a core-eval that's concerned with upgrading Zoe (since it's customer code). How does it tag along?

@Chris-Hibbert
Copy link
Contributor

Agreement durning sprint planning was that we should include upgrading KREAd in the proposal that upgrades Zoe. Created #8837.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Zoe package: Zoe
Projects
None yet
Development

No branches or pull requests

6 participants