Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xsnap upgrade by restart-time forced upgrade of all vats by kernel #8405

Open
warner opened this issue Sep 28, 2023 · 4 comments
Open

xsnap upgrade by restart-time forced upgrade of all vats by kernel #8405

warner opened this issue Sep 28, 2023 · 4 comments
Labels
enhancement New feature or request SwingSet package: SwingSet xsnap the XS execution tool

Comments

@warner
Copy link
Member

warner commented Sep 28, 2023

What is the Problem Being Solved?

We've been thinking for a long time about how we're going to deploy a non-snapshot-compatible new version of xsnap (#6361, . We might do this to provide a new feature, improve performance, or fix a security bug.

The primary constraint is that any transcript replays must remain consistent. Generally, when we restart the kernel, we need to bring all active vat workers back up to their current states (whatever state they were in when the kernel last shut down). We do that by loading the most recent heap snapshot, then we replay transcript entries until the new worker has received all the same deliveries as the previous kernel had observed. While doing this replay, we satisfy syscall responses with data from the transcript, rather than executing them for real, so that the replayed vat does not influence any other vats being brought back up (or communicate with the outside world). To ensure that the syscall responses match the syscalls being made, we insist that every syscall made by the worker during replay must exactly match the syscall recorded in the transcript. If these deviate, we declare an "anachrophobia error" and panic the kernel, because something has gone very wrong.

One such deviation would happen if we perform the replay with a version of xsnap that does not behave the same as the one used to record the transcript. Behavior differences are only tolerable if they do not occur during replay. For example, we could fix a bug that has not been triggered yet: the transcript will make no claims about the xsnap behavior when exposed to the bad input, so it could plausibly have been generated from either version. However we cannot tolerate metering differences (a few extra computrons during replay could make the difference between exhausting a metering limit or not), and many changes we might like to make to xsnap would change the metering behavior.

In addition, if the new version of xsnap is unable to (accurately) load a heap snapshot produced by the earlier version, then we cannot use replay-from-last-snapshot. This limits our ability to roll out more significant structural changes to the XS architecture.

The only safe time to change xsnap is when we don't have any vat transcript to replay. This happens when we upgrade the vat: we throw out the old transcript and heap snapshots, and start again in a fresh worker, carrying over only the durable vatstore data (presented to userspace in the baggage object).

So far, to make more drastic changes to xsnap, we've had two general approaches in mind.

Multiple Simultaneous Versions of xsnap

The first, proposed by me (@warner), is to give the kernel a way to host multiple versions of xsnap at the same time (#6596). In this approach, each vat continues to run on the same version of xsnap for the entire incarnation. We record some metadata about the vat, and the kernel uses that to decide which version of xsnap it should launch when starting the worker. When the vat is finally upgraded (or "restarted", i.e. E(adminNode).upgradeVat() but using the same source bundle as the original), the metadata is updated to point to the newest available xsnap version. This way, new vats, and updated vats, will all use the latest xsnap, while load-time replays of old vats will retain their consistent behavior. After upgrading the kernel, operators are responsible for upgrading all vats too. Once all ugprades are complete, the old xsnap will no longer be used. As long as each kernel supports two (overlapping) versions of xsnap, and all vats are upgraded before the next kernel upgrade, we should have no problems.

The downside of this approach is the build-time complexity of hosting multiple xsnap versions in the same kernel package. #6596 explores this, defining a package named worker-v1 (to hold the first version of xsnap), and proposing a worker-v2 for the subsequent version. Another approach which @kriskowal proposed is to take advantage of the package.json aliases" (https://docs.npmjs.com/cli/v10/using-npm/package-spec/), where the dependencies:section can declare one name for use byimport(e.g.xsnap-v1), and a name+version for use by the yarn installprocess (e.g.@agoric/[email protected]). We'd have to experiment with it, and our use of git submodules in the xsnap` build process might make things complicated.

Replay The Whole Incarnation Under The New xsnap

The second approach, raised by @mhofman in #7855, is to replay the entire incarnation's transcript at kernel restart time, for every vat. This requires the new version of xsnap to produce the same syscalls, but we'd tolerate computron/metering differences (i.e. tolerate greater usage the second time around, even if that would have caused the original to be terminated). We'd hope that vat code was not so sensitive to the XS version that it might perform syscalls in a different order.

The benefit of this approach is that we wouldn't need multiple versions of xsnap in a single kernel. The downside is that replaying the entire incarnation is expensive, and we'd have to do it for every vat in the system, even ones that have been idle for a long time. This cost can be reduced if we manage to perform vat upgrades fairly soon before the kernel/xsnap upgrade (reducing the size of the most recent incarnation by making sure it is very new). And there are some clever precomputation tricks we might do to give validators a way to start the replay process in parallel with their normal execution, a week or two ahead of time, and then the final restart would only need to do the last few minutes of updates. But that represents a significant engineering complexity.

Third Approach: Force-Restart All Vats at Kernel Restart

A third approach came up today in a meeting with @ivanlei . If we manage to get all vats upgradable (#8104), then we could decide that each kernel restart will immediately execute a vat restart (null upgrade) on all vats, before performing any other work. We'd need the kernel to inject an upgrade-vat event into the run-queue, for all vats, ahead of anything else that might be on the run-queue at the time. All upgrades would complete before allowing any other messages to be delivered.

This would avoid the complexity of having multiple versions of xsnap simultaneously, and it would avoid the cost of replaying all vats from the start of their current incarnation.

The downsides are:

  • Normally, when we upgrade a vat, we send a final BringOutYourDead to the old incarnation, giving it a chance to shed any imports that are garbage but not yet collected. Without this, the new incarnation might be left hanging on to some imports which could not then be collected (at least not until we design and implement some sort of reconciliation process, where the kernel and the vat can figure out which c-list entries are unused by the vat).
  • The kernel would need to know enough to restart all vats. Contract vats are normally upgraded by Zoe, which (I think) uses privateArgs provided by whoever is telling Zoe to restart them. The kernel may not store enough information to generate the right vatParameters for the vat: it can look in vatOptions to see what parameters were used the previous time, but for e.g. contract vats that won't be correct (the contract would re-initialize things that should instead be re-used). It might be necessary to get Zoe involved: restart all static vats, then ask Zoe to restart all contract vats. However that would risk Zoe talking with vats that have not been restarted yet, as well as being a significant layering violation
  • Finally, ability to restart all production vats #8104 is not necessarily going to result in all vats being upgradable. We decided during the Vaults/bulldozer release to not attempt to make vat-bootstrap upgradeable. In some cases, it's just a question of more work: we tried to record all the important data into baggage, and could construct a new vat that does the right thing when launched as the second incarnation. But in some cases the vat holds critical state in RAM or in merely-virtual objects, and that data is simply unavailable to an upgrade. If we went with a kernel-upgrades-all-vats solution, these non-upgradable vats would be have to be terminated or indefinitely suspended, and we could only tolerate that for a small number of easily-replaceable vats.

So, it's an idea worth exploring, but not immediately obvious that it would work.

@warner warner added enhancement New feature or request SwingSet package: SwingSet xsnap the XS execution tool labels Sep 28, 2023
@mhofman
Copy link
Member

mhofman commented Sep 28, 2023

However we cannot tolerate metering differences (a few extra computrons during replay could make the difference between exhausting a metering limit or not), and many changes we might like to make to xsnap would change the metering behavior.

but we'd tolerate computron/metering differences (i.e. tolerate greater usage the second time around, even if that would have caused the original to be terminated)

We already do not enforce metering during normal replays, so we do in fact accept differences in metering, and we don't need to make an upgrade replay any different in that regard.

We'd hope that vat code was not so sensitive to the XS version that it might perform syscalls in a different order.

Current evidence shows that this is already the case, that transcripts from the pismo era are compatible (identical minus metering) across wide ranges of XS versions, even with dramatic changes to allocation and gc behavior. We do still need to confirm this is still the case for transcripts made in the current "vaults" era, but I don't see any reason why we would have regressed.

we could decide that each kernel restart

I don't understand what you mean by "kernel restart" part. I assume we're not talking about simply restarting the process which hosts the kernel, because that isn't deterministic.

Force-Restart All Vats

While I would love to get to a point where the kernel can unilaterally decide to restart any vat without involvement from any user code (to support a more healthy page out of inactive vats for example), I am worried that not involving the vat at all brings too many complications. One case mentioned is the ability for liveslots to do some cleanup. But also I worry about the case where the vat cannot in fact restart correctly. We'd now have to tombstone that vat until it can be upgraded to something that will hopefully restart in the future.

But in general I agree, it would be preferable to be able to rely on upgrades. I think #7855 is more generic because it doesn't rely on upgrades, but instead upgrades can optimize the performance of the replay. Most of the engineering complexity of #7855 is in the pre-compute optimization. We already know how to replay transcripts, and updating transcripts with new snapshot hashes (and maybe computron usage if we want to be thorough) is pretty straightforward too.

If we end up supporting cleaning of vat resources without liveslots involvement, we could actually have both: attempt a force restart of a vat, and if that fails, fallback to a replay.

@warner
Copy link
Member Author

warner commented Jan 30, 2024

In a meeting last week, we agreed that Force-Restart would give us the features that we care about most. In some sense, it would provide our chain with the most satisfying workflow: a chain-halting upgrade which does all of:

  • install a new version of xsnap (replacing the old one entirely)
  • ensure that all vats are using the new xsnap, as well as the latest Endo/lockdown, liveslots and supervisor
  • performs all lengthy upgrade tasks immediately, while validators are expecting delays

Those are better than an approach which:

  • leaves some vats running the old xsnap/XS, which might have a security problem that we're trying to address
  • leaves some vats running an old Endo/lockdown/liveslots, same
  • having surprising delays at arbitrary times in the future

In addition, the support-multiple-versions approach would get unwieldly if we're successful at upgrading xsnap/XS on a monthly cadence: who knows how many versions would be in play at the same time.

I generally treat SwingSet as a separate product (with its own release schedule, and non-chain customers). In the support-multiple-versions approach, our SwingSet release decision-making process includes which versions to support, e.g. you might have a supported-versions table like:

swingset version supports xsnap versions
swingset-v1 xsnap-v1
swingset-v2 xsnap-v1, xsnap-v2
swingset-v3 xsnap-v2
swingset-v4 xsnap-v2, xsnap-v3

However in practice, when we are preparing to release swingset-v3 and considering whether we can drop support for xsnap-v1, we would survey customers and find out what xsnap versions they still need, and try to satisfy their requirements by retaining the old ones while still encouraging them to upgrade their vats so we could drop those versions. There would be a constant tension between the swingset complication needed to maintain support, vs the chain-side effort needed to stop requiring the old versions. If a swingset release went ahead anyways and dropped support for a still-needed version, the chain would be unable to take advantage of that release, making it harder to deploy important fixes or features without using expensive old-release maintenance branches.

That said, force-restart is an awkward process to impose upon kernel customers. It obligates vat authors to anticipate changes in Endo/lockdown/liveslots which their deployed code must tolerate, and/or it forces Endo/lockdown/liveslots authors to anticipate compatibility requirements of deployed vat code (or swingset authors to refrain from incorporating newer versions of those components). We've had discussions (TODO ticket) about marking liveslots/supervisor bundles with version/feature indicators, and vat bundles with feature requirements, so there is at least enough metadata for something to discover an incompatibility early enough to avoid problems. But that won't magically enable old abandoned contracts to become compatible with a new liveslots (or e.g. its embedded Endo components, qv #8826) that changes some significant API.

I've been trying to find analogies with desktop operating systems (eg Linux, macOS, Windows). Creating a new vat is like launching an application. Halting the swingset kernel is like hiberating the computer. Upgrading a vat is like upgrading an application, which involves stopping it and starting it again (carrying over only the durable document state). Liveslots and the Endo environment are like dynamic libraries, provided by the OS but used by any given incarnation of a program. Vat code imports other components (including several from Endo) and embeds them into their bundle, which are like statically-linked libraries, sampled at compile/link time.

It's not a perfect analogy, but in this view, we might try to maximize the correspondence between a desktop OS upgrade and a chain upgrade. The chain upgrade is the (only) opportunity to replace the kernel, liveslots, lockdown/Endo, supervisor, and xsnap. It does not mandate a change of liveslots/lockdown/Endo/supervisor (since those come from bundles, tracked separately for each vat, and it's easy to store bundles in the DB and keep re-using them until the vat is upgraded). It does mandate a change of the kernel, since there's obviously only one kernel. A change of xsnap is mandated with the force-restart approach, and optional with the support-multiple-xsnaps approach.

Upgrading a desktop OS risks compatibility with existing applications (I've certainly held off upgrading macOS until I was sure my main applications would keep working). Forcibly restarting a vat (and thus switching to the new liveslots/lockdown/endo bundles) risks compatibility too: if the old vat bundles are doing something too old, or the new kernel-provided components are doing something too new, that vat might fail its restart, and then it's kinda stuck. In the desktop OS world this is managed with compatibility testing on both sides (OS vendors test popular applications against new OS versions, and application vendors test existing applications against upcoming beta/seed versions of the OS). Abandoned applications suffer the worst fates.

Tasks

We identified a couple of tasks needed to implement the restart-all-vats approach.

  • First, we need all vats to be restartable (null-upgrade), unilaterally, by the kernel, without involvement by whoever launched that vat. This requires:
    • vatParameters need to be retained, so they can be supposed again in the startVat message. Currently, the kernel neither remembers the vatParameter capdata, nor does it retain refcounts on any objects therein, because I didn't want a memory leak. This would be a tradeoff: vatParameter objects would be retained indefinitely (at least until the next admin-node -driven upgrade replaces the vatParameters with new ones). maybe have kernel retain vatParameters, to enable kernel-unilateral vat upgrade #8947
    • On the chain, we must terminate all non-restartable vats, so a kernel-driven restart doesn't provoke any problems, or leave us thinking that it failed. This means terminating the current price-feed and price-authority vats, both of which we've decided are not upgradable and want to be replaced outright. Probably also the old governance/vote-counter vats. Many other vats must be deliberately upgraded to versions that are restartable, which means fixing (and deploying) everything in the "upgrade all vats" project, including all the vats that don't really need new features or bugfixes but which need to become restartable anyways. v1-bootstrap is not upgradeable and needs its authorities transferred elsewhere, then terminated.
    • All static vats will be restarted too (bootstrap, vat-admin, vattp, timer). Bootstrap needs to be terminated. The others are supplied by the kernel (their sources live in packages/SwingSet/src/vats/), don't use vatParameters, and are ready to be restarted, but we've identified promise-disconnection issues (eg test that v48-vaultFactory re-subscribes to timer when v5-timer is upgraded #8727, test v23-feeDistributor re-subscribes to timer notifier when v5-timer is upgraded #8730) that will require other vats to be fixed before we can restart vat-timer or vat-vat-admin.

Then we'll need a controller API to indicate that all vats should be restarted as the kernel is brought up. This must either be a flag to makeSwingsetController(), or a state-changing API which is called just before invoking makeSwingsetController(), because it needs to take action before vat-warehouse does the worker preloads. A state-changing API call is probably better, because it needs to be called exactly once, on the first kernel launch after the chain-halting upgrade, and the logic to decide what option to pass into makeSwingsetController() sounds messy. #8954 tracks this.

Once the kernel starts, it needs to suspend processing of the run-queue (if there was anything leftover from the previous boot, it must not be executed until all upgrades are done). Then it needs to restart one vat at a time, probably in vatID order (it might be good to restart all static vats before doing any dynamic ones). For each vat, we do the same thing as processUpgradeVat, except that we forego the last bringOutYourDead (because we can't use the old worker; that xsnap is gone), and if the startVat fails, we have a different recovery path (see below). The kernel-side restart processing will push run-queue items (promise disconnections, non-durable object orphaning), as might the startVat delivery (new messages sent to previously-imported objects). All these items get enqueued to run after all vat restarts complete.

"Suspended Vats"

(extracted to #8955)

What happens if the restart fails? Specifically, the startVat delivery (which is what runs buildRootObject, and thus prepare in ZCF-based contract vats) might fail, perhaps if liveslots notices that the vat code failed to re-define all durable Kinds, or if consistency checks in the vat code itself trigger during startup.

For a normal vat upgrade, the kernel rolls back the upgrade, leaving the vat in its old state, and rejects the promises returned by E(adminNode).upgrade() so the userspace code that asked for an upgrade can decide what to do. But if we're restarting vats to switch to a new xsnap, we have no way to return to the old version: that old heap snapshot and transcript are unusable without the old xsnap to run them.

In addition, this restart-all-vats plan might be practical for now, when we have 87 vats on chain, but not when we have a thousand.

@mhofman introduced the idea of "suspending" the vats (he originally used the term "pause", but we agreed that would conflict with a smaller-yet-overlapping feature that stops pulling items off the run-queue for a while, like when the vat has exceeded its runtime budget). "Hibernation" might be another word to use, but I'm thinking that "suspended animation" (no activity, needs significant/risky effort to revive) captures the idea pretty well.

The name refers to the state of a vat in the middle of the usual upgrade process, after the old worker has been shut down (and the old snapshot/transcript effectively deleted), but before the new worker is started (with the new vat bundle). In this state, there is no worker and no transcript.

Just like we currently "page-in" offline vats on-demand when a delivery arrives, starting a worker (from heap snapshot and transcript), we can imagine "unsuspending" suspended vats on-demand when a delivery arrives. Unlike page-in, which can be different on each validator/follower, unsuspending would happen in-consensus (all validators have the same set of suspended vats at the same time).

This leads to a vat lifetime shaped like:

  • createVat starts the lifetime, and vat termination ends it
  • that lifetime is broken up into "incarnations", separated by periods of suspension (perhaps just for a moment, or for months)
  • each incarnation ends when the vat is upgraded, suspended, or terminated
    • ending the incarnation means deleting the heap snapshot/transcript, perhaps after a final BOYD
  • each incarnation starts when the vat is upgraded, unsuspended, or created
    • starting the incarnation means sampling the current liveslots/supervisor bundles, initializing a worker, and delivering startVat
  • within each incarnation, the vat might be online or offline at any given moment, different for each validator/follower
  • bringing a vat online means loading a heap snapshot and replaying a transcript
  • bringing a vat offline means killing the worker

This "unsuspend" revivification would take longer than a snapshot+replay, because we have to execute the whole startVat (which, traditionally, is kind of expensive), and this delay might happen at an inconvenient time.

And it might fail (since it might be using a different xsnap/liveslots/Endo), at least it might fail in different ways than a transcript replay (which is "shouldn't fail" enough that we panic the kernel if it occurs). If it does fail, since we can't return to the old state, our best option is to leave the vat in a suspended state, and set a flag that inhibits automatic unsuspension so don't get stuck in a loop.

With suspension, our restart-all-vats process now looks like:

  • the chain-halting upgrade invokes the swingset/controller API that says "I want to restart all vats"
    • that marks all vats as suspended: all heap snapshots/transcripts are deleted (in practice, transcripts are truncated, not necessarily deleted, but the effect is equivalent)
  • the kernel starts, and some in-consensus set of vats are unsuspended immediately: probably all static vats, and any vats marked with the criticalVatFlag
    • if any of these vats fails to unsuspend, the kernel should panic and the chain should halt, awaiting a better xsnap/liveslots/supervisor which doesn't have the problem (this is why we need to terminate all non-restartable vats first)
      • this is similar to the worker-preload that vat-warehouse does, except that it must be in-consensus, whereas each validator could preload a different number/set of workers without consensus issues
  • the remaining vats are left suspended, and will be unsuspended on-demand the first time a delivery is made to each
    • an unsuspension error in these vats will mark the vat as deliberately suspended, inhibiting automatic unsuspension until some manual process (perhaps a normal vat-upgrade) clears the flag and allows unsuspension to resume

Over time, most vats will remain in a suspended state, and only active vats will have an active transcript/heap-snapshot. Vats which are idle across multiple upgrades will not experience the intermediate versions. The kernel work will be proportional to the number of non-idle vats, rather than the total number of vats.

We might want a second flag, perhaps named restartCriticalFlag, distinct from criticalVatFlag (which means "panic the kernel if this vat is ever terminated"), to control the unsuspend-on-restart behavior. Setting this flag on a vat means more delay during restart-all-vats, but it also means we refuse to proceed without proof that it can restart (which lets us discover the problem right away).

The difference between "pausing" a vat and "suspending" one is that "paused" flag just inhibits run-queue message delivery: there is still a worker, but each time we pull something off the run-queue for the vat, instead of delivering it, we push it off to a side-queue that will be serviced later, when the vat is unpaused. Vats which are suspended do not have any worker state, and will need a startVat to generate some.

I think we need another flag to distinguish between "suspended by a restart-all-vats event", which means we automatically start the new incarnation when a delivery arrives, and "deliberately suspended" (because of error), where we do not. Vats which are deliberately suspended are also paused. Maybe we can use combinations of two boolean flags:

  • suspended=no, paused=no : deliver as usual
  • suspended=yes, paused=no : startVat on demand
  • suspended=yes, paused=yes: enqueue would-be deliveries, uncertain about unsuspend working
  • suspended=no, paused=yes : enqueue would-be deliveries, confident about page-in working

Minor metering overruns would set paused=yes but not suspend the vat. This might also just be implemented by a more sophisticated kernel scheduler, with an input-queue-per-vat instead of a single merged run-queue, by just deciding to not service that vat's input queue until it accumulated more service priority.

More severe vat errors would be dealt with by setting suspended=yes paused=yes, deleting the worker state (leaving the durable state), which inhibits both delivery and automatic restart until someone calls E(adminNode).upgrade() to mark the vat as ready for work. This upgrade would be expected to provide code that resolves the original problem. upgrade() would clear the paused flag, and would also clear the suspended flag as a side-effect of launching the new incarnation right away.

It might make sense to skip the page-in preload for vats which are currently paused: why waste the memory and CPU when we know it will take something special to unpause them. Likewise we might preemptively page-out the worker when a vat gets paused.

We might want to introduce an E(adminNode).resume() or unpause() to let userspace (zoe? core-eval?) clear the paused flag (and deliver any queued messages) for a vat that got itself paused by overrunning its meter, sort of like paying a parking fine to get your car un-booted.

Run-Queue Handling

We cannot guarantee that the run-queue will be empty when the worker is restarted. We do not want previously-queued deliveries to be interleaved with the restart work. And basically we want to pretend that all vats upgrade simultaneously. So we want all startVats to happen in a group, followed by any leftover deliveries from before the chain restart, followed by deliveries provoked by the upgrades.

The vat restarts should be executed in a loop, not by pushing upgrade-vat entries onto the run-queue. This also means GC actions and routing cranks will be deferred until after the restarts finish.

I suspect that we'll see some pathologies in this sequence. We have some code patterns where an ephemeral publisher in VatA is being followed by a subscriber in VatB. When VatA is restarted, VatB will get a rejected promise, which will prompt it to ask the publisher for a new one, but the publisher will be gone, which ought to prompt it to ask a higher-up (durable) object for a replacement publisher. If VatB is also being restarted in this sequence, I can imagine seeing some wasted messages, which could be avoided if we restarted them in a different order. But this is probably just an inefficiency, not a functionality problem.

Lack of a final BOYD

Our normal vat-upgrade process delivers one last BringOutYourDead to the old incarnation before shutting it down. That allows the vat to drop imports and durable storage that was only kept alive by in-RAM pillars or recently-dropped exports.

An abrupt restart, without this final BOYD, will leave these objects pinned by the vat. Until we get a mark-and-sweep GC system in liveslots, the new incarnation won't have enough information to realize that they can be dropped, so this will effectively constitute a storage leak.

Our current reapInterval of 200 deliveries, coupled with our current unlimited GC (until #8417 is deployed), probably limits the severity of this leak. Doing BOYD less frequently, or less completely, will make it worse.

We talked about finding ways to let the kernel participate in this cleanup, by having liveslots store more information in the vatstore. This would unfortunately introduce more coupling between the kernel and liveslots (weakening the abstraction boundary between them), however it might help us clean up this garbage faster.

One idea was to have liveslots store its memory pillar data in the vatstore (in a new ${vatID}.vs.ram.${baseref} section), however this would result in more syscalls (a performance issue) and might cause GC-sensitive syscall behavior (which we try really hard to avoid). The raising of a RAM pillar is generally a deterministic function of vat behavior, but it is dropped when a finalizer runs, which is not. We could defer removing the vatstore record until BOYD (deterministic), but that would remove its utility for BOYD-less kernel decisions.

But, vats in the suspended state have no RAM pillars at all, so the current vatstore contents are complete and sufficient (they document all export and virtual-data pillars). It's just that using them requires an expensive mark-and-sweep GC pass. Our second idea was to the kernel implement this pass, sometime during suspension, rather than liveslots. The big issue is how long it would take to sweep everything. This might be easier to tackle once we've addressed the chain stability problems and purged the enormous piles of unneeded objects, reducing the cost of this operation.

Signalling Restart Readiness

When will it be safe to trigger a restart-all-vats event? What tools can be provide to make this state visible?

At last week's kernel meeting (2024-01-24), we discussed ways for vat bundles to export metadata that indicates their environmental requirements, like "I need to be run in a liveslots that gives me vatPowers.foo", or "I need my globalThis.HandledPromise to have feature X". Likewise, we could annotate the liveslots/supervisor bundle with data about what features it offers, either to the vat code it loads, or to the kernel (like a budget parameter to dispatch.bringOutYourDead()). In some cases this would enable improved compatibility, in others it would merely let us signal an error earlier.

We might use this to let vats signal that they're prepared to be restarted unilaterally. The kernel could look at these flags across all vats and provide a controller.areAllVatsReadyForRestart(). If we exposed this on an RPC service, then chain governance participants could avoid initiating a restart-all-vats -type chain upgrade until it reported true.

Some other variant of this might make it easier to determine when it's safe to deploy a liveslots/Endo/lockdown which changes the features that are available.

We might hope that this signal gets set when we upgrade or terminate the last non-restartable vat, and then never gets reset again (because we never deploy a new non-restartable vat). So its utility might be too limited to be worth deploying. The value of a which-features-are-in-use aspect would depend upon our ability to identify which such features are relevant, which is historically something that happens after deployment, not before.

@warner
Copy link
Member Author

warner commented Oct 1, 2024

@siarhei-agoric and I were discussing this today, and we realized that we might be able to change the cosmos-sdk chain-halting-upgrade timing to help this out.

The governance proposal that says "halt as of block 1234" would be changed to mean "open the halt window at block 1234". Internally, this would put the chain into "ready to halt" mode, which means it stops servicing the action queue (so no new inputs to kernel devices). And it invokes some special controller.suspendAllVats() method, which is like controller.run() but instead:

Then, cosmic-swingset says "ok, I'm ready to halt now", and the node exits. Validator operators would need to wait for this indication before they replace the software and restart with the new version. We might need a controller.resumeAllVats() call just after the restart, to allow the kernel to page-in vats and resume deliveries.

Depending upon how long we think the suspend will take, controller.suspendAllVats() might be subject to a run-policy (and thus be spread out over multiple blocks), or it might run until completion in a single block.

This would require cosmos changes to allow a module to delay upgrade-based halt. I have no idea how big a deal that would be, we plan to pull @JeancarloBarrios and/or @mhofman into the conversation.

@mhofman
Copy link
Member

mhofman commented Oct 1, 2024

we realized that we might be able to change the cosmos-sdk chain-halting-upgrade timing to help this out.

This would require cosmos changes to allow a module to delay upgrade-based halt. I have no idea how big a deal that would be, we plan to pull @JeancarloBarrios and/or @mhofman into the conversation.

This sounds similar to #6263, which last time I looked was really not feasible in cosmos-sdk without heavy modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet xsnap the XS execution tool
Projects
None yet
Development

No branches or pull requests

2 participants