From 4580d4f64830d8031e3f079f1a682d65554d7433 Mon Sep 17 00:00:00 2001 From: Brian Warner Date: Fri, 25 Oct 2024 12:41:44 -0700 Subject: [PATCH] feat(swingset): add "upgrade events" management Host applications are responsible for calling `upgradeSwingset(kernelStorage)` before creating the controller, at least when the kernel package version has been changed, to perform any necessary upgrades to the kernel state. Previously, if the upgrades needed to inject messages into the run-queue (e.g. the `dispatch.notify` messages used to remediate leftover problems from #9039), `upgradeSwingset` would inject them automatically, which could intermingle their cranks with anything still in the run-queue when the upgrade occurs (like if the previous block had leftover work to do). With this commit, these messages are instead held internally by the kernel, in a new kvStore key (`upgradeEvents`), so the host can choose when to inject them, for instance *after* it has drained the run-queue of any leftover work. This introduces a new host-app responsibility: when `upgradeEvents()` indicates that modifications have been made, the app must call `controller.injectQueuedUpgradeEvents()` some time before the next `hostStorage.commit()`. That will add the remediation messages to the acceptance queue, so the next `controller.run()` will execute them. --- packages/SwingSet/docs/host-app.md | 28 +++++++++-- .../SwingSet/src/controller/controller.js | 2 + .../src/controller/upgradeSwingset.js | 47 +++++++++---------- packages/SwingSet/src/kernel/kernel.js | 5 ++ .../SwingSet/src/kernel/state/kernelKeeper.js | 26 ++++++++++ packages/SwingSet/test/kernel.test.js | 43 +++++++++++++++++ packages/SwingSet/test/state.test.js | 22 ++++++++- .../SwingSet/test/upgrade-swingset.test.js | 29 +++++++----- 8 files changed, 158 insertions(+), 44 deletions(-) diff --git a/packages/SwingSet/docs/host-app.md b/packages/SwingSet/docs/host-app.md index 60a3efb0ed0c..581406c39078 100644 --- a/packages/SwingSet/docs/host-app.md +++ b/packages/SwingSet/docs/host-app.md @@ -18,22 +18,40 @@ SwingSet provides robust and deterministic computation, even in the face of unex ### Kernel Upgrade -The life cycle of a SwingSet kernel begins with the one and only call to `initializeSwingset()`, which populates the swingstore DB for the first time. After that, the kernel is presumed to be immortal, but its execution is broken up into a series of reboots. Each reboot (e.g. each time the host application is started), the app must build a new controller with `makeSwingsetController()`, to have something to run. +The life cycle of a SwingSet kernel begins with the one and only call to `initializeSwingset()`, which populates the SwingStore DB for the first time. After that, the kernel is presumed to be immortal, but its execution is broken up into a series of reboots. Each reboot (e.g. each time the host application is started), the app must build a new controller with `makeSwingsetController()`, to have something to run. -From time to time, the host app will be upgraded to use a newer version of the SwingSet kernel code (e.g. a new version of the `@agoric/swingset-vat` package). This newer version might require an upgrade to the kernel's internal state. For example, the way it represents some data about a vat might be made more efficient, and the upgrade process needs to examine and rewrite vat state to switch to the new representation. Or, a bug might be fixed, and the upgrade process needs to locate and remediate any consequences of the bug having been present for earlier execution. +From time to time, the host app will be upgraded to use a newer version of the SwingSet kernel code (e.g. a new version of this `@agoric/swingset-vat` package). The newer version might require an upgrade to the kernel's persisted/durable state. For example, the way it represents some vat metadata might be made more efficient, and the upgrade process needs to examine and rewrite vat state to use the new representation. Or, a bug might be fixed, and the upgrade process needs to locate and remediate any consequences of the bug having been present during earlier execution. -To make the resulting state changes occur at a predictable time, upgrades do not happen automatically. Instead, each time the host app reboots with a new version of the kernel code, it must call `upgradeSwingset(kernelStorage)`. It must do this *before* calling `makeSwingsetController()`, as that function will throw an error if given a swingstore that has not been upgraded. +To make the resulting state changes occur deterministically, upgrades are not automatic. Instead, each time the host app reboots with a new version of the kernel code, it must call `upgradeSwingset(kernelStorage)`. It must do this *before* calling `makeSwingsetController()`, as that function will throw an error if given a SwingStore that has not been upgraded. -It is safe to call `upgradeSwingset` on reboots that do not change the version of the kernel code: the function is idempotent, and will do nothing if the swingstore is already up-to-date. +It is safe to call `upgradeSwingset` on reboots that do not change the version of the kernel code: the function is idempotent, and will do nothing if the SwingStore is already up-to-date. + +Some upgrades (in particular bug remediations) need to add events to the kernel's run-queue. To avoid having these events be intermingled with work that might already be on the run-queue at reboot time (e.g. work leftover from previous runs), these events are not automatically injected at that time. Instead, the kernel remembers what needs to be done, and waits for the host to invoke `controller.injectQueuedUpgradeEvents()` at a time of their choosing. This should be done before the next `commit()`, to avoid the risk of them being lost by a reboot. So most host applications will start each reboot with a sequence like this: ```js const { hostStorage, kernelStorage } = openSwingStore(baseDirectory); upgradeSwingset(kernelStorage); -const c = makeSwingsetController(kernelStorage, deviceEndowments); +const controller = makeSwingsetController(kernelStorage, deviceEndowments); +controller.injectQueuedUpgradeEvents(); +``` + +followed by later code to execute runs. Inputs can be fed all-at-once, or each processed in their own run, but at the end of the block, the host app must `commit()` the DB: + +```js +async function doBlock(deviceInputs) { + for (const input of deviceInputs) { + injectDeviceInput(input); + await controller.run(); + } + hostStorage.commit(); + emitDeviceOutputs(); +} ``` +The actual signature of `upgradeSwingset` is `const { modified } = upgradeSwingset(kernelStorage)`, and the `modified` flag indicates whether anything actually got upgraded. Host applications which have other means to keep track of software upgrades may wish to assert that `modified === false` in reboots that are not associated with a change to the kernel package version. They can also safely skip the `injectQueuedUpgradeEvents` call if nothing was modified. + ### Crank Execution For convenience in discussion, we split execution into "blocks". During a block, the host may call one or more device inputs, such as inbound messages, or timer wakeup events. The end of the block is marked by one or more calls to `controller.run()`, followed by a `hostStorage.commit()`, followed by the host-provided device endowments doing whatever kind of outbound IO they need to do. diff --git a/packages/SwingSet/src/controller/controller.js b/packages/SwingSet/src/controller/controller.js index 05027542e2b7..c3091989814b 100644 --- a/packages/SwingSet/src/controller/controller.js +++ b/packages/SwingSet/src/controller/controller.js @@ -388,6 +388,8 @@ export async function makeSwingsetController( return kernel.deviceNameToID(deviceName); }, + injectQueuedUpgradeEvents: () => kernel.injectQueuedUpgradeEvents(), + /** * Queue a method call into the named vat * diff --git a/packages/SwingSet/src/controller/upgradeSwingset.js b/packages/SwingSet/src/controller/upgradeSwingset.js index c26cb8d89310..6f81a6d7d1ad 100644 --- a/packages/SwingSet/src/controller/upgradeSwingset.js +++ b/packages/SwingSet/src/controller/upgradeSwingset.js @@ -5,10 +5,13 @@ import { getAllDynamicVats, getAllStaticVats, incrementReferenceCount, - addToQueue, } from '../kernel/state/kernelKeeper.js'; import { enumeratePrefixedKeys } from '../kernel/state/storageHelper.js'; +/** + * @import {RunQueueEvent} from '../types-internal.js'; + */ + const upgradeVatV0toV1 = (kvStore, defaultReapDirtThreshold, vatID) => { // This is called, once per vat, when upgradeSwingset migrates from // v0 to v1 @@ -96,11 +99,13 @@ const upgradeVatV0toV1 = (kvStore, defaultReapDirtThreshold, vatID) => { * `hostStorage.commit()` afterwards. * * @param {SwingStoreKernelStorage} kernelStorage - * @returns {boolean} true if any changes were made + * @returns {{ modified: boolean }} */ export const upgradeSwingset = kernelStorage => { const { kvStore } = kernelStorage; let modified = false; + /** @type {RunQueueEvent[]} */ + const upgradeEvents = []; let vstring = kvStore.get('version'); if (vstring === undefined) { vstring = '0'; @@ -327,40 +332,30 @@ export const upgradeSwingset = kernelStorage => { // only the upgraded vat will see anything, and those deliveries // won't make it past liveslots). - const kernelStats = JSON.parse(getRequired('kernelStats')); - // copied from kernel/state/stats.js, awkward to factor out - const incStat = (stat, delta = 1) => { - assert.equal(stat, 'acceptanceQueueLength'); - kernelStats[stat] += delta; - const maxStat = `${stat}Max`; - if ( - kernelStats[maxStat] !== undefined && - kernelStats[stat] > kernelStats[maxStat] - ) { - kernelStats[maxStat] = kernelStats[stat]; - } - const upStat = `${stat}Up`; - if (kernelStats[upStat] !== undefined) { - kernelStats[upStat] += delta; - } - }; - + let count = 0; for (const [kpid, vatID] of buggyKPIDs) { - const m = harden({ type: 'notify', vatID, kpid }); + // account for the reference to this kpid in upgradeEvents incrementReferenceCount(getRequired, kvStore, kpid, `enq|notify`); - addToQueue('acceptanceQueue', m, getRequired, kvStore, incStat); + upgradeEvents.push({ type: 'notify', vatID, kpid }); + count += 1; } - kvStore.set('kernelStats', JSON.stringify(kernelStats)); - - console.log(` - #9039 remediation complete`); + console.log(` - #9039 remediation complete, ${count} notifies to inject`); modified = true; version = 3; } + if (upgradeEvents.length) { + assert(modified); + // stash until host calls controller.injectQueuedUpgradeEvents() + const oldEvents = JSON.parse(kvStore.get('upgradeEvents') || '[]'); + const events = [...oldEvents, ...upgradeEvents]; + kvStore.set('upgradeEvents', JSON.stringify(events)); + } + if (modified) { kvStore.set('version', `${version}`); } - return modified; + return harden({ modified }); }; harden(upgradeSwingset); diff --git a/packages/SwingSet/src/kernel/kernel.js b/packages/SwingSet/src/kernel/kernel.js index 6f438cf1b5bb..5b21dbe08941 100644 --- a/packages/SwingSet/src/kernel/kernel.js +++ b/packages/SwingSet/src/kernel/kernel.js @@ -215,6 +215,10 @@ export default function buildKernel( return deviceID; } + function injectQueuedUpgradeEvents() { + kernelKeeper.injectQueuedUpgradeEvents(); + } + function addImport(forVatID, what) { if (!started) { throw Error('must do kernel.start() before addImport()'); @@ -2207,6 +2211,7 @@ export default function buildKernel( pinObject, vatNameToID, deviceNameToID, + injectQueuedUpgradeEvents, queueToKref, kpRegisterInterest, kpStatus, diff --git a/packages/SwingSet/src/kernel/state/kernelKeeper.js b/packages/SwingSet/src/kernel/state/kernelKeeper.js index d65420208c92..3481c61bb444 100644 --- a/packages/SwingSet/src/kernel/state/kernelKeeper.js +++ b/packages/SwingSet/src/kernel/state/kernelKeeper.js @@ -143,6 +143,7 @@ export const CURRENT_SCHEMA_VERSION = 3; // gcActions = JSON(gcActions) // reapQueue = JSON([vatIDs...]) // pinnedObjects = ko$NN[,ko$NN..] +// upgradeEvents = JSON([events..]) // ko.nextID = $NN // ko$NN.owner = $vatID @@ -182,6 +183,8 @@ export const CURRENT_SCHEMA_VERSION = 3; // v3: // * change `version` to `'3'` // * perform remediation for bug #9039 +// (after v3, does not get its own version) +// * `upgradeEvents` recognized, but omitted if empty /** @type {(s: string) => string[]} s */ export function commaSplit(s) { @@ -1267,6 +1270,27 @@ export default function makeKernelKeeper( return dequeue('acceptanceQueue'); } + function injectQueuedUpgradeEvents() { + // refcounts: Any krefs in `upgradeEvents` must have a refcount to + // represent the list's hold on those objects. When + // upgradeSwingset() creates these events, it must also + // incref(kref), otherwise we run the risk of dropping the kref by + // the time injectQueuedUpgradeEvents() is called. We're nominally + // removing each event from upgradeEvents (decref), then pushing + // it onto the run-queue (incref), but since those two cancel each + // other out, we don't actually need to modify any reference + // counts from within this function. Note that + // addToAcceptanceQueue does not increment refcounts, just kernel + // queue-length stats. + + const events = JSON.parse(kvStore.get('upgradeEvents') || '[]'); + kvStore.delete('upgradeEvents'); + for (const e of events) { + assert(e.type, `not an event`); + addToAcceptanceQueue(e); + } + } + function allocateMeter(remaining, threshold) { if (remaining !== 'unlimited') { assert.typeof(remaining, 'bigint'); @@ -1966,6 +1990,8 @@ export default function makeKernelKeeper( getAcceptanceQueueLength, getNextAcceptanceQueueMsg, + injectQueuedUpgradeEvents, + allocateMeter, addMeterRemaining, setMeterThreshold, diff --git a/packages/SwingSet/test/kernel.test.js b/packages/SwingSet/test/kernel.test.js index f18a0720ff3d..ef17e333a343 100644 --- a/packages/SwingSet/test/kernel.test.js +++ b/packages/SwingSet/test/kernel.test.js @@ -1817,3 +1817,46 @@ test('reap gc-krefs 12', async t => { test('reap gc-krefs overrideNever', async t => { await reapGCKrefsTest(t, 12, true); }); + +test('injectQueuedUpgradeEvents', async t => { + const endowments = makeKernelEndowments(); + const { kernelStorage } = endowments; + const { kvStore } = kernelStorage; + await initializeKernel({}, kernelStorage); + const kernel = buildKernel(endowments, {}, {}); + await kernel.start(); + // TODO: extract `queueLength`/`dumpQueue` from kernelKeeper.js? + { + const [start, end] = JSON.parse(kvStore.get('acceptanceQueue')); + t.deepEqual([start, end], [1, 1]); + } + + // These events would fail if they actually got processed: the vatID + // and all the krefs are fake, the methargs are not valid capdata, + // and we haven't updated refcounts. But all we care about is that + // they make it onto the acceptance queue. + const vatID = 'v1'; + const e1 = harden({ type: 'notify', vatID, kpid: 'kp25' }); + // kernelKeeper.incrementRefCount(kpid, `enq|notify`); + const target = 'ko11'; + const methargs = ''; + const msg = harden({ methargs, result: 'kp33' }); + const e2 = harden({ type: 'send', target, msg }); + kvStore.set('upgradeEvents', JSON.stringify([e1, e2])); + + kernel.injectQueuedUpgradeEvents([e1, e2]); + + { + const [start, end] = JSON.parse(kvStore.get('acceptanceQueue')); + t.deepEqual([start, end], [1, 3]); + t.deepEqual(JSON.parse(kvStore.get('acceptanceQueue.1')), e1); + t.deepEqual(JSON.parse(kvStore.get('acceptanceQueue.2')), e2); + } + + // It'd be nice to also check that the stats were incremented: + // stats.acceptanceQueueLength shoule be 2, acceptanceQueueLengthUp + // should be 2, and acceptanceQueueLengthMax should be 2. But the + // stats are held in RAM until a crank is executed, and we're not + // doing that here. And the kernel hides the kernelKeeper so we + // can't call getStats() to check. +}); diff --git a/packages/SwingSet/test/state.test.js b/packages/SwingSet/test/state.test.js index 8a5076b14a6d..16450ab7b175 100644 --- a/packages/SwingSet/test/state.test.js +++ b/packages/SwingSet/test/state.test.js @@ -1109,12 +1109,15 @@ test('dirt upgrade', async t => { // it requires a manual upgrade let k2; + let ks2; { const serialized = store.serialize(); const { kernelStorage } = initSwingStore(null, { serialized }); - upgradeSwingset(kernelStorage); + const { modified } = upgradeSwingset(kernelStorage); + t.true(modified); k2 = makeKernelKeeper(kernelStorage, CURRENT_SCHEMA_VERSION); // works this time k2.loadStats(); + ks2 = kernelStorage; } t.true(k2.kvStore.has(`kernel.defaultReapDirtThreshold`)); @@ -1157,6 +1160,12 @@ test('dirt upgrade', async t => { t.deepEqual(JSON.parse(k2.kvStore.get(`${v3}.options`)).reapDirtThreshold, { never: true, }); + + { + // upgrade is idempotent + const { modified } = upgradeSwingset(ks2); + t.false(modified); + } }); test('v2 upgrade', async t => { @@ -1177,15 +1186,24 @@ test('v2 upgrade', async t => { // it requires a manual upgrade let k2; + let ks2; { const serialized = store.serialize(); const { kernelStorage } = initSwingStore(null, { serialized }); - upgradeSwingset(kernelStorage); + const { modified } = upgradeSwingset(kernelStorage); + t.true(modified); k2 = makeKernelKeeper(kernelStorage, CURRENT_SCHEMA_VERSION); // works this time k2.loadStats(); + ks2 = kernelStorage; } t.true(k2.kvStore.has(`vats.terminated`)); t.deepEqual(JSON.parse(k2.kvStore.get(`vats.terminated`)), []); t.is(k2.kvStore.get(`version`), '3'); + + { + // upgrade is idempotent + const { modified } = upgradeSwingset(ks2); + t.false(modified); + } }); diff --git a/packages/SwingSet/test/upgrade-swingset.test.js b/packages/SwingSet/test/upgrade-swingset.test.js index a14e409b83ed..3ef59ccf9955 100644 --- a/packages/SwingSet/test/upgrade-swingset.test.js +++ b/packages/SwingSet/test/upgrade-swingset.test.js @@ -1,4 +1,3 @@ -/* eslint-disable no-underscore-dangle */ // @ts-nocheck // eslint-disable-next-line import/order @@ -162,7 +161,8 @@ test('upgrade kernel state', async t => { upgradeSwingset(kernelStorage); // now we should be good to go - const _controller = await makeSwingsetController(kernelStorage); + const controller = await makeSwingsetController(kernelStorage); + controller.injectQueuedUpgradeEvents(); t.true(kvStore.has('kernel.defaultReapDirtThreshold')); // the kernel-wide threshold gets a .gcKrefs (to meet our upcoming @@ -245,7 +245,8 @@ test('upgrade non-reaping kernel state', async t => { upgradeSwingset(kernelStorage); // now we should be good to go - const _controller = await makeSwingsetController(kernelStorage); + const controller = await makeSwingsetController(kernelStorage); + controller.injectQueuedUpgradeEvents(); t.true(kvStore.has('kernel.defaultReapDirtThreshold')); t.deepEqual(JSON.parse(kvStore.get('kernel.defaultReapDirtThreshold')), { @@ -374,7 +375,7 @@ test('v3 upgrade', async t => { } kvStore.set('acceptanceQueue', JSON.stringify([accHead, accTail])); - let stats = JSON.parse(kvStore.get('kernelStats')); + const stats = JSON.parse(kvStore.get('kernelStats')); stats.runQueueLength += runQueue.length; stats.runQueueLengthUp += runQueue.length; stats.runQueueLengthMax = runQueue.length; @@ -394,8 +395,10 @@ test('v3 upgrade', async t => { // upgrade it upgradeSwingset(kernelStorage); + // now we should be good to go - const _controller = await makeSwingsetController(kernelStorage); + const controller = await makeSwingsetController(kernelStorage); + controller.injectQueuedUpgradeEvents(); // check state by mutating our dumped copy and then comparing // against a new dump @@ -415,12 +418,16 @@ test('v3 upgrade', async t => { const np5 = JSON.stringify({ type: 'notify', vatID, kpid: p5 }); data[`acceptanceQueue.${tail - 1}`] = np5; - // stats are updated with the queue changes - stats = JSON.parse(data.kernelStats); - stats.acceptanceQueueLength += 3; - stats.acceptanceQueueLengthUp += 3; - stats.acceptanceQueueLengthMax = stats.acceptanceQueueLength; - data.kernelStats = JSON.stringify(stats); + // note: the in-RAM copy of kernelStats will have the elevated + // acceptance-queue counters, but these are not written to the + // kvStore until a crank is executed, so the data we're comparing + // won't see them + // + // stats = JSON.parse(data.kernelStats); + // stats.acceptanceQueueLength += 3; + // stats.acceptanceQueueLengthUp += 3; + // stats.acceptanceQueueLengthMax = stats.acceptanceQueueLength; + // data.kernelStats = JSON.stringify(stats); // the refcounts should now be one larger, because of the queued // notifies