Skip to content

Commit

Permalink
feat(swingset): add "upgrade events" management
Browse files Browse the repository at this point in the history
Host applications are responsible for calling
`upgradeSwingset(kernelStorage)` before creating the controller, at
least when the kernel package version has been changed, to perform any
necessary upgrades to the kernel state.

Previously, if the upgrades needed to inject messages into the
run-queue (e.g. the `dispatch.notify` messages used to remediate
leftover problems from #9039), `upgradeSwingset` would inject them
automatically, which could intermingle their cranks with anything
still in the run-queue when the upgrade occurs (like if the previous
block had leftover work to do).

With this commit, these messages are instead held internally by the
kernel, in a new kvStore key (`upgradeEvents`), so the host can choose
when to inject them, for instance *after* it has drained the run-queue
of any leftover work.

This introduces a new host-app responsibility: when `upgradeEvents()`
indicates that modifications have been made, the app must call
`controller.injectQueuedUpgradeEvents()` some time before the next
`hostStorage.commit()`. That will add the remediation messages to the
acceptance queue, so the next `controller.run()` will execute them.
  • Loading branch information
warner committed Oct 25, 2024
1 parent 182cdf6 commit 4580d4f
Show file tree
Hide file tree
Showing 8 changed files with 158 additions and 44 deletions.
28 changes: 23 additions & 5 deletions packages/SwingSet/docs/host-app.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,40 @@ SwingSet provides robust and deterministic computation, even in the face of unex

### Kernel Upgrade

The life cycle of a SwingSet kernel begins with the one and only call to `initializeSwingset()`, which populates the swingstore DB for the first time. After that, the kernel is presumed to be immortal, but its execution is broken up into a series of reboots. Each reboot (e.g. each time the host application is started), the app must build a new controller with `makeSwingsetController()`, to have something to run.
The life cycle of a SwingSet kernel begins with the one and only call to `initializeSwingset()`, which populates the SwingStore DB for the first time. After that, the kernel is presumed to be immortal, but its execution is broken up into a series of reboots. Each reboot (e.g. each time the host application is started), the app must build a new controller with `makeSwingsetController()`, to have something to run.

From time to time, the host app will be upgraded to use a newer version of the SwingSet kernel code (e.g. a new version of the `@agoric/swingset-vat` package). This newer version might require an upgrade to the kernel's internal state. For example, the way it represents some data about a vat might be made more efficient, and the upgrade process needs to examine and rewrite vat state to switch to the new representation. Or, a bug might be fixed, and the upgrade process needs to locate and remediate any consequences of the bug having been present for earlier execution.
From time to time, the host app will be upgraded to use a newer version of the SwingSet kernel code (e.g. a new version of this `@agoric/swingset-vat` package). The newer version might require an upgrade to the kernel's persisted/durable state. For example, the way it represents some vat metadata might be made more efficient, and the upgrade process needs to examine and rewrite vat state to use the new representation. Or, a bug might be fixed, and the upgrade process needs to locate and remediate any consequences of the bug having been present during earlier execution.

To make the resulting state changes occur at a predictable time, upgrades do not happen automatically. Instead, each time the host app reboots with a new version of the kernel code, it must call `upgradeSwingset(kernelStorage)`. It must do this *before* calling `makeSwingsetController()`, as that function will throw an error if given a swingstore that has not been upgraded.
To make the resulting state changes occur deterministically, upgrades are not automatic. Instead, each time the host app reboots with a new version of the kernel code, it must call `upgradeSwingset(kernelStorage)`. It must do this *before* calling `makeSwingsetController()`, as that function will throw an error if given a SwingStore that has not been upgraded.

It is safe to call `upgradeSwingset` on reboots that do not change the version of the kernel code: the function is idempotent, and will do nothing if the swingstore is already up-to-date.
It is safe to call `upgradeSwingset` on reboots that do not change the version of the kernel code: the function is idempotent, and will do nothing if the SwingStore is already up-to-date.

Some upgrades (in particular bug remediations) need to add events to the kernel's run-queue. To avoid having these events be intermingled with work that might already be on the run-queue at reboot time (e.g. work leftover from previous runs), these events are not automatically injected at that time. Instead, the kernel remembers what needs to be done, and waits for the host to invoke `controller.injectQueuedUpgradeEvents()` at a time of their choosing. This should be done before the next `commit()`, to avoid the risk of them being lost by a reboot.

So most host applications will start each reboot with a sequence like this:

```js
const { hostStorage, kernelStorage } = openSwingStore(baseDirectory);
upgradeSwingset(kernelStorage);
const c = makeSwingsetController(kernelStorage, deviceEndowments);
const controller = makeSwingsetController(kernelStorage, deviceEndowments);
controller.injectQueuedUpgradeEvents();
```

followed by later code to execute runs. Inputs can be fed all-at-once, or each processed in their own run, but at the end of the block, the host app must `commit()` the DB:

```js
async function doBlock(deviceInputs) {
for (const input of deviceInputs) {
injectDeviceInput(input);
await controller.run();
}
hostStorage.commit();
emitDeviceOutputs();
}
```

The actual signature of `upgradeSwingset` is `const { modified } = upgradeSwingset(kernelStorage)`, and the `modified` flag indicates whether anything actually got upgraded. Host applications which have other means to keep track of software upgrades may wish to assert that `modified === false` in reboots that are not associated with a change to the kernel package version. They can also safely skip the `injectQueuedUpgradeEvents` call if nothing was modified.

### Crank Execution

For convenience in discussion, we split execution into "blocks". During a block, the host may call one or more device inputs, such as inbound messages, or timer wakeup events. The end of the block is marked by one or more calls to `controller.run()`, followed by a `hostStorage.commit()`, followed by the host-provided device endowments doing whatever kind of outbound IO they need to do.
Expand Down
2 changes: 2 additions & 0 deletions packages/SwingSet/src/controller/controller.js
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,8 @@ export async function makeSwingsetController(
return kernel.deviceNameToID(deviceName);
},

injectQueuedUpgradeEvents: () => kernel.injectQueuedUpgradeEvents(),

/**
* Queue a method call into the named vat
*
Expand Down
47 changes: 21 additions & 26 deletions packages/SwingSet/src/controller/upgradeSwingset.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ import {
getAllDynamicVats,
getAllStaticVats,
incrementReferenceCount,
addToQueue,
} from '../kernel/state/kernelKeeper.js';
import { enumeratePrefixedKeys } from '../kernel/state/storageHelper.js';

/**
* @import {RunQueueEvent} from '../types-internal.js';
*/

const upgradeVatV0toV1 = (kvStore, defaultReapDirtThreshold, vatID) => {
// This is called, once per vat, when upgradeSwingset migrates from
// v0 to v1
Expand Down Expand Up @@ -96,11 +99,13 @@ const upgradeVatV0toV1 = (kvStore, defaultReapDirtThreshold, vatID) => {
* `hostStorage.commit()` afterwards.
*
* @param {SwingStoreKernelStorage} kernelStorage
* @returns {boolean} true if any changes were made
* @returns {{ modified: boolean }}
*/
export const upgradeSwingset = kernelStorage => {
const { kvStore } = kernelStorage;
let modified = false;
/** @type {RunQueueEvent[]} */
const upgradeEvents = [];
let vstring = kvStore.get('version');
if (vstring === undefined) {
vstring = '0';
Expand Down Expand Up @@ -327,40 +332,30 @@ export const upgradeSwingset = kernelStorage => {
// only the upgraded vat will see anything, and those deliveries
// won't make it past liveslots).

const kernelStats = JSON.parse(getRequired('kernelStats'));
// copied from kernel/state/stats.js, awkward to factor out
const incStat = (stat, delta = 1) => {
assert.equal(stat, 'acceptanceQueueLength');
kernelStats[stat] += delta;
const maxStat = `${stat}Max`;
if (
kernelStats[maxStat] !== undefined &&
kernelStats[stat] > kernelStats[maxStat]
) {
kernelStats[maxStat] = kernelStats[stat];
}
const upStat = `${stat}Up`;
if (kernelStats[upStat] !== undefined) {
kernelStats[upStat] += delta;
}
};

let count = 0;
for (const [kpid, vatID] of buggyKPIDs) {
const m = harden({ type: 'notify', vatID, kpid });
// account for the reference to this kpid in upgradeEvents
incrementReferenceCount(getRequired, kvStore, kpid, `enq|notify`);
addToQueue('acceptanceQueue', m, getRequired, kvStore, incStat);
upgradeEvents.push({ type: 'notify', vatID, kpid });
count += 1;
}

kvStore.set('kernelStats', JSON.stringify(kernelStats));

console.log(` - #9039 remediation complete`);
console.log(` - #9039 remediation complete, ${count} notifies to inject`);
modified = true;
version = 3;
}

if (upgradeEvents.length) {
assert(modified);
// stash until host calls controller.injectQueuedUpgradeEvents()
const oldEvents = JSON.parse(kvStore.get('upgradeEvents') || '[]');
const events = [...oldEvents, ...upgradeEvents];
kvStore.set('upgradeEvents', JSON.stringify(events));
}

if (modified) {
kvStore.set('version', `${version}`);
}
return modified;
return harden({ modified });
};
harden(upgradeSwingset);
5 changes: 5 additions & 0 deletions packages/SwingSet/src/kernel/kernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,10 @@ export default function buildKernel(
return deviceID;
}

function injectQueuedUpgradeEvents() {
kernelKeeper.injectQueuedUpgradeEvents();
}

function addImport(forVatID, what) {
if (!started) {
throw Error('must do kernel.start() before addImport()');
Expand Down Expand Up @@ -2207,6 +2211,7 @@ export default function buildKernel(
pinObject,
vatNameToID,
deviceNameToID,
injectQueuedUpgradeEvents,
queueToKref,
kpRegisterInterest,
kpStatus,
Expand Down
26 changes: 26 additions & 0 deletions packages/SwingSet/src/kernel/state/kernelKeeper.js
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ export const CURRENT_SCHEMA_VERSION = 3;
// gcActions = JSON(gcActions)
// reapQueue = JSON([vatIDs...])
// pinnedObjects = ko$NN[,ko$NN..]
// upgradeEvents = JSON([events..])

// ko.nextID = $NN
// ko$NN.owner = $vatID
Expand Down Expand Up @@ -182,6 +183,8 @@ export const CURRENT_SCHEMA_VERSION = 3;
// v3:
// * change `version` to `'3'`
// * perform remediation for bug #9039
// (after v3, does not get its own version)
// * `upgradeEvents` recognized, but omitted if empty

/** @type {(s: string) => string[]} s */
export function commaSplit(s) {
Expand Down Expand Up @@ -1267,6 +1270,27 @@ export default function makeKernelKeeper(
return dequeue('acceptanceQueue');
}

function injectQueuedUpgradeEvents() {
// refcounts: Any krefs in `upgradeEvents` must have a refcount to
// represent the list's hold on those objects. When
// upgradeSwingset() creates these events, it must also
// incref(kref), otherwise we run the risk of dropping the kref by
// the time injectQueuedUpgradeEvents() is called. We're nominally
// removing each event from upgradeEvents (decref), then pushing
// it onto the run-queue (incref), but since those two cancel each
// other out, we don't actually need to modify any reference
// counts from within this function. Note that
// addToAcceptanceQueue does not increment refcounts, just kernel
// queue-length stats.

const events = JSON.parse(kvStore.get('upgradeEvents') || '[]');
kvStore.delete('upgradeEvents');
for (const e of events) {
assert(e.type, `not an event`);
addToAcceptanceQueue(e);
}
}

function allocateMeter(remaining, threshold) {
if (remaining !== 'unlimited') {
assert.typeof(remaining, 'bigint');
Expand Down Expand Up @@ -1966,6 +1990,8 @@ export default function makeKernelKeeper(
getAcceptanceQueueLength,
getNextAcceptanceQueueMsg,

injectQueuedUpgradeEvents,

allocateMeter,
addMeterRemaining,
setMeterThreshold,
Expand Down
43 changes: 43 additions & 0 deletions packages/SwingSet/test/kernel.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -1817,3 +1817,46 @@ test('reap gc-krefs 12', async t => {
test('reap gc-krefs overrideNever', async t => {
await reapGCKrefsTest(t, 12, true);
});

test('injectQueuedUpgradeEvents', async t => {
const endowments = makeKernelEndowments();
const { kernelStorage } = endowments;
const { kvStore } = kernelStorage;
await initializeKernel({}, kernelStorage);
const kernel = buildKernel(endowments, {}, {});
await kernel.start();
// TODO: extract `queueLength`/`dumpQueue` from kernelKeeper.js?
{
const [start, end] = JSON.parse(kvStore.get('acceptanceQueue'));
t.deepEqual([start, end], [1, 1]);
}

// These events would fail if they actually got processed: the vatID
// and all the krefs are fake, the methargs are not valid capdata,
// and we haven't updated refcounts. But all we care about is that
// they make it onto the acceptance queue.
const vatID = 'v1';
const e1 = harden({ type: 'notify', vatID, kpid: 'kp25' });
// kernelKeeper.incrementRefCount(kpid, `enq|notify`);
const target = 'ko11';
const methargs = '';
const msg = harden({ methargs, result: 'kp33' });
const e2 = harden({ type: 'send', target, msg });
kvStore.set('upgradeEvents', JSON.stringify([e1, e2]));

kernel.injectQueuedUpgradeEvents([e1, e2]);

{
const [start, end] = JSON.parse(kvStore.get('acceptanceQueue'));
t.deepEqual([start, end], [1, 3]);
t.deepEqual(JSON.parse(kvStore.get('acceptanceQueue.1')), e1);
t.deepEqual(JSON.parse(kvStore.get('acceptanceQueue.2')), e2);
}

// It'd be nice to also check that the stats were incremented:
// stats.acceptanceQueueLength shoule be 2, acceptanceQueueLengthUp
// should be 2, and acceptanceQueueLengthMax should be 2. But the
// stats are held in RAM until a crank is executed, and we're not
// doing that here. And the kernel hides the kernelKeeper so we
// can't call getStats() to check.
});
22 changes: 20 additions & 2 deletions packages/SwingSet/test/state.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -1109,12 +1109,15 @@ test('dirt upgrade', async t => {

// it requires a manual upgrade
let k2;
let ks2;
{
const serialized = store.serialize();
const { kernelStorage } = initSwingStore(null, { serialized });
upgradeSwingset(kernelStorage);
const { modified } = upgradeSwingset(kernelStorage);
t.true(modified);
k2 = makeKernelKeeper(kernelStorage, CURRENT_SCHEMA_VERSION); // works this time
k2.loadStats();
ks2 = kernelStorage;
}

t.true(k2.kvStore.has(`kernel.defaultReapDirtThreshold`));
Expand Down Expand Up @@ -1157,6 +1160,12 @@ test('dirt upgrade', async t => {
t.deepEqual(JSON.parse(k2.kvStore.get(`${v3}.options`)).reapDirtThreshold, {
never: true,
});

{
// upgrade is idempotent
const { modified } = upgradeSwingset(ks2);
t.false(modified);
}
});

test('v2 upgrade', async t => {
Expand All @@ -1177,15 +1186,24 @@ test('v2 upgrade', async t => {

// it requires a manual upgrade
let k2;
let ks2;
{
const serialized = store.serialize();
const { kernelStorage } = initSwingStore(null, { serialized });
upgradeSwingset(kernelStorage);
const { modified } = upgradeSwingset(kernelStorage);
t.true(modified);
k2 = makeKernelKeeper(kernelStorage, CURRENT_SCHEMA_VERSION); // works this time
k2.loadStats();
ks2 = kernelStorage;
}

t.true(k2.kvStore.has(`vats.terminated`));
t.deepEqual(JSON.parse(k2.kvStore.get(`vats.terminated`)), []);
t.is(k2.kvStore.get(`version`), '3');

{
// upgrade is idempotent
const { modified } = upgradeSwingset(ks2);
t.false(modified);
}
});
29 changes: 18 additions & 11 deletions packages/SwingSet/test/upgrade-swingset.test.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
/* eslint-disable no-underscore-dangle */
// @ts-nocheck

// eslint-disable-next-line import/order
Expand Down Expand Up @@ -162,7 +161,8 @@ test('upgrade kernel state', async t => {
upgradeSwingset(kernelStorage);

// now we should be good to go
const _controller = await makeSwingsetController(kernelStorage);
const controller = await makeSwingsetController(kernelStorage);
controller.injectQueuedUpgradeEvents();

t.true(kvStore.has('kernel.defaultReapDirtThreshold'));
// the kernel-wide threshold gets a .gcKrefs (to meet our upcoming
Expand Down Expand Up @@ -245,7 +245,8 @@ test('upgrade non-reaping kernel state', async t => {
upgradeSwingset(kernelStorage);

// now we should be good to go
const _controller = await makeSwingsetController(kernelStorage);
const controller = await makeSwingsetController(kernelStorage);
controller.injectQueuedUpgradeEvents();

t.true(kvStore.has('kernel.defaultReapDirtThreshold'));
t.deepEqual(JSON.parse(kvStore.get('kernel.defaultReapDirtThreshold')), {
Expand Down Expand Up @@ -374,7 +375,7 @@ test('v3 upgrade', async t => {
}
kvStore.set('acceptanceQueue', JSON.stringify([accHead, accTail]));

let stats = JSON.parse(kvStore.get('kernelStats'));
const stats = JSON.parse(kvStore.get('kernelStats'));
stats.runQueueLength += runQueue.length;
stats.runQueueLengthUp += runQueue.length;
stats.runQueueLengthMax = runQueue.length;
Expand All @@ -394,8 +395,10 @@ test('v3 upgrade', async t => {

// upgrade it
upgradeSwingset(kernelStorage);

// now we should be good to go
const _controller = await makeSwingsetController(kernelStorage);
const controller = await makeSwingsetController(kernelStorage);
controller.injectQueuedUpgradeEvents();

// check state by mutating our dumped copy and then comparing
// against a new dump
Expand All @@ -415,12 +418,16 @@ test('v3 upgrade', async t => {
const np5 = JSON.stringify({ type: 'notify', vatID, kpid: p5 });
data[`acceptanceQueue.${tail - 1}`] = np5;

// stats are updated with the queue changes
stats = JSON.parse(data.kernelStats);
stats.acceptanceQueueLength += 3;
stats.acceptanceQueueLengthUp += 3;
stats.acceptanceQueueLengthMax = stats.acceptanceQueueLength;
data.kernelStats = JSON.stringify(stats);
// note: the in-RAM copy of kernelStats will have the elevated
// acceptance-queue counters, but these are not written to the
// kvStore until a crank is executed, so the data we're comparing
// won't see them
//
// stats = JSON.parse(data.kernelStats);
// stats.acceptanceQueueLength += 3;
// stats.acceptanceQueueLengthUp += 3;
// stats.acceptanceQueueLengthMax = stats.acceptanceQueueLength;
// data.kernelStats = JSON.stringify(stats);

// the refcounts should now be one larger, because of the queued
// notifies
Expand Down

0 comments on commit 4580d4f

Please sign in to comment.