Skip to content

Commit

Permalink
fix(swingset): use "dirt" to schedule vat reap/bringOutYourDead
Browse files Browse the repository at this point in the history
NOTE: deployed kernels require a new `upgradeSwingset()` call upon
(at least) first boot after upgrading to this version of the kernel
code. See below for details.

`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 1000.

This caused BOYD to happen once every 1000 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`, and `neverReap` (a boolean which inhibits all BOYD,
even new forms of dirt added in the future).

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
    and `.neverReap`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The vat-level records override the kernel-wide
values. The current dirt level is recorded in `vNN.reapDirt`.

NOTE: deployed kernels require explicit state upgrade, with:

```js
import { upgradeSwingset } from '@agoric/swingset-vat';
..
upgradeSwingset(kernelStorage);
```

This must be called after upgrading to the new kernel code/release,
and before calling `buildVatController()`. It is safe to call on every
reboot (it will only modify the swingstore when the kernel version has
changed). If changes are made, the host application is responsible for
commiting them, as well as recording any export-data updates (if the
host configured the swingstore with an export-data callback).

During this upgrade, the old `reapCountdown` value is used to
initialize the vat's `reapDirt.deliveries` counter, so the upgrade
shouldn't disrupt the existing schedule. Vats which used `reapInterval
= 'never'` (eg comms) will get a `reapDirtThreshold.never = true`, so
they continue to inhibit BOYD. Any per-vat settings that match the
kernel-wide settings are removed, allowing the kernel values to take
precedence (as well as changes to the kernel-wide values; i.e. the
per-vat settings are not sticky).

We do not track dirt when the corresponding threshold is 'never', or
if `neverReap` is true, to avoid incrementing the comms dirt counters
forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
  • Loading branch information
warner authored and kriskowal committed Aug 27, 2024
1 parent 8fb92a9 commit 0d26d90
Show file tree
Hide file tree
Showing 23 changed files with 1,321 additions and 144 deletions.
39 changes: 35 additions & 4 deletions packages/SwingSet/src/controller/initializeKernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,32 @@ import { insistVatID } from '../lib/id.js';
import { makeVatSlot } from '../lib/parseVatSlots.js';
import { insistStorageAPI } from '../lib/storageAPI.js';
import { makeVatOptionRecorder } from '../lib/recordVatOptions.js';
import makeKernelKeeper from '../kernel/state/kernelKeeper.js';
import makeKernelKeeper, {
DEFAULT_DELIVERIES_PER_BOYD,
DEFAULT_GC_KREFS_PER_BOYD,
} from '../kernel/state/kernelKeeper.js';
import { exportRootObject } from '../kernel/kernel.js';
import { makeKernelQueueHandler } from '../kernel/kernelQueue.js';

/**
* @typedef { import('../types-external.js').SwingSetKernelConfig } SwingSetKernelConfig
* @typedef { import('../types-external.js').SwingStoreKernelStorage } SwingStoreKernelStorage
* @typedef { import('../types-internal.js').InternalKernelOptions } InternalKernelOptions
* @typedef { import('../types-internal.js').ReapDirtThreshold } ReapDirtThreshold
*/

function makeVatRootObjectSlot() {
return makeVatSlot('object', true, 0);
}

/**
* @param {SwingSetKernelConfig} config
* @param {SwingStoreKernelStorage} kernelStorage
* @param {*} [options]
* @returns {Promise<string | undefined>} KPID of the bootstrap message
* result promise
*/

export async function initializeKernel(config, kernelStorage, options = {}) {
const {
verbose = false,
Expand All @@ -25,6 +43,9 @@ export async function initializeKernel(config, kernelStorage, options = {}) {
const logStartup = verbose ? console.debug : () => 0;
insistStorageAPI(kernelStorage.kvStore);

const CURRENT_VERSION = 1;
kernelStorage.kvStore.set('version', `${CURRENT_VERSION}`);

const kernelSlog = null;
const kernelKeeper = makeKernelKeeper(kernelStorage, kernelSlog);
const optionRecorder = makeVatOptionRecorder(kernelKeeper, bundleHandler);
Expand All @@ -33,14 +54,22 @@ export async function initializeKernel(config, kernelStorage, options = {}) {
assert(!wasInitialized);
const {
defaultManagerType,
defaultReapInterval,
defaultReapInterval = DEFAULT_DELIVERIES_PER_BOYD,
defaultReapGCKrefs = DEFAULT_GC_KREFS_PER_BOYD,
relaxDurabilityRules,
snapshotInitial,
snapshotInterval,
} = config;
/** @type { ReapDirtThreshold } */
const defaultReapDirtThreshold = {
deliveries: defaultReapInterval,
gcKrefs: defaultReapGCKrefs,
computrons: 'never', // TODO no knob?
};
/** @type { InternalKernelOptions } */
const kernelOptions = {
defaultManagerType,
defaultReapInterval,
defaultReapDirtThreshold,
relaxDurabilityRules,
snapshotInitial,
snapshotInterval,
Expand All @@ -49,7 +78,7 @@ export async function initializeKernel(config, kernelStorage, options = {}) {

for (const id of Object.keys(config.idToBundle || {})) {
const bundle = config.idToBundle[id];
assert.equal(bundle.moduleFormat, 'endoZipBase64');
assert(bundle.moduleFormat === 'endoZipBase64');
if (!kernelKeeper.hasBundle(id)) {
kernelKeeper.addBundle(id, bundle);
}
Expand Down Expand Up @@ -86,6 +115,8 @@ export async function initializeKernel(config, kernelStorage, options = {}) {
'useTranscript',
'critical',
'reapInterval',
'reapGCKrefs',
'neverReap',
'nodeOptions',
]);
const vatID = kernelKeeper.allocateVatIDForNameIfNeeded(name);
Expand Down
2 changes: 1 addition & 1 deletion packages/SwingSet/src/controller/initializeSwingset.js
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ export async function initializeSwingset(
enableSetup: true,
managerType: 'local',
useTranscript: false,
reapInterval: 'never',
neverReap: true,
},
};
}
Expand Down
193 changes: 193 additions & 0 deletions packages/SwingSet/src/controller/upgradeSwingset.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
import {
DEFAULT_REAP_DIRT_THRESHOLD_KEY,
DEFAULT_GC_KREFS_PER_BOYD,
getAllDynamicVats,
getAllStaticVats,
} from '../kernel/state/kernelKeeper.js';

const upgradeVatV0toV1 = (kvStore, defaultReapDirtThreshold, vatID) => {
// This is called, once per vat, when upgradeSwingset migrates from
// v0 to v1

// schema v0:
// Each vat has a `vNN.reapInterval` and `vNN.reapCountdown`.
// vNN.options has a `.reapInterval` property (however it was not
// updated by processChangeVatOptions). Either all are numbers, or
// all are 'never'.

const oldReapIntervalKey = `${vatID}.reapInterval`;
const oldReapCountdownKey = `${vatID}.reapCountdown`;
const vatOptionsKey = `${vatID}.options`;

// schema v1:
// Each vat has a `vNN.reapDirt`, and vNN.options has a
// `.reapDirtThreshold` property (which overrides kernel-wide
// `defaultReapDirtThreshold`)

const reapDirtKey = `${vatID}.reapDirt`;

assert(kvStore.has(oldReapIntervalKey), oldReapIntervalKey);
assert(kvStore.has(oldReapCountdownKey), oldReapCountdownKey);
assert(!kvStore.has(reapDirtKey), reapDirtKey);

// initialize or upgrade state
const reapDirt = {}; // all missing keys are treated as zero
const threshold = {};

const reapIntervalString = kvStore.get(oldReapIntervalKey);
assert(reapIntervalString !== undefined);
const reapCountdownString = kvStore.get(oldReapCountdownKey);
assert(reapCountdownString !== undefined);
const intervalIsNever = reapIntervalString === 'never';
const countdownIsNever = reapCountdownString === 'never';
assert(
(intervalIsNever && countdownIsNever) ||
(!intervalIsNever && !countdownIsNever),
`reapInterval=${reapIntervalString}, reapCountdown=${reapCountdownString}`,
);

if (!intervalIsNever && !countdownIsNever) {
// deduce delivery count from old countdown values
const reapInterval = Number.parseInt(reapIntervalString, 10);
const reapCountdown = Number.parseInt(reapCountdownString, 10);
const deliveries = reapInterval - reapCountdown;
reapDirt.deliveries = Math.max(deliveries, 0); // just in case
if (reapInterval !== defaultReapDirtThreshold.deliveries) {
threshold.deliveries = reapInterval;
}
}

// old vats that were never reaped (eg comms) used
// reapInterval='never', so respect that and set the other
// threshold values to never as well
if (intervalIsNever) {
threshold.never = true;
}
kvStore.delete(oldReapIntervalKey);
kvStore.delete(oldReapCountdownKey);
kvStore.set(reapDirtKey, JSON.stringify(reapDirt));

// remove .reapInterval from options, replace with .reapDirtThreshold
const options = JSON.parse(kvStore.get(vatOptionsKey));
delete options.reapInterval;
options.reapDirtThreshold = threshold;
kvStore.set(vatOptionsKey, JSON.stringify(options));
};

/**
* (maybe) upgrade the kernel state to the current schema
*
* This function is responsible for bringing the kernel's portion of
* swing-store (kernelStorage) up to the current version. The host app
* must call this each time it launches with a new version of
* swingset, before using makeSwingsetController() to build the
* kernel/controller (which will throw an error if handed an old
* database). It is ok to call it only on those reboots, but it is
* also safe to call on every reboot, because upgradeSwingset() is a
* no-op if the DB is already up-to-date.
*
* If an upgrade is needed, this function will modify the DB state, so
* the host app must be prepared for export-data callbacks being
* called during the upgrade, and it is responsible for doing a
* `hostStorage.commit()` afterwards.
*
* @param { SwingStoreKernelStorage } kernelStorage
* @returns { boolean } true if any changes were made
*/
export const upgradeSwingset = kernelStorage => {
const { kvStore } = kernelStorage;
let modified = false;
let vstring = kvStore.get('version');
if (vstring === undefined) {
vstring = '0';
}
let version = Number(vstring);

/**
* @param {string} key
* @returns {string}
*/
function getRequired(key) {
if (!kvStore.has(key)) {
throw Error(`storage lacks required key ${key}`);
}
// @ts-expect-error already checked .has()
return kvStore.get(key);
}

// kernelKeeper.js has a large comment that defines our current
// kvStore schema, with a section describing the deltas. The upgrade
// steps applied here must match.

// schema v0:
// The kernel overall has `kernel.defaultReapInterval`.
// Each vat has a `vNN.reapInterval` and `vNN.reapCountdown`.
// vNN.options has a `.reapInterval` property (however it was not
// updated by processChangeVatOptions, so do not rely upon its
// value). Either all are numbers, or all are 'never'.

if (version < 1) {
// schema v1:
// The kernel overall has `kernel.defaultReapDirtThreshold`.
// Each vat has a `vNN.reapDirt`, and vNN.options has a
// `.reapDirtThreshold` property

// So:
// * replace `kernel.defaultReapInterval` with
// `kernel.defaultReapDirtThreshold`
// * replace vat's `vNN.reapInterval`/`vNN.reapCountdown` with
// `vNN.reapDirt` and a `vNN.reapDirtThreshold` in `vNN.options`
// * then do per-vat upgrades with upgradeVatV0toV1

// upgrade from old kernel.defaultReapInterval

const oldDefaultReapIntervalKey = 'kernel.defaultReapInterval';
assert(kvStore.has(oldDefaultReapIntervalKey));
assert(!kvStore.has(DEFAULT_REAP_DIRT_THRESHOLD_KEY));

/**
* @typedef { import('../types-internal.js').ReapDirtThreshold } ReapDirtThreshold
*/

/** @type ReapDirtThreshold */
const threshold = {
deliveries: 'never',
gcKrefs: 'never',
computrons: 'never',
};

const oldValue = getRequired(oldDefaultReapIntervalKey);
if (oldValue !== 'never') {
const value = Number.parseInt(oldValue, 10);
assert.typeof(value, 'number');
threshold.deliveries = value;
// if BOYD wasn't turned off entirely (eg
// defaultReapInterval='never', which only happens in unit
// tests), then pretend we wanted a gcKrefs= threshold all
// along, so all vats get a retroactive gcKrefs threshold, which
// we need for the upcoming slow-vat-deletion to not trigger
// gigantic BOYD and break the kernel
threshold.gcKrefs = DEFAULT_GC_KREFS_PER_BOYD;
}
harden(threshold);
kvStore.set(DEFAULT_REAP_DIRT_THRESHOLD_KEY, JSON.stringify(threshold));
kvStore.delete(oldDefaultReapIntervalKey);

// now upgrade all vats
for (const [_name, vatID] of getAllStaticVats(kvStore)) {
upgradeVatV0toV1(kvStore, threshold, vatID);
}
for (const vatID of getAllDynamicVats(getRequired)) {
upgradeVatV0toV1(kvStore, threshold, vatID);
}

modified = true;
version = 1;
}

if (modified) {
kvStore.set('version', `${version}`);
}
return modified;
};
harden(upgradeSwingset);
2 changes: 1 addition & 1 deletion packages/SwingSet/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ export {
loadBasedir,
loadSwingsetConfigFile,
} from './controller/initializeSwingset.js';

export { upgradeSwingset } from './controller/upgradeSwingset.js';
export {
buildMailboxStateMap,
buildMailbox,
Expand Down
Loading

0 comments on commit 0d26d90

Please sign in to comment.