-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dirty-vat -driven BOYD scheduler #8980
Comments
I keep running into new wrinkles as I try to minimize the changes (and make sure all the hard-to-change callers can keep using the old API), so I'll write up my plan first: The old code:
We don't want / can't change vat-vat-admin or device-vat-admin, so we'll continue to accept The upgrade process needs to:
The new code will do:
|
Previously, vat state initialization (e.g. setting counters to zero) happened lazily, the first time that `provideVatKeeper()` was called. When creating a new vat, the pattern was: vk = kernelKeeper.provideVatKeeper(); vk.setSourceAndOptions(source, options); Now, we initialize both counters and source/options explicitly, in `recordVatOptions`, when the static/dynamic vat is first defined: kernelKeeper.createVatState(vatID, source, options); In the future, this will make it easier for `provideVatKeeper` to rely upon the presence of all vat state keys, especially `options`. Previously, vatKeeper.getOptions() would tolerate a missing key by returning empty options. The one place where this was needed (terminating a barely-created vat because startVat() failed, using getOptions().critical) was changed, so this tolerance is no longer needed, and was removed. The tolerance caused bug #9157 (kernel doesn't panic when it should), which continues to be present, but at least won't cause a crash. refs #8980
`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection inside a vat, and gives it a chance to drop imported c-list vrefs that are no longer referenced by anything inside the vat. Previously, each vat has a configurable parameter named `reapInterval`, which defaults to a kernel-wide `defaultReapInterval` (but can be set separately for each vat). This defaults to 1, mainly for unit testing, but real applications set it to something like 200. This caused BOYD to happen once every 200 deliveries, plus an extra BOYD just before we save an XS heap-state snapshot. This commit switches to a "dirt"-based BOYD scheduler, wherein we consider the vat to get more and more dirty as it does work, and eventually it reaches a `reapDirtThreshold` that triggers the BOYD (which resets the dirt counter). We continue to track `dirt.deliveries` as before, with the same defaults. But we add a new `dirt.gcKrefs` counter, which is incremented by the krefs we submit to the vat in GC deliveries. For example, calling `dispatch.dropImports([kref1, kref2])` would increase `dirt.gcKrefs` by two. The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use patterns, this will trigger a BOYD after ten krefs have been dropped and retired. We choose this value to allow the #8928 slow vat termination process to trigger BOYD frequently enough to keep the BOYD cranks small: since these will be happening constantly (in the "background"), we don't want them to take more than 500ms or so. Given the current size of the large vats that #8928 seeks to terminate, 10 krefs seems like a reasonable limit. And of course we don't want to perform too many BOYDs, so `gcKrefs: 20` is about the smallest threshold we'd want to use. External APIs continue to accept `reapInterval`, and now also accept `reapGCKrefs`. * kernel config record * takes `config.defaultReapInterval` and `defaultReapGCKrefs` * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs` * `controller.changeKernelOptions()` still takes `defaultReapInterval` but now also accepts `defaultReapGCKrefs` The APIs available to userspace code (through `vatAdminSvc`) are unchanged (partially due to upgrade/backwards-compatibility limitations), and continue to only support setting `reapInterval`. Internally, this just modifies `reapDirtThreshold.deliveries`. * `E(vatAdminSvc).createVat(bcap, { reapInterval })` * `E(adminNode).upgrade(bcap, { reapInterval })` * `E(adminNode).changeOptions({ reapInterval })` Internally, the kernel-wide state records `defaultReapDirtThreshold` instead of `defaultReapInterval`, and each vat records `.reapDirtThreshold` in their `vNN.options` key instead of `vNN.reapInterval`. The current dirt level is recorded in `vNN.reapDirt`. The kernel will automatically upgrade both the kernel-wide and the per-vat state upon the first reboot with the new kernel code. The old `reapCountdown` value is used to initialize the vat's `reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the existing schedule. Vats which used `reapInterval = 'never'` (eg comms) will get a `reapDirtThreshold` of all 'never' values, so they continue to inhibit BOYD. Otherwise, all vats get a `threshold.gcKrefs` of 20. We do not track dirt when the corresponding threshold is 'never', to avoid incrementing the comms dirt counters forever. This design leaves room for adding `.computrons` to the dirt record, as well as tracking a separate `snapshotDirt` counter (to trigger XS heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but do not yet expose an API to set it. Future work includes: * upgrade vat-vat-admin to let userspace set `reapDirtThreshold` New tests were added to exercise the upgrade process, and other tests were updated to match the new internal initialization pattern. We now reset the dirt counter upon any BOYD, so this also happens to help with #8665 (doing a `reapAllVats()` resets the delivery counters, so future BOYDs will be delayed, which is what we want). But we should still change `controller.reapAllVats()` to avoid BOYDs on vats which haven't received any deliveries. closes #8980
Previously, vat state initialization (e.g. setting counters to zero) happened lazily, the first time that `provideVatKeeper()` was called. When creating a new vat, the pattern was: vk = kernelKeeper.provideVatKeeper(); vk.setSourceAndOptions(source, options); Now, we initialize both counters and source/options explicitly, in `recordVatOptions`, when the static/dynamic vat is first defined: kernelKeeper.createVatState(vatID, source, options); In the future, this will make it easier for `provideVatKeeper` to rely upon the presence of all vat state keys, especially `options`. Previously, vatKeeper.getOptions() would tolerate a missing key by returning empty options. The one place where this was needed (terminating a barely-created vat because startVat() failed, using getOptions().critical) was changed, so this tolerance is no longer needed, and was removed. The tolerance caused bug #9157 (kernel doesn't panic when it should), which continues to be present, but at least won't cause a crash. refs #8980
`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection inside a vat, and gives it a chance to drop imported c-list vrefs that are no longer referenced by anything inside the vat. Previously, each vat has a configurable parameter named `reapInterval`, which defaults to a kernel-wide `defaultReapInterval` (but can be set separately for each vat). This defaults to 1, mainly for unit testing, but real applications set it to something like 200. This caused BOYD to happen once every 200 deliveries, plus an extra BOYD just before we save an XS heap-state snapshot. This commit switches to a "dirt"-based BOYD scheduler, wherein we consider the vat to get more and more dirty as it does work, and eventually it reaches a `reapDirtThreshold` that triggers the BOYD (which resets the dirt counter). We continue to track `dirt.deliveries` as before, with the same defaults. But we add a new `dirt.gcKrefs` counter, which is incremented by the krefs we submit to the vat in GC deliveries. For example, calling `dispatch.dropImports([kref1, kref2])` would increase `dirt.gcKrefs` by two. The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use patterns, this will trigger a BOYD after ten krefs have been dropped and retired. We choose this value to allow the #8928 slow vat termination process to trigger BOYD frequently enough to keep the BOYD cranks small: since these will be happening constantly (in the "background"), we don't want them to take more than 500ms or so. Given the current size of the large vats that #8928 seeks to terminate, 10 krefs seems like a reasonable limit. And of course we don't want to perform too many BOYDs, so `gcKrefs: 20` is about the smallest threshold we'd want to use. External APIs continue to accept `reapInterval`, and now also accept `reapGCKrefs`. * kernel config record * takes `config.defaultReapInterval` and `defaultReapGCKrefs` * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs` * `controller.changeKernelOptions()` still takes `defaultReapInterval` but now also accepts `defaultReapGCKrefs` The APIs available to userspace code (through `vatAdminSvc`) are unchanged (partially due to upgrade/backwards-compatibility limitations), and continue to only support setting `reapInterval`. Internally, this just modifies `reapDirtThreshold.deliveries`. * `E(vatAdminSvc).createVat(bcap, { reapInterval })` * `E(adminNode).upgrade(bcap, { reapInterval })` * `E(adminNode).changeOptions({ reapInterval })` Internally, the kernel-wide state records `defaultReapDirtThreshold` instead of `defaultReapInterval`, and each vat records `.reapDirtThreshold` in their `vNN.options` key instead of `vNN.reapInterval`. The current dirt level is recorded in `vNN.reapDirt`. The kernel will automatically upgrade both the kernel-wide and the per-vat state upon the first reboot with the new kernel code. The old `reapCountdown` value is used to initialize the vat's `reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the existing schedule. Vats which used `reapInterval = 'never'` (eg comms) will get a `reapDirtThreshold` of all 'never' values, so they continue to inhibit BOYD. Otherwise, all vats get a `threshold.gcKrefs` of 20. We do not track dirt when the corresponding threshold is 'never', to avoid incrementing the comms dirt counters forever. This design leaves room for adding `.computrons` to the dirt record, as well as tracking a separate `snapshotDirt` counter (to trigger XS heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but do not yet expose an API to set it. Future work includes: * upgrade vat-vat-admin to let userspace set `reapDirtThreshold` New tests were added to exercise the upgrade process, and other tests were updated to match the new internal initialization pattern. We now reset the dirt counter upon any BOYD, so this also happens to help with #8665 (doing a `reapAllVats()` resets the delivery counters, so future BOYDs will be delayed, which is what we want). But we should still change `controller.reapAllVats()` to avoid BOYDs on vats which haven't received any deliveries. closes #8980
Previously, vat state initialization (e.g. setting counters to zero) happened lazily, the first time that `provideVatKeeper()` was called. When creating a new vat, the pattern was: vk = kernelKeeper.provideVatKeeper(); vk.setSourceAndOptions(source, options); Now, we initialize both counters and source/options explicitly, in `recordVatOptions`, when the static/dynamic vat is first defined: kernelKeeper.createVatState(vatID, source, options); In the future, this will make it easier for `provideVatKeeper` to rely upon the presence of all vat state keys, especially `options`. Previously, vatKeeper.getOptions() would tolerate a missing key by returning empty options. The one place where this was needed (terminating a barely-created vat because startVat() failed, using getOptions().critical) was changed, so this tolerance is no longer needed, and was removed. The tolerance caused bug #9157 (kernel doesn't panic when it should), which continues to be present, but at least won't cause a crash. refs #8980
fix(swingset): use "dirt" to schedule vat reap/bringOutYourDead NOTE: deployed kernels require a new `upgradeSwingset()` call upon (at least) first boot after upgrading to this version of the kernel code. `dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection inside a vat, and gives it a chance to drop imported c-list vrefs that are no longer referenced by anything inside the vat. Previously, each vat has a configurable parameter named `reapInterval`, which defaults to a kernel-wide `defaultReapInterval` (but can be set separately for each vat). This defaults to 1, mainly for unit testing, but real applications set it to something like 1000. This caused BOYD to happen once every 1000 deliveries, plus an extra BOYD just before we save an XS heap-state snapshot. This commit switches to a "dirt"-based BOYD scheduler, wherein we consider the vat to get more and more dirty as it does work, and eventually it reaches a `reapDirtThreshold` that triggers the BOYD (which resets the dirt counter). We continue to track `dirt.deliveries` as before, with the same defaults. But we add a new `dirt.gcKrefs` counter, which is incremented by the krefs we submit to the vat in GC deliveries. For example, calling `dispatch.dropImports([kref1, kref2])` would increase `dirt.gcKrefs` by two. The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use patterns, this will trigger a BOYD after ten krefs have been dropped and retired. We choose this value to allow the #8928 slow vat termination process to trigger BOYD frequently enough to keep the BOYD cranks small: since these will be happening constantly (in the "background"), we don't want them to take more than 500ms or so. Given the current size of the large vats that #8928 seeks to terminate, 10 krefs seems like a reasonable limit. And of course we don't want to perform too many BOYDs, so `gcKrefs: 20` is about the smallest threshold we'd want to use. External APIs continue to accept `reapInterval`, and now also accept `reapGCKrefs`, and `neverReap` (a boolean which inhibits all BOYD, even new forms of dirt added in the future). * kernel config record * takes `config.defaultReapInterval` and `defaultReapGCKrefs` * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs` and `.neverReap` * `controller.changeKernelOptions()` still takes `defaultReapInterval` but now also accepts `defaultReapGCKrefs` The APIs available to userspace code (through `vatAdminSvc`) are unchanged (partially due to upgrade/backwards-compatibility limitations), and continue to only support setting `reapInterval`. Internally, this just modifies `reapDirtThreshold.deliveries`. * `E(vatAdminSvc).createVat(bcap, { reapInterval })` * `E(adminNode).upgrade(bcap, { reapInterval })` * `E(adminNode).changeOptions({ reapInterval })` Internally, the kernel-wide state records `defaultReapDirtThreshold` instead of `defaultReapInterval`, and each vat records `.reapDirtThreshold` in their `vNN.options` key instead of `vNN.reapInterval`. The vat-level records override the kernel-wide values. The current dirt level is recorded in `vNN.reapDirt`. NOTE: deployed kernels require explicit state upgrade, with: ```js import { upgradeSwingset } from '@agoric/swingset-vat'; .. upgradeSwingset(kernelStorage); ``` This must be called after upgrading to the new kernel code/release, and before calling `buildVatController()`. It is safe to call on every reboot (it will only modify the swingstore when the kernel version has changed). If changes are made, the host application is responsible for commiting them, as well as recording any export-data updates (if the host configured the swingstore with an export-data callback). During this upgrade, the old `reapCountdown` value is used to initialize the vat's `reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the existing schedule. Vats which used `reapInterval = 'never'` (eg comms) will get a `reapDirtThreshold.never = true`, so they continue to inhibit BOYD. Any per-vat settings that match the kernel-wide settings are removed, allowing the kernel values to take precedence (as well as changes to the kernel-wide values; i.e. the per-vat settings are not sticky). We do not track dirt when the corresponding threshold is 'never', or if `neverReap` is true, to avoid incrementing the comms dirt counters forever. This design leaves room for adding `.computrons` to the dirt record, as well as tracking a separate `snapshotDirt` counter (to trigger XS heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but do not yet expose an API to set it. Future work includes: * upgrade vat-vat-admin to let userspace set `reapDirtThreshold` New tests were added to exercise the upgrade process, and other tests were updated to match the new internal initialization pattern. We now reset the dirt counter upon any BOYD, so this also happens to help with #8665 (doing a `reapAllVats()` resets the delivery counters, so future BOYDs will be delayed, which is what we want). But we should still change `controller.reapAllVats()` to avoid BOYDs on vats which haven't received any deliveries. closes #8980
Previously, vat state initialization (e.g. setting counters to zero) happened lazily, the first time that `provideVatKeeper()` was called. When creating a new vat, the pattern was: vk = kernelKeeper.provideVatKeeper(); vk.setSourceAndOptions(source, options); Now, we initialize both counters and source/options explicitly, in `recordVatOptions`, when the static/dynamic vat is first defined: kernelKeeper.createVatState(vatID, source, options); In the future, this will make it easier for `provideVatKeeper` to rely upon the presence of all vat state keys, especially `options`. Previously, vatKeeper.getOptions() would tolerate a missing key by returning empty options. The one place where this was needed (terminating a barely-created vat because startVat() failed, using getOptions().critical) was changed, so this tolerance is no longer needed, and was removed. The tolerance caused bug #9157 (kernel doesn't panic when it should), which continues to be present, but at least won't cause a crash. refs #8980
NOTE: deployed kernels require a new `upgradeSwingset()` call upon (at least) first boot after upgrading to this version of the kernel code. See below for details. `dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection inside a vat, and gives it a chance to drop imported c-list vrefs that are no longer referenced by anything inside the vat. Previously, each vat has a configurable parameter named `reapInterval`, which defaults to a kernel-wide `defaultReapInterval` (but can be set separately for each vat). This defaults to 1, mainly for unit testing, but real applications set it to something like 1000. This caused BOYD to happen once every 1000 deliveries, plus an extra BOYD just before we save an XS heap-state snapshot. This commit switches to a "dirt"-based BOYD scheduler, wherein we consider the vat to get more and more dirty as it does work, and eventually it reaches a `reapDirtThreshold` that triggers the BOYD (which resets the dirt counter). We continue to track `dirt.deliveries` as before, with the same defaults. But we add a new `dirt.gcKrefs` counter, which is incremented by the krefs we submit to the vat in GC deliveries. For example, calling `dispatch.dropImports([kref1, kref2])` would increase `dirt.gcKrefs` by two. The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use patterns, this will trigger a BOYD after ten krefs have been dropped and retired. We choose this value to allow the #8928 slow vat termination process to trigger BOYD frequently enough to keep the BOYD cranks small: since these will be happening constantly (in the "background"), we don't want them to take more than 500ms or so. Given the current size of the large vats that #8928 seeks to terminate, 10 krefs seems like a reasonable limit. And of course we don't want to perform too many BOYDs, so `gcKrefs: 20` is about the smallest threshold we'd want to use. External APIs continue to accept `reapInterval`, and now also accept `reapGCKrefs`, and `neverReap` (a boolean which inhibits all BOYD, even new forms of dirt added in the future). * kernel config record * takes `config.defaultReapInterval` and `defaultReapGCKrefs` * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs` and `.neverReap` * `controller.changeKernelOptions()` still takes `defaultReapInterval` but now also accepts `defaultReapGCKrefs` The APIs available to userspace code (through `vatAdminSvc`) are unchanged (partially due to upgrade/backwards-compatibility limitations), and continue to only support setting `reapInterval`. Internally, this just modifies `reapDirtThreshold.deliveries`. * `E(vatAdminSvc).createVat(bcap, { reapInterval })` * `E(adminNode).upgrade(bcap, { reapInterval })` * `E(adminNode).changeOptions({ reapInterval })` Internally, the kernel-wide state records `defaultReapDirtThreshold` instead of `defaultReapInterval`, and each vat records `.reapDirtThreshold` in their `vNN.options` key instead of `vNN.reapInterval`. The vat-level records override the kernel-wide values. The current dirt level is recorded in `vNN.reapDirt`. NOTE: deployed kernels require explicit state upgrade, with: ```js import { upgradeSwingset } from '@agoric/swingset-vat'; .. upgradeSwingset(kernelStorage); ``` This must be called after upgrading to the new kernel code/release, and before calling `buildVatController()`. It is safe to call on every reboot (it will only modify the swingstore when the kernel version has changed). If changes are made, the host application is responsible for commiting them, as well as recording any export-data updates (if the host configured the swingstore with an export-data callback). During this upgrade, the old `reapCountdown` value is used to initialize the vat's `reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the existing schedule. Vats which used `reapInterval = 'never'` (eg comms) will get a `reapDirtThreshold.never = true`, so they continue to inhibit BOYD. Any per-vat settings that match the kernel-wide settings are removed, allowing the kernel values to take precedence (as well as changes to the kernel-wide values; i.e. the per-vat settings are not sticky). We do not track dirt when the corresponding threshold is 'never', or if `neverReap` is true, to avoid incrementing the comms dirt counters forever. This design leaves room for adding `.computrons` to the dirt record, as well as tracking a separate `snapshotDirt` counter (to trigger XS heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but do not yet expose an API to set it. Future work includes: * upgrade vat-vat-admin to let userspace set `reapDirtThreshold` New tests were added to exercise the upgrade process, and other tests were updated to match the new internal initialization pattern. We now reset the dirt counter upon any BOYD, so this also happens to help with #8665 (doing a `reapAllVats()` resets the delivery counters, so future BOYDs will be delayed, which is what we want). But we should still change `controller.reapAllVats()` to avoid BOYDs on vats which haven't received any deliveries. closes #8980
What is the Problem Being Solved?
Vats accumulate "garbage" over time: JS objects (Presences, Remotables, Representaives) which have been collected but not processed, as well as imported/exported refcounts that have reached zero but not yet processed. We defer this processing until a special
dispatch.bringOutYourDead
delivery, which performs an engine-level forced GC pass, examines all the finalizers that have run since the last BOYD, and checks current refcounts on every vref/baseref that might now be unreachable or unrecognizable.We defer BOYD until specific calls, to reduce the sensitivity of normal execution (especially metering and the syscall transcript) to the details of the garbage collector. We definitely need to prevent userspace from sensing GC activity, but if we can also maintain the transcript's insensitivty, then we have more flexibility to change the engine in the future (upgrading XS or xsnap without requiring a complete vat upgrade).
We also do a BOYD just before we write out an XS heap snapshot, because the heap-writing process does a forced GC anyways, and we want to minimize the size of the snapshot (i.e. "empty your trash cans before you put them in the moving van").
However, this raised the question of how frequently we should make these BOYD calls. Too frequently, and we're spending a lot of time forcing unnecessary GC mark-and-sweep operations. Too infrequently, and a single BOYD might need to examine a large number of vrefs, which requires refcount syscalls (
vatstoreGet
), and which might createsyscall.dropImports/retireImports
calls with a large number of vrefs (which creates spikes in kernel activity that may cause problems).The current kernel scheduler triggers a BOYD after every 2000 deliveries ("
reapInterval
", recorded separately for each vat invatKeeper.updateReapInterval()
, read from a kernel-wide configurable default that is set withchangeKernelOptions({defaultReapInterval: NN})
). However it also triggers a heap snapshot (and thus a BOYD) on a different schedule: 3 deliveries after the start of the incarnation, and then every 200 deliveries (snapshotInitial
andsnapshotInterval
, configurable on the kernel as a whole withchangeKernelOptions({snapshotInterval: NN})
). The host application can also invokecontroller.reapAllVats()
to trigger a BOYD on all vats, although this does not update the counters (#8665), so the usual BOYDs will still happen at their usual time.The worst case, and the motivation for this ticket, is when a large vat is terminated, or deletes a large collection, and drops a lot of references into some other vat. The #8928 strategy limits the rate at which we send
dispatch.dropExports/retireExports
into the upstream vat, but if we don't also do BOYD frequently enough, we'll still be stuck with problematically-large BOYD deliveries. The largest expansion factor I've seen so far is when a terminated contract's ExitObject or SeatHandle is dropped, because these are engaged in reference cycles through v9-zoe. Each dropped object triggers about 50 syscalls in v9-zoe (to unwind the zoe-side portion of the cycle).I want to limit each BOYD to maybe 500 syscalls, to minimize the CPU time and kvStore churn (which causes IAVL churn, which adds to the cosmos-sdk state pruning work that happens sooner or later). Based on my studies so far, 500 syscalls of work can be done in about half a second, which seems safe to do in nearly every block (as a "background" task, only done when no other work was done in that block).
So I want the scheduler to trigger a BOYD after 10 krefs have been dropped and retired.
Description of the Design
We'd replace the current
vNN.reapCountdown
vatKeeper key with one namedvNN.dirty
. The value is a JSON-serialized record with{ deliveries, gcKrefs, computrons }
. Each non-BOYD delivery (and itsdeliveryResults
) gets sent to a new function, maybe namedmeasureDirt
, which:deliveries
computrons
dispatch.dropImports
or other GC delivery, counts the krefs inside, and maybe incrementsgcKrefs
vNN.reapInterval
, and some new kernel-widegcKrefsLimit
)reapQueue
withscheduleReap(vatID)
Then, the next time
kernel.step()
happens, it might see a vatID in the reapQueue, and process it:reapQueue
vNN.dirty
countersIn addition, the BOYD we do as part of writing heap snapshots should zero out the counters and remove it from
reapQueue
. That would fix the #8665 problem, as well as deferring BOYDs to the next correct time (avoiding rapid BOYDs when thesnapshotInterval
counter phases in and out of sync with thereapInterval
counter).We'll need to make the thresholds configurable. We already have
changeKernelOptions({defaultReapInterval})
, but it only sets the default (which gets sampled when a vat is created), and doesn't provide a way to modify a single vat's interval later. And we certainly need a new control on the GC kref threshold, so maybe we addchangeKernelOptions({gcKrefReapThreshold})
(but I don't love the spelling of that).This
vNN.dirty
record could also be used to schedule heap snapshots some day (#6786). Some thoughts:dirty
, and also trigger snapshots after a lot of themrunPolicy
, and probably also to trigger snapshotsdispatch.deliver
increase the number of vrefs being tracked (especially if they include vrefs as arguments)dispatch.notify
(promise resolutions) decrease the number of vrefs, increasing garbage.slots.length
and delivery-type, and use them to compute a score, which might ignoredispatch.deliver
but would trigger GC sooner if a lot ofdispatch.notify
ordispatch.dropImports/etc
deliveries were madevNN.dirty
is a decent place to track them, or to add new things to be trackedSecurity Considerations
None, userspace code cannot observe GC or BOYD. The most significant concern is what kind of userspace behavior could provoke excessive BOYDs (either too frequently, or individual ones that are too large). E.g. creating a large collection and then dereferencing it, so it gets collected in a future BOYD, which then gets too big (would be addressed/prevented by #8417). Changing the scheduler might change the tactics an attacker might take to provoke this, but I think a dirt-driven scheduler should generally work better than our current
reapInterval
approach.Scaling Considerations
The main purpose of this scheduler is to improve the scaling behavior, by doing BOYD more frequently when there is more garbage to be collected, and avoiding infrequent-but-large BOYDs (which trigger bad kernel behavior, #8402).
Test Plan
Swingset unit tests which use fake vats (returning fake metering data), to measure exactly when BOYD happens.
Upgrade Considerations
Transitioning from
vNN.reapCountdown
tovNN.dirty
is a schema upgrade that can be done lazily. We'll deletecountdownToReap()
and replace it withaddDirt()
(etc), and the new function needs to accommodate a missingvNN.dirty
key by initializing it to a new zero record, looking for a legacyvNN.reapCountdown
(and the still-usefulvNN.reapInterval
), and subtracting them to deduce the new.deliveries
. We didn't have agcKrefs
or computron-counter before, so we don't need to initialize them with anything but zero.This is a one-way upgrade, of course, since we delete
vNN.reapCountdown
.The text was updated successfully, but these errors were encountered: