-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC syscall sensitivity through reanimate and reanimateDurableKindID #7142
Comments
Do the extra vatstoreGet seem compatible with some of the observations reported in #6784 ? |
I think so. It's hard to tell because we only learn the one diverging syscall, and to be more confident I'd want the replay to continue (instead of being halted) and compare a larger span of calls from both sides of the divergence. But they're touching keys that are plausible. |
So I can think fixes for two of these that are only moderately invasive. reanimateDurableKindIDEach durable Kind has a We keep a copy of the descriptor in RAM, in a Map named The KindHandle behaves a lot like a virtual object (although not created with The code which does that regeneration is a function named reanimateDurableKindID : function reanimateDurableKindID(vobjID) {
const kindID = `${parseVatSlot(vobjID).subid}`;
const raw = syscall.vatstoreGet(`vom.dkind.${kindID}`);
raw || Fail`unknown kind ID ${kindID}`;
const durableKindDescriptor = JSON.parse(raw);
const kindHandle = Far('kind', {});
linkToCohort.set(Object.getPrototypeOf(kindHandle), kindHandle);
unweakable.add(Object.getPrototypeOf(kindHandle));
kindHandleToID.set(kindHandle, kindID);
// we load the descriptor (including .nextInstanceID) every time
// the vat makes a new DurableKindHandle representative (during
// deserialization). The handle is held weakly and can be dropped,
// but the KindID-to-descriptor mapping remains in memory.
kindIDToDescriptor.set(kindID, durableKindDescriptor);
return kindHandle;
} This will get called in two contexts. The first (and probably the only one we thought carefully about) is after a vat upgrade, in a new incarnation, when The second is later, after a KindHandle has been in RAM for a while, but then gets dropped and maybe garbage collected. If userspace then re-fetches the handle from If this is the first time we've called So, given that we're already spending the RAM on each Kind (we decided that Kinds are low cardinality), a simple fix is to change Another approach would be to change The second approach would continue to be testable with the branch I mentioned above, where the reanimateCollectionWhen each virtual collection is created, we get a "schemata" (optional Currently, all three values are passed into The virtual collection has a Representative (collections are high-cardinality, although we didn't completely expect that, so we still have some RAM cost for inactive/paged-out collections). These Representatives can be dropped from RAM and then regenerated, just like KindHandles. When this happens, and deserialization fails to find an entry in function reanimateCollection(vobjID) {
const { id, subid } = parseVatSlot(vobjID);
const kindName = storeKindIDToName.get(`${id}`);
const rawSchemata = JSON.parse(syscall.vatstoreGet(prefixc(subid, '|schemata')));
const [keyShape, valueShape] = unserialize(rawSchemata);
const label = syscall.vatstoreGet(prefixc(subid, '|label'));
return summonCollection(false, label, subid, kindName, keyShape, valueShape);
} The two As above, the sometimes-happens sometimes-not divergence of calls to One moderately invasive fix would be to defer getting the schemata/label data until userspace performed an actual collection API call ( So e.g. const invalidKeyTypeMsg = `invalid key type for collection ${q(label)}`; would become: const makeInvalidKeyTypeMsg = () => `invalid key type for collection ${q(cache.get(collectionID).label)}`; and const serializeValue = value => {
if (valueShape !== undefined) { ... becomes const serializeValue = value => {
const { valueShape } = cache.get(collectionID);
if (valueShape !== undefined) { ... No references to the The cache would be built with a |
reanimateFor the third issue, the real problem is our need to know the list of state property names too early. One (invasive) approach would be to change A differently-invasive approach would retain the "each instance has different properties" quality, but would instead change The last time we considered the If that works out, then the approach would be:
Callers would observe that |
For reference, I had suggested using a Proxy for the state object in #5170 |
Ah, yeah, thanks for the pointer, I see you thought through all of the issues I was wondering about above. I walked through the agoric-sdk/packages/swingset-liveslots/src/virtualObjectManager.js Lines 639 to 646 in f2ebdb5
then later we build agoric-sdk/packages/swingset-liveslots/src/virtualObjectManager.js Lines 763 to 765 in f2ebdb5
So the userspace Representative is actually a per-instance object (whose method(...args) {
this || Fail`thisful method ${methodTag} called without 'this' object`;
const context = getContext(this);
return apply(behaviorMethod, context, args);
} where const getContext = self => {
const context = contextMap.get(self);
context || Fail`${q(methodTag)} may only be applied to a valid instance: ${self}`;
return context;
}; So each time someone invokes a method on a Representative, it follows the prototype chain up to the per-Kind object, whose method starts by looking up So.. we change
As a bonus, if someone is only using the Representative for it's identity, and doesn't actually invoke any methods, then we don't need |
This ticket is about sensitivity via three pathways:
|
Oh, in an earlier comment I said:
and apparently I did exactly that in 18f6a1d , which landed as part of PR #7138 . So this part is already fixed. |
Describe the bug
I think I found another two ways in which syscalls are sensitive to GC activity. I've got a fast (non-kernel-based) test for them as well.
The first arises when
reanimateDurableKindID
is used to regenerate a KindHandle (i.e. deserializing one while pulling it out of a virtual or durable MapStore). To build the Representative, we must read thedurableKindDescriptor
out of the vatstore. If the deserialization is able to re-use an existing Representative, thesyscall.vatstoreGet()
won't happen:agoric-sdk/packages/swingset-liveslots/src/virtualObjectManager.js
Lines 927 to 931 in 2e33306
The second arises when
reanimate
is used to regenerate a normal Kind instance's Representative. Same issue, this timereanimate
callsmakeRepresentative()
:agoric-sdk/packages/swingset-liveslots/src/virtualObjectManager.js
Lines 795 to 798 in 2e33306
which calls
ensureState()
:agoric-sdk/packages/swingset-liveslots/src/virtualObjectManager.js
Lines 712 to 718 in 2e33306
whose
cache.lookup(.., true)
will cause the LRU cache to do avatstoreGet
to read the state data if it isn't already in the cache.The issue I'm looking at is that
makeRepresentative
is called only when GC has happened and deserialization fails to find a Representative in theslotToVal
weakref. Imagine a virtual object that is created and stored in baggage during step 1, but is then dropped from userspace (it becomes UNREACHABLE). Then time passes, and we look at two cases. In case A, GC happens, moving it to COLLECTED and FINALIZED, removing it fromslotToVal
. In case B, GC does not happen, leaving it in UNREACHABLE. Then, in step 2, something reads the value out of baggage again.In case A, the deserialization in step2 fails to find the vref, so
reanimate()
is called to build a replacement, which callsmakeRepresentative
. If the state had fallen out of the LRU cache, this will cause avatstoreGet
to fetch the state data. In case B,renimate()
is not called, and novatstoreGet
will happen.I think our testing might not have managed to evict the state data from the cache, so we missed this case. Also, we don't have a lot of mock-the-GC tests:
test-liveslots.js
has the necessary mocks for WeakRef and FinalizationRegistry, but they weren't exported or put in a shared library, so no other tests were using those tools.Incidentally, the presence or absence of the state data in the cache is deterministic function of the sequence of calls to the LRU cache, like
lookup
andrefresh
, which are driven by (deterministic) calls to thestate
object's getters and setters, but also by calls tomakeRepresentative
. IfmakeRepresentative
is called with GC sensitivity, then the LRU cache contents will be GC-sensitive, adding another variable to the "will vatstore reads happen or not" question. I think we convinced ourselves that the determinism of getters and setters made this safe, without realizing thatreanimate
itself could be a source of trouble.I've written a new test to exercise this, on the 7142-gc-sensitivity branch, which compares the syscalls made by case A and case B, and asserts that they are equal. Those tests currently fail, with an extra
vatstoreGet
appearing in case A.The text was updated successfully, but these errors were encountered: