Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cosmic-swingset): Use vat cleanup budget values to allow slow cleanup #10165

Merged
merged 34 commits into from
Nov 9, 2024

Conversation

gibson042
Copy link
Member

@gibson042 gibson042 commented Sep 27, 2024

Fixes #8928

Description

Adds new cosmos parameters for controlling terminated vat cleanup and uses them in cosmic-swingset to allow budget-limited cleanup in otherwise empty blocks. Also includes a fair amount of refactoring in support of the same, so best reviewed as individual commits.

Security Considerations

Too-high budget values are a denial of service risk. All values are currently defaulted to zero for no minimal cleanup, but we may instead want to use small values.

Scaling Considerations

See above.

Documentation Considerations

Hmm, do we have documentation of cosmos-sdk parameters anywhere?

Testing Considerations

Help wanted; the fully-threaded pathway is end-to-end but we may be able to do something more tractable. The policy logic is somewhat intricate and should be covered by tests.

Upgrade Considerations

Upgrade will set the new parameters to their default values as defined in default-params.go. If we set non-zero defaults, then work could commence as soon as this code is live, which means it requires a chain-halting upgrade.

Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup-budget passing code needs to change.. the policy language is funky, I have to admit, and the fact that it's easy to accidentally enable unlimited cleanups is a problem. Let's brainstorm tomorrow to see if there's a way to write a test around this part, maybe a unit test which imports just the policy creator and then pokes it in a couple of ways, without a kernel or cosmos or anything else.

NewQueueSize(QueueInbound, DefaultInboundQueueMax),
}

// FIXME: Settle on default values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd like our initial values to be { default: 5, kv: 50 } (swingset will use .default for everything else), so deletion happens slowly as soon as upgrade18 is deployed. That will let us trigger deletion of some vat (probably v112-scaledPriceAuthority-stkATOM, being the smallest one) as part of upgrade18, instead of a three-vote sequence of:

  • upgrade18
  • governance proposal to change parameters to { default: 5, kv: 50 }
  • governance proposal to trigger deletion of v112

The important values are that exports and imports use no more than 5 (so the BOYDs remain a reasonable size, and we don't clobber the chain with large cleanup work), and that kv uses more like 50 (so the deletion takes weeks instead of months or years).

The total number of transcripts and snapshots aren't large enough for 5 vs 50 to make a big difference. Deleting both are pretty quick, so it would probably be safe to use 50, but I'd rather measure the load of using "5" before changing that rate.

I don't think I have a strong opinion on whether the default specifies values for all the fields, or just default and kv. My original intention was to allow the governance proposal to be minimal, and only supply the ones that needed to be overridden, but I'm not clear on how much the governance parameter messages are deltas vs absolutes. If you have to supply all the parameters (not just vat cleanup, but also beans-per-computrons and queue sizes) in each message, then it's not a big lift to supply all of default, kv, imports, exports, snapshots, transcripts in each one.

The important thing is legibility: the voters inspecting the change-parameter proposal should be able to tell what exactly is changing. If a proposal that omits kv and really means "kv reverts to default", that might be surprising.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's important that one of these defaults being set to 0 doesn't override the safely-rate-limited behavior we get from { default: 5 }. I'm thinking that DefaultVatCleanupBudget should have one or two entries in it, and not provide a value for e.g. exports and imports and snapshots and transcripts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • governance proposal to change parameters to { default: 5, kv: 50 }

That seems unnecessary, if we have non-0 defaults already set as part of the software upgrade.

If a proposal that omits kv and really means "kv reverts to default", that might be surprising.

I believe that is almost how the current "fill-in missing values" logic would work. It would revert the value to the "kv default" set below. An alternative is to merge with existing values, but I'm not sure we have a place to stand to do that. Also in that case we now have the problem of how to express "please delete/unset this previously explicitly set value, and return it to default", particularly if we decide to not persist default values.

golang/cosmos/x/swingset/types/params.go Outdated Show resolved Hide resolved
packages/cosmic-swingset/src/launch-chain.js Outdated Show resolved Hide resolved
// https://github.com/Agoric/agoric-sdk/issues/8928#issuecomment-2053357870
const shouldRun = () =>
!cleanupStarted && (ignoreBlockLimit || totalBeans < blockComputeLimit);
const allowCleanup = () => cleanupStarted && !cleanupDone;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't get us the right budget.. allowCleanup() === true means an unlimited budget, and the kernel only calls didCleanup at the end of the crank. So a non-finite budget will let the kernel perform unlimited cleanup, and then it will report a huge didCleanup(cleanups), and our decrementer will stop it, but by then it will be too late.

We need allowCleanup to return the remaining budget each time (if we're in the cleanup window). We also need to let the kernel perform GC actions and BOYD. The kernel will ask about one vat with allowCleanup(), do an amount of cleanup work limited by the returned budget, notify the policy with a didCleanup(), then it returns to the "decide what work to do next" function, which will query the policy again. We need the didCleanup() to return true (meaning "keep going, don't stop controller.run() yet"), then that second call to allowCleanup() must say "no more cleanups", to allow it to move on to the gc-action queue, or the dirty-vat-do-BOYD check. We still count computrons, so if e.g. the cleanup resulted in multiple vats becoming dirty and eligible for BOYD, we might perform one BOYD, decide the computrons it spent were too much, and then end the block before we get to the second BOYD. The next block's crankerPhase = 'leftover' run will finish off those BOYDs.

This only really works because we only do cleanup on empty blocks, so we know the run-queue is empty before clean is allowed to proceed. The run-policy language doesn't have a way to say "do the gcActions and BOYDs triggered by cleanup, but don't allow any normal cranks to be delivered, because we're only doing cleanup-related stuff now".

(I wish there were a good way to unit test this)

So something more like:

let inCleanup = false;
const startCleanup = () => if (totalBeans > 0n) { return false; } else { inCleanup = true; return true; };
const allowCleanup = () => inCleanup && remainingCleanups; // returns false or budget
const didCleanup = details => { .. decrement remainingCleanups; return true; };
const shouldRun = () => ignoreBlockLimit || totalBeans < blockComputeLimit;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, yeah. Updated to something that actually has the right return type for allowCleanup.

@@ -88,6 +90,24 @@ const parseUpgradePlanInfo = (upgradePlan, prefix = '') => {
* `chainStorageEntries: [ ['c.o', '"top"'], ['c.o.i'], ['c.o.i.n', '42'], ['c.o.w', '"moo"'] ]`).
*/

/**
* @typedef {'leftover' | 'forced' | 'high-priority' | 'intermission' | 'queued' | 'cleanup'} CrankerPhase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot, it makes the flow much easier to think about.

packages/cosmic-swingset/test/scenario2.js Outdated Show resolved Hide resolved
packages/cosmic-swingset/src/sim-params.js Outdated Show resolved Hide resolved
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help wanted; the fully-threaded pathway is end-to-end but we may be able to do something more tractable. The policy logic is somewhat intricate and should be covered by tests.

Can the policy (possibly with the cranker) be factored out of launch-chain and exercised in a test, maybe with a mock swingset controller?

Upgrade Considerations

I believe it's worth noting that the upgrade will set the default parameters for the new param.

DefaultVatCleanupKv = sdk.NewUint(0) // 50
DefaultVatCleanupSnapshots = sdk.NewUint(0) // 50
DefaultVatCleanupTranscripts = sdk.NewUint(0) // 50
DefaultVatCleanupBudget = []UintMapEntry{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question keeps coming up of where the defaults should be defined, and whether defaults are persistent, aka what should happen with values if the default changes and the values were not modified. I can't shake the feeling that the way we've implemented params does not allow the flexibility to update unchanged defaults.

This is particularly true when we have a param value whose name is default.

I believe here it might be better to just have a single entry in DefaultVatCleanupBudget: the VatCleanupDefault one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of the latest push, cosmic-swingset has a default budget of { default: Infinity } to preserve current behavior (and requires an explicit default value when any other field is present) and golang/cosmos/x/swingset/types/default-params.go defines DefaultVatCleanupBudget as { default: 5, kv: 50 } per @warner's instruction.

NewQueueSize(QueueInbound, DefaultInboundQueueMax),
}

// FIXME: Settle on default values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • governance proposal to change parameters to { default: 5, kv: 50 }

That seems unnecessary, if we have non-0 defaults already set as part of the software upgrade.

If a proposal that omits kv and really means "kv reverts to default", that might be surprising.

I believe that is almost how the current "fill-in missing values" logic would work. It would revert the value to the "kv default" set below. An alternative is to merge with existing values, but I'm not sure we have a place to stand to do that. Also in that case we now have the problem of how to express "please delete/unset this previously explicitly set value, and return it to default", particularly if we decide to not persist default values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specific to this PR, but I didn't realize #8931 resulted in a copy of the proto definition files to be checked into the repo. @turadg any reason why these are not excluded from git?

@@ -80,7 +80,7 @@ func TestAddStorageBeanCost(t *testing.T) {
},
} {
t.Run(tt.name, func(t *testing.T) {
got, err := appendMissingDefaultBeansPerUnit(tt.in, []StringBeans{defaultStorageCost})
got, err := appendMissingDefaults(tt.in, []StringBeans{defaultStorageCost})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably to be consistent we should fill in all missing defaults here, not just the beans per unit ones.

packages/cosmic-swingset/test/scenario2.js Outdated Show resolved Hide resolved
*/

/**
* @typedef {(phase: CrankerPhase) => Promise<boolean>} Cranker runs the kernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about the Cranker name. It controls a macro "swingset run", which itself is composed of many cranks. Does that make a vat delivery a "turner"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah, the existing term that this relates to is a "run" or a "swingset run", meaning one invocation of controller.run(). Each cosmos block triggers a certain number of runs (one catchup, then maybe some high-priority, then maybe one timer, then maybe some low-priority, then maybe one cleanup).

I think Runner is disqualified (too generic), but I'm ok with Cranker as a second choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really needed a name, and "Runner" is just too generic. I suppose "Turner" would therefore accurately describe a function that performs a single vat delivery, although if we ever need to name that we'll probably find something better.

packages/cosmic-swingset/src/launch-chain.js Outdated Show resolved Hide resolved
packages/cosmic-swingset/src/launch-chain.js Outdated Show resolved Hide resolved
Comment on lines +514 to +586
const allowCleanup = runPolicy.startCleanup();
if (!allowCleanup) return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep wondering if we have the right layering here. I know @warner would like for controller.run() to automatically include cleanup for "simple hosts", but it feels like we're jumping through hoops just to enable non cosmic-swingset embedder.

An imperative controller.performCleanup() would be so much cleaner. We could simply call it after we've returned from processBlockActions: has the policy done anything? no, cool, then start cleanup with a specific cleanup policy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true, I don't want to add API burden to hosts that don't need rate limiting, but I agree that there are some significant hoops here.

Adding controller.performCleanup(budget) wouldn't be too hard, it might be a wrapper around controller.run() that uses an internally-built runPolicy. If we added that, then the cosmic-swingset runPolicy could have a allowCleanup: () => false, which is a cheap way to say "I'm going to drive cleanup manually, thanks very much".

OTOH, that API assumes that we're ok with arbitrarily large computron usage during the cleanup runs. In practice, that's probably ok: the GC action deliveries are generally trivial (they just change some refcounts), and the BOYD is usually the last thing that runs so it's not like a computron limit would actually stop any deliveries from happening.

But if we wanted to be consistent with normal runs, we'd need a way to provide a limit, which would mean performCleanup() would need to get a host-provided RunPolicy, and merging some automatically-built one with the host-provided one sounds more messy than having the host manage everything in the first place.

Copy link

cloudflare-workers-and-pages bot commented Oct 2, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: db8108a
Status: ✅  Deploy successful!
Preview URL: https://dcd7d72b.agoric-sdk.pages.dev
Branch Preview URL: https://gibson-8928-parameterized-al.agoric-sdk.pages.dev

View logs

@gibson042 gibson042 requested review from mhofman and warner October 2, 2024 20:13
@gibson042
Copy link
Member Author

OK, I'm reasonably happy with this but still need to brainstorm on how to verify expected behavior through testing. sim-chain.js looks like the right foundation, since it's already wrapping the parts of launch-chain.js affected by this PR with a mostly-mocked harness.

remainingCleanups[phase] -= count;
if (remainingCleanups[phase] <= 0) cleanupDone = true;
}
return !cleanupDone && shouldRun();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better, but we need didCleanup() to return true when the cleanup budget is exhausted, otherwise the controller.run() will exit and we won't deliver the GC actions or run a BOYD until the the leftover phase next block (which might have more important work to do).

Basically, having allowCleanup return false is how you prevent cleanup from happening. The return value of didCleanup controls whether we do anything after it is called, and GC actions / BOYD don't qualify as cleanup (they happen all the time, even without vats being terminated).

So I think this should be replaced with just return true. We could have it return shouldRun(), which would return false when we've hit the computron limit, but that's kind of of pointless because cleanup actions don't spend any computrons or perform any deliveries, they just delete vatKeeper data and enqueue GC actions. The computron-spending activity (GC actions, BOYD) will get reported to the policy with crankComplete as usual, and that has the opportunity to stop the run because of a computron limit.

I'm sorry this stuff is so opaque. I'll make a note to update SwingSet/docs/run-policy.md.. the section on didCleanup is accurate, and the examples all have it return true, but the docs ought to make that recommendation explicit, and explain why it is usually important to have it return true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. Updated with an explanatory comment.

@warner warner self-requested a review October 4, 2024 20:38
Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think that looks good, at least the policy parts. I'll rely upon @mhofman for the cosmos/golang side, but it at least seems consistent with the rest of the code.

I do worry about proposals that accidentally revert parameters to their default values, but I believe the cleanup-budget defaults established by this PR are safe. So I believe the worst case is such a proposal would just slow down termination even more than necessary, which is much better than reverting to "allow full-speed deletion".

if err != nil {
return params, err
}
newVcb, err := appendMissingDefaults(params.VatCleanupBudget, DefaultVatCleanupBudget)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so now my current beliefs are:

  • we get one transaction type to influence all swingset-related parameters (four sets: beansPerUnit, powerFlagFees, queueMax, vatCleanupBudget)
  • if any field of any set is not included in the transaction/proposal, and is included in e.g. DefaultBeansPerUnit or DefaultPowerFlagFees or DefaultQueueMax or DefaultVatCleanupBudget, then that field is populated here
  • so if someone makes a governance parameter proposal to change something in params.QueueMax, and they forget to look at the chain and copy in the current fields from VatCleanupBudget, they'll be accidentally resetting the cleanup budget back to the default
  • also, because both .default and .kv are in DefaultVatCleanupBudget, there is no way to say "please stop overriding kv and just let it use .default like everything else does"
    • the closest you could achieve is by saying { default: 5, kv: 5 }

That works, it's weird but I suspect it's the best we can get from a one-method-per-module parameter-setting scheme.

}
// We return true to allow processing of any BOYD/GC prompted by cleanup,
// even if cleanup as such is now done.
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, looks good

@warner
Copy link
Member

warner commented Oct 12, 2024

BTW, I'm adding a new promises phase to cleanup, to fix #10261. It should use the same default of 5 cleanups per run, like imports and exports (and not like kv). I'll be making a branch on top of this one to add the new fields if this doesn't land by the time the swingset -side changes are done.

@warner
Copy link
Member

warner commented Oct 12, 2024

#10268 is the PR which introduces budget.promises and work.promises on the swingset side. Given your careful work to make the integration code tolerate new fields, we don't strictly need to update anything in cosmic-swingset, but it might be nice to be able to control this budget item like we do the others.

mergify bot added a commit that referenced this pull request Oct 17, 2024
…ke stores (#10290)

(split from #10165)

## Description
chain-main.js passes a [non-map `mailboxStorage`](https://github.com/Agoric/agoric-sdk/blob/94aaebcd255840daf13514ab1050282b985684d2/packages/cosmic-swingset/src/chain-main.js#L351-L359) to `launch`, which [threads it into mailbox code](https://github.com/Agoric/agoric-sdk/blob/94aaebcd255840daf13514ab1050282b985684d2/packages/cosmic-swingset/src/launch-chain.js#L126) that returns [methods expecting a map-like `entries` method and `size` property](https://github.com/Agoric/agoric-sdk/blob/94aaebcd255840daf13514ab1050282b985684d2/packages/SwingSet/src/devices/mailbox/mailbox.js#L121-L146). This PR adds type data to reject such issues and refactors mailbox code accordingly by converting those methods into independent functions that accept appropriate backing maps.

### Security Considerations
None known.

### Scaling Considerations
n/a

### Documentation Considerations
n/a

### Testing Considerations
Existing tests should continue to pass.

### Upgrade Considerations
This code does not affect any vat and is safe to include in the next release.
@gibson042 gibson042 force-pushed the gibson-8928-parameterized-allowcleanup branch from 9fe3b50 to bb28c19 Compare October 22, 2024 01:38
@gibson042
Copy link
Member Author

@warner @mhofman I just pushed my work-in-progress including the framework for testing a mock chain at block level, which works, but this scenario is still not covered because now I need to make the test have its doomed vat generate data to be cleaned up and then actually observe it. I'm open to help if you have the cycles, and also to the possibility of landing this and then finishing up the rest of testing in a followup.

@gibson042 gibson042 force-pushed the gibson-8928-parameterized-allowcleanup branch from 90b9005 to 2e577ee Compare October 28, 2024 10:31
@gibson042
Copy link
Member Author

@warner @mhofman This is now ready... it's not covering transcript/snapshot cleanup, but it does see the expected trickle of kvStore entries—and was good enough to catch a mistake on my part: f6d3cb4

It has also rebased to account for #10268.

The work required for testing entrained quite a bit of sprawl, so reviewing commits in order is recommended.

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty incredible change to enable testing of cosmic-swingset behavior. Hopefully starts paving the path for more tests.

Some of these changes feel reminiscent of #6931, I should revisit it someday.

Most of the feedback is pretty minor, but I'm concerned about a couple of the changes. Let's discuss.

golang/cosmos/x/swingset/types/params.go Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to the config processing seems unrelated to the scope of the PR. Was it necessary for testing? It seems fine, but I know this is fairly finicky. cc @michaelfig

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I backed off on the behavioral changes, but kept the refactorings for better debugging.

packages/cosmic-swingset/src/chain-main.js Outdated Show resolved Hide resolved
packages/cosmic-swingset/src/launch-chain.js Show resolved Hide resolved
@@ -10,11 +10,11 @@ Like all `@agoric` packages it follows Semantic Versioning. Unlike the others, i

# Design

It is meant to be a home for modules that have no Agoric-specific dependencies themselves. It does depend on a these other @agoric packages but they are all destined to migrate out of the repo,
It is meant to be a home for modules that have no dependencies on other packages in this repository, except for the following packages that do not theirselves depend upon any other @agoric packages and may be destined for migration elsewhere:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It is meant to be a home for modules that have no dependencies on other packages in this repository, except for the following packages that do not theirselves depend upon any other @agoric packages and may be destined for migration elsewhere:
It is meant to be a home for modules that have no dependencies on other packages in this repository, except for the following packages that do not themselves depend upon any other @agoric packages and may be destined for migration elsewhere:

packages/cosmic-swingset/src/launch-chain.js Show resolved Hide resolved
packages/cosmic-swingset/test/run-policy.test.js Outdated Show resolved Hide resolved
packages/cosmic-swingset/test/run-policy.test.js Outdated Show resolved Hide resolved
);

// Give the vat a big footprint.
pushCoreEval(async powers => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can adapt the async-flow eslint rule to prevent these closures from using their surrounding scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I'm all for it!

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am satisfied that the new test would catch the issue we got lucky to stumble onto in CI, and that the important feedback from my precious review was addressed.

Thanks for the amazing work.

@@ -1140,7 +1143,7 @@ export default function makeKernelKeeper(

// same
remaining = budget.transcripts ?? budget.default;
const dts = vatKeeper.deleteTranscripts(remaining);
const dts = undertaker.deleteTranscripts(remaining);
work.transcripts += dts.cleanups;
remaining -= dts.cleanups;
// last task, so increment cleanups, but dc.done is authoritative
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we evict the undertaker at this point? Small memory leak I assume.

* @param {SnapStore} [snapStore]
* returns an object to hold and access the kernel's state for the given vat
* @param {KVStore} kvStore The keyValue store in which the persistent state will be kept
* @param {VatKeeperPowers} powers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌 thanks for reigning in those positional params

@gibson042 gibson042 added the automerge:rebase Automatically rebase updates, then merge label Oct 29, 2024
@gibson042 gibson042 force-pushed the gibson-8928-parameterized-allowcleanup branch 2 times, most recently from 1543f79 to d5dee6c Compare October 29, 2024 16:04
@gibson042 gibson042 removed the automerge:rebase Automatically rebase updates, then merge label Oct 29, 2024
@gibson042 gibson042 force-pushed the gibson-8928-parameterized-allowcleanup branch from d5dee6c to 51926be Compare October 30, 2024 02:41
@gibson042 gibson042 added automerge:rebase Automatically rebase updates, then merge and removed automerge:rebase Automatically rebase updates, then merge labels Oct 30, 2024
…ty to the run policy test

An unrelated CI test experienced a failure when budgeted vat cleanup had
already removed a strict subset of a slowly-terminating vat's kvStore
entries that included the one with key "${vatID}.o.nextID"...
`provideVatKeeper` uses such keys as a "vat is alive" predicate that
must pass before it spins up any new keeper (whose primary purpose is
providing an interface for a *live* vat).

This is not an issue within a single swingset instantiation, in which
`provideVatKeeper` is memoized. But that cache starts empty with each
process, and the attempt to instantiate a new keeper for a slowly-
terminating vat in this state fails.

run-policy.test.js now covers this scenario.
…down to helpers

This minimizes gas consumption vs. helpers re-reading parameters,
making it less likely for an unestimated transaction to run out.
…ce text input

This reduces the risk of lexical shear between definition in a *.test.js
file and evaluation in a core-eval compartment.
@gibson042 gibson042 force-pushed the gibson-8928-parameterized-allowcleanup branch from 810ad66 to db8108a Compare November 9, 2024 20:42
@gibson042 gibson042 added the automerge:rebase Automatically rebase updates, then merge label Nov 9, 2024
@mergify mergify bot merged commit 2837beb into master Nov 9, 2024
85 checks passed
@mergify mergify bot deleted the gibson-8928-parameterized-allowcleanup branch November 9, 2024 21:17
mergify bot pushed a commit that referenced this pull request Nov 10, 2024
…`make` test fails (#10438)

## Description
Extracted from #10165 to independently resolve a preëxisting TODO.

### Security Considerations
n/a

### Scaling Considerations
n/a

### Documentation Considerations
n/a

### Testing Considerations
Makes it easier to debug failures in `make`-dependent cosmic-swingset tests.

### Upgrade Considerations
n/a
mergify bot pushed a commit that referenced this pull request Nov 10, 2024
## Description
Extracted from #10165.

### Security Considerations
n/a

### Scaling Considerations
n/a

### Documentation Considerations
n/a

### Testing Considerations
n/a

### Upgrade Considerations
n/a
mergify bot added a commit that referenced this pull request Nov 12, 2024
…0447)

## Description
Extracted from #10165.
When the `agoric wallet send` transaction inside [`psmSwap`](https://github.com/Agoric/agoric-sdk/blob/e7af58fe25a802d23283633416d0900aa06a2127/a3p-integration/proposals/z%3Aacceptance/test-lib/psm-lib.js#L391) failed because the default gas limit was insufficient, the following [`waitUntilOfferResult`](https://github.com/Agoric/agoric-sdk/blob/e7af58fe25a802d23283633416d0900aa06a2127/packages/client-utils/src/sync-tools.js#L205) never settled because the `agoric follow -lF :published.wallet.${addr}` inside its `queryWallet` was waiting forever. This PR fixes the root issue by updating `agoric wallet send` to use `--gas=auto` (while still preserving a3p-integration test coverage of the "default gas with automatic wallet provisioning" scenario), and also updates `retryUntilCondition` to [Promise.race](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/race) its operation against the interval timeout rather than just naïvely `await`ing it and possibly getting hung up on permanent unsettlement, while still allowing the original promise to persist (specifically for such `agoric follow` applications).
It also incidentally cleans up `tryISTBalances`, which is used by `checkSwapSucceeded` after the vulnerable `psmSwap` calls.
* fix(client-utils): Retry at least every other interval (even when a backing operation hangs, e.g. `follow` of an unprovisioned wallet)
* chore(a3p-integration): Increase the verbosity of psmSwap
* feat(agoric-cli): Block agoric wallet send on tx inclusion (`-bblock`)
* test(a3p-integration): Update "swap into IST" to force the default gas limit
* feat(agoric-cli): Add agoric wallet send gas limit options (defaulting to `--gas=auto --gas-adjustment=1.2`)

### Security Considerations
None known.

### Scaling Considerations
None known.

### Documentation Considerations
The CLI is self-documenting:
```console
$ agoric wallet send --help
Usage: agoric wallet send [options]

send a prepared offer

Options:
  --home <dir>                      agd application home directory
  --keyring-backend <os|file|test>  keyring's backend (os|file|test) (default "os")
  --from [address]                  address literal or name
  --offer [filename]                path to file with prepared offer
  --dry-run                         spit out the command instead of running it
  --gas                             gas limit; "auto" [default] to calculate automatically
  --gas-adjustment                  factor by which to multiply the limit calculated from
                                    --gas=auto [default 1]
  --verbose                         print command output
  -h, --help                        display help for command
```

### Testing Considerations
The new `--gas=auto` default is used every *except* "swap into IST", which is now explicitly responsible for covering the "default gas with automatic wallet provisioning" scenario.

### Upgrade Considerations
This will now detect any future change that pushes PSM swap gas consumption beyond the default limit.
mergify bot added a commit that referenced this pull request Nov 12, 2024
## Description
Extracted from #10165.
Best reviewed by individual commits, because the introduction of `deepCopyJsonable` is a bit noisy.

### Security Considerations
n/a

### Scaling Considerations
n/a

### Documentation Considerations
n/a

### Testing Considerations
n/a

### Upgrade Considerations
n/a
mergify bot added a commit that referenced this pull request Nov 13, 2024
…es (#10446)

## Description
Extracted from #10165 and dependent upon Agoric/agoric-3-proposals#193 .
This expands `test-docker-build` CI job artifacts from just core eval scripts to their containing a3p-integration directory plus the job's slogfile.

### Security Considerations
n/a

### Scaling Considerations
The final slogfile from #10165 was about 67 MB, which is smaller than the 161 MB deployment-test-results artifact. But regardless, I've limited retention to ~~4 days on success/10 days on failure~~ 10 days, and we could also compress before upload if even that is too much.

### Documentation Considerations
n/a

### Testing Considerations
n/a

### Upgrade Considerations
n/a
mergify bot added a commit that referenced this pull request Nov 14, 2024
…#10466)

## Description
Resolving some confusion I experienced while debugging #10165 and working on #10446.
* refactor build-submission.sh to be working-directory-aware and accept an arbitrary number of additional arguments
* document build-submission.sh w.r.t. package.json "agoricProposal"
* document build-all-submissions.sh w.r.t. `./proposals?:*`
* add explanatory output to build-all-submissions.sh
* explain all of the above in the README

### Security Considerations
None known.

### Scaling Considerations
n/a

### Documentation Considerations
Included in the README and shell scripts.

### Testing Considerations
These scripts are covered by integration testing.

### Upgrade Considerations
None known; a3p-integration should be self-contained.
mergify bot added a commit that referenced this pull request Nov 20, 2024
## Description
Extracted from #10165 and best reviewed by commit.
* Include relevant variable details (e.g., block height).
* Omit output that doesn't convey new information "done", "reading", etc.).
* Provide a common logging prefix for subtasks, inlining functions where relevant.
* Remove endo/init/legacy.js from z:acceptance ava config to eliminate noisy `Object <[Object: null prototype] {}>` output.
* Introduce a `logRecord` helper to concisely log possibly-remotable-bearing records and/or record entries.

### Security Considerations
n/a

### Scaling Considerations
n/a

### Documentation Considerations
n/a

### Testing Considerations
n/a

### Upgrade Considerations
n/a
mergify bot added a commit that referenced this pull request Dec 9, 2024
## Description
Extracted from an error investigation in #10165.

```diff
-anachrophobia strikes ${vatID} on delivery ${deliveryNum}
+anachrophobia strikes ${vatID} syscalls for delivery ${deliveryNum}
```

```diff
-anachrophobia in ${vatID}: {extra,wrong} syscall
+anachrophobia in ${vatID} delivery ${deliveryNum}: {extra,wrong} syscall at index ${idx}
```

```diff
-anachrophobia in ${vatID}: missing syscalls
+anachrophobia in missing ${missing} syscall(s) at index ${syscallsMade.length}
```

Also capitalizes `WRONG`/`MISSING`/`EXTRA` after `sc[${idx}]:`, for increased visual distinction.

### Security Considerations
None known; any consumer aware of vatIDs should also be allowed to know delivery numbers and constituent syscalls.

### Scaling Considerations
n/a

### Documentation Considerations
n/a

### Testing Considerations
n/a

### Upgrade Considerations
None known; this kernel output should be independent of consensus (and will be consistent across nodes using the same kernel code anyway).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge force:integration Force integration tests to run on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

terminate vats slowly
3 participants