-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't reschedule auction's price notifier if we already have one #10615
Conversation
@@ -464,6 +465,8 @@ export const prepareAuctionBook = (baggage, zcf, makeRecorderKit) => { | |||
AmountMath.make(collateralBrand, QUOTE_SCALE), | |||
bidBrand, | |||
); | |||
baggage.set(QUOTE_NOTIFIER, quoteNotifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's storing a promise in (durable) baggage, which isn't allowed (also that const quoteNotifier =
should probably be quoteNotifierP =
to make that more obvious). Would it hurt anything to await
and store the actual Presence instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
I also notice that we don't seem to make a new baggage for each book. I need to figure out if that's true, and either make new baggages, or make unique names for each notifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, no, that wouldn't be a trivial change. It would make observeQuoteNotifier
async, but also it wouldn't be setting the flag at the right time: if two callers invoke observeQuoteNotifier
in the same turn, we'd still wind up with multiple outstanding calls to makeQuoteNotifier
, and we'd get multiple observers again.
I think observerQuoteNotifier()
needs to be named/intentioned "maybeObserveQuoteNotifier" or "maybeStartObservingQuoteNotifier". And it needs to be "async idempotent" (not sure if that's a term of art but you get the idea): no matter how many times you call it in a row, you still only wind up with a single outstanding async request, but if that request fails for any reason, it resets back to a state where you can call it again and it won't short-circuit.
The flag needs to be checked on entry to the function, during the synchronous prelude, before sending any messages. Then it needs to send off the makeQuoteNotifier
, and if that fails it needs to clear the flag. If it succeeds then it can build the observer, which clears the flag if/when the observer fails or finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated via pairing.
Deploying agoric-sdk with Cloudflare Pages
|
b68ac12
to
ba64933
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitting review now so @Chris-Hibbert can consider the map-vs-set thing, will re-review when that's settled
// auctionBook changed to create a sub-baggage for each book (to distinguish | ||
// their quoteNotifier flags) so older auctions will not be upgradable to this | ||
// version. We believe this version is saving all the necesary state to be | ||
// upgraded, but that hasn't been verified, so we don't mark it as `canUpgrade'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of those meta
flags is that the first sentence ("older versions will not be upgradable to this version") is the reason we don't mark this as canUpgrade
. The second sentence ("saving all the necessary state") is what justifies our use of canBeUpgraded
. The fact that we haven't verified it is either good note to add but doesn't change the flags, or a reason to not use canBeUpgraded
.
My opinion is that canBeUpgraded
is appropriate, but I don't know how much of a stickler we are for "there must be tests to prove this claim before we're allowed to make it".
So maybe add ", so we do not mark this as canUpgrade
" to the first sentence, and maybe make the second sentence be like "We believe.., so we mark this as canBeUpgraded
(but this hasn't been verified with an a3p test)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we haven't verified it is either good note to add but doesn't change the flags, or a reason to not use canBeUpgraded.
If we had verified it, it would be 'canUpgrade. The only time it should be
canBeUpgraded` is when we hope it's upgraded, but haven't proven it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought canUpgrade
is backwards-looking, while canBeUpgraded
is forward-looking, and that there was no way to express "probably upgradable but not tested".
If I understand it correctly, canUpgrade
means that this code is capable of upgrading a previous version, which we know is false here because the sub-baggage we added.. (actually we just changed that, we're no longer modifying the way baggage is used, so maybe this could upgrade the previous version, but I imagine there are other reasons that might not work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never did know the difference. Ooh! we have docs now:
* - `canUpgrade` means this code can perform an upgrade
* - `canBeUpgraded` means that the contract stores kinds durably such that the next version can upgrade
So I concur that canBeUpgraded
is correct and that any test for canUpgrade
from the previous version is certain to fail.
@@ -118,6 +122,12 @@ export const makeOfferSpecShape = (bidBrand, collateralBrand) => { | |||
export const prepareAuctionBook = (baggage, zcf, makeRecorderKit) => { | |||
const makeScaledBidBook = prepareScaledBidBook(baggage); | |||
const makePriceBook = preparePriceBook(baggage); | |||
// a map from collateralBrand to true when the quoteNotifier has an observer | |||
// the brand is absent when there's no observer. | |||
const quoteNotifierFlags = provideDurableMapStore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, hey, should this just be a DurableMapSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. Nits on the comments, might pull in @dckc on the meta
flag question (and to do another pass in general).
@@ -645,8 +668,8 @@ export const prepareAuctionBook = (baggage, zcf, makeRecorderKit) => { | |||
|
|||
trace(`capturing oracle price `, state.updatingOracleQuote); | |||
if (!state.updatingOracleQuote) { | |||
// if the price has feed has died, try restarting it. | |||
facets.helper.observeQuoteNotifier(); | |||
// if the price feed has died, restart it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add "(or hasn't been started for this incarnation yet)"
// auctionBook changed to create a sub-baggage for each book (to distinguish | ||
// their quoteNotifier flags) so older auctions will not be upgradable to this | ||
// version. We believe this version is saving all the necesary state to be | ||
// upgraded, but that hasn't been verified, so we don't mark it as `canUpgrade'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought canUpgrade
is backwards-looking, while canBeUpgraded
is forward-looking, and that there was no way to express "probably upgradable but not tested".
If I understand it correctly, canUpgrade
means that this code is capable of upgrading a previous version, which we know is false here because the sub-baggage we added.. (actually we just changed that, we're no longer modifying the way baggage is used, so maybe this could upgrade the previous version, but I imagine there are other reasons that might not work).
The declaration of
I'd be happy to take it out. It's not required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW...
// Ensure that there is an observer monitoring the quoteNotifier. We | ||
// assume that all failure modes for quoteNotifier eventually lead to | ||
// fail or finish. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I read this as "We rely on SOMETHING to see that all failure modes for quoteNotifier eventually lead to fail or finish"? What is the SOMETHING?
If we don't rely on something else, we have to ensure it here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption is baked into the design of the notifiers and observers. There's no way to interrupt it from the consumer end, and no way to verify that it's still alive. Its contract says if it every ceases to continue producing new values it will either call fail
or finish
, or return an exception because the vat died.
// Brands that have or are making active quoteNotifier Observers | ||
const observedBrands = new Set(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to get the relevant invariant in my head.
Something that combines observedBrands
with state.updatingOracleQuote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state.updatingOracleQuote
has a value when we've received an update since the last auction. observedBrands
contains the brand if the priceNotifier observer has been started and hasn't returned an error or called finish
.
In the normal case, observedBrands
doesn't change, but state.updatingOracleQuote
gets reset every auction.
The interesting cases are startup and if the notifier ever breaks. The problem with startup is that it can take an arbitrary timespan for the oracles to start reporting prices after the auction starts. We want to start the observer, but we don't want to do it multiple time (as we were previously doing) because each creates a new observer that persists.
// auctionBook changed to create a sub-baggage for each book (to distinguish | ||
// their quoteNotifier flags) so older auctions will not be upgradable to this | ||
// version. We believe this version is saving all the necesary state to be | ||
// upgraded, but that hasn't been verified, so we don't mark it as `canUpgrade'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never did know the difference. Ooh! we have docs now:
* - `canUpgrade` means this code can perform an upgrade
* - `canBeUpgraded` means that the contract stores kinds durably such that the next version can upgrade
So I concur that canBeUpgraded
is correct and that any test for canUpgrade
from the previous version is certain to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in one of these cases I wish state with shapes could grow new props.
@@ -118,6 +118,8 @@ export const makeOfferSpecShape = (bidBrand, collateralBrand) => { | |||
export const prepareAuctionBook = (baggage, zcf, makeRecorderKit) => { | |||
const makeScaledBidBook = prepareScaledBidBook(baggage); | |||
const makePriceBook = preparePriceBook(baggage); | |||
// Brands that have or are making active quoteNotifier Observers | |||
const observedBrands = new Set(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explain why it's ok to have this be a heap Set? Both from a cardinality and upgradability pov.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brian and I talked through this. If we decide to upgrade auctions, then the restart won't have an observer running, but it won't have reset any tracking variable we create, so it's important that the tracking go away on restart.
I think the cardinality question is about the fact that we're sharing a Set across auctoinBooks. We don't have ephemeral per-object state, but we can have ephemeral shared state. The number of brands handled by the auction is small and seldom grows, so the set won't get too big. We're already keeping all the auctionBooks in a single vat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured as much for the upgradability side, but I would prefer this to be written out as comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, we lost the race with the merge. Please let me know if you think it's worth a new PR to add the comment. I'm happy to do it if you ask.
}, | ||
}), | ||
e => { | ||
trace('makeQuoteNotifier failed, resetting', e); | ||
state.updatingOracleQuote = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to reset the state here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's safe because we'll check for a notifier again at the next auction. It's a good idea because we can't count on the notifier continuing to produce events if there's an error at this level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't quite understand in which case this isn't already null
. This is an error making the notifier, so afaiu we would never have updated this state in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't explore it deeply, but I think if a previous version was happily updating the state, then it upgrades, then anything which samples the price at the beginning of the new incarnation (before any new updates happen) will get the old price. If an auction then starts and initiates the observer process, but fails (maybe because the price feed vat is getting upgraded too?), then we'll hit this case, clear the ephemeral flag (so the next auction can initiate a new one), and also clear the durable state.updatingOracleQuote
, which at least will tell clients to not rely on the old quote.
I suspect that's a bit weird. Not sure if it's a problem though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 469, we add the brand to the set, and on line 472, we attempt to create the notifier. On line 477, we await the notifier. If the creation promise fails, we end up here, and want to ensure the brand is removed from the set, or we'll never try again to create a notifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure the brand is removed from the set, or we'll never try again to create a notifier.
I have no problem with the ephemeral set being reset on error. My concern is with how the durable updatingOracleQuote
state is managed. Resetting it at this state doesn't seem correct to me, and might be an indication something else is amiss.
Brian mentions a case where this durable state may not be null
: after an upgrade. I suppose the product question is whether we want to maintain or not the old "updating" quote while we attempt to restart the observer.
But beyond that question, the problem is that the ensureQuoteNotifierObserved
call is gated on !updatingOracleQuote
in the first place, so really it seems like restarting the vat wouldn't trigger the observer creation anyway because that state wouldn't be reset by the upgrade, and the contract would think it still has a notifier.
IMO, we need to switch the call to ensureQuoteNotifierObserved
to be unconditional, return true
if the observer is in the Set, and if not immediately reset the updatingOracleQuote
(if we want to avoid pretending we have an updating quote), start the notifier, and return false so that captureOraclePriceForRound
can continue bailing out early. In that case updatingOracleQuote
wouldn't need to be reset if we fail to make the observer since we'd be guaranteed for it to already be null
(again if we don't want to pretend we have an updating quote while re-establishing the observer).
Regarding the early bail out of captureOraclePriceForRound
, I'm a little surprised that we would skip capturing a price for round when the observer isn't yet setup instead of delaying the capture, but that's a different topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, if the auctionBook has a price (in durable state.updatingOracleQuote
), then gets restarted/upgraded, the new incarnation won't start a new observer at startup (because nothing special happens at restart, it's not like we walk through all the pre-existing brands and do something for each one), and when the auction timer fires and it calls captureOraclePriceForRound()
, that will see the old price and assume it's ok, and won't start a replacement observer then either.
I agree with @mhofman that the captureOraclePriceForRound()
needs to check the ephemeral set instead of the durable state, or simply always call ensureQuoteNotifierObserved()
and rely on its internal check to prevent duplicates.
const { state, facets } = this; | ||
const { collateralBrand, bidBrand, priceAuthority } = state; | ||
|
||
if (observedBrands.has(collateralBrand)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using the collateralBrand
as key instead of the identity of this auctionBook? Are we guaranteed there is only ever a 1:1 relationship between auctionBook and collateralBrand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of auctionBook
and auctioneer
use the collateralBrand
, so we're already reliant on it being 1:1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there is never multiple bid brands for a given collateral? I'm just wary of assumptions like this that aren't super clear or documented in code comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the bids are in IST, so there's no chance of that.
With Vaults, we explicitly intended that we might want to have multiple vaultManagers for a given currency that could have different minimums, or other requirements, and charge different rates.
With auctions, it's a mistake to split the bidding equity across multiple auctions, so the design was always that each auctionBook would handle all the bids and asks for a particular asset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think having a single bidding denom is the assumption that wasn't clear to me.
) ## Description It was observed, after applying upgrade 18 to EmeryNet, that a whole slew of promises resolved when a price was provided. The current belief is that `observeQuoteNotifier` will be called every auction round until a price is available, and that creates a new observer with each call that wait until a price is published, and then they all continue waiting for each successive update. This change adds an interlock, so if there's already a notifier waiting, we don't add a new one. ### Security Considerations No security implication. ### Scaling Considerations Processing about 19000 actions waiting on a notifier in EmeryNet took several hours. If we're correct that the notifiers will continue to cycle, we expect to see a similar wait for each price update on that currency. That's unsustainable. The only current theory about dropping all those actions waiting for notifiers is to kill the vat. We can't kill the priceAuthority vats that hold the notifiers, but we might be able to cleanly kill the abandoned auctioneers. ### Documentation Considerations Not needed. ### Testing Considerations Tough to test in unit tests. It's conceivable that we could recreate the situation in `a3p-integration`, though it would be hard to observe the results. ### Upgrade Considerations We probably shouldn't ship upgrade 18 with something to address this problem.
### Description Cherry-picks the following commits from master: - #10551 (9e19321) - #10635 (ad4e83e) - #10615 (e596a01 ) - #10634 (a1856f3) Since we plan to verify this rc on devnet rather than emerynet, there is no apparent need for a new upgrade name. Not aware of any deployments on devnet before so can reuse the previous upgrade name. However, skipping emerynet is dependent on comms with the validators so I have added a new upgrade name `agoric-ugprade-18-emerynet-rc3` just in case. Only added to bypass the need for a new rc if for some reason emerynet validation is needed anyways - best case scenario is that it remains unused. commits added using git cherry-pick
refs: #10660 ## Description in testing upgrade 18 candidates, we realized that vaultManager was subject to the same error as auctions fixed in #10615. ### Security Considerations No Security implications. ### Scaling Considerations Not fixing this would add a new notifier to the vaultManager's quote watcher for every hour that elapsed between upgrade and the oracles starting to provide price updates. ### Documentation Considerations Unnecessary. ### Testing Considerations This was detected by trawling slogs. It's not clear that we can do better than that to verify the fix. ### Upgrade Considerations It would probably be a mistake to upgrade vaults without this fix.
Description
It was observed, after applying upgrade 18 to EmeryNet, that a whole slew of promises resolved when a price was provided. The current belief is that
observeQuoteNotifier
will be called every auction round until a price is available, and that creates a new observer with each call that wait until a price is published, and then they all continue waiting for each successive update.This change adds an interlock, so if there's already a notifier waiting, we don't add a new one.
Security Considerations
No security implication.
Scaling Considerations
Processing about 19000 actions waiting on a notifier in EmeryNet took several hours. If we're correct that the notifiers will continue to cycle, we expect to see a similar wait for each price update on that currency. That's unsustainable.
The only current theory about dropping all those actions waiting for notifiers is to kill the vat. We can't kill the priceAuthority vats that hold the notifiers, but we might be able to cleanly kill the abandoned auctioneers.
Documentation Considerations
Not needed.
Testing Considerations
Tough to test in unit tests. It's conceivable that we could recreate the situation in
a3p-integration
, though it would be hard to observe the results.Upgrade Considerations
We probably shouldn't ship upgrade 18 with something to address this problem.