-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACP-77: Implement validator state #3388
ACP-77: Implement validator state #3388
Conversation
Co-authored-by: Darioush Jalali <[email protected]> Signed-off-by: Ceyhun Onur <[email protected]>
Signed-off-by: Ceyhun Onur <[email protected]>
Co-authored-by: Darioush Jalali <[email protected]> Signed-off-by: Ceyhun Onur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
(I left a question for the caching in #3516 (comment))
maybeSOVOverhead = wrappers.BoolLen + sovOverhead | ||
entryOverhead = ids.IDLen + maybeSOVOverhead | ||
) | ||
if maybeSOV.IsNothing() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
under what circumstance is it nothing? Both here and in general (line 196 in subnet_only_validator.go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cache stores hits as maybe.Some
and misses as maybe.Nothing
. So, if someone performs a delete
or a get
which results in not found, maybe.Nothing
would be cached.
return false, fmt.Errorf("%w: %s", ErrMissingParentState, d.parentID) | ||
} | ||
|
||
return parentState.HasSubnetOnlyValidator(subnetID, nodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might seem as a weird question, but - how deep is the expected recursion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recursion ends at the lastAcceptedState and starts at the most recently verified block, so typically the recursion is very shallow.
return weight, nil | ||
} | ||
|
||
parentState, ok := d.stateVersions.GetState(d.parentID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain the high level idea of doing this backwards traversal in search of the subnet, both here and in the next function?
We basically don't have a full snapshot of the validators in each state? Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performing a full copy of the validator states for each diff would be pretty expensive. The number of changes per block is typically small, but the number total entries being tracked can be quite large.
vms/platformvm/state/state.go
Outdated
// SubnetOnlyValidator with the given validationID. It is guaranteed that any | ||
// returned validator is either active or inactive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No action needed) Aren't all validators always either active or inactive? Not sure I understand the purpose of the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was supposed to signify that no deleted SoVs will be returned from this function (isDeleted()
will always return false). Specifically, this is to ensure that !isActive()
means inactive
(and not deleted
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SubnetOnlyValidator with the given validationID. It is guaranteed that any | |
// returned validator is either active or inactive. | |
// returned validator is either active or inactive (not deleted). |
// ensures that a subnetID+nodeID pair that was deleted and then re-added in | ||
// a single diff can't get reordered into the addition happening first; | ||
// which would return an error. | ||
for _, sov := range d.sovDiff.modified { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing devil's advocate here, we're iterating here over a map, and calling baseState.PutSubnetOnlyValidator
which may return with different errors. This means that across different nodes, we may fail differently, no?
Is that a concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is generally unexpected for Apply
to ever error.
Apply
is called during Accept
. If Apply
were to return an error, that error would be propagated through Accept
into the consensus engine (causing a FATAL).
if priorSOV.Weight < sov.Weight { | ||
err = s.validators.AddWeight(sov.SubnetID, nodeID, sov.Weight-priorSOV.Weight) | ||
} else if priorSOV.Weight > sov.Weight { | ||
err = s.validators.RemoveWeight(sov.SubnetID, nodeID, priorSOV.Weight-sov.Weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is priorSOV.Weight == sov.Weight
legal? what does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the priorSOV.Weight
is greater than sov.Weight
, then the weight of the validator is being decreased by priorSOV.Weight - sov.Weight
units
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no but i asked what it means when they're equal, I edited shortly after, probably github didn't refresh my comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. It is possible for them to be equal. In that case the weight does not need to be modified.
vms/platformvm/state/diff.go
Outdated
feeState gas.State | ||
sovExcess gas.Gas | ||
accruedFees uint64 | ||
parentActiveSOVs int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: numParentActiveSOVs is more descriptive IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or parentActiveSOVCount
because a "count" is more descriptive than a "number" 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up going with parentNumActiveSOVs
.
Because it aligns most with the actual source of this: parentState.NumActiveSubnetOnlyValidators()
return parentState.WeightOfSubnetOnlyValidators(subnetID) | ||
} | ||
|
||
func (d *diff) GetSubnetOnlyValidator(validationID ids.ID) (SubnetOnlyValidator, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple high level questions that may be related:
- What triggers a new
diff
to be created? - Did you consider updating
diffs
in place, rather than the linked list style approach here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diffs are created for a few reasons. But the reason the diffs are designed this way is because each diff (can) represent the changes that are performed by a block. When a block is verified a diff is created. When a block is accepted a diff is applied to the disk state.
Diffs are updated in-place during the execution (verification) of a block - but we wouldn't want a child block to modify the state that will be committed when the parent block is accepted (as the child could be rejected for a different child)
|
||
// GetSubnetOnlyValidator returns the validator with [validationID] if it | ||
// exists. If the validator does not exist, [err] will equal | ||
// [database.ErrNotFound]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit odd to call out the error type here when that's not enforceable by the interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is particularly unusual... This is done in the standard lib as well. For an example, see SetDeadline
here: https://pkg.go.dev/net#Conn
vms/platformvm/state/state.go
Outdated
@@ -2230,6 +2571,16 @@ func (s *state) writeValidatorDiffs(height uint64) error { | |||
return nil | |||
} | |||
|
|||
func getOrDefault[K comparable, V any](m map[K]*V, k K) *V { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining that the default value is written to the map would be helpful. Or consider renaming to getOrAddDefault
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did both
for subnetID, weight := range s.sovDiff.modifiedTotalWeight { | ||
var err error | ||
if weight == 0 { | ||
err = s.weightsDB.Delete(subnetID[:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the expected behavior of WeightOfSubnetOnlyValidators
after the entry is deleted from the DB and the cache entry expires? If it's to error, perhaps we should not write to the cache so that WeightOfSubnetOnlyValidators
deterministically errors for any subnets with weight 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WeightOfSubnetOnlyValidators
calls database.WithDefault(database.GetUInt64, s.weightsDB, subnetID[:], 0)
which will return 0
if the value is not on disk. So, WeightOfSubnetOnlyValidators
deterministically returns 0
for any subnets with weight 0
.
vms/platformvm/state/state.go
Outdated
// SubnetOnlyValidator with the given validationID. It is guaranteed that any | ||
// returned validator is either active or inactive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SubnetOnlyValidator with the given validationID. It is guaranteed that any | |
// returned validator is either active or inactive. | |
// returned validator is either active or inactive (not deleted). |
Why this should be merged
ACP-77 introduces a new type of validator: the
SubnetOnlyValidator
.This PR introduces this new validator type to the P-chain state.
How this works
SubnetOnlyValidator
For the new
SubnetOnlyValidator
type, the validator can either beactive
(EndAccumulatedFee != 0
) orinactive
(EndAccumulatedFee == 0
).active
validators are handled similarly to other validator types. They can be sampled during consensus, can contribute to block production, can contribute to ICM (Warp) messages, and are held in memory to efficiently perform these operations. To limit the memory pressure introduced byactive
validators, the number ofactive
validators is limited.Unlike
active
validators,inactive
validators are not sampled during consensus, can not contribute to block production, and can not contribute to ICM (Warp) messages. This allowsinactive
validators to be removed from memory and only kept on disk. EachL1
will have an entry in their validator set representing the cumulative amount of inactive stake.Removing a
SubnetOnlyValidator
is performed simply by setting theirWeight
to be0
.State Modifications
The P-chain state is broken into 2 implementations:
state
diff
The typical flow for performing a change is:
diff
on top of either thestate
or anotherdiff
.diff
to the parent, either thestate
or anotherdiff
.state
to disk.Contrary to prior validator implementations within the state package, this PR attempts to provide validation of the requested changes.
Returned errors from
PutSubnetOnlyValidator
should result in no changes to the state.PutSubnetOnlyValidator
succeeding should result in a valid state representation, regardless of the expected API behavior.The only exception to this is documented with:
It is difficult to handle this case within the state without introducing non-deterministic error reporting because validationIDs are deleted from disk when the state is deleted. This means that the state code would need to handle all of the fields of an SoV to change on demand. However, that would require additional changes to the validator manager and other parts of the code (which make assumptions around
ValidationID
implying a staticsubnetID + nodeID
pair).The in-memory validator manager and historical validator set diffs are only updated during the atomic disk commitment changes.
How this was tested
Added unit tests.
Need to be documented in RELEASES.md?
Added P-chain configs:
l1-weights-cache-size
l1-inactive-validators-cache-size
l1-subnet-id-node-id-cache-size