-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qip-0012: Qi UTXO Pruning #36
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
``` | ||
QIP: 11 | ||
Layer: Consensus (hard fork) | ||
Title: Paying for Account State Usage | ||
Author: wizeguyy <[email protected]> | ||
Comments-Summary: No comments yet. | ||
Comments-URI: https://github.com/quainetwork/qips/wiki/Comments:QIP-0011 | ||
Status: Draft | ||
Type: Standards Track | ||
Created: 2024-04-01 | ||
License: BSD-2-Clause | ||
``` | ||
|
||
## Abstract | ||
This QIP proposes a mechanism for paying for the state used by an account in | ||
the state tree. State pricing is dominated by the disk IOPS necessary to | ||
load/store trie nodes to perform state root updates. Therefore, we price | ||
account slots logarithmically according to the size of the account trie, and | ||
let the market drive the price through its ability to buy/sell account space. | ||
|
||
## Motivation | ||
As the popularity of a blockchain grows, the data necessary to maintain the | ||
state of each account grows. This state trie growth degrades performance, as | ||
the cost of recomputing state root updates becomes the dominant processing cost | ||
associated with processing a transaction. | ||
|
||
Given the material costs associated with account slots, the protocol needs to | ||
impose that cost on users to prevent unsustainable state usage. Because these | ||
costs are born by the entire network, and not simply the transactor, it is not | ||
sufficient to impose this cost in the form of transaction fees. A long-term | ||
cost must exist, proportional to each account's impact on the state tree and | ||
the duration of that impact (i.e. not just at the time the transaction is | ||
processed). | ||
|
||
## Rationale | ||
The complexity to update the state root is proportional to the number of | ||
accounts in the state tree. Since we use Patricia Merkle trie (PMT) with its | ||
extension node optimization, the tree complexity is rarely worst-case, but for | ||
the reader's intuition, the worst case update cost is bounded by its radix-16 | ||
tree update cost: $O(log_{16}(N))$ for $N$ accounts. In the case of a PMT, the | ||
physical limitation is often the number of disk IOPS necessary to update each | ||
database record for each trie node. | ||
|
||
### Letting Markets Decide the Price | ||
Since node operators may be running any number of hardware configurations, with | ||
different CPU, memory, storage, or network constraints, it is impossible to | ||
pick a concrete price for any of these resources that makes any sense. In fact, | ||
even if we could, the concrete number selected today may be very different as | ||
node operators upgrade hardware or experience failures over time. The solution | ||
to this, is to expose these limited resources to the market so that market | ||
participants can decide the price of these resources based on each's own | ||
subjective preference. | ||
|
||
To accomplish this, we build the protocol to enforce account prices | ||
according to the current size of the account trie. Then we make it possible to | ||
buy and sell account slots back at whatever the current price | ||
is. | ||
Every account must | ||
pay for its index in the PMT, but it may sell its account space back if it | ||
no longer requires it. This accomplishes two things: | ||
|
||
1) The market determines efficient price for account slots | ||
|
||
2) Users are incentivized to clean up the state trie by destroying their | ||
accounts when they no longer need them. | ||
|
||
An important thing to consider, is that Quai's design has the ability to scale | ||
and add more capacity if these limits end up being too small. If, for example, | ||
speculators buy up too many account slots to resell them to future buyers, they | ||
run the risk of increasing block processing latency, which through | ||
[QIP-0008](qip-0008.md) will ultimately lead to the addition of more chains | ||
with more resources. If this happens, speculators will have to compete with | ||
these newly available resources, which would harm their speculative investment. | ||
So there is some negative feedback here, which incentivizes speculators to | ||
participate and help determine market pricing, without fully consuming | ||
resources that would trigger a trie expansion. In fact, when the network gets | ||
close to that point, there is a strong incentive for account holders to sell | ||
back any account space they no longer need, which helps balance resource usage | ||
as the network scales. | ||
|
||
## Specification | ||
### Account Pricing | ||
Every new account added to the tree must pay for its address space. The cost of | ||
this address space must increase as the number of addresses increases. This | ||
will create negative feedback: speculators will occupy address space with new | ||
accounts when its cheap, and destroy their accounts when if it becomes | ||
expensive. This balance of speculator preference with user demand, is how the | ||
market determines the appropriate address space price. | ||
|
||
To achieve this, we use the radix-16 computational bound described above, to | ||
set the account slot price. The price function is: | ||
|
||
$$ P_a = K_a* \lfloor log_{16}(N) \rfloor $$ | ||
|
||
where $P_a$ is the price to create or sell an account slot in Quai per account, | ||
$N$ is the total number of accounts in the trie, and $K_a$ is a constant | ||
scaling factor chosen to adjust price responsiveness to trie size. | ||
|
||
#### Choosing $K_a$ | ||
We choose $K_a$ to set a Schelling point about a reasonable acceptable trie | ||
size, while understanding that the actual size will again depend on the | ||
subjective demand for account space. We choose 1 billion (2^30) accounts per | ||
chain as a reasonable upper bound, and we choose 0.1 Quai per account to be a | ||
high cost which will lead to reduced demand in the account space market. | ||
|
||
We choose: | ||
|
||
$$ K_a = 0.1*\frac{Quai}{account^2} $$ | ||
|
||
$$ \approx 9.31*10^{-11} \frac{Quai}{account^2} $$ | ||
|
||
### Protocols for Buying and Selling Account Slots | ||
For a new account to be created, enough Quai must be sent to cover the account | ||
creation price $P_a$. The new account will be credited with the balance of the | ||
transaction minus the creation price, $P_a$. Any transaction to a new account, | ||
which does not satisfy the creation price, will fail. | ||
|
||
Conversely, to sell account space back, we provide a precompiled contract with | ||
a destruction method. The destruction method will allow the user to provide a | ||
"beneficiary transaction", which may transfer the balance including the sale | ||
price of the account (along with optional TX data) to any other address in the | ||
trie. Upon successful execution of the transaction, the account will be deleted | ||
from the state trie. | ||
|
||
Contract address and ABI TBD. | ||
|
||
### Dynamic Load/Store Gas | ||
The demand curve defined above only works if the load/store costs for accounts | ||
and account data are appropriately priced. Specifically, there cannot be a | ||
fixed price to load an account, load code/data stored within an account, or to | ||
store it back. Since all of these operations depend on merkelization of the | ||
data, the gas used in each operation is dependent on the size of the trie. | ||
Accordingly, these operations cannot have a fixed gas cost as implemented in | ||
the vanilla EVM. | ||
|
||
#### Account Access Gas | ||
The gas required for processing a transaction will be updated to dynamically | ||
compute the account load/store cost according to merkle trie depth. Every trie | ||
node which is loaded will cost $\lfloor log_{16}(N) \rfloor$ times the vanilla | ||
EVM gas cost. | ||
|
||
#### Account Data Access Gas | ||
The gas required for accessing code and storage data will similarly be multiplied | ||
by $\lfloor log_{16}(N) \rfloor$ times the vanilla EVM gas cost for each such opcode. | ||
|
||
## Copyright | ||
This QIP licensed under the BSD 2-clause license. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
``` | ||
QIP: 12 | ||
Layer: Consensus (hard fork) | ||
Title: Qi UTXO Pruning | ||
Author: wizeguyy <[email protected]> | ||
Comments-Summary: No comments yet. | ||
Comments-URI: https://github.com/quainetwork/qips/wiki/Comments:QIP-0012 | ||
Status: Draft | ||
Type: Standards Track | ||
Created: 2024-04-02 | ||
License: BSD-2-Clause | ||
``` | ||
|
||
## Abstract | ||
This QIP proposes a mechanism to prune dust UTXOs from the ledger. | ||
|
||
## Motivation | ||
Transaction outputs in the Qi UTXO ledger, occupy space and impose a burden on | ||
UTXO transaction processing, from the set root recomputations. This is not a | ||
problem under normal usage, but can become a problem if any users/wallets are | ||
lazy or irresponsible with UTXO reconsolidation, or worse if the network | ||
undergoes a spam attack. This QIP describes a protocol change to disincentivize | ||
UTXO state pollution without meaningfully impacting normal transactional | ||
behavior. | ||
|
||
## Rationale | ||
Normal Qi wallet behavior will involve some amount of UTXO splitting, to create | ||
change outputs for payments, as well as UTXO reconsolidation as the wallet | ||
acquires too many small change outputs. This QIP should not negatively impact | ||
this expected user behavior, but should provide a cost to lazy wallets that do | ||
not reconsolidate change outputs. | ||
|
||
## Specification | ||
We achieve this simply, by limiting the size of the UTXO set. If a transaction | ||
creates new UTXOs which would exceed the UTXO set capacity, we destroy the | ||
smallest UTXOs in the ledger. This is practical to do thanks to the fixed | ||
denominations in the Qi ledger. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do I find the smallest and oldest UTXO in the ledger? The UTXO trie is not organized in a FIFO manner (or any organization except for some key prefixing, as far as I can tell) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, it will involve some filter routine on these events. As an optimization, we can consider indexing by denomination or something like that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add some detail in the QIP regarding how this might be achieved? I was under the impression that indexing would be optional, not in consensus There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well this is a different kind of indexing than the indexers used by the RPC. It could be implemented any number of ways, up to each implementation, so I don't want the QIP to say "this is the way its done". But for context, here are some ways it could be done: Just-In-Time Scanning (not performant):let mut denomination = MAX_DENOMINATION;
let mut delete_list = Vec::new();
// First scan and collect the keys of every UTXO to be deleted
for utxo in set {
// Found a new smaller denomination. Reset the scanner.
if utxo.denomination < denomination {
denomination = utxo.denomination;
delete_list.clear();
}
// If the utxo matches the smallest denomination, add it to the delete list
if utxo.denomination == denomination {
delete_list.push(utxo.key);
}
}
// Now go back and delete each key you found
for key in delete_list {
set.delete(key);
} Keeping Denominations By Index:struct UtxoSet {
utxos: HashMap<UtxoKey, Utxo>,
denominations: HashMap<Denomination, HashSet<UtxoKey>>,
}
impl UtxoSet {
// Add a UTXO to the set, and prune the set if it gets too large
fn AddUtxo(mut self, utxo: Utxo) {
... make sure its a valid utxo ...
// Add to the UTXO set
self.utxos.insert(utxo.key, utxo);
// Add to the denomination index
self.denominations[utxo.Denomination].insert(utxo.key)
// Check if the set is too large, and trigger deletions
if self.len() > UTXO_SET_CAPACITY {
// Find the smallest denomination in the set
let min_denomination = self.denominations
.iter() // iterate through the index lists
.filter(|(den, list)| !list.is_empty()) // filter out any denominations which don't have existing UTXOs
.map(|(den, _)| den) // just look at the denominations
.min(); // get the smallest denomination
// Delete every UTXO in the smallest denomination list
for key in self.denominations.get(min_denomination) {
self.DeleteUtxo(key);
}
}
}
// Delete a UTXO from the set
fn DeleteUtxo(mut self, key: UtxoKey) {
// Delete it from the set, and if it existed, delte it from the indexed lists
if let Some(utxo) = self.utxos.remove(key) {
self.denominations.get(utxo.denomination).remove(utxo.key);
}
}
} The second requires more memory (effectively double the UTXO set), but takes very little time to prune the set. There could be trade-off approaches, e.g. one which only indexes the keys of the smallest denomination, but the logic to get that right is beyond the scope of this thread, lol There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The deletion should also take into account when the UTXO was created, right? i.e. the smallest and oldest, dustiest UTXOs are deleted first There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps an ordered list should be maintained, organized by FIFO and denomination. It would have to be updated for each block, and perhaps even committed to in the header. Hopefully insertions are no worse than O(logn)... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, I just gave some quick examples, because you asked. There's a million ways to skin this cat |
||
|
||
Since the network cost we are trying to control is related to the construction | ||
of the UTXO set root, it is sensible to choose a capacity that reflects a | ||
particular depth of the state tree. Since we use a Patricia Merkle trie, the | ||
state root recomputation is bounded by the radix-16 update cost bound: | ||
$O(log_{16}(N))$, where $N$ is the number of UTXOs in the set. | ||
|
||
We set a max trie depth size of 10, which corresponds to a max UTXO set size of | ||
$16^{10} \approx 1 trillion$ UTXOs. If a transaction mints new UTXOs which exceed | ||
the $16^10$ limit, the node shall prune all of the smallest UTXOs from the UTXO | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How long would it take to recompute the root of the trie with 1 trillion nodes/depth of 10? Average case for say an 8-core CPU? There's a max number of UTXOs that can be emitted and destroyed per block based on the block gas limit which gives some upper bound, I suppose. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, the computation isnt the limit. Its usually the disk IOPS to read write each trie node that limits you. To strictly answer you question, here's some napkin math: An 8c/16t 4GHz CPU, assuming keccak takes 15 cycles per byte (source wikipedia) and each trie node being up to 17*32 bytes:
But, as I mentioned at the start, this is just the compute component. The dominant cost is actually the IOPS the disk can handle. A high end SSD tends to get around 45K IOPS, which equates to ~= 23us per disk access. At 10 trie levels, you need 460 us just to read 10 original nodes and write 10 new nodes, as well as 2x23us for the leaf node itself. Lets call it 500us to add a single UTXO to the trie. So, we could add 2K UTXOs per second to the trie at level 10. That is the naive implementation. A good implementation will amortize some of those costs with batch operations, but that's beyond the scope of my napkin math. There's also some costs not accounted for here, e.g. time to look up and remove spent UTXOs from the set. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where did There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume you mean 1 trillion leafs/accounts, not including all the intermediate trie nodes, right? 10 trie nodes is the number of levels in I thought you were asking for the CPU time, so I gave you some napkin math for that, but I realize now you are just asking for total recomputation time, which again is dominated by IOPS, not CPU performance. There are a LOT of factors that could influence disk access performance (disk speed, how busy is the disk with other software?, database in-memory caching / paging strategies, etc), so its not reasonable to try and "napkin math" it here. You'd have to benchmark a particular implementation to get an idea. |
||
set, repeating this process until the UTXO set size is back within the limits. | ||
|
||
## Copyright | ||
This QIP licensed under the BSD 2-clause license. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be smallest and oldest