-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qip-0012: Qi UTXO Pruning #36
base: master
Are you sure you want to change the base?
Conversation
We achieve this simply, by limiting the size of the UTXO set. If a transaction | ||
creates new UTXOs which would exceed the UTXO set capacity, we destroy the | ||
smallest UTXOs in the ledger. This is practical to do thanks to the fixed | ||
denominations in the Qi ledger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I find the smallest and oldest UTXO in the ledger? The UTXO trie is not organized in a FIFO manner (or any organization except for some key prefixing, as far as I can tell)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it will involve some filter routine on these events. As an optimization, we can consider indexing by denomination or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some detail in the QIP regarding how this might be achieved? I was under the impression that indexing would be optional, not in consensus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well this is a different kind of indexing than the indexers used by the RPC. It could be implemented any number of ways, up to each implementation, so I don't want the QIP to say "this is the way its done". But for context, here are some ways it could be done:
Just-In-Time Scanning (not performant):
let mut denomination = MAX_DENOMINATION;
let mut delete_list = Vec::new();
// First scan and collect the keys of every UTXO to be deleted
for utxo in set {
// Found a new smaller denomination. Reset the scanner.
if utxo.denomination < denomination {
denomination = utxo.denomination;
delete_list.clear();
}
// If the utxo matches the smallest denomination, add it to the delete list
if utxo.denomination == denomination {
delete_list.push(utxo.key);
}
}
// Now go back and delete each key you found
for key in delete_list {
set.delete(key);
}
Keeping Denominations By Index:
struct UtxoSet {
utxos: HashMap<UtxoKey, Utxo>,
denominations: HashMap<Denomination, HashSet<UtxoKey>>,
}
impl UtxoSet {
// Add a UTXO to the set, and prune the set if it gets too large
fn AddUtxo(mut self, utxo: Utxo) {
... make sure its a valid utxo ...
// Add to the UTXO set
self.utxos.insert(utxo.key, utxo);
// Add to the denomination index
self.denominations[utxo.Denomination].insert(utxo.key)
// Check if the set is too large, and trigger deletions
if self.len() > UTXO_SET_CAPACITY {
// Find the smallest denomination in the set
let min_denomination = self.denominations
.iter() // iterate through the index lists
.filter(|(den, list)| !list.is_empty()) // filter out any denominations which don't have existing UTXOs
.map(|(den, _)| den) // just look at the denominations
.min(); // get the smallest denomination
// Delete every UTXO in the smallest denomination list
for key in self.denominations.get(min_denomination) {
self.DeleteUtxo(key);
}
}
}
// Delete a UTXO from the set
fn DeleteUtxo(mut self, key: UtxoKey) {
// Delete it from the set, and if it existed, delte it from the indexed lists
if let Some(utxo) = self.utxos.remove(key) {
self.denominations.get(utxo.denomination).remove(utxo.key);
}
}
}
The second requires more memory (effectively double the UTXO set), but takes very little time to prune the set.
There could be trade-off approaches, e.g. one which only indexes the keys of the smallest denomination, but the logic to get that right is beyond the scope of this thread, lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deletion should also take into account when the UTXO was created, right? i.e. the smallest and oldest, dustiest UTXOs are deleted first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps an ordered list should be maintained, organized by FIFO and denomination. It would have to be updated for each block, and perhaps even committed to in the header. Hopefully insertions are no worse than O(logn)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I just gave some quick examples, because you asked. There's a million ways to skin this cat
|
||
We set a max trie depth size of 10, which corresponds to a max UTXO set size of | ||
$16^10 \appox ~1 trillion$ UTXOs. If a transaction mints new UTXOs which exceed | ||
the $16^10$ limit, the node shall prune all of the smallest UTXOs from the UTXO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long would it take to recompute the root of the trie with 1 trillion nodes/depth of 10? Average case for say an 8-core CPU? There's a max number of UTXOs that can be emitted and destroyed per block based on the block gas limit which gives some upper bound, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the computation isnt the limit. Its usually the disk IOPS to read write each trie node that limits you. To strictly answer you question, here's some napkin math:
An 8c/16t 4GHz CPU, assuming keccak takes 15 cycles per byte (source wikipedia) and each trie node being up to 17*32 bytes:
17 * 32 * 15 = 8160 cycles / trie node hash
4GHz / 8160 = 490K tries/s per thread
490K tries/s * 16 threads = 7.8M tries/s
10 tries / 7.8M tries/s ~= 1.36us / root calculation
But, as I mentioned at the start, this is just the compute component. The dominant cost is actually the IOPS the disk can handle. A high end SSD tends to get around 45K IOPS, which equates to ~= 23us per disk access. At 10 trie levels, you need 460 us just to read 10 original nodes and write 10 new nodes, as well as 2x23us for the leaf node itself. Lets call it 500us to add a single UTXO to the trie. So, we could add 2K UTXOs per second to the trie at level 10.
That is the naive implementation. A good implementation will amortize some of those costs with batch operations, but that's beyond the scope of my napkin math. There's also some costs not accounted for here, e.g. time to look up and remove spent UTXOs from the set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did 10 tries
come from? I'm curious about the length of time to recompute the root of a PMT with a trillion elements. Shouldn't the computation use 1 trillion trie nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean 1 trillion leafs/accounts, not including all the intermediate trie nodes, right?
10 trie nodes is the number of levels in
I thought you were asking for the CPU time, so I gave you some napkin math for that, but I realize now you are just asking for total recomputation time, which again is dominated by IOPS, not CPU performance.
There are a LOT of factors that could influence disk access performance (disk speed, how busy is the disk with other software?, database in-memory caching / paging strategies, etc), so its not reasonable to try and "napkin math" it here. You'd have to benchmark a particular implementation to get an idea.
980ebd2
to
ca3ad3a
Compare
## Specification | ||
We achieve this simply, by limiting the size of the UTXO set. If a transaction | ||
creates new UTXOs which would exceed the UTXO set capacity, we destroy the | ||
smallest UTXOs in the ledger. This is practical to do thanks to the fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be smallest and oldest
Similarly to QIP 11, the gas cost to create a new UTXO should increase as the size of the set grows. Once we've hit the limit, it is sensible to destroy small and old UTXOs, but it should also be more expensive to create new ones. The total cost of a transaction can be offset by destroying UTXOs (by using them as inputs to the transaction). |
No description provided.