You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm reading through the design docs, and I have couple of questions.
First, I wonder what's the ballpark of a speedup that hammersbald gives relative to rocksdb.
Second, I'm not very familiar with linear hashing (probably need to do my homework), but I wonder - wouldn't LSM be better for maintaining a key to offset mappings? Seems like consistent hashing involves a mutable data structure, that (I'm guessing) would need some synchronization here and there, while LSM is kind of append-only (just like rest of the data in hammersbald) parallelizes great, and both CPUs and modern fast storage seem to evolve into heavy parallelization direction. See https://itnext.io/modern-storage-is-plenty-fast-it-is-the-apis-that-are-bad-6a68319fbc1a & https://crates.io/crates/glommio . Also, with LSM maybe one could completely avoid having to deal with recovery-log when coupled to blockchain indexing. The way I understand it LSM can't really corrupt itself (since it's append only), it can only truncate the data. The same data that available in the blockchain anyway, and in case of a crash can just get re-inserted.
Edit: Oh. I guess the read amplification is what is making LSM not great. I think my mindset is to focused on initial blockchain indexing performance assuming everything was already validated, and I ignored the lookups.
The text was updated successfully, but these errors were encountered:
Hi,
I'm reading through the design docs, and I have couple of questions.
First, I wonder what's the ballpark of a speedup that hammersbald gives relative to rocksdb.
Second, I'm not very familiar with linear hashing (probably need to do my homework), but I wonder - wouldn't LSM be better for maintaining a key to offset mappings? Seems like consistent hashing involves a mutable data structure, that (I'm guessing) would need some synchronization here and there, while LSM is kind of append-only (just like rest of the data in hammersbald) parallelizes great, and both CPUs and modern fast storage seem to evolve into heavy parallelization direction. See https://itnext.io/modern-storage-is-plenty-fast-it-is-the-apis-that-are-bad-6a68319fbc1a & https://crates.io/crates/glommio . Also, with LSM maybe one could completely avoid having to deal with recovery-log when coupled to blockchain indexing. The way I understand it LSM can't really corrupt itself (since it's append only), it can only truncate the data. The same data that available in the blockchain anyway, and in case of a crash can just get re-inserted.
Edit: Oh. I guess the read amplification is what is making LSM not great. I think my mindset is to focused on initial blockchain indexing performance assuming everything was already validated, and I ignored the lookups.
The text was updated successfully, but these errors were encountered: