-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store metadata in redb #1954
Store metadata in redb #1954
Conversation
also implement scanning from disk to db.
also handle the fallibility properly (no unwrap) in the gc part.
- Migration code now uses a new directory blobs-v2 - Integration tests have to sync metadata after changing the file system
(and maybe later for use from iroh cli)
ba335df
to
d2d816d
Compare
Version is now tracked within the database.
Update on this whole trainwreck. There wasn't a bug. I got this helper fn step that waits for a certain number of gcs before proceeding. But this did not drain the queue before, so sometimes it would not wait because there were old GcCompleted events in the queue. This is a normal flume queue, not a broadcast queue that forgets messages. |
I guess the bug was in the test helper 😅 |
OK, I have second thoughts about merging this as is. It completely removes the outboard cache, hereby completely changing the performance characteristics of the thing. The former version of the flat store had a consistent concept. It might not have been perfect, but it worked for a large set of use cases. This is some weird hybrid between the former concept and something else. |
this outboard cache was a pretty big footgun given it wasn't limited in size asfaict, so if we bring it back it should be at minimum size limited |
I don't think so. Outboards 1/256 of data size, so for a 1 TiB disk even if you had all outboards in memory it would be just 4 GiB. Ok, could become a problem if you have a Raspberry Pi 3 with a giant external hard drive... But for any real world app even on mobile it would be fine. Part of the reason for the whole bao-tree crate and the chosen chunk group size of 16 chunks was that you would be able to hold the outboards in memory even on a small device. With the original bao crate it would have been 1/16, which would have been way too much. |
Ha-ha, exactly me. 8GB RAM Pi with attached 200TB HDD :D Seems I'm quite unlucky customer. |
OK, seriously, thank you for being such a demanding customer. I am working on a larger refactoring. Basically I want to store small files inline in the redb always, and also store small outboards in the redb once they are complete. This should reduce both the number of files and the number of file operations for many use cases. Having the data for small files and the outboards for small to medium files in redb would provide a kind of size limited cache for those outboards, just the redb cache. And for larger files I would then rely on the file system cache of the operating system. Could you do me a favour and compute some sort of histogram of your file sizes, or just do something like stat -f "%N %z" * # (Mac)
stat -c "%n %s" * # (Linux) and send me the result? That would help a lot. Here is the new PR, building on this one: #1985 |
I have done this:
|
closing in favor of ongoing PRs refactoring the store in a more step-by-step fashion |
This stores metadata in redb instead of in the file system. A part of the data can be inferred from the file system (e.g. partial and complete hashes), another part is exclusively in the redb database (tags).
The upside of this is that it reduces startup delay due to scanning the file system. Also, manipulation of tags should be much faster now since it does not involve IO anymore.
The downside is that now the file system can become inconsistent with the metadata, since there is redundant information.
Closes #1942