Store metadata in redb #1954

dignifiedquire · 2024-01-15T19:16:21Z

This stores metadata in redb instead of in the file system. A part of the data can be inferred from the file system (e.g. partial and complete hashes), another part is exclusively in the redb database (tags).

The upside of this is that it reduces startup delay due to scanning the file system. Also, manipulation of tags should be much faster now since it does not involve IO anymore.

The downside is that now the file system can become inconsistent with the metadata, since there is redundant information.

Closes #1942

also implement scanning from disk to db.

also handle the fallibility properly (no unwrap) in the gc part.

iroh-bytes/src/store/flat.rs

- Migration code now uses a new directory blobs-v2 - Integration tests have to sync metadata after changing the file system

(and maybe later for use from iroh cli)

iroh-bytes/src/store/flat.rs

Version is now tracked within the database.

rklaehn · 2024-01-29T13:51:03Z

Update on this whole trainwreck. There wasn't a bug. I got this helper fn step that waits for a certain number of gcs before proceeding. But this did not drain the queue before, so sometimes it would not wait because there were old GcCompleted events in the queue. This is a normal flume queue, not a broadcast queue that forgets messages.

dignifiedquire · 2024-01-29T13:58:41Z

I guess the bug was in the test helper 😅

rklaehn · 2024-01-30T09:30:59Z

OK, I have second thoughts about merging this as is. It completely removes the outboard cache, hereby completely changing the performance characteristics of the thing.

The former version of the flat store had a consistent concept. It might not have been perfect, but it worked for a large set of use cases. This is some weird hybrid between the former concept and something else.

dignifiedquire · 2024-01-30T11:48:42Z

this outboard cache was a pretty big footgun given it wasn't limited in size asfaict, so if we bring it back it should be at minimum size limited

rklaehn · 2024-01-30T16:58:57Z

this outboard cache was a pretty big footgun given it wasn't limited in size asfaict, so if we bring it back it should be at minimum size limited

I don't think so. Outboards 1/256 of data size, so for a 1 TiB disk even if you had all outboards in memory it would be just 4 GiB. Ok, could become a problem if you have a Raspberry Pi 3 with a giant external hard drive...

But for any real world app even on mobile it would be fine.

Part of the reason for the whole bao-tree crate and the chosen chunk group size of 16 chunks was that you would be able to hold the outboards in memory even on a small device. With the original bao crate it would have been 1/16, which would have been way too much.

ppodolsky · 2024-01-31T08:08:26Z

I don't think so. Outboards 1/256 of data size, so for a 1 TiB disk even if you had all outboards in memory it would be just 4 GiB. Ok, could become a problem if you have a Raspberry Pi 3 with a giant external hard drive...

Ha-ha, exactly me. 8GB RAM Pi with attached 200TB HDD :D Seems I'm quite unlucky customer.
May it worth to use some sort of LRU for dat cache?

rklaehn · 2024-01-31T09:26:58Z

I don't think so. Outboards 1/256 of data size, so for a 1 TiB disk even if you had all outboards in memory it would be just 4 GiB. Ok, could become a problem if you have a Raspberry Pi 3 with a giant external hard drive...

Ha-ha, exactly me. 8GB RAM Pi with attached 200TB HDD :D Seems I'm quite unlucky customer. May it worth to use some sort of LRU for dat cache?

OK, seriously, thank you for being such a demanding customer.

I am working on a larger refactoring. Basically I want to store small files inline in the redb always, and also store small outboards in the redb once they are complete. This should reduce both the number of files and the number of file operations for many use cases.

Having the data for small files and the outboards for small to medium files in redb would provide a kind of size limited cache for those outboards, just the redb cache. And for larger files I would then rely on the file system cache of the operating system.

Could you do me a favour and compute some sort of histogram of your file sizes, or just do something like

stat -f "%N %z" * # (Mac)
stat -c "%n %s" * # (Linux)

and send me the result?

That would help a lot.

Here is the new PR, building on this one: #1985

ppodolsky · 2024-01-31T10:02:21Z

I have done this: find . -type f -print0 | xargs -0 ls -l | awk '{size[int(log($5)/log(2))]++}END{for (i in size) printf("%10d %3d\n", 2^i, size[i])}' | sort -n

         0  11
        32   2
       256   2
       512   1
      2048  78
      4096 983
      8192 2370
     16384 24614
     32768 31862
     65536 61111
    131072 111704
    262144 148639
    524288 160883
   1048576 132321
   2097152 87118
   4194304 51952
   8388608 24728
  16777216 9846
  33554432 4255
  67108864 1883
 134217728 561
 268435456 106
 536870912  45
1073741824  12

dignifiedquire · 2024-02-08T15:28:16Z

closing in favor of ongoing PRs refactoring the store in a more step-by-step fashion

dignifiedquire added 7 commits January 15, 2024 11:07

perf: start tracking iroh-bytes state in redb

54ab3ab

implement more pieces

50629ec

piece by piece

e23da06

compiles

216a0bb

remove unused paths in tables

5bdbe7c

add minimal test

565f767

handle external paths

d3fb73e

rklaehn self-assigned this Jan 16, 2024

rklaehn and others added 10 commits January 17, 2024 09:59

Merge branch 'main' into perf-startup

4d715bd

Implement RedbKey and RedbValue for CompleteEntry and PartialEntryData

55604c8

also implement scanning from disk to db.

remove dead code

8e22925

Handle tags also in redb

fdccf3b

Merge branch 'main' into perf-startup

28aada2

avoid recursive loading and simplify load_outboard

538783a

delete meta file to trigger reloading from disk in tests

64676c8

clippy

6719816

Add fallibility in a few places...

1a93a90

also handle the fallibility properly (no unwrap) in the gc part.

Make the remaining db trait fns fallible.

752232b

rklaehn marked this pull request as ready for review January 19, 2024 15:07

rklaehn changed the title ~~[WIP] improve startup time~~ Store metadata in redb Jan 19, 2024