About added redundancy of chunk checksum and file path mapping metadata in repos #8534

chocmake · 2024-11-10T12:29:57Z

chocmake
Nov 10, 2024

This became quite lengthy/dense to follow so I've posted it here for discussion rather than an issue since also wasn't sure if I was misunderstanding something with my conclusions. It involves two sets of tests with a small data set, each of which have slightly different behavior.

Edit: like my other discussion these tests were on Borg 1.4.0 (standalone binary) in a Kubuntu VM (Debian Bullseye) on EXT4 filesystem.

It's unclear to what extent (if any) Borg has redundancy for the metadata of the history of valid chunk checksums and also file path mapping (that is, which files are associated to the data in which chunks), which raised some concerns noted at the bottom if one has made their own non-synchronous repo backup copies but needs to repair a primary repo.

I realize Borg doesn't have any redundancy for data per se but that's expected.

Test 1:

I had 500KB of test files (1 PNG + 4 empty TXT files)
These were backed up by Borg in a handful of snapshots. At this point the repo was valid and had 64 archive files in the repo's `data/0` directory.
I intentionally deleted the largest (~460KB) archive file from the repo to test what effect more severe corruption would have (I also tested alternatively emptying the same file via hex with a separate valid repo copy, with the same outcome).
Ran `borg --log-json --verbose check --repair --verify-data `
None of the paths of the associated file paths from the Borg backups appeared in the repair output. Neither did it say it had zeroed any chunk.
Created a new backup via `create`.
Used Borg's `mount` to inspect all the backups.

Result: found only the latest backup contained any files. All other mounted backup directories were entirely empty (not even any nested sub-directory paths from the source data, such as /home/, etc).

It would seem among the 64 archive files in data all info about those five files was seemingly contained solely in that largest single ~460KB archive file. At least, judging from the results.

Then I reverted to an earlier valid copy of the repo before I deleted the archive file for the test above. Then I added 6 extra, different PNG files to the source data (totalling 7.6MB extra filesize).

Note: to get around the Cache is newer than repository - do you have multiple, independently updated repos with same ID? message I deleted the backup repo's hints, index and integrity files and rain --repair on it initially (prior to the steps below). It appears from a issue ticket here that it's a known thing with timestamp differences.

Test 2:

I created two additional Borg backups which contained the extra PNG files from the source. No changes between each backup.
Then I deleted from the repo `data` directory the closest archive file in size to the one I deleted in the prior test (~460KB). It also shared the same filename as the prior test.
Then ran `--repair` but with this new test it *did* report the missing file path in the output and said it had zeroed the chunk.
Ran a new `create` backup.
Ran `--repair` again. It reported healing the previously missing chunk (since valid source data remained in the source directory).

Result: when using mount to check the backups I found the snapshots before adding the extra PNG files were entirely empty (no PNG or TXT files) as they were in the prior test, however all the snapshots after the point I added the new PNG files to the source data did contain the complete set of files despite having similarly deleted what seemed like the equivalent archive file.

I'm not sure what explains this discrepancy as I would have thought that if the original PNG were 'healed' in this latter test that any of the prior snapshots it appeared it would have been fixed. Or if it weren't healed (like the former test) that all snapshots would lack the files.

Either way, what interested me was if Borg could have a way of adding redundancy for metadata specifically, so copies of chunk checksums and file association mappings could be spread across more archive files, to allow more opportunity for --repair healing to work.

From the tests it's unclear how Borg handles this since the second test was able to heal the effect of the removed archive file for some backups but not for others (the entirely empty ones), while the first test with 64 archive files was un-healable after just one was corrupt/missing.

To clarify, the data loss per se isn't the concern just the inability in some cases to heal chunks in snapshots based on data restored to the repo via new snapshots after limited whole archive file corruption.

Since even if one had a secondary, non-synchronous repo copy (either via periodic sync'ing or a separate repo ID based on the same source data like the docs suggest) an issue I'm picturing is if it's not a perfect sync the repo archive files wouldn't be interchangeable should something bad occur to either (or am I mistaken?), which would put the onus on the user to do diff checks across multiple snapshots to determine what was absent (rather than just being able to identify missing files automatically via --repair and restore them easily).

(Apologies for how verbose this ended up being btw!)

ThomasWaldmann · 2024-11-10T13:47:23Z

ThomasWaldmann
Nov 10, 2024
Maintainer

I suggest you read the borg internals docs to get more information about how borg works and also the terminology used. It's quite a bit different from what you think how it works.

What you think is an "archive file" isn't: it is a so-called segment file and it can contain all sorts of chunks: file content chunks, archive metadata stream chunks, manifest chunks.

If you delete such a file, the damage and what borg check --repair can do depends on what was actually in there:

deleted file content chunks can get replaced by all-zero chunks of same length and can get healed again if the correct chunk reappears
deleted archive metadata / metadata stream chunks: borg might try to re-synchronize in the metadata stream and rescue as much of the metadata stream as possible. or it might delete that archive if it is beyond repair.
deleted manifest might get rebuilt by scanning the whole repo for archive metadata.

Also, if an old borg version is involved (< 1.2, always compacting) or if a new borg version ran borg compact, segment files get compacted and stuff is moved from old segment files to new segment files. After that, assumptions about what got stored into which of the segment files can be easily wrong.

Not sure what you mean by "non-synchronous repo backup copies". See our FAQ why we do not recommend making repo copies (e.g. with rsync). But of course, if you do, you need to do it while no borg process is modifying the repo to get a consistent state, see borg with-lock ... command.

Redundancy: a borg repo is a content-addressable key/value store, the key (== address) is H(value) (== H(content)), so it can't store the same value multiple times. That's how the deduplication works and (among other reasons) why you should have multiple backups at different places.

Having multiple repos being able to repair each other (assuming that a chunk got lost and some other repo has a copy of that chunk still): this will be possible in borg2, where the concept of "related repos" exists (making sure chunks get cut in the same way, H is doing the same computation). Even there, that is not yet implemented as a command.

2 replies

chocmake Nov 11, 2024
Author

Not sure what you mean by "non-synchronous repo backup copies"

What I meant was either non-concurrent Borg backups have been made to two destinations (per the docs' recommended method with two different repo IDs) or via a periodic sync of the same repo (eg: one daily while the other weekly). Instead of a synchronous copy (such as concurrently backing up to two repo destinations with the same source data or something like a zfs mirror).

As the problem becomes that if metadata about the chunks is such that it's unhealable (eg: my tests above in different severity) then one can't copy segment files from a known-good repo backup back to the primary repo to repair it as the files may not be equivalent, depending if the issue occurs when the two repo backups are out of sync/non-equivalent.

In such cases hard-to-identity gaps can occur in the snapshots but even if a user were able to determine which files/dirs were affected there may be no way to fix some snapshots (eg: test 1 and partially test 2 above). At which point a user may just restore the out-of-sync secondary repo backup even if it contains slightly outdated data, as it's completely valid at least.

So my OP was wondering if there was any way that the segment metadata about the previously known-good chunk checksums and its associated snapshot file paths would be able to have some kind of parity (as an open question / feature request) so even if one segment was fully corrupted another segment file would contain redundant copies of the metadata at least, so the repo could still heal (should a new snapshot be made that contains the original data).

It sounds from your comment that at least in Borg v1 that isn't possible. Interesting hearing that Borg v2 has some planned form of related repo chunk repair, if that would cover these same scenarios.

ThomasWaldmann Nov 11, 2024
Maintainer

There is already a ticket about adding redundancy, find it and read it, so I don't need to repeat here why this is not done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About added redundancy of chunk checksum and file path mapping metadata in repos #8534

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

About added redundancy of chunk checksum and file path mapping metadata in repos #8534

chocmake Nov 10, 2024

Replies: 1 comment · 2 replies

ThomasWaldmann Nov 10, 2024 Maintainer

chocmake Nov 11, 2024 Author

ThomasWaldmann Nov 11, 2024 Maintainer

chocmake
Nov 10, 2024

Replies: 1 comment 2 replies

ThomasWaldmann
Nov 10, 2024
Maintainer

chocmake Nov 11, 2024
Author

ThomasWaldmann Nov 11, 2024
Maintainer