-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Change zstd compression levels mapping #643
base: master
Are you sure you want to change the base?
Conversation
Since background_compression happens, well, in the background, it's not as important for it to be fast and it makes sense to let the system use spare background cycles getting the maximum possible compression for very infrequently accessed files. As such, being able to access level 22 for this purpose would be nice and I'd appreciate not losing it. If we wanted a piecewise-linear mapping with a slope of 1 at (0,0) and a slope of 2 at (15,22), then they would meet at exactly (8,8), giving one extra point to the lower range. |
I noticed that piecewise-linear mapping while experimenting with it, and I do think that's a good option. The only issue I have with it is that levels 20-22 might want to be considered separate.
Because of that, I think level 19, the maximum non-ultra level, should be included in the mapping. As such, I would propose using the mapping points (0,0) and (14,19), giving an intersection point of (9,9). Then the top level 15 would be mapped to the top ultra level 22. The only possible issue I see is that it adds additional complexity to the mapping. |
d0860cf
to
253d7fb
Compare
This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
We don't want journal write completions to be blocked behind btree transactions - io_complete_wq is used for btree updates after data and metadata writes. Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Previously, any time we failed to get a journal reservation we'd retry, with the journal lock held; but this isn't necessary given wait_event()/wake_up() ordering. This avoids performance cliffs when the journal starts to get backed up and lock contention shoots up. Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
For DT_SUBVOL, we now print both parent and child subvol IDs. Signed-off-by: Kent Overstreet <[email protected]>
Most bcachefs workqueues are used for completions, and should be WQ_HIGHPRI - this helps reduce queuing delays, we want to complete quickly once we can no longer signal backpressure by blocking. Signed-off-by: Kent Overstreet <[email protected]>
Minor renaming for clarity, bit of refactoring. Signed-off-by: Kent Overstreet <[email protected]>
Drop an unnecessary bch2_subvolume_get_snapshot() call, and drop the __ from the name - this is a normal interface. Signed-off-by: Kent Overstreet <[email protected]>
Eliminates some error paths - no longer have a hardcoded BCH_REPLICAS_MAX limit. Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
This gives us a way to record the date and time every journal entry was written - useful for debugging. Signed-off-by: Kent Overstreet <[email protected]>
Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <[email protected]>
Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <[email protected]>
Recently a severe performance regression was discovered, which bisected to a6548c8 bcachefs: Avoid flushing the journal in the discard path It turns out the old behaviour, which issued excessive journal flushes, worked around a performance issue where queueing delays would cause the journal to not be able to write quickly enough and stall. The journal flushes masked the issue because they periodically flushed the device write cache, reducing write latency for non flushes. This patch reworks the journalling code to allow more than one (non-flush) write to be in flight at a time. With this patch, doing 4k random writes and an iodepth of 128, we are now able to hit 560k iops to a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range. Signed-off-by: Kent Overstreet <[email protected]>
we now always have a btree_trans when using a btree_and_journal_iter; prep work for adding prefetching to btree_and_journal_iter Signed-off-by: Kent Overstreet <[email protected]>
btree_and_journal_iter is old code that we want to get rid of, but we're not ready to yet. lack of btree node prefetching is, it turns out, a real performance issue for fsck on spinning rust, so - add it. Signed-off-by: Kent Overstreet <[email protected]>
Files within a subvolume cannot be renamed into another subvolume, but subvolumes themselves were intended to be. This implements subvolume renaming - we need to ensure that there's only a single dirent that points to a subvolume key (not multiple versions in different snapshots), and we need to ensure that dirent.d_parent_subol and inode.bi_parent_subvol are updated. Signed-off-by: Kent Overstreet <[email protected]>
switch the statfs code from something horrible and open coded to the more standard uuid_to_fsid() Signed-off-by: Kent Overstreet <[email protected]>
e4b827e
to
b3a7824
Compare
48c34ec
to
c7e3dd8
Compare
8903f5f
to
bc01863
Compare
f1d736e
to
f22a971
Compare
79c1539
to
f8ed207
Compare
379189f
to
c684780
Compare
I noticed that bcachefs is mapping its compression levels (1-15) to zstd's compression levels (1-22) using
floor(level * 3 / 2)
. This PR proposes to change this mappingto be direct, and use only zstd compression levels 1-15(see later comments about the new mapping). My reasoning for this change is as follows:--fast=N
). Having bcachefs ignore some of the higher compression levels is not a significant deal, considering that it's already ignoring some of the lower compression levels.zstd
, the compression levels 20-22 are not accessible without an additional command-line argument (--ultra
) to tell it you really want to use compression levels that high. Counting those higher compression levels as being equivalent to the others can be misleading, and excluding them shouldn't be a significant change.If keeping access to these higher compression levels is a priority, then perhaps a non-linear mapping would be better suited. One option would be to map the lower levels directly to the zstd levels, and then have the higher levels skip every other compression level (clamped to <=22). This would provide a compromise between providing more low compression levels and maintaining access to the max compression level, all while addressing the above issues. Let me know if you'd rather this solution, and I can update the PR with it.