-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6.11,6.12] Constant I/O (rebalance) when foreground 2x nvme + background 2x HDD when nvme size >> HDD size #799
Comments
Duplicate of #795 |
@elmystico It shouldn't eat TBW rating of your SSD since only reads are affected (at least in my testing). |
Fair enough @nitinkmr333 - I've made e VM jsut for this, 2x32GiB plus 2x16GiB background on smaller pair and I see constant I/O with writes with no reason and no "pending rebalance amount" is changing whatsoever. |
Hm after upgrading kernel v6.11 -> v6.12 no more I/O with writes but still full IO saturation perhaps duplicate of #795 (as ypu mentioned @nitinkmr333 |
Hm I can see that you've been using v6.11 as well @nitinkmr333 perhaps just reboot put writes to stop? Anyway until you reboot it perhaps stucks with r/w IO not read only |
@elmystico I tested it by creating loopback devices. On kernel 6.11, I noticed that After upgrading to kernel 6.12.2, I noticed that underlying filesystem I also checked real hardware (sd card and hard drive) by creating 2 partitions - foreground and background target on each on them. There were reads but no writes on the filesystem (even on kernel 6.11) after filling background target partitions. Rebooting or remounting these bcachefs drives does not make any difference in my case. I will try your VM setup. |
Yeah I've seen that with bare metal bcachefs as well. It looks like bch-rebalance kthread hides its write io inside different thread or something like that because it could not be seen directly but you can see the write IO done on the drive level. Anyway I think it's ok now to wait for Kent's or other dev reaction we don't know if and what more info is needed to fix |
I tried bcachefs inside cachyos VM (kernel 6.12, kvm/qemu VM), similar to your setup (2 x SSD, 2 x HDD; and filled the background_target completely) and I can only see constant heavy reads on both- host (NixOS, kernel 6.12) and inside cachyos VM, but there no writes (or minimal writes from other services). I guess the heavy "writes" issue has been solved with 6.12 kernel and might only be present in 6.11. |
Adding my comment from duplicate issue #795 It looks like the issue is related to For example, we can create a filesystem with 2 disks (
show-super-
Now, write some data to a folder having
We have enough free space in the
This causes heavy reads on filesystem. |
I experience a similar issue, where I have 3 x 1GiB foreground/promote/metadata targets + 1x4 TiB background_target, and having |
@aviallon you hav background_target only 4TB, but if you write 3TB with 2 replicas you need 6 TB minimum for this target and plus reserved for rebalance ~8%. You should:
I expeareance same behavour and imho bcachefs should handle it wise. I beleve it is another issue. |
@alexminder I was writing 3 Gigabytes, not Terabytes. |
I/O ATE MY FLASH after two weeks or smth fortunatelly those were not expensive at all somehow old pieces
Having two 256 GiB partitions nvme and two 34 GiB HDD together four partitions
(I'm not using this configuration anymore but I've tried it few times from scratch and this was reproducible each time)
kernel v 6.11 (debian testing)
bcachefs format --fs_label=data --replicas=2 --block_size=4k --background_compression=lz4:1 \ --label=dhdd.tosh4310 /dev/sda3 --label=dhdd.tosh21F0 /dev/sdb3 \ --discard \ --label=dnvme.970evo /dev/nvme0n1p4 \ --label=dnvme.960evo /dev/nvme1n1p4 \ --foreground_target=dnvme --background_target=dhdd
filing some data and put live processes and then
`Size: 534 GiB
Used: 124 GiB
Online reserved: 1.96 MiB
Data type Required/total Durability Devices
reserved: 1/2 [] 151 MiB
btree: 1/2 2 [nvme0n1p4 nvme1n1p4] 4.51 GiB
user: 1/2 2 [sda3 sdb3] 63.5 GiB
user: 1/2 2 [sda3 nvme0n1p4] 977 MiB
user: 1/2 2 [sda3 nvme1n1p4] 961 MiB
user: 1/2 2 [sdb3 nvme0n1p4] 968 MiB
user: 1/2 2 [sdb3 nvme1n1p4] 985 MiB
user: 1/2 2 [nvme0n1p4 nvme1n1p4] 52.3 GiB
cached: 1/1 1 [sda3] 440 KiB
cached: 1/1 1 [sdb3] 384 KiB
cached: 1/1 1 [nvme0n1p4] 14.1 GiB
cached: 1/1 1 [nvme1n1p4] 14.1 GiB
Compression:
type compressed uncompressed average extent size
lz4 51.8 GiB 197 GiB 70.5 KiB
incompressible 147 GiB 147 GiB 70.2 KiB
Btree usage:
extents: 1.19 GiB
inodes: 305 MiB
dirents: 107 MiB
xattrs: 389 MiB
alloc: 677 MiB
reflink: 137 MiB
subvolumes: 512 KiB
snapshots: 512 KiB
lru: 22.5 MiB
freespace: 5.00 MiB
need_discard: 1.00 MiB
backpointers: 1.52 GiB
bucket_gens: 11.0 MiB
snapshot_trees: 512 KiB
deleted_inodes: 512 KiB
logged_ops: 1.00 MiB
rebalance_work: 117 MiB
subvolume_children: 512 KiB
accounting: 69.5 MiB
Pending rebalance work:
54.3 GiB
dhdd.tosh21F0 (device 1): sdb3 rw
data buckets fragmented
free: 1.06 GiB 4339
sb: 3.00 MiB 13 252 KiB
journal: 272 MiB 1088
btree: 0 B 0
user: 32.7 GiB 133824 100 KiB
cached: 0 B 0
parity: 0 B 0
stripe: 0 B 0
need_gc_gens: 0 B 0
need_discard: 0 B 0
unstriped: 0 B 0
capacity: 34.0 GiB 139264
dhdd.tosh4310 (device 0): sda3 rw
data buckets fragmented
free: 1.07 GiB 4381
sb: 3.00 MiB 13 252 KiB
journal: 272 MiB 1088
btree: 0 B 0
user: 32.7 GiB 133782 12.0 KiB
cached: 0 B 0
parity: 0 B 0
stripe: 0 B 0
need_gc_gens: 0 B 0
need_discard: 0 B 0
unstriped: 0 B 0
capacity: 34.0 GiB 139264
dnvme.960evo (device 3): nvme1n1p4 rw
data buckets fragmented
free: 196 GiB 802284
sb: 3.00 MiB 13 252 KiB
journal: 2.00 GiB 8192
btree: 2.25 GiB 9237
user: 27.1 GiB 111187 360 KiB
cached: 14.1 GiB 117422 14.5 GiB
parity: 0 B 0
stripe: 0 B 0
need_gc_gens: 0 B 0
need_discard: 60.3 MiB 241
unstriped: 0 B 0
capacity: 256 GiB 1048576
dnvme.970evo (device 2): nvme0n1p4 rw
data buckets fragmented
free: 197 GiB 808055
sb: 3.00 MiB 13 252 KiB
journal: 2.00 GiB 8192
btree: 2.25 GiB 9237
user: 27.1 GiB 111186 92.0 KiB
cached: 14.1 GiB 110583 12.9 GiB
parity: 0 B 0
stripe: 0 B 0
need_gc_gens: 0 B 0
need_discard: 328 MiB 1310
unstriped: 0 B 0
capacity: 256 GiB 1048576`
look at pending rebalance amount
`Device: (unknown device)
External UUID: e9807c87-b09b-4cde-8065-4a475de5e2cb
Internal UUID: 16fc0099-7df6-4ea3-9f4e-49cfc10034c9
Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index: 1
Label: data
Version: 1.12: rebalance_work_acct_fix
Version upgrade complete: 1.12: rebalance_work_acct_fix
Oldest version on disk: 1.12: rebalance_work_acct_fix
Created: Fri Nov 15 17:10:58 2024
Sequence number: 75
Time of last write: Sun Dec 1 00:31:50 2024
Superblock size: 5.38 KiB/1.00 MiB
Clean: 0
Devices: 4
Sections: members_v1,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
Features: lz4,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Options:
block_size: 4.00 KiB
btree_node_size: 256 KiB
errors: continue [fix_safe] panic ro
metadata_replicas: 2
data_replicas: 2
metadata_replicas_required: 1
data_replicas_required: 1
encoded_extent_max: 64.0 KiB
metadata_checksum: none [crc32c] crc64 xxhash
data_checksum: none [crc32c] crc64 xxhash
compression: none
background_compression: lz4:1
str_hash: crc32c crc64 [siphash]
metadata_target: none
foreground_target: dnvme
background_target: dhdd
promote_target: none
erasure_code: 0
inodes_32bit: 1
shard_inode_numbers: 1
inodes_use_key_cache: 1
gc_reserve_percent: 8
gc_reserve_bytes: 0 B
root_reserve_percent: 0
wide_macs: 0
promote_whole_extents: 1
acl: 1
usrquota: 0
grpquota: 0
prjquota: 0
journal_flush_delay: 1000
journal_flush_disabled: 0
journal_reclaim_delay: 100
journal_transaction_names: 1
allocator_stuck_timeout: 30
version_upgrade: [compatible] incompatible none
nocow: 0
members_v2 (size 592):
Device: 0
Label: tosh4310 (1)
UUID: a04ae694-690c-49fa-999d-c35db9e55b9f
Size: 34.0 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 256 KiB
First bucket: 0
Buckets: 139264
Last mount: Sun Dec 1 00:30:28 2024
Last superblock write: 75
State: rw
Data allowed: journal,btree,user
Has data: journal,user,cached
Btree allocated bitmap blocksize: 1.00 B
Btree allocated bitmap: 0000000000000000000000000000000000000000000000000000000000000000
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 1
Label: tosh21F0 (2)
UUID: 08632210-3ddf-4290-971d-17bb26f979e4
Size: 34.0 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 256 KiB
First bucket: 0
Buckets: 139264
Last mount: Sun Dec 1 00:30:28 2024
Last superblock write: 75
State: rw
Data allowed: journal,btree,user
Has data: journal,user,cached
Btree allocated bitmap blocksize: 1.00 B
Btree allocated bitmap: 0000000000000000000000000000000000000000000000000000000000000000
Durability: 1
Discard: 0
Freespace initialized: 1
Device: 2
Label: 970evo (4)
UUID: b94e6dd2-e553-4e03-b6d6-7e39c799267b
Size: 256 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 256 KiB
First bucket: 0
Buckets: 1048576
Last mount: Sun Dec 1 00:30:28 2024
Last superblock write: 75
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Btree allocated bitmap blocksize: 8.00 MiB
Btree allocated bitmap: 0000000010000001100000000000000000000000000000001110010100000101
Durability: 1
Discard: 1
Freespace initialized: 1
Device: 3
Label: 960evo (5)
UUID: b03e2746-b6f5-4474-b692-f5fb70ac0662
Size: 256 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 256 KiB
First bucket: 0
Buckets: 1048576
Last mount: Sun Dec 1 00:30:28 2024
Last superblock write: 75
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Btree allocated bitmap blocksize: 8.00 MiB
Btree allocated bitmap: 0000000000000000100000000000000000000000000000000110010100000101
Durability: 1
Discard: 1
Freespace initialized: 1
errors (size 24):
accounting_mismatch 20 Sun Dec 1 00:30:49 2024`
The text was updated successfully, but these errors were encountered: