-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tree: fix lba_count size calculation #803
Conversation
src/nvme/tree.c
Outdated
* size is in 512 bytes units and lba_count is in lba_size which are not | ||
* necessarily the same. | ||
*/ | ||
ns->lba_count = size >> (ns->lba_shift - 9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems better to use SECTOR_SHIFT
instead of the hard coded value 9 and also seems more better to use lba_size and SECTOR_SIZE to calculate lba_count
as below as you mentioned on the issue ticket.
ns->lba_count = size * SECTOR_SIZE / ns->lba_size;
or
ns->lba_count = size / (ns->lba_size / SECTOR_SIZE);
lba_count = size / (logical_block_size/512)
By the way seems also the driver implementation itself better to be changed as correctly use the logical block size for NVMe specification to convert to the size
instead of the sector size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernel uses nvme_sec_to_lba
for converting the sector numbers to LBA numbers. I'd rather go with this version and avoid multiplication which can easily overflow.
/*
* Convert a 512B sector number to a device logical block number.
*/
static inline u64 nvme_sect_to_lba(struct nvme_ns_head *head, sector_t sector)
{
return sector >> (head->lba_shift - SECTOR_SHIFT);
}
SECTOR_SIZE
: alright, let me add it, we don't have it defined yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way seems also the driver implementation itself better to be changed as correctly use the logical block size for NVMe specification to convert to the size instead of the sector size.
Not sure if I understand you correctly. Are you suggesting that the sysfs entry size
should report size in bytes and not sector numbers? If so, the sysfs entries are Linux API and thus can't be changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay thanks for the reply to add the SECTOR_SIZE. Sorry about the size
I tried to mention as change the sector size unit itself from 512 bytes to 4,096 bytes as same with the logical_block_size
. By the way is there any API documentation about the sys entry size? Just confirmed some ABI sysfs block desciptions related but seems not described about the block device sector size.
- /sys/block//queue/chunk_sectors
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/stable/sysfs-block#n145 - /sys/block//queue/max_hw_sectors_kb
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/stable/sysfs-block#n395 - /sys/block//queue/physical_block_size
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/stable/sysfs-block#n514
Also just reconfirmed the SECTOR_SHIFT
implementation at first I thought the definition can be changed for the hardware device but the linux kernel should be supported on multiple hardware devices so now I am thinking it is only supported the one size unit as implemented 512 bytes. Is this understanding correct? (But still the block device size seems not described as the 512 bytes unit.)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/blk_types.h#n31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, all sysfs entries are considered API, but not all of them are documented. The size
entries is not documented yet. Feel free to submit a patch.
The block layer supports larger physical block sizes, e.g. 4k bytes, but the core calculates all in
fixed sector size of 512 bytes. So there is no real limitation at this level. What is not supported is
larger block size I/O. There is an active discussion going on https://lore.kernel.org/linux-block/[email protected]/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your advice and information. I will do try to understand them more later.
(Added)
Notes:
- If the larger physical block size supported in future the NVMe max segments size also can be increased since in the past tried to do it as below.
https://lore.kernel.org/linux-nvme/[email protected]/ - The NVMe max segments size increased to 8,192 KB from 4,092 KB before as below.
drivers/nvme/host/pci.c:
- #define NVME_MAX_KB_SZ 4096
+ #define NVME_MAX_KB_SZ 8192
#define NVME_MAX_SEGS 128
The kernel reports the size in 512 byte units, which is might not match with lba_size. Signed-off-by: Daniel Wagner <[email protected]>
The kernel reports the size in 512 byte units, which is might not match with lba_size.
Fixes: linux-nvme/nvme-cli#2260