Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to see the physical space of a disk? #45

Open
Baimax-123 opened this issue Apr 21, 2022 · 10 comments
Open

How to see the physical space of a disk? #45

Baimax-123 opened this issue Apr 21, 2022 · 10 comments
Assignees

Comments

@Baimax-123
Copy link

Baimax-123 commented Apr 21, 2022

I set up VDO on the disk and want to check the actual disk usage when deduplication is turned off and on
What command should I use?
sudo vdostats --hu?
This should only be the size in VDO

@rhawalsh
Copy link
Member

Hi @Baimax-123, yes. Using the vdostats utility will provide you with a df-style output that shows the physical usage of the volume.

Here is some output to show an example.

[root@localhost ~]# vdo create --name vdo0 --device /dev/sda --vdoLogicalSize 1T
Creating VDO vdo0
      The VDO volume can address 12 GB in 6 data slabs, each 2 GB.
      It can grow to address at most 16 TB of physical storage in 8192 slabs.
      If a larger maximum size might be needed, use bigger slabs.
Starting VDO vdo0
Starting compression on VDO vdo0
VDO instance 0 volume is ready at /dev/mapper/vdo0
[root@localhost ~]# mkfs.xfs -K /dev/mapper/vdo0
meta-data=/dev/mapper/vdo0       isize=512    agcount=4, agsize=67108864 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=268435456, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=131072, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@localhost ~]# mkdir /mnt/vdo
[root@localhost ~]# mount /dev/mapper/vdo0 /mnt/vdo
[root@localhost ~]# lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda      7:0    0  15G  0 disk 
└─vdo0 252:0    0   1T  0 vdo  /mnt/vdo
vda    253:0    0  20G  0 disk 
└─vda1 253:1    0  20G  0 part /



# Note that the starting values show 7.2G physical used and 3G used on the
# filesystem.

[root@localhost ~]# df -h /mnt/vdo
Filesystem        Size  Used Avail Use% Mounted on
/dev/mapper/vdo0  1.0T  7.2G 1017G   1% /mnt/vdo
[root@localhost ~]# vdostats --human-readable
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo0         15.0G      3.0G     12.0G  20%           99%


# Write 1G of unique data and see both values increase by 1G.

[root@localhost ~]# dd if=/dev/urandom of=/mnt/vdo/1G-file bs=1M count=1024 oflag=direct status=progress
1072693248 bytes (1.1 GB, 1023 MiB) copied, 24 s, 44.7 MB/s
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 24.0207 s, 44.7 MB/s
[root@localhost ~]# df -h /mnt/vdo
Filesystem        Size  Used Avail Use% Mounted on
/dev/mapper/vdo0  1.0T  8.2G 1016G   1% /mnt/vdo
[root@localhost ~]# vdostats --human-readable
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo0         15.0G      4.0G     11.0G  26%           33%


# Duplicate that data and see only the df (logical used) increase

[root@localhost ~]# cp -a /mnt/vdo/1G-file /mnt/vdo/1G-file-copied
[root@localhost ~]# sync
[root@localhost ~]# df -h /mnt/vdo
Filesystem        Size  Used Avail Use% Mounted on
/dev/mapper/vdo0  1.0T  9.2G 1015G   1% /mnt/vdo
[root@localhost ~]# vdostats --human-readable
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo0         15.0G      4.0G     11.0G  26%           59%

@rhawalsh rhawalsh self-assigned this Apr 22, 2022
@rhawalsh
Copy link
Member

Hi @Baimax-123, What are you specifically trying to find out? Using df, and vdostats are going to give you different metrics than smartctl. If you're just trying to understand usage information (i.e. How full is the device and how much more can I write to it?), then you should be using df, and vdostats for that information. But if you're instead interested in understanding wear leveling or something along those lines, then smartctl would be a way to go about getting that information. I don't think using smartctl to measure usage is going to tell you much other than that a read or write operation has happened on the device.

@Baimax-123
Copy link
Author

Hi @rhawalsh, I have a SSD with compression, which can give the physical usage of the disk.
However, this result is far from that given by df or vdostats .
So maybe there is a gap between the actual physical usage in the standard disk and the above commands?

@rhawalsh
Copy link
Member

Hi @Baimax-123, I did not realize your SSD was doing compression as well. I would suggest that if you want to compare realistic numbers, it's probably better to look at the df, and vdostats outputs without any human readable numbers, since those are going to be rounded to the nearest GiB (or so...).

As you can tell the state of compression and/or deduplication affects the amount of data that actually gets cycled through the device. It will never be 1:1 because of the need to write out metadata, journal information for recovery, etc. Depending on the workload the ratio will vary up or down.

@Baimax-123
Copy link
Author

Hi @rhawalsh ,
thanks, I will try it with your suggestion. And hope find some useful conclusions.
Bast.

@rhawalsh
Copy link
Member

Hi @Baimax-123, Please feel free to ask any questions you might have along the way!

I also intended to mention that inspecting the output of vdostats --verbose may give you some additional clues to the amount of data being written to the underlying storage. Typically you should be able to look at statistics that mention 'write', 'flush', and/or 'fua' to help tie things together.

@Baimax-123
Copy link
Author

Hi, @rhawalsh
Where can I find a detailed explanation of the vdostats -- verbose command output project?
Some data may be useful, but I can't understand its actual meaning
For example: BIOS meta completed write

@Baimax-123
Copy link
Author

Baimax-123 commented May 5, 2022

Hi, @rhawalsh, I use the FIO tool for random writing and iostat to monitor the VDO volume and hard disk at the same time.
The VDO volume only has write BW, but there are both read and write BW in the hard disk (the approximate data is as follows: the read BW is the same as the write BW of the VDO volume, and the write BW is twice the write BW of the VDO volume).
Can you roughly describe the role of this additional bandwidth introduction?
Thanks.

As mentioned in the following table, VDO volumes are built directly on the hard disk.
The FIO write command is:
sudo fio -filename=/dev/mapper/vdo_2 --bs=4k --output write4k.log --direct=1 --iodepth=128 --rw=randwrite --ioengine=libaio --buffer_compress_percentage=54 --buffer_compress_chunk=4096 --offset=0 --size=100% --runtime=50000s --time_based=1 --group_reporting --numjobs=4
vdo_2 is the name.

@Baimax-123 Baimax-123 reopened this May 5, 2022
@rhawalsh
Copy link
Member

rhawalsh commented May 6, 2022

Hi @Baimax-123, I apologize for the delayed response.

To get some information about the output from vdostats --verbose, I'd point you at the RHEL docs, specifically "Table 30.9. vdostats --verbose Output" if the anchor doesn't put you there initially.

The IO for VDO involves doing read-compares when we encounter duplicate data. So if the block comes in, VDO hashes it and sends to UDS for advice, and UDS claims that it is a duplicate and likely at a particular block, the VDO device will then go read that block to make sure that it actually is a duplicate. In the event that it's actually not a duplicate, the VDO device can then write it out as it normally would with a unique block. So it is for reasons like this that you're seeing a bunch of read traffic, despite a purely write workload.

Please keep in mind that my description of the IO pattern is generalized. If you want/need more detail then you could ask for more details and I can try to get someone who is more knowledgeable than I am to provide better information. Of course you're always free to browse the code yourself as well, but that might be more work than its worth.

@Baimax-123
Copy link
Author

Baimax-123 commented May 7, 2022

Thanks, @rhawalsh. I have see that.
Meanwhile, there is another question:
VDO has the functions of Deduplication and Compression. You can know from the function name that the read-compares should exist. But if you turn off Deduplication and Compression, will the read-compares still be exist?

And I will browse the code and hope to learn more about VDO.
The amount of VDO code is still quite large. If I want to see the specific operation from VDO layer to physical storage layer, can you help me point out where to start? I believe it will save a lot of time.
Thanks, again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants