Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

used space not correct in list-osds #547

Open
jeroenmaelbrancke opened this issue Jan 5, 2017 · 6 comments
Open

used space not correct in list-osds #547

jeroenmaelbrancke opened this issue Jan 5, 2017 · 6 comments
Labels

Comments

@jeroenmaelbrancke
Copy link

jeroenmaelbrancke commented Jan 5, 2017

The output from list-osds (used size) is not the same as the space the OS is reported.

Configuration:
1 asd per disk
removed one asd yesterday and added the same asd back to the backend.

output from list-osds: long_id, used space, total space and percentage

ZjsVP7sfvj0Dwv2U5osfpyQDrzomKYWC 5631600669.0 9.99713665024e+11 0.56332136551
zLByrlyezmhmFe4Ho2RgZqLHjzR86R9f 5534102590.0 9.99713665024e+11 0.553568765099
JZiUquESD3AYqAGHgslVhWqZtTX1PHiN 5666999749.0 9.99713665024e+11 0.566862287399
yHEBAetegI73fXNjHCVVJNmtVBnyBpau 5624989129.0 9.99713665024e+11 0.562660022144
8KX4Zw2Q696HJHO29sGAcTtGHSPKQtzP 5571181328.0 9.99713665024e+11 0.557277700897
s0PuO6GyNIv0VPYuK2EYf0WhKAaM2TKj 5562012647.0 9.99713665024e+11 0.556360570191
uQKcnr1dox6JoUm2orCeWXD6Vqv47z9S 5573100399.0 9.99713665024e+11 0.557469662963
BYgu7TpeVDR7kevs6T13E8DbWODbjRHY 5528523509.0 9.99713665024e+11 0.553010697205
RtLc7Nni93W9OBeLV4o7uawz5QfmAUa8 5684668887.0 9.99713665024e+11 0.568629707274
RLwOLtHzoXGcFjJwihXDjejroB99d0wj 5669367816.0 9.99713665024e+11 0.567099161925
A6Rxk2gQbcoEUqwXIjnRQbeGkQJ6BPE5 5677313736.0 9.99713665024e+11 0.56789398151
MLWOhCQTU4Itw2mPXhwRmfX2GWDSEld5 0.0 9.99713665024e+11 0.0

list asd ids on one server:

root@pocwim-ovs03:~# ls /mnt/alba-asd/*
/mnt/alba-asd/HAiOMsDvKwPpzNDD:
A6Rxk2gQbcoEUqwXIjnRQbeGkQJ6BPE5

/mnt/alba-asd/mfQCQXHJq7a2pnAi:
MLWOhCQTU4Itw2mPXhwRmfX2GWDSEld5

/mnt/alba-asd/P47bfokAZDazDrzv:
RtLc7Nni93W9OBeLV4o7uawz5QfmAUa8

/mnt/alba-asd/YdFtwucw4IPoj1kl:
RLwOLtHzoXGcFjJwihXDjejroB99d0wj

diskspace usage on OS:

root@pocwim-ovs03:~# df -h
Filesystem                            Size  Used Avail Use% Mounted on
udev                                   16G     0   16G   0% /dev
tmpfs                                 3.2G  923M  2.3G  29% /run
/dev/sda1                             886G  4.3G  837G   1% /
tmpfs                                  16G   12K   16G   1% /dev/shm
tmpfs                                 5.0M     0  5.0M   0% /run/lock
tmpfs                                  16G     0   16G   0% /sys/fs/cgroup
tmpfs                                 3.2G     0  3.2G   0% /run/user/0
/dev/sdd1                             917G   72M  871G   1% /mnt/hdd1
/dev/sdb1                             110G  432M  104G   1% /mnt/ssd1
/dev/sdc1                             110G  664M  104G   1% /mnt/ssd2
/dev/sdg1                             932G   26G  906G   3% /mnt/alba-asd/HAiOMsDvKwPpzNDD
/dev/sdh1                             932G   30G  902G   4% /mnt/alba-asd/P47bfokAZDazDrzv
/dev/sde1                             932G   25G  907G   3% /mnt/alba-asd/mfQCQXHJq7a2pnAi
/dev/sdf1                             932G   29G  903G   4% /mnt/alba-asd/YdFtwucw4IPoj1kl
70bc02fc-b4de-4435-8b78-3d77ed1ad993   64T     0   64T   0% /mnt/stor3

for example:

MLWOhCQTU4Itw2mPXhwRmfX2GWDSEld5 0.0 9.99713665024e+11 0.0
/dev/sde1                             932G   25G  907G   3% /mnt/alba-asd/mfQCQXHJq7a2pnAi

used space with list_osds = 0.0
used space OS = 25G

Can you please have a look why the used space from list-osds is not correct (for all the osds)?

@jeroenmaelbrancke
Copy link
Author

After restarting the maintenance agents the used space is now correct.
Checked the propagate osd updates in the maintenance agent (default is every 20 sec)

One maintenance agent before the restart the propagate appears every minute instead of every 20 sec.

Jan 05 09:28:32 pocwim-ovs03 alba[52551]: 2017-01-05 09:28:32 190161 +0100 - pocwim-ovs03 - 52551/0 - alba/maintenance - 9453673 - info - propagate 14 osd updates
Jan 05 09:29:31 pocwim-ovs03 alba[52551]: 2017-01-05 09:29:31 739959 +0100 - pocwim-ovs03 - 52551/0 - alba/maintenance - 9461376 - info - propagate 14 osd updates

Trying to verify if the delete-readd asd trigger this issue.

@domsj
Copy link
Contributor

domsj commented Jan 5, 2017

looks like it could be related to / the same issue as #312 & #441

@wimpers wimpers added this to the Gilbert milestone Jan 16, 2017
@toolslive
Copy link
Member

disk_usage is a value that's maintained inside the ASD.
It's the sum of the sizes of the values that are stored inside that ASD.

As an example, this is an excerpt of the statistics of an ASD that was completely filled and then
completely emptied again.

    ...
      "disk_usage": 0.0,
    "capacity": 2365308928.0
  }

If you look at the state of the file system, you see the following:

[shell 16:48:03.226926] du -h /home/romain/workspace/tmp/alba/asd_mnt
16K	/home/romain/workspace/tmp/alba/asd_mnt/lost+found
4.0K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/09
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/08
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/06
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/07
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/02
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/00
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/03
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/05
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/01
12K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00/04
116K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00/00
120K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00/00
124K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00/00
128K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00/00
132K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00/00
136K	/home/romain/workspace/tmp/alba/asd_mnt/blobs/00
140K	/home/romain/workspace/tmp/alba/asd_mnt/blobs
75M	/home/romain/workspace/tmp/alba/asd_mnt/db
79M	/home/romain/workspace/tmp/alba/asd_mnt

There are some residual directories, and there's about 75MB of stuff in the rocksb dir.
most notably:

 71M -rw-r--r-- 1 romain romain 247K Jan 18 16:47 000006.log

the transaction log file uses a preallocated slab.

This explains the difference between list-osds and ls -lRhs.

I'm absolutely clueless why a restart of a maintenance agent would change this.

@domsj
Copy link
Contributor

domsj commented Jan 23, 2017

@toolslive the size of the content of the rocksdb directory can indeed explain a small difference between list-osds and ls -lRhs, however that's not what this ticket is about, because it can't explain:

  • 25GB difference
  • restart of maintenance make this problem disappear

This ticket is about disk usage information not being properly propagated towards the albamgr.
Restart of the maintenance process (which should do this propagation) made the problem go away.
See also the other linked issues.

@wimpers
Copy link

wimpers commented May 30, 2017

@toolslive is this one fixed by the above PR? please close if so.

@toolslive
Copy link
Member

No. above PR fixes something else ( #696 ).

@wimpers wimpers removed this from the G milestone May 30, 2017
@wimpers wimpers added this to the I milestone Jun 15, 2017
@wimpers wimpers modified the milestones: I, J Nov 28, 2017
@wimpers wimpers removed this from the J milestone Mar 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants