Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% disk usage without any process on I/O #1287

Closed
brauliobo opened this issue Aug 27, 2023 · 15 comments
Closed

100% disk usage without any process on I/O #1287

brauliobo opened this issue Aug 27, 2023 · 15 comments
Labels
enhancement Extension or improvement to existing feature Linux 🐧 Linux related issues support request This is not a code issue but merely a support request. Please use the mailing list or IRC instead.

Comments

@brauliobo
Copy link

brauliobo commented Aug 27, 2023

Please see the screenshot:
image

Besides htop I/O tab, I'm also using iotop and fatrace -f r/fatrace -f w to try to find where the IO is coming from. Both show almost no relevant IO when idle, also because I'm using anything-sync-daemon to reduce IO in a couple of applications

Is there any other way to find the origin of this persistent 100% IO?

I'm using Archlinux, Seagate's 2tb SSD and BTRFS with zstd compression and snapshots

@natoscott
Copy link
Member

What does the I/O tab show? I'm guessing nothing but still would be good to have a look. Its a very strange pattern where the Disk IO meter shows 100% use (from memory this is calculated using time-based fields from /proc/diskstats) but zero read/write bytes (those two are calculated from different fields based on IO's and bytes read/written) - indicates a very small number of small I/Os are taking a long time to complete (queued? slow device?).

Is CPU 11 showing ~100% I/O wait time or is that gray something else? It might be useful to add the PROCESSOR column (via F2/Setup) and see if that identifies a process running on that CPU? Bit of a long shot but you never know.

fatrace will only show application-level file I/O; you may have something happening at a lower level - try blktrace too.

@brauliobo
Copy link
Author

brauliobo commented Aug 27, 2023

Thanks for the detailed feedback @natoscott.

Below is a screenshot of the IO tab. The grey is indeed the color of IO. Also added the PROCESSOR column, there might be some overlap of IO in processor 11 and kwin_x11 CPU usage, but still uncertain.
image

A weird fact I've forgot to mention that this never happens when running htop with sudo, only with my personal user. Here it is a sample shot for sudo htop:
image

When using blktrace where the root and home partitions are I have the following output:

✗  sudo blktrace -d /dev/nvme0n1  
=== nvme0n1 ===
  CPU  0:                    0 events,        0 KiB data
  CPU  1:                   18 events,        1 KiB data
  CPU  2:                    0 events,        0 KiB data
  CPU  3:                    0 events,        0 KiB data
  CPU  4:                    0 events,        0 KiB data
  CPU  5:                    0 events,        0 KiB data
  CPU  6:                    0 events,        0 KiB data
  CPU  7:                  310 events,       15 KiB data
  CPU  8:                    1 events,        1 KiB data
  CPU  9:                    0 events,        0 KiB data
  CPU 10:                    0 events,        0 KiB data
  CPU 11:                    0 events,        0 KiB data
  Total:                   329 events (dropped 0),       16 KiB data

Although I also have a second NVMe, 3 USB HDDs, and 1 SATA HDD in the same machine. They are mostly inactive as they are there for backup and bigger files storage.

@natoscott
Copy link
Member

Can you run the 'sudo htop' with the same htoprc file as your regular user? (there's no Disk IO meter shown currently)

And is that blktrace output from the same time htop is showing 100% I/O? (and how long was blktrace running there)

@brauliobo
Copy link
Author

With the same htoprc now sudo htop also shows the same busy disk behavior. Here it is the htoprc config file.
image

Here it is a close to 1min new run of blktrace while having 100% disk usage in htop:

✗  sudo blktrace -d /dev/nvme0n1
=== nvme0n1 ===
  CPU  0:                    0 events,        0 KiB data
  CPU  1:                    0 events,        0 KiB data
  CPU  2:                    0 events,        0 KiB data
  CPU  3:                   70 events,        4 KiB data
  CPU  4:                  131 events,        7 KiB data
  CPU  5:                    0 events,        0 KiB data
  CPU  6:                    0 events,        0 KiB data
  CPU  7:                    0 events,        0 KiB data
  CPU  8:                    0 events,        0 KiB data
  CPU  9:                 1021 events,       48 KiB data
  CPU 10:                    0 events,        0 KiB data
  CPU 11:                    1 events,        1 KiB data
  Total:                  1223 events (dropped 0),       58 KiB

I've run in all other block devices and all of them only return 1 event.

@natoscott
Copy link
Member

Interesting! There's two things that spring to mind then:

  • it may be something the htop systemd meter is doing (talking to systemd) that is causing it (somehow, no idea how)
  • I wonder if NFS maybe involved here? (nfsdcld there) ... is it possible there is NFS traffic happening? That may not be visible to blktrace. This is a bit of a stretch though, I'd be more inclined to suspect the first option.

Try switching off the htop systemd Meter next and see if the problem goes away.

@brauliobo
Copy link
Author

brauliobo commented Aug 28, 2023

I remember this issue for a long time, even before the systemd metric was there. I just removed to try but still no change.

NFS is indeed suspect. I stopped nfsd and nfsdcld without any change.
Another thing being used is sshfs. I've trying umounting 3 mount points on 2 other PCs to this machine and still no change.

Now I've finally found the option that disabling partially removes the issue, it is Detailed CPU time (System/IO-Wait/Hard-IRQ/Soft-IRQ/Steal/Guest). Disabling it remove the grey bar from the CPU, but still shows 100% Disk activity:
image

@natoscott
Copy link
Member

I suspect that's just hiding the issue by not showing I/O wait anymore. See if 'iostat 1 10' shows any discard I/O? That's the only other thing I can think of that might explain it.

@BenBE BenBE added question ❔ Further information is requested support request This is not a code issue but merely a support request. Please use the mailing list or IRC instead. Linux 🐧 Linux related issues labels Aug 28, 2023
@brauliobo
Copy link
Author

the 100% disk issue is gone for a while. I'll check iostat once it is back. thank you @natoscott for you kind support! I'll close the issue as it is definitely a question instead of an htop issue

@eworm-de
Copy link
Contributor

I guess this is a duplicate of #1278... Did you install relevant kernel updates?

@natoscott
Copy link
Member

@eworm-de difference here is the Disk IO Meter shows 100% use also (so both CPU metrics and diskstats showing IO time) - whereas in #1278 the Disk IO Meter shows 0%.

@brauliobo
Copy link
Author

brauliobo commented Aug 28, 2023

I guess this is a duplicate of #1278... Did you install relevant kernel updates?

I think so, here it is:

✗  uname -a
Linux bhavapower 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 24 Aug 2023 00:38:14 +0000 x86_64 GNU/Linux

@brauliobo
Copy link
Author

It is likely that discard is the culprit @natoscott, also because I use btrfs which is heavy on it. Below is a iostat report:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,84    0,00    0,84   35,28    0,00   63,04

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
nvme0n1          10,00         0,00         0,00      1544,00          0          0       1544
nvme1n1           0,00         0,00         0,00         0,00          0          0          0
sda             131,00      2096,00         0,00         0,00       2096          0          0
sdb               0,00         0,00         0,00         0,00          0          0          0
sdc             250,00      2000,00      2000,00         0,00       2000       2000          0
sdd               0,00         0,00         0,00         0,00          0          0          0

Maybe htop should show discard activity just after reads and writes?

@BenBE BenBE added enhancement Extension or improvement to existing feature and removed question ❔ Further information is requested labels Aug 28, 2023
@natoscott
Copy link
Member

@brauliobo good idea. The pcp-htop build of htop from #1243 adds a new "Disks" tab which shows the iostat output above, so once that's merged that'll be another way htop could help solve this class of issue.

@brauliobo
Copy link
Author

brauliobo commented Aug 29, 2023

A general advice for others having this issue: it might be an issue with very slow discard on Seagate FireCuda NVMe drives. This is one of the reports: https://bbs.archlinux.org/viewtopic.php?id=264119

A workaround is to enable the fstrim.timer for weekly discards and mount btrfs partitions with the nodiscard option

@brauliobo
Copy link
Author

brauliobo commented Aug 29, 2023

And the definitive fix is to update its firmware which can be downloaded from https://apps1.seagate.com/downloads/certificate.html?action=performDownload&key=203660060052 and with the commands below:
fetch serial number:

udevadm info --query=all --name=/dev/nvme0n1 | grep ID_SERIAL

then load firmware

sudo command\ line\ tools/SeaChest/ubuntu-20.04_x86_64/SeaChest_Firmware_x86_64-linux-gnu -d /dev/nvme0n1 --downloadFW firmware/FireCuda510-STOSC017.bin
sudo command\ line\ tools/SeaChest/ubuntu-20.04_x86_64/SeaChest_Firmware_x86_64-linux-gnu -d /dev/nvme0n1 --activateFW firmware/FireCuda510-STOSC017.bin

unfortunately, fwupd didn't provide any updates for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Extension or improvement to existing feature Linux 🐧 Linux related issues support request This is not a code issue but merely a support request. Please use the mailing list or IRC instead.
Projects
None yet
Development

No branches or pull requests

4 participants