LXCFS high CPU Usage with high number of CTs #655

ahmedshehata101 · 2024-07-12T17:57:55Z

Required information

Distribution: Debian GNU/Linux 12 (bookworm)
LXCFS version: 5.0.3
The output of
kernel version : "6.5.11-6-pve SMP PREEMPT_DYNAMIC PMX 6.5.11-6 (2023-11-29T08:32Z) x86_64 GNU/Linux"
- cat /proc/1/mounts
  sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
  proc /proc proc rw,relatime 0 0
  udev /dev devtmpfs rw,nosuid,relatime,size=65908900k,nr_inodes=16477225,mode=755,inode64 0 0
  devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
  tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=13188516k,mode=755,inode64 0 0
  /dev/mapper/pve-root / ext4 rw,relatime,errors=remount-ro 0 0
  securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
  tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
  tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 0 0
  tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,inode64 0 0
  cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
  cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd 0 0
  pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
  efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
  bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
  cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
  cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
  cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
  cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
  cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
  cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
  cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
  cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
  cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
  cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0
  cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
  cgroup /sys/fs/cgroup/misc cgroup rw,nosuid,nodev,noexec,relatime,misc 0 0
  systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=62656 0 0
  mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
  tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
  hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
  debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
  fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
  configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0
  ramfs /run/credentials/systemd-sysusers.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
  ramfs /run/credentials/systemd-tmpfiles-setup-dev.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
  ramfs /run/credentials/systemd-sysctl.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
  /dev/sda2 /boot/efi vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro 0 0
  ramfs /run/credentials/systemd-tmpfiles-setup.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
  binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime 0 0
  sunrpc /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
  /dev/fuse /etc/pve fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
  tmpfs /run/user/0 tmpfs rw,nosuid,nodev,relatime,size=13188512k,nr_inodes=3297128,mode=700,inode64 0 0
  /dev/sdb1 /containers ext4 rw,relatime,errors=remount-ro 0 0
  tracefs /sys/kernel/debug/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
  lxcfs /var/lib/lxcfs fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
- ps aux | grep lxcfs
  "root 2475805 630 0.0 757080 20768 ? Ssl Apr07 868518:31 /usr/bin/lxcfs /var/lib/lxcfs"

In case if you need more info about Proxmox version
*pveversion
proxmox-ve: 8.1.0 (running kernel: 6.5.11-6-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.3
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.4
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1

Issue description

I am encountring an issue with my LXC containers as LXCFS takes around 500% of the cpu as a usage and the execution of commands inside the containers is very slow .

I was digging inside one of my lxc container and found that when any command try to access any of the mounted dirs by lxcfs takes time and this leads to make the command be delayed in its execuation .

I am running this node under Proxmox 8.1 and LXC 5.0.2 with 100 running CT .
Host node has 56 CPU , and 128 GB of Memory

If there any needed info please let me know

Thank You

The text was updated successfully, but these errors were encountered:

eebssk1 · 2024-09-10T14:05:46Z

Looks similar to #659
too much proc accesses hog lxcfs a lot and effectly slow down any command in containers.

stgraber · 2024-09-15T21:36:38Z

Similar to #659, LXCFS doesn't really go and read files for fun, anything it does is as a response to an action inside a container.

So it may be interesting to try and track down what frequent read may be going from within the containers and see if there's a way to reduce those.

For those using Incus, one interesting option may be to run a per-container LXCFS instance (instances.lxcfs.per_instance=true) as that will make it impossible for one container to impact others as well as making it much easier to track down what container may be hitting LXCFS just a bit too hard.

ahmedshehata101 · 2024-09-23T12:01:43Z

Thank You @stgraber for explanation , but what do you recommend ? is LXCFS cannot handle too much LXC containers ?
I saw before one of the community was saying that LXCFS will take more cpu usage when number of containers is being increased , but I didn't imagine it will affect the response of the server during running any commands .

in my environment all me containers are same template has some DB and another processes read/write in this DB .

stgraber · 2024-09-23T15:39:25Z

Figure out what the container is doing. It's not normal for a container to spam /proc enough that LXCFS is unable to handle the requests. In 99% of cases, you take care of the problem by fixing whatever defective software is running.
If you can't solve this with 1), then run one LXCFS instance per container instead of a shared one.

ahmedshehata101 · 2024-09-23T16:44:42Z

Regarding the first point , I have 57 processes per container and if I have 100 container per node so I have 5700 processes that LXCFS needs to check them and the changes of their PID dir , if I am understanding this correctly .

For the second point , how can I run this in the proxmox environment ? as you mentioned before that this option in "Incus"

Thank you for your responses .

stgraber · 2024-09-23T18:23:28Z

LXCFS doesn't scan your processes or look at your processes, it responds to specific files being read by your processes. You can easily run 20k+ containers on your system and see 0% CPU usage on LXCFS, just as you can have just 2 containers causing 100% CPU usage on LXCFS.

That's why step 1) is to try to figure out what's going on in your container. A normally behaving process should only read stuff from /proc on startup, then maybe every few seconds if it's a monitoring daemon of some kind. More frequent accesses are usually a bug.

ahmedshehata101 · 2024-09-26T18:03:56Z

Hi @stgraber ,
I was doing some analysis with perf tool and found the below :
This screenshot on the node level

and this screenshot from Container level :

This overhead from lxc-info in container level and lxcfs in node level is a static , it's continuing

Can you explain to me why this is happening ?

Hint , Main LXCFS PID is 64968

Thank You in advance

stgraber added the Incomplete Waiting on more information from reporter label Sep 15, 2024

eebssk1 mentioned this issue Sep 16, 2024

LXCFS slows down Wine in container a lot #659

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LXCFS high CPU Usage with high number of CTs #655

LXCFS high CPU Usage with high number of CTs #655

ahmedshehata101 commented Jul 12, 2024 •

edited

Loading

eebssk1 commented Sep 10, 2024

stgraber commented Sep 15, 2024

ahmedshehata101 commented Sep 23, 2024

stgraber commented Sep 23, 2024

ahmedshehata101 commented Sep 23, 2024

stgraber commented Sep 23, 2024

ahmedshehata101 commented Sep 26, 2024 •

edited

Loading

LXCFS high CPU Usage with high number of CTs #655

LXCFS high CPU Usage with high number of CTs #655

Comments

ahmedshehata101 commented Jul 12, 2024 • edited Loading

Required information

Issue description

eebssk1 commented Sep 10, 2024

stgraber commented Sep 15, 2024

ahmedshehata101 commented Sep 23, 2024

stgraber commented Sep 23, 2024

ahmedshehata101 commented Sep 23, 2024

stgraber commented Sep 23, 2024

ahmedshehata101 commented Sep 26, 2024 • edited Loading

ahmedshehata101 commented Jul 12, 2024 •

edited

Loading

ahmedshehata101 commented Sep 26, 2024 •

edited

Loading