Warm migration fails due to reported available storage being too low #3567

arturshadnik · 2024-12-13T22:54:34Z

What happened:
When warm migrating a VM from vSphere using Forklift that has 1x 10GiB disk and 1x 1TiB disk, the migration fails to import the smaller disk during the final disk import stage. The error from the cdi importer pod is
virtual image size 10737418240 is larger than the reported available storage 9428545536. A larger PVC is required. Unable to resize disk image to requested size.

I have filesystem overhead set to 15% for this storage class, and it is being respected by CDI - the DataVolume is created with size = 10GiB, and the corresponding PVC is 12048MiB.

Despite the ~2GiB filesystem overhead, the reported available storage is smaller than the disk being imported.

When migrating the same VM using a 25% overhead, the import succeeds.

What you expected to happen:
15% filesystem overhead should be sufficient to import the 10GiB disk. More broadly, a fixed % overhead should work for all sizes of disk.

How to reproduce it (as minimally and precisely as possible):
DataVolume Spec:

spec:
  checkpoints:
  - current: snapshot-143404
    previous: ""
  - current: snapshot-143408
    previous: 52 78 e6 49 93 9e f7 72-af 8a 71 8e 94 80 2f 8e/1
  finalCheckpoint: true
  source:
    vddk:
      backingFile: 'xxx/test-mig-fedora.vmdk'
      initImageURL: docker.io/arturshadnik/vddk:v8.0.3
      secretRef: test-1tb-warm-vm-139311-hfh84
      thumbprint: xxx
      url: https://<vcenter>/sdk
      uuid: xxx
  storage:
    resources:
      requests:
        storage: 10Gi
    storageClassName: spectro-storage-class

Prime PVC for this DV:

spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: "12632256753"
  storageClassName: spectro-storage-class
  volumeMode: Filesystem
  volumeName: pvc-e37d0553-8efc-45f0-ac95-9cd4c2a27f77

StorageClass:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: spectro-storage-class
parameters:
  fstype: ext4
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Additional context:
I understand that a certain % overhead is needed to import when using Filesystem volumeMode, but I've observed that the % needs to be increased as virtual disk size increases. My understanding is that, a given % should be sufficient across all disk sizes, since the actual overhead bytes would increase proportional to the virtual disk.

Some other examples we've observed:

16% overhead is sufficient for a 80GB VM, not sufficient for a 500GB VM.
20% overhead not sufficient for a 2TB VM.

Perhaps I am missing something, why should % overhead have to increase as disk size increases. Shouldnt the relative nature of a percentage account for this? Any guidance about how these % are used, especially in the context of multistage imports is greatly appreciated 🙏

Environment:

CDI version (use kubectl get deployments cdi-deployment -o yaml): 1.58.0
Kubernetes version (use kubectl version): 1.29.7
DV specification: N/A
Cloud provider or hardware configuration: N/A
OS (e.g. from /etc/os-release): N/A
Kernel (e.g. uname -a): N/A
Install tools: N/A
Others: N/A

The text was updated successfully, but these errors were encountered:

akalenyu · 2024-12-15T11:39:28Z

This sounds like something which is fixed in newer versions:
#3473

arturshadnik added the kind/bug label Dec 13, 2024

arturshadnik changed the title ~~Warm migration fails due to~~ Warm migration fails due to reported available storage being too low Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warm migration fails due to reported available storage being too low #3567

Warm migration fails due to reported available storage being too low #3567

arturshadnik commented Dec 13, 2024

akalenyu commented Dec 15, 2024

Warm migration fails due to reported available storage being too low #3567

Warm migration fails due to reported available storage being too low #3567

Comments

arturshadnik commented Dec 13, 2024

akalenyu commented Dec 15, 2024