Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm migration fails due to reported available storage being too low #3567

Open
arturshadnik opened this issue Dec 13, 2024 · 1 comment
Open
Labels

Comments

@arturshadnik
Copy link

What happened:
When warm migrating a VM from vSphere using Forklift that has 1x 10GiB disk and 1x 1TiB disk, the migration fails to import the smaller disk during the final disk import stage. The error from the cdi importer pod is
virtual image size 10737418240 is larger than the reported available storage 9428545536. A larger PVC is required. Unable to resize disk image to requested size.

I have filesystem overhead set to 15% for this storage class, and it is being respected by CDI - the DataVolume is created with size = 10GiB, and the corresponding PVC is 12048MiB.

Despite the ~2GiB filesystem overhead, the reported available storage is smaller than the disk being imported.

When migrating the same VM using a 25% overhead, the import succeeds.

What you expected to happen:
15% filesystem overhead should be sufficient to import the 10GiB disk. More broadly, a fixed % overhead should work for all sizes of disk.

How to reproduce it (as minimally and precisely as possible):
DataVolume Spec:

spec:
  checkpoints:
  - current: snapshot-143404
    previous: ""
  - current: snapshot-143408
    previous: 52 78 e6 49 93 9e f7 72-af 8a 71 8e 94 80 2f 8e/1
  finalCheckpoint: true
  source:
    vddk:
      backingFile: 'xxx/test-mig-fedora.vmdk'
      initImageURL: docker.io/arturshadnik/vddk:v8.0.3
      secretRef: test-1tb-warm-vm-139311-hfh84
      thumbprint: xxx
      url: https://<vcenter>/sdk
      uuid: xxx
  storage:
    resources:
      requests:
        storage: 10Gi
    storageClassName: spectro-storage-class

Prime PVC for this DV:

spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: "12632256753"
  storageClassName: spectro-storage-class
  volumeMode: Filesystem
  volumeName: pvc-e37d0553-8efc-45f0-ac95-9cd4c2a27f77

StorageClass:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: spectro-storage-class
parameters:
  fstype: ext4
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Additional context:
I understand that a certain % overhead is needed to import when using Filesystem volumeMode, but I've observed that the % needs to be increased as virtual disk size increases. My understanding is that, a given % should be sufficient across all disk sizes, since the actual overhead bytes would increase proportional to the virtual disk.

Some other examples we've observed:

  • 16% overhead is sufficient for a 80GB VM, not sufficient for a 500GB VM.
  • 20% overhead not sufficient for a 2TB VM.

Perhaps I am missing something, why should % overhead have to increase as disk size increases. Shouldnt the relative nature of a percentage account for this? Any guidance about how these % are used, especially in the context of multistage imports is greatly appreciated 🙏

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml): 1.58.0
  • Kubernetes version (use kubectl version): 1.29.7
  • DV specification: N/A
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Install tools: N/A
  • Others: N/A
@arturshadnik arturshadnik changed the title Warm migration fails due to Warm migration fails due to reported available storage being too low Dec 13, 2024
@akalenyu
Copy link
Collaborator

This sounds like something which is fixed in newer versions:
#3473

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants