Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clone dv, and tmp-pvc lost #3470

Open
wywself opened this issue Oct 23, 2024 · 11 comments
Open

clone dv, and tmp-pvc lost #3470

wywself opened this issue Oct 23, 2024 · 11 comments
Labels

Comments

@wywself
Copy link

wywself commented Oct 23, 2024

What happened:
A clear and concise description of what the bug is.

  1. create new namespace: wyw-test-dv
  2. clone a dv from a pvc of another namespace
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: DV-NAME
  namespace: wyw-test-dv
spec:
  pvc:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 50Gi
    storageClassName: ceph-block
    volumeMode: Block
  source:
    pvc:
      name: img-uvt3cv6k-ceph-block
      namespace: default
  1. clone 50 dvs for concurrency
#! /bin/bash

count=0
while [ $count -lt 50 ]
do
count=`expr $count + 1`
length=5
random_string=$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w $length | head -n 1)
echo $random_string

sed  "s/DV-NAME/$random_string/g" /root/batch-create/batch-dv.yaml > /root/batch-create/batch-dv-1.yaml

kubectl apply -f /root/batch-create/batch-dv-1.yaml

done
  1. one dv are RebindInProgress
# kubectl get dv -n wyw-test-dv
lk4w5   RebindInProgress   N/A                   31m
# kubectl get pvc -A | grep -i lost
default                 tmp-pvc-5ce2f242-2cee-417e-8006-1353c7d1e478                              Lost      pvc-cf2fe6e7-7911-4889-a9e0-887d824e88fc   0                         ceph-block     32m
  1. tmp pvc in lost:
# kubectl get pvc tmp-pvc-5ce2f242-2cee-417e-8006-1353c7d1e478  -o yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/clonePhase: Pending
    cdi.kubevirt.io/cloneType: csi-clone
    cdi.kubevirt.io/dataSourceNamespace: default
    cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3Mjk2NjU0NzYsImlhdCI6MTcyOTY2NTE3NiwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctdXZ0M2N2NmstY2VwaC1ibG9jayIsIm5hbWVzcGFjZSI6ImRlZmF1bHQiLCJuYmYiOjE3Mjk2NjUxNzYsIm9wZXJhdGlvbiI6IkNsb25lIiwicGFyYW1zIjp7InRhcmdldE5hbWUiOiJsazR3NSIsInRhcmdldE5hbWVzcGFjZSI6Ind5dy10ZXN0LWR2In0sInJlc291cmNlIjp7Imdyb3VwIjoiIiwicmVzb3VyY2UiOiJwZXJzaXN0ZW50dm9sdW1lY2xhaW1zIiwidmVyc2lvbiI6InYxIn19.gBjwqBJFGX5L3Oua2MSkqMGZCxD9mVHgl91snaegN_b-xOYw2AdcLtRLkBxZRjQltqecC1LG-1nxafetyayzzBOGpoltQblmrUFc9ziQUJZDwbzfVtOxQVq38ouihOnbrpsxqqVXdnwFtEslfxItu2OJx6H5d55AsNoYSHyL-RoA79Z_MYxlCrXSG3fMpYh2gQiS5EV6rocO-2K2Q_ggCjvJVixkwx3DpxLUSEqkHJ556Pk6vVtPihfY3-hwNxznqx9mlhsNr3avzgk7CzY20h4PVp07e7XXHwMQpl90j1077crFZtA_voZ42IvOblbDABA2SJR6AtVPI3pVjUO2gA
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.populator.kind: VolumeCloneSource
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    cdi.kubevirt.io/storage.usePopulator: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"lk4w5","namespace":"wyw-test-dv"},"spec":{"pvc":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"50Gi"}},"storageClassName":"ceph-block","volumeMode":"Block"},"source":{"pvc":{"name":"img-uvt3cv6k-ceph-block","namespace":"default"}}}}
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
  creationTimestamp: "2024-10-23T06:33:02Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    cdi.kubevirt.io/OwnedByUID: 5ce2f242-2cee-417e-8006-1353c7d1e478
  name: tmp-pvc-5ce2f242-2cee-417e-8006-1353c7d1e478
  namespace: default
  resourceVersion: "34759260"
  uid: cf2fe6e7-7911-4889-a9e0-887d824e88fc
spec:
  accessModes:
  - ReadWriteOnce
  dataSource:
    apiGroup: null
    kind: PersistentVolumeClaim
    name: img-uvt3cv6k-ceph-block
  dataSourceRef:
    apiGroup: null
    kind: PersistentVolumeClaim
    name: img-uvt3cv6k-ceph-block
  resources:
    requests:
      storage: 50Gi
  storageClassName: ceph-block
  volumeMode: Block
  volumeName: pvc-cf2fe6e7-7911-4889-a9e0-887d824e88fc
status:
  phase: Lost
  1. tmp-pvc request to bind pv :
# kubectl get pv pvc-cf2fe6e7-7911-4889-a9e0-887d824e88fc 
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
pvc-cf2fe6e7-7911-4889-a9e0-887d824e88fc   40Gi       RWO            Retain           Bound    wyw-test-dv/lk4w5   ceph-block              34m

# kubectl get pv pvc-cf2fe6e7-7911-4889-a9e0-887d824e88fc -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-rbd-provisioner
    volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph
  creationTimestamp: "2024-10-23T06:33:19Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-cf2fe6e7-7911-4889-a9e0-887d824e88fc
  resourceVersion: "34759226"
  uid: ffd38b82-01e6-422c-9fd5-ff96333549a8
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 40Gi
  claimRef:
    name: lk4w5
    namespace: wyw-test-dv
    resourceVersion: "34757380"
    uid: 5ce2f242-2cee-417e-8006-1353c7d1e478
  csi:
    controllerExpandSecretRef:
      name: rook-csi-rbd-provisioner
      namespace: rook-ceph
    driver: rook-ceph.rbd.csi.ceph.com
    nodeStageSecretRef:
      name: rook-csi-rbd-node
      namespace: rook-ceph
    volumeAttributes:
      clusterID: rook-ceph
      imageFeatures: layering
      imageFormat: "2"
      imageName: csi-vol-e50f670b-0475-47dc-b606-94fea18bf53b
      journalPool: ceph-blockpool
      pool: ceph-blockpool
      storage.kubernetes.io/csiProvisionerIdentity: 1726927168699-730-rook-ceph.rbd.csi.ceph.com
    volumeHandle: 0001-0009-rook-ceph-0000000000000001-e50f670b-0475-47dc-b606-94fea18bf53b
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ceph-block
  volumeMode: Block
status:
  phase: Bound

What you expected to happen:
all dvs are success.

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml): 1.58.1
  • Kubernetes version (use kubectl version): 1.27.6
@wywself
Copy link
Author

wywself commented Oct 29, 2024

I upgraded cdi to version 1.60.3 and encountered the same problem. Who can help me? Thank you.

# kubectl get dv -n wyw-test-dv  | grep -v Succeeded
NAME    PHASE                PROGRESS   RESTARTS   AGE
1c7ld   CSICloneInProgress   N/A                   4m39s

# kubectl get pvc | grep tmp
tmp-pvc-1577eabb-fad2-4728-9f0f-bedec895517c   Lost      pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3   0                         ceph-block     4m43s

# kubectl describe dv -n wyw-test-dv 1c7ld
Name:         1c7ld
Namespace:    wyw-test-dv
Labels:       <none>
Annotations:  cdi.kubevirt.io/cloneType: csi-clone
              cdi.kubevirt.io/storage.clone.token:
                eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MzAyODE2MzksImlhdCI6MTczMDI4MTMzOSwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctZjlqc242ZXMtY2VwaC1ibG9ja...
              cdi.kubevirt.io/storage.extended.clone.token:
                eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjIwNDU2NDEzNDIsImlhdCI6MTczMDI4MTM0MiwiaXNzIjoiY2RpLWRlcGxveW1lbnQiLCJuYW1lIjoiaW1nLWY5anNuNmVzLWNlcGgtYmxvY...
              cdi.kubevirt.io/storage.usePopulator: true
API Version:  cdi.kubevirt.io/v1beta1
Kind:         DataVolume
Metadata:
  Creation Timestamp:  2024-10-30T09:42:19Z
  Finalizers:
    cdi.kubevirt.io/dataVolumeFinalizer
  Generation:        1
  Resource Version:  3764339
  UID:               f05e31f0-8fee-487b-930d-036fdda35778
Spec:
  Pvc:
    Access Modes:
      ReadWriteOnce
    Resources:
      Requests:
        Storage:         50Gi
    Storage Class Name:  ceph-block
    Volume Mode:         Block
  Source:
    Pvc:
      Name:       img-f9jsn6es-ceph-block
      Namespace:  default
Status:
  Claim Name:  1c7ld
  Conditions:
    Last Heartbeat Time:   2024-10-30T09:42:27Z
    Last Transition Time:  2024-10-30T09:42:27Z
    Message:               PVC 1c7ld Pending
    Reason:                Pending
    Status:                False
    Type:                  Bound
    Last Heartbeat Time:   2024-10-30T09:42:53Z
    Last Transition Time:  2024-10-30T09:42:27Z
    Reason:                TransferRunning
    Status:                False
    Type:                  Ready
    Last Heartbeat Time:   2024-10-30T09:42:29Z
    Last Transition Time:  2024-10-30T09:42:29Z
    Reason:                Populator is running
    Status:                True
    Type:                  Running
  Phase:                   CSICloneInProgress
  Progress:                N/A
Events:
  Type    Reason               Age                    From                             Message
  ----    ------               ----                   ----                             -------
  Normal  CloneScheduled       5m                     datavolume-pvc-clone-controller  Cloning from default/img-f9jsn6es-ceph-block into wyw-test-dv/1c7ld scheduled
  Normal  Pending              5m                     datavolume-pvc-clone-controller  PVC 1c7ld Pending
  Normal  PrepClaimInProgress  4m54s                  datavolume-pvc-clone-controller  Prepping PersistentVolumeClaim for DataVolume wyw-test-dv/1c7ld
  Normal  RebindInProgress     4m42s                  datavolume-pvc-clone-controller  Rebinding PersistentVolumeClaim for DataVolume wyw-test-dv/1c7ld
  Normal  CSICloneInProgress   4m34s (x2 over 4m58s)  datavolume-pvc-clone-controller  CSI Volume clone in progress (for pvc default/img-f9jsn6es-ceph-block)

# kubectl get pvc -n wyw-test-dv 1c7ld -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/clonePhase: CSIClone
    cdi.kubevirt.io/cloneType: csi-clone
    cdi.kubevirt.io/createdForDataVolume: f05e31f0-8fee-487b-930d-036fdda35778
    cdi.kubevirt.io/dataSourceNamespace: default
    cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MzAyODE2MzksImlhdCI6MTczMDI4MTMzOSwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctZjlqc242ZXMtY2VwaC1ibG9jayIsIm5hbWVzcGFjZSI6ImRlZmF1bHQiLCJuYmYiOjE3MzAyODEzMzksIm9wZXJhdGlvbiI6IkNsb25lIiwicGFyYW1zIjp7InRhcmdldE5hbWUiOiIxYzdsZCIsInRhcmdldE5hbWVzcGFjZSI6Ind5dy10ZXN0LWR2In0sInJlc291cmNlIjp7Imdyb3VwIjoiIiwicmVzb3VyY2UiOiJwZXJzaXN0ZW50dm9sdW1lY2xhaW1zIiwidmVyc2lvbiI6InYxIn19.G8ed3fmjdTnUJG2jAoKabaZ6DuQ1fumBeTo7zcZRn7xqpxoljYwxkGmTHEoqzrqb0qTih1YmE9rfIipx9TFDs09KtpSISkcCg_KL6is844r7XSmC49QQYsz0RYKjwi0tw2fwGRyfK3eAZeWGSKnYf3omRS2gFpL16BotR6QV3olKExDIg--qKCjsWVWNVoLqzuUPvOx9yFGTo8Qp6ICtlE7FAkN7NKhs5hbTMEVaapKfFLK2niw7AgGxTbAKhlkhwJwyl4dIwwateKP_lTtaaa5F44DYxvNHhu99WCuShwObRhGk6sMPOrU7WKMhcessdHyGe_G_L11sJnwIaLJY7g
    cdi.kubevirt.io/storage.condition.running: "true"
    cdi.kubevirt.io/storage.condition.running.message: ""
    cdi.kubevirt.io/storage.condition.running.reason: Populator is running
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.extended.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjIwNDU2NDEzNDgsImlhdCI6MTczMDI4MTM0OCwiaXNzIjoiY2RpLWRlcGxveW1lbnQiLCJuYW1lIjoiaW1nLWY5anNuNmVzLWNlcGgtYmxvY2siLCJuYW1lc3BhY2UiOiJkZWZhdWx0IiwibmJmIjoxNzMwMjgxMzQ4LCJvcGVyYXRpb24iOiJDbG9uZSIsInBhcmFtcyI6eyJ0YXJnZXROYW1lIjoiMWM3bGQiLCJ0YXJnZXROYW1lc3BhY2UiOiJ3eXctdGVzdC1kdiIsInVpZCI6IjE1NzdlYWJiLWZhZDItNDcyOC05ZjBmLWJlZGVjODk1NTE3YyJ9LCJyZXNvdXJjZSI6eyJncm91cCI6IiIsInJlc291cmNlIjoicGVyc2lzdGVudHZvbHVtZWNsYWltcyIsInZlcnNpb24iOiJ2MSJ9fQ.D6eq2K1PkvFIZV4VQecnEwojiGu9E8PUa74_K4mq7smrTet4zuwDXHMXhsH4Y2eycfv_xzIpXo5wNB8vJw0uMhL9N6kNqb1NmGNB0vVmIjMCMDLHCSYbmXlHg8dWiBZp4V9suhui_3Wp8D7ZSdv0bim1U_usPAP853wT9DR5wlJXa9XpKtduKSTvaAR9lVFybaIG9Aa0OTiLHNKzIaaReOlBotSMbfSUdbppIbotooczzlUmbM-YZIJGeJ_eg4ZsAic6qD2r5EIhOSaw9lRl0iwPRU2vUjSGTXNC3NGqSPfbAf89UxPhHK-v90wM8rTmlFaNUM8h1fjZXP_EqzfX_g
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    cdi.kubevirt.io/storage.usePopulator: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"1c7ld","namespace":"wyw-test-dv"},"spec":{"pvc":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"50Gi"}},"storageClassName":"ceph-block","volumeMode":"Block"},"source":{"pvc":{"name":"img-f9jsn6es-ceph-block","namespace":"default"}}}}
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
  creationTimestamp: "2024-10-30T09:42:27Z"
  finalizers:
  - kubernetes.io/pvc-protection
  - cdi.kubevirt.io/clonePopulator
  labels:
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
  name: 1c7ld
  namespace: wyw-test-dv
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: 1c7ld
    uid: f05e31f0-8fee-487b-930d-036fdda35778
  resourceVersion: "3764335"
  uid: 1577eabb-fad2-4728-9f0f-bedec895517c
spec:
  accessModes:
  - ReadWriteOnce
  dataSource:
    apiGroup: cdi.kubevirt.io
    kind: VolumeCloneSource
    name: volume-clone-source-f05e31f0-8fee-487b-930d-036fdda35778
  dataSourceRef:
    apiGroup: cdi.kubevirt.io
    kind: VolumeCloneSource
    name: volume-clone-source-f05e31f0-8fee-487b-930d-036fdda35778
  resources:
    requests:
      storage: 50Gi
  storageClassName: ceph-block
  volumeMode: Block
status:
  phase: Pending

# kubectl get pvc tmp-pvc-1577eabb-fad2-4728-9f0f-bedec895517c
NAME                                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
tmp-pvc-1577eabb-fad2-4728-9f0f-bedec895517c   Lost     pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3   0                         ceph-block     7m46s
[root@rongqi-node01 batch-create]# kubectl get pv pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3   40Gi       RWO            Retain           Bound    wyw-test-dv/1c7ld   ceph-block              44m

@awels
Copy link
Member

awels commented Oct 29, 2024

@mhenriks Can you take a look?

@mhenriks
Copy link
Member

Getting similar vibes as this: #3259

Please confirm that ceph configuration is correct, StorageClass/cephblockpools/etc

@wywself
Copy link
Author

wywself commented Oct 29, 2024

@mhenriks thank you.
Here are my configuration, before deleting the dv, I will change the pv's persistentVolumeReclaimPolicy to Delete.Is there something wrong with these configurations?

# kubectl get sc ceph-block -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: os-rook-ceph-cluster
    meta.helm.sh/release-namespace: default
    storageclass.kubernetes.io/is-default-class: "false"
  creationTimestamp: "2024-10-25T16:28:07Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: ceph-block
  resourceVersion: "2306"
  uid: e59a9040-8ec1-402b-bd84-10e4694cb48b
parameters:
  clusterID: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  imageFeatures: layering
  imageFormat: "2"
  pool: ceph-blockpool
provisioner: rook-ceph.rbd.csi.ceph.com
reclaimPolicy: Retain
volumeBindingMode: Immediate

# kubectl get  cephblockpools -n rook-ceph
NAME             PHASE
ceph-blockpool   Ready

@mhenriks
Copy link
Member

I would make sure that you don't have a bunch of retained PVs that are unneeded and taking up space. Could be causing issues on the backend. Delete reclaimPolicy may keep that from happening

@wywself
Copy link
Author

wywself commented Oct 30, 2024

I upgraded cdi to version 1.60.3 and encountered the same problem. Who can help me? Thank you.

# kubectl get dv -n wyw-test-dv  | grep -v Succeeded
NAME    PHASE                PROGRESS   RESTARTS   AGE
1c7ld   CSICloneInProgress   N/A                   4m39s

# kubectl get pvc | grep tmp
tmp-pvc-1577eabb-fad2-4728-9f0f-bedec895517c   Lost      pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3   0                         ceph-block     4m43s

# kubectl describe dv -n wyw-test-dv 1c7ld
Name:         1c7ld
Namespace:    wyw-test-dv
Labels:       <none>
Annotations:  cdi.kubevirt.io/cloneType: csi-clone
              cdi.kubevirt.io/storage.clone.token:
                eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MzAyODE2MzksImlhdCI6MTczMDI4MTMzOSwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctZjlqc242ZXMtY2VwaC1ibG9ja...
              cdi.kubevirt.io/storage.extended.clone.token:
                eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjIwNDU2NDEzNDIsImlhdCI6MTczMDI4MTM0MiwiaXNzIjoiY2RpLWRlcGxveW1lbnQiLCJuYW1lIjoiaW1nLWY5anNuNmVzLWNlcGgtYmxvY...
              cdi.kubevirt.io/storage.usePopulator: true
API Version:  cdi.kubevirt.io/v1beta1
Kind:         DataVolume
Metadata:
  Creation Timestamp:  2024-10-30T09:42:19Z
  Finalizers:
    cdi.kubevirt.io/dataVolumeFinalizer
  Generation:        1
  Resource Version:  3764339
  UID:               f05e31f0-8fee-487b-930d-036fdda35778
Spec:
  Pvc:
    Access Modes:
      ReadWriteOnce
    Resources:
      Requests:
        Storage:         50Gi
    Storage Class Name:  ceph-block
    Volume Mode:         Block
  Source:
    Pvc:
      Name:       img-f9jsn6es-ceph-block
      Namespace:  default
Status:
  Claim Name:  1c7ld
  Conditions:
    Last Heartbeat Time:   2024-10-30T09:42:27Z
    Last Transition Time:  2024-10-30T09:42:27Z
    Message:               PVC 1c7ld Pending
    Reason:                Pending
    Status:                False
    Type:                  Bound
    Last Heartbeat Time:   2024-10-30T09:42:53Z
    Last Transition Time:  2024-10-30T09:42:27Z
    Reason:                TransferRunning
    Status:                False
    Type:                  Ready
    Last Heartbeat Time:   2024-10-30T09:42:29Z
    Last Transition Time:  2024-10-30T09:42:29Z
    Reason:                Populator is running
    Status:                True
    Type:                  Running
  Phase:                   CSICloneInProgress
  Progress:                N/A
Events:
  Type    Reason               Age                    From                             Message
  ----    ------               ----                   ----                             -------
  Normal  CloneScheduled       5m                     datavolume-pvc-clone-controller  Cloning from default/img-f9jsn6es-ceph-block into wyw-test-dv/1c7ld scheduled
  Normal  Pending              5m                     datavolume-pvc-clone-controller  PVC 1c7ld Pending
  Normal  PrepClaimInProgress  4m54s                  datavolume-pvc-clone-controller  Prepping PersistentVolumeClaim for DataVolume wyw-test-dv/1c7ld
  Normal  RebindInProgress     4m42s                  datavolume-pvc-clone-controller  Rebinding PersistentVolumeClaim for DataVolume wyw-test-dv/1c7ld
  Normal  CSICloneInProgress   4m34s (x2 over 4m58s)  datavolume-pvc-clone-controller  CSI Volume clone in progress (for pvc default/img-f9jsn6es-ceph-block)

# kubectl get pvc -n wyw-test-dv 1c7ld -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/clonePhase: CSIClone
    cdi.kubevirt.io/cloneType: csi-clone
    cdi.kubevirt.io/createdForDataVolume: f05e31f0-8fee-487b-930d-036fdda35778
    cdi.kubevirt.io/dataSourceNamespace: default
    cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MzAyODE2MzksImlhdCI6MTczMDI4MTMzOSwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctZjlqc242ZXMtY2VwaC1ibG9jayIsIm5hbWVzcGFjZSI6ImRlZmF1bHQiLCJuYmYiOjE3MzAyODEzMzksIm9wZXJhdGlvbiI6IkNsb25lIiwicGFyYW1zIjp7InRhcmdldE5hbWUiOiIxYzdsZCIsInRhcmdldE5hbWVzcGFjZSI6Ind5dy10ZXN0LWR2In0sInJlc291cmNlIjp7Imdyb3VwIjoiIiwicmVzb3VyY2UiOiJwZXJzaXN0ZW50dm9sdW1lY2xhaW1zIiwidmVyc2lvbiI6InYxIn19.G8ed3fmjdTnUJG2jAoKabaZ6DuQ1fumBeTo7zcZRn7xqpxoljYwxkGmTHEoqzrqb0qTih1YmE9rfIipx9TFDs09KtpSISkcCg_KL6is844r7XSmC49QQYsz0RYKjwi0tw2fwGRyfK3eAZeWGSKnYf3omRS2gFpL16BotR6QV3olKExDIg--qKCjsWVWNVoLqzuUPvOx9yFGTo8Qp6ICtlE7FAkN7NKhs5hbTMEVaapKfFLK2niw7AgGxTbAKhlkhwJwyl4dIwwateKP_lTtaaa5F44DYxvNHhu99WCuShwObRhGk6sMPOrU7WKMhcessdHyGe_G_L11sJnwIaLJY7g
    cdi.kubevirt.io/storage.condition.running: "true"
    cdi.kubevirt.io/storage.condition.running.message: ""
    cdi.kubevirt.io/storage.condition.running.reason: Populator is running
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.extended.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjIwNDU2NDEzNDgsImlhdCI6MTczMDI4MTM0OCwiaXNzIjoiY2RpLWRlcGxveW1lbnQiLCJuYW1lIjoiaW1nLWY5anNuNmVzLWNlcGgtYmxvY2siLCJuYW1lc3BhY2UiOiJkZWZhdWx0IiwibmJmIjoxNzMwMjgxMzQ4LCJvcGVyYXRpb24iOiJDbG9uZSIsInBhcmFtcyI6eyJ0YXJnZXROYW1lIjoiMWM3bGQiLCJ0YXJnZXROYW1lc3BhY2UiOiJ3eXctdGVzdC1kdiIsInVpZCI6IjE1NzdlYWJiLWZhZDItNDcyOC05ZjBmLWJlZGVjODk1NTE3YyJ9LCJyZXNvdXJjZSI6eyJncm91cCI6IiIsInJlc291cmNlIjoicGVyc2lzdGVudHZvbHVtZWNsYWltcyIsInZlcnNpb24iOiJ2MSJ9fQ.D6eq2K1PkvFIZV4VQecnEwojiGu9E8PUa74_K4mq7smrTet4zuwDXHMXhsH4Y2eycfv_xzIpXo5wNB8vJw0uMhL9N6kNqb1NmGNB0vVmIjMCMDLHCSYbmXlHg8dWiBZp4V9suhui_3Wp8D7ZSdv0bim1U_usPAP853wT9DR5wlJXa9XpKtduKSTvaAR9lVFybaIG9Aa0OTiLHNKzIaaReOlBotSMbfSUdbppIbotooczzlUmbM-YZIJGeJ_eg4ZsAic6qD2r5EIhOSaw9lRl0iwPRU2vUjSGTXNC3NGqSPfbAf89UxPhHK-v90wM8rTmlFaNUM8h1fjZXP_EqzfX_g
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    cdi.kubevirt.io/storage.usePopulator: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"1c7ld","namespace":"wyw-test-dv"},"spec":{"pvc":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"50Gi"}},"storageClassName":"ceph-block","volumeMode":"Block"},"source":{"pvc":{"name":"img-f9jsn6es-ceph-block","namespace":"default"}}}}
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
  creationTimestamp: "2024-10-30T09:42:27Z"
  finalizers:
  - kubernetes.io/pvc-protection
  - cdi.kubevirt.io/clonePopulator
  labels:
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
  name: 1c7ld
  namespace: wyw-test-dv
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: 1c7ld
    uid: f05e31f0-8fee-487b-930d-036fdda35778
  resourceVersion: "3764335"
  uid: 1577eabb-fad2-4728-9f0f-bedec895517c
spec:
  accessModes:
  - ReadWriteOnce
  dataSource:
    apiGroup: cdi.kubevirt.io
    kind: VolumeCloneSource
    name: volume-clone-source-f05e31f0-8fee-487b-930d-036fdda35778
  dataSourceRef:
    apiGroup: cdi.kubevirt.io
    kind: VolumeCloneSource
    name: volume-clone-source-f05e31f0-8fee-487b-930d-036fdda35778
  resources:
    requests:
      storage: 50Gi
  storageClassName: ceph-block
  volumeMode: Block
status:
  phase: Pending

# kubectl get pvc tmp-pvc-1577eabb-fad2-4728-9f0f-bedec895517c
NAME                                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
tmp-pvc-1577eabb-fad2-4728-9f0f-bedec895517c   Lost     pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3   0                         ceph-block     7m46s
[root@rongqi-node01 batch-create]# kubectl get pv pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3   40Gi       RWO            Retain           Bound    wyw-test-dv/1c7ld   ceph-block              44m

@mhenriks Thank you. Before I create dvs, I will delete all the dvs under the wyw-test-dv namespace and dv associated pvs.

According to the above records, by analyzing the dv 1c7ld which has been in the CSICloneInProgress state, it can be found that the pv requested to be bound by tmp-pvc-<targetClaim.UID> generated during the dv creation process has been bound by 1c7ld, but the status of 1c7ld is Pending, and tmp-pvc-<targetClaim.UID> is Lost. Is this reasonable?

Another point is that the phases of dv in the source code are CSIClonePhase, PrepClaimPhase, and RebindPhase, but when describing 1c7ld dv, you can see that the order in Events is CloneScheduled -> Pending -> PrepClaimInProgress-> RebindInProgress -> CSICloneInProgress.

For those dvs that are successfully created, the order in Events is CloneScheduled -> Pending -> CSICloneInProgress->PrepClaimInProgress-> RebindInProgress -> Bound->CloneSucceeded.

@mhenriks
Copy link
Member

@wywself it appears that the PV was updated to refer to the target pvc (1c7ld). The kube controller manager should set spec.volumeName of the target PVC to be pvc-55c432cf-30b8-4c68-a3f2-22ef49ef32b3. Not sure why that's not happening. Maybe check kube controller log and events on PV. CDI waits for target to be bound before deleting the "lost" PVC

@wywself
Copy link
Author

wywself commented Nov 5, 2024

@mhenriks Thank you.

func Rebind(ctx context.Context, c client.Client, source, target *corev1.PersistentVolumeClaim) error {
	pv := &corev1.PersistentVolume{
		ObjectMeta: metav1.ObjectMeta{
			Name: source.Spec.VolumeName,
		},
	}

	if err := c.Get(ctx, client.ObjectKeyFromObject(pv), pv); err != nil {
		return err
	}

	// Examine the claimref for the PV and see if it's still bound to PVC'
	if pv.Spec.ClaimRef == nil {
		return fmt.Errorf("PV %s claimRef is nil", pv.Name)
	}

	if !IsPVBoundToPVC(pv, source) {
		// Something is not right if the PV is neither bound to PVC' nor target PVC
		if !IsPVBoundToPVC(pv, target) {
			klog.Errorf("PV bound to unexpected PVC: Could not rebind to target PVC '%s'", target.Name)
			return fmt.Errorf("PV %s bound to unexpected claim %s", pv.Name, pv.Spec.ClaimRef.Name)
		}
		// our work is done
		return nil
	}

	// Rebind PVC to target PVC
	pv.Spec.ClaimRef = &corev1.ObjectReference{
		Namespace:       target.Namespace,
		Name:            target.Name,
		UID:             target.UID,
		ResourceVersion: target.ResourceVersion,
	}
	klog.V(3).Info("Rebinding PV to target PVC", "PVC", target.Name)
	if err := c.Update(context.TODO(), pv); err != nil {
		return err
	}

	return nil
}

In the RebindPhase phase, after the ClaimRef of pv-xx bound to tmp-pvc is updated to target pvc, the status of tmp-pvc is lost, and then kube-controller-manager automatically updates the volumeName of target pvc to pv-xx, CDI waits for target to be bound before deleting the "lost" PVC. Is this description correct?

I reproduced the problem again and found that after rebind, there were error logs such as Waited for 140.220212ms due to client-side throttling, not priority and fairness, request:PUT:https://10.96.0.1:443/apis/cdi.kubevirt.io/v1beta1/namespaces/wyw-test-dv/datavolumes/oen1301v0f/status. Is this related to the problem? How can I solve it? Thank you.

2024-11-04T18:05:49.579932729+08:00 stderr F I1104 10:05:49.579885       1 util.go:1767] Rebinding PV to target PVCPVCoen1301v0f
2024-11-04T18:05:49.584582011+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.clone-populator	currently in phase, returning	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "name": "Rebind", "progress": ""}
2024-11-04T18:05:49.871078171+08:00 stderr F I1104 10:05:49.870980       1 request.go:629] Waited for 286.34232ms due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/api/v1/namespaces/wyw-test-dv/persistentvolumeclaims/oen1301v0f
2024-11-04T18:05:49.878471209+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.import-controller	reconciling Import PVCs	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:05:49.878475211+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.clone-controller	reconciling Clone PVCs	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:05:49.878479272+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.clone-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "k8s.io/CloneRequest"}
2024-11-04T18:05:49.87848791+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.clone-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "k8s.io/CloneRequest"}
2024-11-04T18:05:49.878526453+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.clone-controller	PVC not bound, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "Phase": "Pending"}
2024-11-04T18:05:49.878542156+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.clone-controller	Should not reconcile this PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "checkPVC(AnnCloneRequest)": false, "NOT has annotation(AnnCloneOf)": true, "isBound": false, "has finalizer?": false}
2024-11-04T18:05:49.878555357+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.import-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "cdi.kubevirt.io/storage.import.endpoint"}
2024-11-04T18:05:49.878568848+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.upload-controller	reconciling Upload PVCs	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:05:49.87859603+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.upload-controller	PVC not bound, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "Phase": "Pending"}
2024-11-04T18:05:49.878602483+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.upload-controller	not doing anything with PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "isUpload": false, "isCloneTarget": false, "isBound": false, "podSucceededFromPVC": false, "deletionTimeStamp set?": false}
2024-11-04T18:05:49.878647172+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.import-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "cdi.kubevirt.io/storage.import.source"}
2024-11-04T18:05:49.878660093+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.import-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "cdi.kubevirt.io/storage.import.endpoint"}
2024-11-04T18:05:49.878664462+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.import-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "cdi.kubevirt.io/storage.import.source"}
2024-11-04T18:05:49.878668931+08:00 stderr F 2024-11-04T10:05:49Z	DEBUG	controller.import-controller	PVC not bound, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "Phase": "Pending"}
2024-11-04T18:05:49.878696161+08:00 stderr F 2024-11-04T10:05:49Z	LEVEL(-3)	controller.import-controller	Should not reconcile this PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "pvc.annotation.phase.complete": false, "pvc.annotations.endpoint": false, "pvc.annotations.source": false, "isBound": false, "isMultistage": false, "multiStageDone": false}
2024-11-04T18:05:53.178892061+08:00 stderr F 2024-11-04T10:05:53Z	DEBUG	controller.clone-populator	reconciling Clone Source PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:05:53.178996272+08:00 stderr F 2024-11-04T10:05:53Z	LEVEL(-3)	controller.clone-populator	pvc state	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "hasFinalizer": true, "isBound": false, "isDeleted": false, "isSucceeded": false}
2024-11-04T18:05:53.179015483+08:00 stderr F 2024-11-04T10:05:53Z	LEVEL(-3)	controller.clone-populator	found datasource namespace annotation	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "namespace": "wyw-test-dv"}
2024-11-04T18:05:53.179855931+08:00 stderr F 2024-11-04T10:05:53Z	LEVEL(-3)	controller.clone-populator	Planning csi clone from PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:05:53.17988151+08:00 stderr F 2024-11-04T10:05:53Z	LEVEL(-3)	controller.clone-populator	created phases	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "num": 3}
2024-11-04T18:05:53.179962713+08:00 stderr F 2024-11-04T10:05:53Z	LEVEL(-3)	controller.clone-populator	Expand sizes	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "req": "50Gi", "cur": "50Gi", "act": "50Gi"}
2024-11-04T18:05:53.179974072+08:00 stderr F 2024-11-04T10:05:53Z	LEVEL(-3)	controller.clone-populator	Prep status	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "podRequired": false, "podExists": false}
2024-11-04T18:05:53.180009339+08:00 stderr F 2024-11-04T10:05:53Z	DEBUG	controller.clone-populator	currently in phase, returning	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "name": "Rebind", "progress": ""}
2024-11-04T18:05:55.497450724+08:00 stderr F 2024-11-04T10:05:55Z	DEBUG	events	Rebinding PersistentVolumeClaim for DataVolume wyw-test-dv/oen1301v0f	{"type": "Normal", "object": {"kind":"DataVolume","namespace":"wyw-test-dv","name":"oen1301v0f","uid":"98454c97-1b82-435e-9a31-97791e894610","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"9537260"}, "reason": "RebindInProgress"}
2024-11-04T18:06:00.877702102+08:00 stderr F 2024-11-04T10:06:00Z	DEBUG	controller.clone-populator	reconciling Clone Source PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:06:00.877708123+08:00 stderr F 2024-11-04T10:06:00Z	LEVEL(-3)	controller.clone-populator	pvc state	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "hasFinalizer": true, "isBound": false, "isDeleted": false, "isSucceeded": false}
2024-11-04T18:06:00.877727528+08:00 stderr F 2024-11-04T10:06:00Z	LEVEL(-3)	controller.clone-populator	found datasource namespace annotation	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "namespace": "wyw-test-dv"}
2024-11-04T18:06:00.878557211+08:00 stderr F 2024-11-04T10:06:00Z	LEVEL(-3)	controller.clone-populator	Planning csi clone from PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:06:00.878572061+08:00 stderr F 2024-11-04T10:06:00Z	LEVEL(-3)	controller.clone-populator	created phases	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "num": 3}
2024-11-04T18:06:00.878588632+08:00 stderr F 2024-11-04T10:06:00Z	DEBUG	controller.clone-populator	currently in phase, returning	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "name": "CSIClone", "progress": ""}
2024-11-04T18:06:01.021854563+08:00 stderr F I1104 10:06:01.021736       1 request.go:629] Waited for 142.974742ms due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/api/v1/namespaces/wyw-test-dv/persistentvolumeclaims/oen1301v0f
2024-11-04T18:06:01.030260158+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.upload-controller	reconciling Upload PVCs	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:06:01.030281461+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.import-controller	reconciling Import PVCs	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:06:01.030287218+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.clone-controller	reconciling Clone PVCs	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:06:01.030297671+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.clone-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "k8s.io/CloneRequest"}
2024-11-04T18:06:01.030305227+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.upload-controller	PVC not bound, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "Phase": "Pending"}
2024-11-04T18:06:01.03032422+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.upload-controller	not doing anything with PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "isUpload": false, "isCloneTarget": false, "isBound": false, "podSucceededFromPVC": false, "deletionTimeStamp set?": false}
2024-11-04T18:06:01.03032911+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.import-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "cdi.kubevirt.io/storage.import.endpoint"}
2024-11-04T18:06:01.030332931+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.import-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "cdi.kubevirt.io/storage.import.source"}
2024-11-04T18:06:01.030336864+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.import-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "cdi.kubevirt.io/storage.import.endpoint"}
2024-11-04T18:06:01.030340785+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.clone-populator	reconciling Clone Source PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:06:01.030344637+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.import-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "cdi.kubevirt.io/storage.import.source"}
2024-11-04T18:06:01.030366976+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.import-controller	PVC not bound, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "Phase": "Pending"}
2024-11-04T18:06:01.030375902+08:00 stderr F 2024-11-04T10:06:01Z	LEVEL(-3)	controller.clone-populator	pvc state	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "hasFinalizer": true, "isBound": false, "isDeleted": false, "isSucceeded": false}
2024-11-04T18:06:01.030380719+08:00 stderr F 2024-11-04T10:06:01Z	LEVEL(-3)	controller.import-controller	Should not reconcile this PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "pvc.annotation.phase.complete": false, "pvc.annotations.endpoint": false, "pvc.annotations.source": false, "isBound": false, "isMultistage": false, "multiStageDone": false}
2024-11-04T18:06:01.030384747+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.clone-controller	PVC annotation not found, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "annotation": "k8s.io/CloneRequest"}
2024-11-04T18:06:01.030388598+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.clone-controller	PVC not bound, skipping pvc	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "Phase": "Pending"}
2024-11-04T18:06:01.030392773+08:00 stderr F 2024-11-04T10:06:01Z	LEVEL(-3)	controller.clone-populator	found datasource namespace annotation	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "namespace": "wyw-test-dv"}
2024-11-04T18:06:01.030406793+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.clone-controller	Should not reconcile this PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "checkPVC(AnnCloneRequest)": false, "NOT has annotation(AnnCloneOf)": true, "isBound": false, "has finalizer?": false}
2024-11-04T18:06:01.031179914+08:00 stderr F 2024-11-04T10:06:01Z	LEVEL(-3)	controller.clone-populator	Planning csi clone from PVC	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}}
2024-11-04T18:06:01.031190926+08:00 stderr F 2024-11-04T10:06:01Z	LEVEL(-3)	controller.clone-populator	created phases	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "num": 3}
2024-11-04T18:06:01.031207329+08:00 stderr F 2024-11-04T10:06:01Z	DEBUG	controller.clone-populator	currently in phase, returning	{"PVC": {"name":"oen1301v0f","namespace":"wyw-test-dv"}, "name": "CSIClone", "progress": ""}
2024-11-04T18:06:02.310413724+08:00 stderr F 2024-11-04T10:06:02Z	DEBUG	events	CSI Volume clone in progress (for pvc default/img-f9jsn6es-ceph-block)	{"type": "Normal", "object": {"kind":"DataVolume","namespace":"wyw-test-dv","name":"oen1301v0f","uid":"98454c97-1b82-435e-9a31-97791e894610","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"9537923"}, "reason": "CSICloneInProgress"}
2024-11-04T18:06:05.100232924+08:00 stderr F I1104 10:06:05.100145       1 request.go:629] Waited for 140.220212ms due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/apis/cdi.kubevirt.io/v1beta1/namespaces/wyw-test-dv/datavolumes/oen1301v0f/status
2024-11-04T18:06:07.899763716+08:00 stderr F I1104 10:06:07.899657       1 request.go:629] Waited for 138.36537ms due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/apis/cdi.kubevirt.io/v1beta1/namespaces/wyw-test-dv/datavolumes/oen1301v0f/status
2024-11-04T18:06:10.650055699+08:00 stderr F I1104 10:06:10.649962       1 request.go:629] Waited for 140.082944ms due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/apis/cdi.kubevirt.io/v1beta1/namespaces/wyw-test-dv/datavolumes/oen1301v0f/status
2024-11-04T18:06:13.40024881+08:00 stderr F I1104 10:06:13.400153       1 request.go:629] Waited for 138.940681ms due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/apis/cdi.kubevirt.io/v1beta1/namespaces/wyw-test-dv/datavolumes/oen1301v0f/status
2024-11-04T18:06:16.150119028+08:00 stderr F I1104 10:06:16.150025       1 request.go:629] Waited for 139.623981ms due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/apis/cdi.kubevirt.io/v1beta1/namespaces/wyw-test-dv/datavolumes/oen1301v0f/status

@mhenriks
Copy link
Member

mhenriks commented Nov 6, 2024

In the RebindPhase phase, after the ClaimRef of pv-xx bound to tmp-pvc is updated to target pvc, the status of tmp-pvc is lost, and then kube-controller-manager automatically updates the volumeName of target pvc to pv-xx, CDI waits for target to be bound before deleting the "lost" PVC. Is this description correct?

That is correct

I reproduced the problem again and found that after rebind, there were error logs such as Waited for 140.220212ms due to client-side throttling, not priority and fairness, request:PUT:https://10.96.0.1:443/apis/cdi.kubevirt.io/v1beta1/namespaces/wyw-test-dv/datavolumes/oen1301v0f/status. Is this related to the problem? How can I solve it? Thank you.

Interesting find. I'm not sure how it would be related. But clearly CDI is updating the status very frequently. Is DV oen1301v0f the one with lost PVC?

@wywself
Copy link
Author

wywself commented Nov 6, 2024

Interesting find. I'm not sure how it would be related. But clearly CDI is updating the status very frequently. Is DV oen1301v0f the one with lost PVC?

@mhenriks Yes. The above is the log of cdi-deployment. Which component's rateLimiter causes this? Currently, the default QPS is 5 and the default Burst is 10? Is there a way to adjust these two parameters? Thank you.

@akalenyu
Copy link
Collaborator

akalenyu commented Nov 18, 2024

This really reminds me of a hotloop we've seen before:
#2985

You could watch the resourceVersion on the PVC to see if that's the case
(Or really, just watch the PVC with kubectl -w and you'll probably see some field/annotation being rapidly added/removed)
If not, the next step would be kube-controller-manager logs, preferably with increased verbosity:
https://nakamasato.medium.com/how-to-change-log-level-of-kubernetes-components-3cd0e15b86ea

# k logs -n kube-system kube-controller-manager-node01 | grep pv_controller
...
I1114 09:34:29.977446       1 pv_controller_base.go:197] "Enqueued for sync" objName="local-pv-3fb65ef2"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants