MTV-1536 | Use CDI for disk transfer in cold migration #1109

mnecas · 2024-10-16T11:59:23Z

Issues:

Allow migration of "unknow" guests [1]
Right now we can't migrate an unknown and unsupported operating system which is unsupported by the virt-v2v [2].
Unifying the process and potential speedup [3]
Right now we are using two different methods for the disk transfer virt-v2v cold local migrations and CDI warm and remote migrations. This brings additional engineering for maintaining two paths. It's harder to debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we can start multiple disk imports in parallel. This can improve the migration speeds.

Fix:

Start using the CNV CDI for the disk transfer instead of virt-v2v.
MTV is already using the CNV CDI for the warm and remote migration. We just need to adjust the code to remove the virt-v2v transfer and rely on the CNV CDI to do it for us.

Drawbacks:

CNV CDI requires the VDDK, which was till now highly recommended.
CNV CDI is not maintained inside the MTV and there might be problems escalating and backporting the patches as CNV has a different release cycle.
Because we will be migrating all disks in parallel we need to optimise our migration scheduler as we don't want to take too much of the hosts/network resources. I have already done some optimisations in [4,5,6].

Notes:

This change removes the usage of virt-v2v and we will only use the virt-v2v-in-place.

Ref:

[1] https://issues.redhat.com/browse/MTV-1536
[2] https://access.redhat.com/articles/1351473
[3] https://issues.redhat.com/browse/MTV-1581
[4] #1088
[5] #1087
[6] #1086

codecov-commenter · 2024-10-16T12:13:45Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.

Project coverage is 16.33%. Comparing base (bc87852) to head (419605d).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/controller/plan/adapter/vsphere/builder.go	0.00%	9 Missing ⚠️
pkg/controller/plan/scheduler/vsphere/scheduler.go	0.00%	5 Missing ⚠️
pkg/controller/plan/kubevirt.go	0.00%	4 Missing ⚠️
pkg/controller/plan/migration.go	0.00%	3 Missing ⚠️
virt-v2v/cmd/entrypoint.go	0.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1109      +/-   ##
==========================================
+ Coverage   16.20%   16.33%   +0.13%     
==========================================
  Files         112      112              
  Lines       19882    19719     -163     
==========================================
  Hits         3222     3222              
+ Misses      16375    16212     -163     
  Partials      285      285

Flag	Coverage Δ
unittests	`16.33% <0.00%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mnecas · 2024-10-16T12:14:10Z

@rwmjones This change removes the usage of virt-v2v for disk transfer and we will only use the virt-v2v-in-place. Not sure when we will get this in, I'm aiming for the next Y release (2.8.0) as it has some changes for the users. Just wanted to let you know.

rwmjones · 2024-10-17T08:09:06Z

We won't be able to support this. We only support conversions using virt-v2v. We don't support usage of virt-v2v-in-place at all, or recommend you use it. Basically, if you do this, you're on your own with supporting v2v.

Issues: [1] Allow migration of "unknow" guests Right now when we want to migrate an unknown and unsupported operating system which is unsupported by the virt-v2v [3]. [2] Unifying the process and potential speedup Right now we are using two different methods for the disk transfer. This brings additional engineering for maintaining two paths. It's harder to debug two different flows. The virt-v2v transfers the disks in the sequence whereas using the CDI we can start multiple disk imports in parallel. This can improve the migration speeds. Fix: MTV is already using the CNV CDI for the warm and remote migration. We just need to adjust the code to remove the virt-v2v transfer and rely on the CNV CDI to do it for us. Drawbacks: - CNV CDI *requires* the VDDK, which was till now highly recommended. - CNV CDI is not maintained inside the MTV and there might be problems escalating and backporting the patches as CNV has a different release cycle. - Because we will be migrating all disks in parallel we need to optimise our migration scheduler as we don't want to take too much of the hosts/network resources. I have already done some optimisations in [4,5,6]. Notes: This change removes the usage of virt-v2v and we will only use the virt-v2v-in-place. Ref: [1] https://issues.redhat.com/browse/MTV-1536 [2] https://issues.redhat.com/browse/MTV-1581 [3] https://access.redhat.com/articles/1351473 [4] kubev2v#1088 [5] kubev2v#1087 [6] kubev2v#1086 Signed-off-by: Martin Necas <[email protected]>

Issue: when migrating the VMs the VM disks need to be transfered over the network. This is inefficient, takes additional storage and is slow. We need an alternative way for the disk transfer. Design: This PR is PoC of adding an offload plugin for the disk transfer. The offload plugin is a specific tool that needs to be implemented by the CSI provider. To specify the storage offload plugin the user will need to specify it in the StorageMap destination. This will allow the users to migrate the VM even with disks across multiple data store types. For example, some could be managed by the offloadPlugin and some would still go over the network if needed. Example of the storage map with offload plugin: ``` spec: map: - destination: offloadPlugin: 'quay.io/kubev2v/offload:latest' storageClass: CSI source: id: datastore-30 ... ``` The offload plugin is started right after the CreateDataVolumes, this the way the Kubernetes CSI will create empty PVs into which the disks can be transfered. The OffloadPlugin step creates a job on the destination cluster. The job provided offload plugin image and start. The job is started with the following parameters. `HOST` = `PLAN_NAME` = `NAMESPACE` = In addition to these variables, it also mounts the secrets to the vCenter. Note: This change additionally requires kubev2v#1109, because right now the cold migration transfer is managed by the virt-v2v. The kubev2v#1109 removes this dependency and moves it out to the CNV CDI. This allows us to split the transfer and conversion steps which were in the same step from the forklift perspective. Once the disks are transferred the Forklift does the `virt-v2v-in-place` on the disks and starts the VM. In the same way, this step will be done also on the offload plugin as we will transfer the disks using the offload plugin and then start `virt-v2v-in-place` on the disks. TODO: - Check if OffloadPlugin exists - Allow storage map with OffloadPlugin and without combination - Add checking of the offload plugin disk transfer status Signed-off-by: Martin Necas <[email protected]>

sonarqubecloud · 2024-10-18T13:12:58Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Issue: when migrating the VMs the VM disks need to be transfered over the network. This is inefficient, takes additional storage and is slow. We need an alternative way for the disk transfer. Design: This PR is PoC of adding an offload plugin for the disk transfer. The offload plugin is a specific tool that needs to be implemented by the CSI provider. To specify the storage offload plugin the user will need to specify it in the StorageMap destination. This will allow the users to migrate the VM even with disks across multiple data store types. For example, some could be managed by the offloadPlugin and some would still go over the network if needed. Example of the storage map with offload plugin: ``` spec: map: - destination: offloadPlugin: 'quay.io/kubev2v/offload:latest' storageClass: CSI source: id: datastore-30 ... ``` The offload plugin is started right after the CreateDataVolumes, this the way the Kubernetes CSI will create empty PVs into which the disks can be transfered. The OffloadPlugin step creates a job on the destination cluster. The job provided offload plugin image and start. The job is started with the following parameters: - `HOST` = url to the esxi host - `PLAN_NAME` = plane name - `NAMESPACE` = namepsace where the migration is running In addition to these variables, it also mounts the secrets to the vCenter. The secrets are in the path `/etc/secret` and the files are: - `accessKeyId` with a username - `secretKey` with a password Note: This change additionally requires kubev2v#1109, because right now the cold migration transfer is managed by the virt-v2v. The kubev2v#1109 removes this dependency and moves it out to the CNV CDI. This allows us to split the transfer and conversion steps which were in the same step from the forklift perspective. Once the disks are transferred the Forklift does the `virt-v2v-in-place` on the disks and starts the VM. In the same way, this step will be done also on the offload plugin as we will transfer the disks using the offload plugin and then start `virt-v2v-in-place` on the disks. TODO: [ ] Add design doc showing larger details, this is just PoC [ ] Add check if OffloadPlugin image exists [ ] Add check of the offload plugin disk transfer status [ ] Allow storage map with OffloadPlugin and without combination [ ] Improve the name of the offload pugin job, right now its the VM ID Signed-off-by: Martin Necas <[email protected]>

Issue: when migrating the VMs the VM disks need to be transfered over the network. This is inefficient, takes additional storage and is slow. We need an alternative way for the disk transfer. Design: This PR is PoC of adding an offload plugin for the disk transfer. The offload plugin is a specific tool that needs to be implemented by the CSI provider. To specify the storage offload plugin the user will need to specify it in the StorageMap destination. This will allow the users to migrate the VM even with disks across multiple data store types. For example, some could be managed by the offloadPlugin and some would still go over the network if needed. Example of the storage map with offload plugin: ``` spec: map: - destination: offloadPlugin: image: 'quay.io/mnecas0/offload:latest' vars: test1: test1 test2: test2 storageClass: CSI source: id: datastore-30 ... ``` The offload plugin is started right after the CreateDataVolumes, this the way the Kubernetes CSI will create empty PVs into which the disks can be transfered. The OffloadPlugin step creates a job on the destination cluster. The job provided offload plugin image and start. The job is started with the following parameters: - `HOST` = url to the esxi host - `PLAN_NAME` = plane name - `NAMESPACE` = namepsace where the migration is running In addition to these variables, it also mounts the secrets to the vCenter. The secrets are in the path `/etc/secret` and the files are: - `accessKeyId` with a username - `secretKey` with a password Note: This change additionally requires kubev2v#1109, because right now the cold migration transfer is managed by the virt-v2v. The kubev2v#1109 removes this dependency and moves it out to the CNV CDI. This allows us to split the transfer and conversion steps which were in the same step from the forklift perspective. Once the disks are transferred the Forklift does the `virt-v2v-in-place` on the disks and starts the VM. In the same way, this step will be done also on the offload plugin as we will transfer the disks using the offload plugin and then start `virt-v2v-in-place` on the disks. TODO: [ ] Add design doc showing larger details, this is just PoC [ ] Add check if OffloadPlugin image exists [ ] Add check of the offload plugin disk transfer status [ ] Allow storage map with OffloadPlugin and without combination [ ] Improve the name of the offload pugin job, right now its the VM ID Signed-off-by: Martin Necas <[email protected]>

Issue: This is alternative to kubev2v#1109 which removes the virt-v2v disk transfer and replaces it with te CDI. The main dissadvatage of that solution is that in case customer will have problems with the new CDI method there won't be any easy way how to go back, to the virt-v2v. Fix: This change adds a new feature toggle `feature_vsphere_cold_cdi` in the forklift controller. By default the feature will be enabled and in case there will be problems such as disk corruption the user can disable it. Signed-off-by: Martin Necas <[email protected]>

mnecas requested a review from yaacov as a code owner October 16, 2024 11:59

mnecas force-pushed the cdi_cold_migration branch from 40b3f51 to 419605d Compare October 16, 2024 12:07

mnecas force-pushed the cdi_cold_migration branch 2 times, most recently from 514634a to 23e30da Compare October 17, 2024 12:34

mnecas force-pushed the cdi_cold_migration branch from 23e30da to 75ac8a7 Compare October 18, 2024 13:12

mnecas mentioned this pull request Oct 18, 2024

Add offload plugin implmentation #1115

Draft

5 tasks

mnecas mentioned this pull request Nov 5, 2024

MTV-1536 | Use CDI for disk transfer in cold migration feature toggle #1171

Open

mnecas force-pushed the main branch from 862e385 to 59c31b0 Compare December 12, 2024 19:21

arturshadnik mentioned this pull request Dec 17, 2024

feat: expose ability to skip vsphere guest conversion #1279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MTV-1536 | Use CDI for disk transfer in cold migration #1109

MTV-1536 | Use CDI for disk transfer in cold migration #1109

mnecas commented Oct 16, 2024 •

edited

Loading

codecov-commenter commented Oct 16, 2024

mnecas commented Oct 16, 2024

rwmjones commented Oct 17, 2024

sonarqubecloud bot commented Oct 18, 2024

MTV-1536 | Use CDI for disk transfer in cold migration #1109

Are you sure you want to change the base?

MTV-1536 | Use CDI for disk transfer in cold migration #1109

Conversation

mnecas commented Oct 16, 2024 • edited Loading

Issues:

Fix:

Drawbacks:

Notes:

Ref:

codecov-commenter commented Oct 16, 2024

Codecov Report

mnecas commented Oct 16, 2024

rwmjones commented Oct 17, 2024

sonarqubecloud bot commented Oct 18, 2024

Quality Gate passed

mnecas commented Oct 16, 2024 •

edited

Loading