Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTV-1543 | Fix warm migration scheduler #1086

Merged
merged 1 commit into from
Oct 7, 2024
Merged

Conversation

mnecas
Copy link
Member

@mnecas mnecas commented Oct 7, 2024

Issue:
When running the warm migration with VMs more than the MaxInFlight disks. The VMs over this number won't start the migration till the cutover. Once the cutover is started the VMs which were not started migrate same as in the old migration. So there is larger downtime.

Fix:
Ignore the cost when in CopyingPaused, this allows the other Migrations to start as the cost is reduced when the VM disk transfer is finished.

Note:
This patch also improves the cold migration as it ignores the count when creating the VM so the other VM migrations can get started. And dramaticially improves the warm migration time as we are not waiting for the guest conversion as we already have the disk so we start the guest conversion and do not halt the scheduler.

Fixes: https://issues.redhat.com/browse/MTV-1543

@mnecas mnecas requested a review from yaacov as a code owner October 7, 2024 14:13
@codecov-commenter
Copy link

codecov-commenter commented Oct 7, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.

Project coverage is 16.26%. Comparing base (c2f7d64) to head (1c2eea1).

Files with missing lines Patch % Lines
pkg/controller/plan/scheduler/vsphere/scheduler.go 0.00% 16 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1086      +/-   ##
==========================================
- Coverage   16.32%   16.26%   -0.06%     
==========================================
  Files         112      112              
  Lines       19785    19794       +9     
==========================================
- Hits         3229     3220       -9     
- Misses      16269    16289      +20     
+ Partials      287      285       -2     
Flag Coverage Δ
unittests 16.26% <0.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Issue:
When running the warm migration with VMs more than the MaxInFlight disks.
The VMs over this number won't start the migration till the cutover.
Once the cutover is started the VMs which were not started migrate same
as in the old migration. So there is larger downtime.

Fix:
Ignore the cost when in `CopyingPaused`, this allows the other
Migrations to start as the cost is reduced when the VM disk transfer is
finished.

Note:
This patch also improves the cold migration as it ignores the count when
creating the VM so the other VM migrations can get started.
And dramaticially improves the warm migration time as we are not waiting
for the guest conversion as we already have the disk so we start the
guest conversion and do not halt the scheduler.

Fixes: https://issues.redhat.com/browse/MTV-1543

Signed-off-by: Martin Necas <[email protected]>
Copy link

sonarqubecloud bot commented Oct 7, 2024

@mnecas mnecas merged commit 0a30857 into kubev2v:main Oct 7, 2024
23 of 29 checks passed
@mnecas mnecas mentioned this pull request Oct 8, 2024
mnecas added a commit to mnecas/forklift that referenced this pull request Oct 16, 2024
Issues:
[1] Allow migration of "unknow" guests
Right now when we want to migrate an unknown and unsupported operating
system which is unsupported by the virt-v2v [3].

[2] Unifying the process and potential speedup
Right now we are using two different methods for the disk transfer. This
brings additional engineering for maintaining two paths. It's harder to
debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we
can start multiple disk imports in parallel. This can improve the
migration speeds.

Fix:
MTV is already using the CNV CDI for the warm and remote migration. We
just need to adjust the code to remove the virt-v2v transfer and rely on
the CNV CDI to do it for us.

Drawbacks:
- CNV CDI *requires* the VDDK, which was till now highly recommended.
- CNV CDI is not maintained inside the MTV and there might be problems
  escalating and backporting the patches as CNV has a different release
  cycle.
- Because we will be migrating all disks in parallel we need to
  optimise our migration scheduler as we don't want to take too much
  of the hosts/network resources. I have already done some
  optimisations in [4,5,6].

Notes:
This change removes the usage of virt-v2v and we will only use the
virt-v2v-in-place.

Ref:
[1] https://issues.redhat.com/browse/MTV-1536
[2] https://issues.redhat.com/browse/MTV-1581
[3] https://access.redhat.com/articles/1351473
[4] kubev2v#1088
[5] kubev2v#1087
[6] kubev2v#1086

Signed-off-by: Martin Necas <[email protected]>
mnecas added a commit to mnecas/forklift that referenced this pull request Oct 16, 2024
Issues:
[1] Allow migration of "unknow" guests
Right now when we want to migrate an unknown and unsupported operating
system which is unsupported by the virt-v2v [3].

[2] Unifying the process and potential speedup
Right now we are using two different methods for the disk transfer. This
brings additional engineering for maintaining two paths. It's harder to
debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we
can start multiple disk imports in parallel. This can improve the
migration speeds.

Fix:
MTV is already using the CNV CDI for the warm and remote migration. We
just need to adjust the code to remove the virt-v2v transfer and rely on
the CNV CDI to do it for us.

Drawbacks:
- CNV CDI *requires* the VDDK, which was till now highly recommended.
- CNV CDI is not maintained inside the MTV and there might be problems
  escalating and backporting the patches as CNV has a different release
  cycle.
- Because we will be migrating all disks in parallel we need to
  optimise our migration scheduler as we don't want to take too much
  of the hosts/network resources. I have already done some
  optimisations in [4,5,6].

Notes:
This change removes the usage of virt-v2v and we will only use the
virt-v2v-in-place.

Ref:
[1] https://issues.redhat.com/browse/MTV-1536
[2] https://issues.redhat.com/browse/MTV-1581
[3] https://access.redhat.com/articles/1351473
[4] kubev2v#1088
[5] kubev2v#1087
[6] kubev2v#1086

Signed-off-by: Martin Necas <[email protected]>
mnecas added a commit to mnecas/forklift that referenced this pull request Oct 17, 2024
Issues:
[1] Allow migration of "unknow" guests
Right now when we want to migrate an unknown and unsupported operating
system which is unsupported by the virt-v2v [3].

[2] Unifying the process and potential speedup
Right now we are using two different methods for the disk transfer. This
brings additional engineering for maintaining two paths. It's harder to
debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we
can start multiple disk imports in parallel. This can improve the
migration speeds.

Fix:
MTV is already using the CNV CDI for the warm and remote migration. We
just need to adjust the code to remove the virt-v2v transfer and rely on
the CNV CDI to do it for us.

Drawbacks:
- CNV CDI *requires* the VDDK, which was till now highly recommended.
- CNV CDI is not maintained inside the MTV and there might be problems
  escalating and backporting the patches as CNV has a different release
  cycle.
- Because we will be migrating all disks in parallel we need to
  optimise our migration scheduler as we don't want to take too much
  of the hosts/network resources. I have already done some
  optimisations in [4,5,6].

Notes:
This change removes the usage of virt-v2v and we will only use the
virt-v2v-in-place.

Ref:
[1] https://issues.redhat.com/browse/MTV-1536
[2] https://issues.redhat.com/browse/MTV-1581
[3] https://access.redhat.com/articles/1351473
[4] kubev2v#1088
[5] kubev2v#1087
[6] kubev2v#1086

Signed-off-by: Martin Necas <[email protected]>
mnecas added a commit to mnecas/forklift that referenced this pull request Oct 17, 2024
Issues:
[1] Allow migration of "unknow" guests
Right now when we want to migrate an unknown and unsupported operating
system which is unsupported by the virt-v2v [3].

[2] Unifying the process and potential speedup
Right now we are using two different methods for the disk transfer. This
brings additional engineering for maintaining two paths. It's harder to
debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we
can start multiple disk imports in parallel. This can improve the
migration speeds.

Fix:
MTV is already using the CNV CDI for the warm and remote migration. We
just need to adjust the code to remove the virt-v2v transfer and rely on
the CNV CDI to do it for us.

Drawbacks:
- CNV CDI *requires* the VDDK, which was till now highly recommended.
- CNV CDI is not maintained inside the MTV and there might be problems
  escalating and backporting the patches as CNV has a different release
  cycle.
- Because we will be migrating all disks in parallel we need to
  optimise our migration scheduler as we don't want to take too much
  of the hosts/network resources. I have already done some
  optimisations in [4,5,6].

Notes:
This change removes the usage of virt-v2v and we will only use the
virt-v2v-in-place.

Ref:
[1] https://issues.redhat.com/browse/MTV-1536
[2] https://issues.redhat.com/browse/MTV-1581
[3] https://access.redhat.com/articles/1351473
[4] kubev2v#1088
[5] kubev2v#1087
[6] kubev2v#1086

Signed-off-by: Martin Necas <[email protected]>
mnecas added a commit to mnecas/forklift that referenced this pull request Oct 17, 2024
Issues:
[1] Allow migration of "unknow" guests
Right now when we want to migrate an unknown and unsupported operating
system which is unsupported by the virt-v2v [3].

[2] Unifying the process and potential speedup
Right now we are using two different methods for the disk transfer. This
brings additional engineering for maintaining two paths. It's harder to
debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we
can start multiple disk imports in parallel. This can improve the
migration speeds.

Fix:
MTV is already using the CNV CDI for the warm and remote migration. We
just need to adjust the code to remove the virt-v2v transfer and rely on
the CNV CDI to do it for us.

Drawbacks:
- CNV CDI *requires* the VDDK, which was till now highly recommended.
- CNV CDI is not maintained inside the MTV and there might be problems
  escalating and backporting the patches as CNV has a different release
  cycle.
- Because we will be migrating all disks in parallel we need to
  optimise our migration scheduler as we don't want to take too much
  of the hosts/network resources. I have already done some
  optimisations in [4,5,6].

Notes:
This change removes the usage of virt-v2v and we will only use the
virt-v2v-in-place.

Ref:
[1] https://issues.redhat.com/browse/MTV-1536
[2] https://issues.redhat.com/browse/MTV-1581
[3] https://access.redhat.com/articles/1351473
[4] kubev2v#1088
[5] kubev2v#1087
[6] kubev2v#1086

Signed-off-by: Martin Necas <[email protected]>
mnecas added a commit to mnecas/forklift that referenced this pull request Oct 17, 2024
Issues:
[1] Allow migration of "unknow" guests
Right now when we want to migrate an unknown and unsupported operating
system which is unsupported by the virt-v2v [3].

[2] Unifying the process and potential speedup
Right now we are using two different methods for the disk transfer. This
brings additional engineering for maintaining two paths. It's harder to
debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we
can start multiple disk imports in parallel. This can improve the
migration speeds.

Fix:
MTV is already using the CNV CDI for the warm and remote migration. We
just need to adjust the code to remove the virt-v2v transfer and rely on
the CNV CDI to do it for us.

Drawbacks:
- CNV CDI *requires* the VDDK, which was till now highly recommended.
- CNV CDI is not maintained inside the MTV and there might be problems
  escalating and backporting the patches as CNV has a different release
  cycle.
- Because we will be migrating all disks in parallel we need to
  optimise our migration scheduler as we don't want to take too much
  of the hosts/network resources. I have already done some
  optimisations in [4,5,6].

Notes:
This change removes the usage of virt-v2v and we will only use the
virt-v2v-in-place.

Ref:
[1] https://issues.redhat.com/browse/MTV-1536
[2] https://issues.redhat.com/browse/MTV-1581
[3] https://access.redhat.com/articles/1351473
[4] kubev2v#1088
[5] kubev2v#1087
[6] kubev2v#1086

Signed-off-by: Martin Necas <[email protected]>
mnecas added a commit to mnecas/forklift that referenced this pull request Oct 18, 2024
Issues:
[1] Allow migration of "unknow" guests
Right now when we want to migrate an unknown and unsupported operating
system which is unsupported by the virt-v2v [3].

[2] Unifying the process and potential speedup
Right now we are using two different methods for the disk transfer. This
brings additional engineering for maintaining two paths. It's harder to
debug two different flows.
The virt-v2v transfers the disks in the sequence whereas using the CDI we
can start multiple disk imports in parallel. This can improve the
migration speeds.

Fix:
MTV is already using the CNV CDI for the warm and remote migration. We
just need to adjust the code to remove the virt-v2v transfer and rely on
the CNV CDI to do it for us.

Drawbacks:
- CNV CDI *requires* the VDDK, which was till now highly recommended.
- CNV CDI is not maintained inside the MTV and there might be problems
  escalating and backporting the patches as CNV has a different release
  cycle.
- Because we will be migrating all disks in parallel we need to
  optimise our migration scheduler as we don't want to take too much
  of the hosts/network resources. I have already done some
  optimisations in [4,5,6].

Notes:
This change removes the usage of virt-v2v and we will only use the
virt-v2v-in-place.

Ref:
[1] https://issues.redhat.com/browse/MTV-1536
[2] https://issues.redhat.com/browse/MTV-1581
[3] https://access.redhat.com/articles/1351473
[4] kubev2v#1088
[5] kubev2v#1087
[6] kubev2v#1086

Signed-off-by: Martin Necas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants