-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#4547: Refactor vmm builder code to simplify logic that creates the microVM to boot #4910
base: main
Are you sure you want to change the base?
Conversation
pass vm_config to eliminate two extra arguments derived from it Signed-off-by: tommady <[email protected]>
remove cfg_attr and extract create_vcpus from create_vmm_and_vcpus Signed-off-by: tommady <[email protected]>
fce4afd
to
38e9537
Compare
extract codes into two architecture specific modes Signed-off-by: tommady <[email protected]>
eliminate the unnecessary usage of the event_manager argument Signed-off-by: tommady <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tommady, thx for doing the PR. Overall changes seems solid, but I left some small notes here and there. Also can you change names of the commits to refactor(builder): ...
. This is usually how we treat refactor commits.
Add the SVE CPU template as a valid template in 5.10 since it works. Signed-off-by: Pablo Barbáchano <[email protected]>
Update release policy to v1.10.1 patch Signed-off-by: Jack Thomson <[email protected]>
`TcpIPv4Handler` for MMDS network stack preallocates several buffers whose sizes are saved into a snapshot as `max_connections` and `max_pending_resets` in `MmdsNetworkStackState`. But they are always the same constant hardcoded values (`DEFAULT_MAX_CONNECTIONS` and `DEFAULT_MAX_PENDING_RESETS`) as of today, which means there is no need to save them into a snapshot. Even if we change the hardcoded sizes across Firecracker versions, that should not be a problem. This is because the snapshot feature does not support migration of network connections and those buffers are initialized with empty on snapshot restoration. When migrating from a Firecracker version with larger buffers to another version with smaller ones, guest workloads that worked previously might start to fail due to the less buffer spaces. However, the issue is not a problem of the snapshot feature and it should also occur even on a purely booted microVM (not restored from a snapshot). Thus, it is fine to remove those fields from a snapshot. Since this is a breaking change of the snapshot format, bumps the major version. Signed-off-by: Takahiro Itazuri <[email protected]>
There is no need to use MmdsNetworkStack::new() instead of MmdsNetworkStack::new_with_defaults() in tests that pass the same default values. Signed-off-by: Takahiro Itazuri <[email protected]>
We bumped the snapshot version up twice recently, requiring users to regenerate their snapshot, but the user action isn't clearly stated. Signed-off-by: Takahiro Itazuri <[email protected]>
eliminate the unnecessary usage of the event_manager argument and fix up aarch64 attach_legacy_devices_aarch64 fn Signed-off-by: tommady <[email protected]>
remove the aarch64 suffix from the attach_legacy_devices_aarch64 function and ensure that aarch64 smt is always set to false int the configure_system_for_boot function Signed-off-by: tommady <[email protected]>
Hi @ShadowCurse I’ve addressed your comments, with one note regarding the set_stdout_nonblocking function. Please review and provide your guidance when you have a moment. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you need to make sure the changes you made pass our CI. You can look at the buildkite/firecracker-pr
line at the bottom of the PR and click details
to look at the CI status. Start with making sure changes do compile. I can see that on x86_86 only errors are absence of doc comments for public functions. For aarch64, well, it does not even compile. If you don't have access to arm system, you can try to use cross to cross compile.
After everything compiles, please move all changes to corresponding commits.
Also please rebase the PR so it only includes your commits.
`TcpIPv4Handler` for MMDS network stack preallocates several buffers whose sizes are saved into a snapshot as `max_connections` and `max_pending_resets` in `MmdsNetworkStackState`. But they are always the same constant hardcoded values (`DEFAULT_MAX_CONNECTIONS` and `DEFAULT_MAX_PENDING_RESETS`) as of today, which means there is no need to save them into a snapshot. Even if we change the hardcoded sizes across Firecracker versions, that should not be a problem. This is because the snapshot feature does not support migration of network connections and those buffers are initialized with empty on snapshot restoration. When migrating from a Firecracker version with larger buffers to another version with smaller ones, guest workloads that worked previously might start to fail due to the less buffer spaces. However, the issue is not a problem of the snapshot feature and it should also occur even on a purely booted microVM (not restored from a snapshot). Thus, it is fine to remove those fields from a snapshot. Since this is a breaking change of the snapshot format, bumps the major version. Signed-off-by: Takahiro Itazuri <[email protected]>
We bumped the snapshot version up twice recently, requiring users to regenerate their snapshot, but the user action isn't clearly stated. Signed-off-by: Takahiro Itazuri <[email protected]>
let the x86_64 and aarch64 architectures code can compile and without warnning Signed-off-by: tommady <[email protected]>
Hi @ShadowCurse Thank you for your patience and thorough review—it has been an incredible learning experience for me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that it at least compiles, you need to make sure it also does function as expected. If you look at the Buildkite pipelines output: https://buildkite.com/firecracker/firecracker-pr/builds/11856#0193640b-6e7c-4f49-bbcf-cb60c1c946ef/62-392 you will see that there are some unit tests that fail in the same place src/vmm/src/builder.rs:519:51
. Quick look around and it seems the issue is that when pio_device_manager
registers itself, it registers interrupts for a VM, but the interrupt controller is not yet created at that point. So you need to either move creation of interrupt_controller before pio_device_manager
creation, or move this .register_device(..)
call.
You can reproduce all of this errors if you run CI locally. You can find exact commands CI is running looking at the very beginning of the logs like this: https://buildkite.com/firecracker/firecracker-pr/builds/11856#0193640b-6e7c-4f49-bbcf-cb60c1c946ef/62-63
Start by running unit tests. There are a couple of ways to do this:
./tools/devtool test -- integration_tests/build/test_unittests.py
- this will use the docker container to compile and run tests- Compile test with
cargo test -p vmm --no-run
and then run withsudo ./build/cargo_target/debug/deps/vmm-....
(there will be a full path at the end ofcargo test
) With this method some network related unit tests will fail most likely, but you can skip those.
Also you can make sure your changes are sound by to simply trying to launch a VM with them. We have a doc for this: https://github.com/firecracker-microvm/firecracker/blob/main/docs/getting-started.md (in the section about getting a rootfs you can skip the ssh key creation if you don't want to use/test network)
make the test all passed Signed-off-by: tommady <[email protected]>
Hi @ShadowCurse, ./tools/devtool test -- integration_tests/build/test_unittests.py
...
integration_tests/build/test_unittests.py::test_unittests PASSED [ 50%]
integration_tests/build/test_unittests.py::test_benchmarks_compile PASSED [100%] All tests passed successfully! 🎉 When you have a moment, please review it again. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4910 +/- ##
==========================================
+ Coverage 84.08% 84.76% +0.67%
==========================================
Files 251 223 -28
Lines 28058 25666 -2392
==========================================
- Hits 23592 21755 -1837
+ Misses 4466 3911 -555
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
With x86_64 version done, now you need to make sure aarhc64 also compiles/works. If you look at the BK run again: https://buildkite.com/firecracker/firecracker-pr/builds/11886#01936ce9-9b52-4459-ac0a-f8cfbe7a3690/62-624, https://buildkite.com/firecracker/firecracker-pr/builds/11886#01936ce9-9b62-4713-b1eb-33926eacc8b0/62-882 you will see some compilation errors in unit tests and CI test for serial port in a restored VM failing. As I mentioned before, you can use cross to cross compile FC to aarch64. It might even allow some subset of unit tests to be run locally. Or you can use qemu to create an emulated aarch64 VM and test your code there. |
Oh, I see… I assumed the devtool test would handle all platform testing in one sweep. Let me double-check again. |
fix(4547): Refactor vmm builder code to simplify logic
Changes
Refactoring Vmm builder code
Reason
close issue #4547
...
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.