-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge master into feature/update-backwards-compat #5887
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Prior to this commit, the xapi on the coordinator host records the 'Unix.gettimeofday' as the timestamps of the heartbeat with other pool supporter hosts. When the system clock is updated with a huge jump forward, the timestamps would be far back into the past. This would cause the xapi assumes that the hosts are offline as long time no heartbeats. In this commit, the timestamps are changed to get from a monotonic clock. In this way, the system clock changes will not impact the heartbeats' timestamps any more. Additionally, Host_metrics.last_updated is only set when the object is created. It's useless in check_host_liveness at all. Signed-off-by: Ming Lu <[email protected]>
Xapi will send alert when the user is running on a Corosync 2 cluster and prompting them to upgrade. This should only happen on XS 9 and is behind the corosync3 feature flag. XenCenter can take advantage of this alert message, but should have its own warning message to tell the user why/how to perform the upgrade. Signed-off-by: Vincent Liu <[email protected]>
By default message-switch calls are serialized for backwards compatibility reasons in the Lwt and Async modes. (We tried enabling parallel actions by default but got some hard to debug failures in the CI). This causes very long VM start times when multiple VBDs are plugged/unplugged concurrently: the operations are seen concurrently by message-switch, but xapi-storage-script only retrieves and dispatches them sequentially, so any opportunity for parallel execution is lost. Even though the actions themselves only take seconds, due to serialization, a VM start may take minutes. Enable parallel processing explicitly here (instead of for all message-switch clients). SMAPIv3 should expect to be called concurrently (on different hosts even), so in theory this change should be safe and improve performance, but there are some known bugs in SMAPIv3 plugins currently. So introduce a config file flag 'concurrent' for now, that defaults to false, but that can be turned to 'true' for testing purposes. When all SMAPIv3 concurrency bugs are believed to be fixed we can flip the default, and eventually remove this flag once no more bugs are reported. The configuration value is done as a global to simplify integrating intot he Lwt port, instead of changing a lot of functions to thread through an argument. This doesn't change the behaviour of xapi-storage-script in its default configuration. Signed-off-by: Edwin Török <[email protected]>
CP-49634: Add alerting for Corosync upgrade
Signed-off-by: Andrii Sultanov <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
If we want to reproduce a failure we need to know the exact commandline that was used. Longer than 80 chars is not a problem, this is a logfile, and a truncated line is worse than a long line. Signed-off-by: Edwin Török <[email protected]>
Non-running VMs' metrics are stored in the coordinator. When the coordinator is asked about the metrics try to unarchive them instead of failing while trying to fetch the coordinator's IP address. This needs to force the HTTP method of the query to be POST Also returns a Service Unavailable when the host is marked as Broken. Signed-off-by: Pau Ruiz Safont <[email protected]>
Forces users to use an address, instead of being implicit, this avoid the underlying cause for the issue fixed in the previous commit: it allowed a coordinator to call Pool_role.get_master_address, which always fails. Signed-off-by: Pau Ruiz Safont <[email protected]>
This makes the selection of the action obvious, previously the two booleans made it hazy to understand the decision, and was part of the error why the coordinator tried to get the coordinator address from the pool_role file (and failed badly) Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Currently a List.assoc is used, which raises an unhandled exception. Signed-off-by: Pau Ruiz Safont <[email protected]>
Shell quoting changes in xen-api.git 65f152d broke the dry-run functionality, as by quoting parameters in the way it was done it meant the space separation was not properly handled to separate parameters etc. Signed-off-by: Alex Brett <[email protected]>
The xe-[backup,restore]-metadata scripts have cleanup logic designed to ensure we do not leave any vbd objects etc behind. This logic calls `vbd-unplug` with a 20s timeout, and then (regardless of the result) allows up to 10s for any device specified in the VBD to disappear - if it does not, it does not trigger a `vbd-destroy`. This logic fails in the case where a VDI is attached to a VM running on the same host, as the `device` field in the new VBD will be populated with the backend device for the running VM. In this case, the `vbd-unplug` fails immediately (as the vbd is not plugged because the original `vbd-plug` attempt fails as the VDI is in use), but then we sit waiting for 10s for the device to disappear (which is obviously does not), and then fail to trigger a `vbd-destroy`, leaving the VBD behind. Fix this by simply removing the wait for the device to disappear and always attempting a `vbd-destroy`, as I am not aware of any situation where this additional 10s wait will give any benefit given current behaviours. Signed-off-by: Alex Brett <[email protected]>
This patch makes the feature use the debugfs utility, part of e2fsprogs. This makes the system as a whole a heck of a lot better, if only because it won't be able to parse XFS, ReiserFS or any of the many plugins of libfsimage. Signed-off-by: Alejandro Vallejo <[email protected]>
Add a new option `-o` to xe-restore-metadata, which is used to distinguish whether to allow use of legacy backup VDIs, or enforce only use of the new format VDIs with known UUIDs. Also modify xe-restore-metadata such that it no longer stops searching the candidate list if only one VDI is found, but instead identifies all possible backup VDIs. If more than one is found, and you are doing anything other than listing the VDIs, the script will abort. This is to cover the case where a malicious legacy format VDI is present - we will detect it and the expected 'real' backup VDI. Modify xe-backup-metadata to always expect to use the deterministic UUID to identify the VDI to add backups to, do not rely on the `other-config:ctxs-pool-backup` property for identification in any way. This is XSA-459 / CVE-2024-31144 Signed-off-by: Alex Brett <[email protected]>
- Quote a parameter - Adjust how we check the returncode of some function calls to satifsy shellcheck - Disable the warnings where we are explicitly relying on string splitting Signed-off-by: Alex Brett <[email protected]>
This parameter was added in 7f1d315, but the changes to always use the new metadata VDIs with known UUIDs means it is no longer required, so remove it. Signed-off-by: Alex Brett <[email protected]>
* Remove redundant parameter wiping Removes ineffectual parameter wiping introduced by 6e24ca4. Signed-off-by: Colin James <[email protected]>
…Unix/Lwt It was a mix of Lwt and Unix code, which means that if the Unix call blocks the entire Lwt code blocks too. This was only installed by the specfile in a -devel package. `message-cli tail --follow` can be used to debug the IDL protocol instead. Signed-off-by: Edwin Török <[email protected]>
Also add missing dependencies added recently Signed-off-by: Pau Ruiz Safont <[email protected]>
Its implementation was identical, except for the use of time_limited_read in Threadext, but the semantics is identical. Use one well tested implementation instead of duplicating code. One less function to convert to epoll. Signed-off-by: Edwin Török <[email protected]>
…t_timed_read Signed-off-by: Edwin Török <[email protected]>
[epoll] CP-47536: Drop posix_channel and channel_helper: unused and a mix of …
CA-395512: process SMAPIv3 API calls concurrently
Signed-off-by: Edwin Török <[email protected]>
Reduces the amount of work the build has to do if we don't need to preprocess everything, but only the few modules that actually using [@@deriving]. Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
This signature is completely unused. We could instead generate a client.mli, but that is more complicated, currently the client.mli it'd generate wouldn't be polymorphic enough. Signed-off-by: Edwin Török <[email protected]>
Server.ml takes a while to compile, but most unit tests don't actually need it. Reorganize Api_server into Api_server+Api_server_common, where the latter suffices for unit tests. 'dune runtest' times are improved: ``` hyperfine --min-runs 2 'dune clean; dune runtest --cache=disabled' 'cd ../scm-prev; dune clean; dune runtest --cache=disabled' Benchmark 1: dune clean; dune runtest --cache=disabled Time (mean ± σ): 103.491 s ± 1.596 s [User: 374.464 s, System: 125.957 s] Range (min … max): 102.363 s … 104.620 s 2 runs Benchmark 2: cd ../scm-prev; dune clean; dune runtest --cache=disabled Time (mean ± σ): 114.158 s ± 2.980 s [User: 380.638 s, System: 134.558 s] Range (min … max): 112.051 s … 116.266 s 2 runs Summary dune clean; dune runtest --cache=disabled ran 1.10 ± 0.03 times faster than cd ../scm-prev; dune clean; dune runtest --cache=disabled ``` Signed-off-by: Edwin Török <[email protected]>
`sexpr` is now fully thread safe without having to use locks, doesn't need to depend on threadext. `gen_api_main` can use the external `uuidm` module directly, without waiting for internal one to be built. `dune-build-info` is only needed by xapi_version. `xapi-stdext-unix` is not needed in `xapi-idl` The sexplib ppx runtime also doesn't need to be linked in some libraries that do not use it anymore, and where it is used it'll be automatically linked. Signed-off-by: Edwin Török <[email protected]>
Xapi_version depends on Build_info which can change on every commit. It is better to remove it from the dependencies of gen_api_main, especially that gen_api_main is on the critical path for discovering more dependencies. The 'xapi_user_agent' constant got moved to Xapi_version. Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
It'll be slower, but it can run a lot earlier in the build process. Compiling the datamodels takes time, but compiling them for bytecode is faster. Signed-off-by: Edwin Török <[email protected]>
Introduce a _minimal library, so we can start compiling server.ml earlier. Build time reduced from 80s to: ``` Benchmark 1: dune clean; dune build --cache=disabled Time (mean ± σ): 67.081 s ± 0.190 s [User: 326.847 s, System: 103.668 s Range (min … max): 66.946 s … 67.215 s 2 runs ``` Signed-off-by: Edwin Török <[email protected]>
Remove most sleeps, and reduce test duration from 5 seconds to 1. (If we do want to run a performance test we can increase these again) Run just a basic test for 0.1 seconds instead of a performance test for 5s by default. (can still be tweaked by overriding SECS) Signed-off-by: Edwin Török <[email protected]>
Instead of every 5s. Speeds up testing, and may speed up startup somewhat. And a connection try once every 0.5s won't create a lot of load on the system. (If needed we could implement some form of exponential backoff here). ``` Benchmark 1: dune clean; dune runtest --cache=disabled Time (mean ± σ): 97.642 s ± 0.933 s [User: 354.132 s, System: 113.436 s] Range (min … max): 96.982 s … 98.302 s 2 runsi ``` Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Reduce build/test times
These are errors in dune 3.15 and don't seem to be problematic Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Also make dune generate the opam metadata Signed-off-by: Pau Ruiz Safont <[email protected]>
Only tests need it to generate crypto keys, but it's needed to create the serial when signing certificates. Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
psafont
approved these changes
Jul 29, 2024
We want to run these from 'quicktest', so make them available as libraries, and add a _run.ml that would run them separately, just as before. (running separately in the CI is better, because it can be parallelize) No functional change. Signed-off-by: Edwin Török <[email protected]>
Quicktest runs in Dom0, and the existing quickcheck tests run in the CI. Some of these test as much the OCaml code, as its interaction with the system (e.g. behaviour of system calls). So it is better to run these tests both in the CI and in Dom0. We run these in long mode, to explore more randomly generated scenarios. The seed can be controlled with QCHECK_SEED environment variable. Similar to @StressTest it uses a random seed, instead of a fixed seed. Signed-off-by: Edwin Török <[email protected]>
CP-50448: run quickcheck tests in XenRT
lindig
approved these changes
Jul 29, 2024
gangj
merged commit Jul 30, 2024
e93b2a5
into
feature/update-backwards-compat
28 of 29 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.