Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update feature branch #6120

Merged
merged 20 commits into from
Nov 18, 2024
Merged

Update feature branch #6120

merged 20 commits into from
Nov 18, 2024

Conversation

edwintorok
Copy link
Contributor

No description provided.

Vincent-lau and others added 20 commits October 25, 2024 14:10
There is no remote invocation of this function, so should be safe.

Signed-off-by: Vincent Liu <[email protected]>
Rather than copy and copy', rename them into `copy_into_sr` and
`copy_into_vdi`, which is what they actually do.

Signed-off-by: Vincent Liu <[email protected]>
Also make the logging information more verbose.

Signed-off-by: Vincent Liu <[email protected]>
- Introduce new `MigrateLocal` and `MigrateRemote` modules which contains
  the main implementations of the migration logic.
- Move the actual exposed SMAPIv2 functions in one place towards the end
  of the file, rather than spreading across the entire file.

These refactoring should all be statically verifiable by the compiler.

Signed-off-by: Vincent Liu <[email protected]>
Previously a task was constructed based on the log and tracing of a dbg
of the type Debug_info.t, and then later on a dbg is constructed based
on the previously constructued task. Instead of that, just convert the
first dbg into a string and thread it through the call.

Signed-off-by: Vincent Liu <[email protected]>
Now we got winbind as the Active Direcotry backend, the joined
netbios name is persisted in smb.conf, thus can be compatible with
hostname change

This commit just remove the set_hostname_live limitation

Signed-off-by: Lin Liu <[email protected]>
Reraise with the original stacktrace, this requires using the raw backtrace instead of the string one.

Signed-off-by: Edwin Török <[email protected]>
…6116)

Now we got winbind as the Active Direcotry backend, the joined netbios
name is persisted in smb.conf, thus can be compatible with hostname
change

This commit just remove the set_hostname_live limitation
Reraise with the original stacktrace, this requires using the raw
backtrace instead of the string one.

I had this change sitting in a branch since around Apr 25 2024, although
the branch itself might need more testing, this is a simple fix that we
should probably get in sooner.
Flame graphs indicate that, under load created by parallel "xe vm-list"
commands, the DB action get_record is hit often. This function
constructs an API-level record by marshalling an association list that
maps field names to unmarshalled string values. To do this, it serially
queries all the field names using List.assoc. This has rather large cost
in doing lexicographical string comparisons (caml_compare on string keys).

To avoid this, regardless of record size, we preprocess the association
lists __regular_fields and __set_refs into a (string, string) Hashtbl.t
and query that to construct each record field.

Signed-off-by: Colin James <[email protected]>
Rewrite the API.assocer function to internally construct a hash table
from an association list. Then, shadow its usage in relevant places.

Signed-off-by: Colin James <[email protected]>
Speed up "find" operation on Schema Tables and Database.

Previously, these would use List.find which involves costly
comparisons on string keys. Now, we use a hash table, keyed by names,
and use that to identify columns and tables.

Signed-off-by: Colin James <[email protected]>
Sync the metadata with upstream, which is not updated

Signed-off-by: Pau Ruiz Safont <[email protected]>
The function copy is being dropped because it's to_string, without labeled
arguments

Signed-off-by: Pau Ruiz Safont <[email protected]>
Sync the metadata with upstream, which is not updated

This fixes the CI issue due to cstruct version in xs-opam being
incompatible with the one requested by the opam metadata in this
repository
Flame graphs indicate that, under load created by parallel "xe vm-list"
commands, the DB action get_record is hit often. This function
constructs an API-level record by marshalling an association list that
maps field names to unmarshalled string values. To do this, it serially
queries all the field names using `List.assoc`. This has rather large
cost in doing lexicographical string comparisons (`caml_compare` on
string keys).

To avoid this, regardless of record size, we preprocess the association
lists `__regular_fields` and `__set_refs` into a (string, string)
Hashtbl.t and query that to construct each record field.

---

This benefit of this change is most notable for large records (such as
`VM` and `pool`). The cost of the previously generated code, which does
a bunch of serial `List.assoc` calls, incurs the quadratic cost of list
traversal (compounded by the costly lexicographical comparison of
strings during the search).

To produce measurements, I sampled xapi under a load of 500 consecutive
`xe vm-list` invocations (using the same sampling rate with `perf`) on a
host with a single VM.

Without the change, the `get_record` done internally by `xe vm-list`
makes up for ~33.59% of the samples (33.59% = 1,592,782,255 samples).
With the change, `get_record` accounts for ~7.56 of the samples (of
which there are substantially fewer collected: 7.56% = 264,948,239). So
the number of samples for `get_record` has dropped from 1,592,782,255 to
264,948,239 (assuming `perf`'s sampling is reliable).

You can see the visual difference in the flame graphs:

Before:

![{B1FEFF3A-AD91-478B-A828-89DCD19C2BEA}](https://github.com/user-attachments/assets/b7a4d504-3894-4f34-8dbe-b63de0e1d88c)

After:

![{CB28FFEC-944D-4F26-918A-FFB57E4875A3}](https://github.com/user-attachments/assets/fa517307-deb6-4a49-9a18-8217deb38734)

Of course, this is benefit as measured in aggregate (`perf` sampling),
so quite a fast and loose comparison. In practice, the `xe vm-list`
stress test goes from 7.4s to 6.2s (as `get_record` makes up only a
small part of the work done for a single `xe vm-list`).
The Storage migration code is too messy to be maintainable, making the
developing experience quite bad. This is the first attempt to make it
less spaghetti.

I am intending to do this in several stages. This PR is just for those
that are mostly "safe" operations that can probably be checked
statically by the compiler. More risky changes on the way...

Best reviewed by commit.
@edwintorok edwintorok merged commit b6952d6 into feature/perf Nov 18, 2024
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants