Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update feature/perf from master #5993

Closed
wants to merge 489 commits into from
Closed

Update feature/perf from master #5993

wants to merge 489 commits into from

Conversation

edwintorok
Copy link
Contributor

@edwintorok edwintorok commented Sep 16, 2024

bernhardkaindl and others added 30 commits July 9, 2024 15:06
…n-python3

Improve spelling and fix typos in comments of Python scripts
- Removed templates debian and debug from Makefile

Signed-off-by: Ashwinh <[email protected]>
…rt-tabs-to-spaces

CP-49931 - Convert tabs to spaces and fix whitespace for scripts/xe-reset-networking:
…and-warningfixes

CP-50100: `backup-sr-metadata,restore-sr-metadata`: `2to3`, fix `pytype`, `pyright`
In `XenAPIPlugin.py`'s `dispatch()` function, SystemExit does not need to be
caught and raised because both other exceptions are subclasses of Exception:

By design, SystemExit is a subclass of BaseException and because we
are not catching BaseException and also not use a bare `except:`
here, we can cleanup catching and re-raising `SystemExit()` here.

Reference: https://docs.python.org/3/library/exceptions.html#SystemExit

Signed-off-by: Bernhard Kaindl <[email protected]>
…c directory

- Modified python3/Makefile to include this change.
- Removed backup-sr-metadata.py from scripts/Makefile

Signed-off-by: Ashwinh <[email protected]>
…ec directory

- Modified python3 Makefile to include this change
- Removed restore-sr-metadata.py from scripts/Makefile
- Fixed bare-except exception pylint issue

Signed-off-by: Ashwinh <[email protected]>
…sions

Original (code supplied) by Bernhard Kaindl:
- Declare missing methods to python3/stubs/XenAPI.py for pyright
- Initialize variables to fix pyright:reportPossiblyUnboundVariable
- Applied isort

Signed-off-by: Ashwinh <[email protected]>
Signed-off-by: Bernhard Kaindl <[email protected]>
CP-49919: mv scripts/extensions/pool_update.precheck to python3/extensions
mv restore-sr-metadata.py & backup-sr-metadata.py to python3/
- Modified python3/Makefile to include this change
- Removed static-vdis from scripts/Makefile
- Modified test_static_vdis.py to include new location of the static-vdis

Signed-off-by: Ashwinh <[email protected]>
These example scripts were imported in 2009 and are obsoleted by samples like:
https://github.com/xapi-project/xen-api-sdk/blob/master/python/samples/powercycle.py

- The sample xen-api-sdk/python/samples/powercycle.py is much better.
- They are not installed and not otherwise mentioned in the repository.

Signed-off-by: Bernhard Kaindl <[email protected]>
Fix pyright:
    - Rework unclean exception handling using a contextmanager.
    - Declare address and master in case of an Exception as well.
    - Fix warning on unsupported escape sequence using a raw string.
    - Removed python2 script check from pyproject.toml

Signed-off-by: Bernhard Kaindl <[email protected]>

Signed-off-by: Ashwinh <[email protected]>
CP-49931: mv `scripts/xe-reset-networking` to `python3/bin`
…l/ocaml_backend/python

CP-47869: Remove ocaml/idl/ocaml_backend/python: remove obsolete example scripts:
- Could not find any use of installation of them
CP-49900: Removed templates folder from python3/
…rm-py3

Py3: Cleanup `test_mail-alarm.py` to use python3 test helpers and move it
…si-iqn

`scripts/generate-iscsi-iqn`: Update the inline Python script for Python3
Instead of fetching a list of enabled tracers from the locked hashtable everytime,
look it up directly.
Change the 'observer' global boolean to an Atomic.t that stores the currently enabled TracerProvider
(if any, a [no_op] tracer that is disabled otherwise).
Now 'Tracer.get_tracer' can run without allocating memory, and without having to take any locks.

When the tracer gets created/destroyed/enabled/disabled we walk the entire list and find the current tracer, if any,
and set the current tracer appropriately.
This is O(tracerproviders), but we only support tracerproviders=0, or tracerproviders=1, and this is a rare  operation,
that happens on startup or the first time the tracer gets enabled.

tracing/overhead(on, create span) with workloads:
```
before: 14940.958498 ns/run (confidence: 16363.887949 to 13856.836570)
after: 11975.740184 ns/run (confidence: 12710.262010 to 11463.052415)
```

Signed-off-by: Edwin Török <[email protected]>
We only need to traverse this once, in a direct order.

without workload:
before: 25172.588238 (confidence: 38289.717741 to 14519.713302);
after: 15774.959559 ns/run (confidence: 23866.336958 to 9640.244697);
with workload:
before: 14940.958498 ns/run (confidence: 16363.887949 to 13856.836570)
after: 11028.540184 ns/run (confidence: 11687.173016 to 10454.879814);

Signed-off-by: Edwin Török <[email protected]>
Instead of constructing intermediate seq elements, directly fold the new attribute list into the old set.

Signed-off-by: Edwin Török <[email protected]>
Tracing shouldn't use locks, because we use it to find hot spots in the application.
And introducing a single global lock might introduce lock contention that wouldn't otherwise be present.

Use atomic operations on immutable data structures instead.
Although a Map is O(log n), and not O(1) like a hashtable, it doesn't require holding any locks to traverse it,
or update it.
We only need to do an atomic compare-and-set operation once we've finished updating it, and if we raced with anyone
 (unlikely on OCaml4, unless you got interrupted by the tick thread), then try again with a backoff.

We shouldn't hand roll atomic data structures like this, but instead use Saturn (Skip lists), or Kcas (which has a
generic [update] function that implements the above method and also works on OCaml 5).

before: 3827.092437 ns/run (confidence: 4275.705106 to 3550.511099);
after:  2727.247326 ns/run (confidence: 3019.854167 to 2582.316754);

Note: when benchmarking ensure that /sys/devices/system/clocksource/clocksource0/current_clocksource is set to TSC.
If set to Xen, then reading timestamps is very slow.

Signed-off-by: Edwin Török <[email protected]>
A small improvement, but could be more on deep span trees.

before:2727.247326 ns/run (confidence: 3019.854167 to 2582.316754)
after: 2590.000000 ns/run(confidence: 3362.115942 to 2068.393072);

Signed-off-by: Edwin Török <[email protected]>
And print the failed ID.

Signed-off-by: Edwin Török <[email protected]>
The following environment variables can now be used by tests:
* QCHECK_LONG_FACTOR (default 10): multiplies the default iteration count of 100
* QCHECK_SEED: an integer seed (the tests print their current seed on startup otherwise, can be used to reproduce a failure)

These environment variables are already supported by the QCheck framework, we just need to use them.

Signed-off-by: Edwin Török <[email protected]>
psafont and others added 18 commits September 16, 2024 13:36
On behalf of Gang Ji <[email protected]>, I, Pau Ruiz Safont <[email protected]>, hereby add my Signed-off-by to this commit: e40fe60

Signed-off-by: Pau Ruiz Safont <[email protected]>
There's a commit on master that makes the DCO check to fail on all
master commits. Since the issue seems to be an email mismatch, we can
fix this easily.
The following environment variables can now be used by tests:
* QCHECK_LONG_FACTOR (default 10): multiplies the default iteration
count of 100
* QCHECK_SEED: an integer seed (the tests print their current seed on
startup otherwise, can be used to reproduce a failure)

These environment variables are already supported by the QCheck
framework, we just need to use them.
Last year, an argument to determine the group id of the certificates
created
was added. This broke test clients, and there's no need no make it
compulsory
since there's a default value: -1.

Modify the binary so it can accept only 3 arguments like before, and
change the
help output to mention the group id.

This makes quite a few internal certificate tests to pass compare the
previous suite run 204647 to the new 204840
This allows to maintain sub-second precision, and only convert to datetimes
when converting to strings.

Now timezone conversion is more explicit. Datetimes without timezone are
assumed to be UTC. While rare in practice, this is not safe in general, so
print a warning to stdout whenever this happens.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Datetimes received as call parameters can lack timezone. In that case the host
assumed the datetime is in UTC. This is not always safe to do, but returning an
error will break clients. Instead keep doing the assumption and enforce proper
documentation of the parameters in the datamodel.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Previously the dates without timezone were assumed to use UTC when being
converted to unix time, but the datatype maintained the lack of timezone when
being printed. This means that the problematic timezoneless timestamps were
kept and could be produced by the hosts.

Since the dates are assumed to be UTC, add the timezone when they are parsed
for the first time. Now their timezone is displayed at all times.

The output of Localtime is kept without a timezone because clients are not
prepared to accept arbitrary timezones in dates. (Both the SDK and XO(Lite))

Signed-off-by: Pau Ruiz Safont <[email protected]>
OCamlformat will correctly format new features only when it's aware the OCaml
version it's outputting provides them. This allows to improve formatting for
let-punning, punned labelled arguments, and more:
https://github.com/ocaml-ppx/ocamlformat/blob/main/CHANGES.md#added-8

Signed-off-by: Andrii Sultanov <[email protected]>
…ion (#5950)

The Date module has a very convoluted way to deal with timezones because
of its historic clients. While we can't remove all the issues, we can
remove most of them.
- Dates (timestamps, really) without timezones do not contain a frame of
reference, and hence placing them in the UTC timeline can't be done
accurately in the general case. This means that comparing timestamps
gives incorrect results.
- XMLRPC enforces dates to be encoded in ISO8601format. This means
timestamps can lack a timezone.
- Xapi uses xmlrpc's dates, adding this unfortunate limitation.
- While Date allows these timestamps, it assumes that these are in the
UTC timezone. On top of that, it refuses to process any timestamps that
are in any timezone other than UTC (`Z`)
- The Date module tries really hard to hide the timezone of timestamps
that lack it when printing them. This means that timezoneless timestamps
can persist in the database, for no good reason, as they are treated as
UTC timestamps.
- Host.localtime is the only call that returns timezoneless timestamps,
all the other calls correctly return timestamps in the UTC timezone.
Because the call on purpose does not want to return the current time in
UTC, changing this might break clients not ready to process any
timezone, mainly SDK-built ones
#5802
- Dates are stored as a tuple of date, hour, minutes, seconds. This has
very limited precision, which might be unexpected.

This PR does the following mitigation / fixes:
- Document all calls with Datetimes as parameters that the timestamps
will be interpreted as UTC ones if they miss the timezone.
- Remove the limitation to process any valid timezone
- Timezoneless timstamps are immediately processed as UTC timestamps, as
refusing these timestamps can break clients.

Issues that the PR does not fix
- Host.localtime produces timestamps without timezones, this is needed
as adding non-zero timestamps breaks the SDK clients.
- The server does not reject timezoneless timestamp, this might break
migrations, RPUs, or normal clients, so I've held back on this change.
- Printed timestamp do _not_ retain sub-second precision

Drafting as tests are ongoing
OCamlformat will correctly format new features only when it's aware the
OCaml version it's outputting provides them. This allows to improve
formatting for let-punning, punned labelled arguments, and more:
https://github.com/ocaml-ppx/ocamlformat/blob/main/CHANGES.md#added-8
max-spans=1000 is small for a VM.start, now that we have more instrumentation.
With the other optimizations in this series of commits it should now be safe to increase the number of spans,
we no longer perform O(n^2) operations on them.

Signed-off-by: Edwin Török <[email protected]>
clock now uses xapi-log, so it needs to have the dependency declared in opam

Signed-off-by: Pau Ruiz Safont <[email protected]>
clock now uses xapi-log, so it needs to have the dependency declared in
opam
We notice that required CI checks cancel themselves even when attempting to merge a single PR at a time.
That is probably because there are CI jobs run on both 'push' and 'merge_group'.

Try to make the concurrency group more unique by adding the github event name to the group key.

Signed-off-by: Edwin Török <[email protected]>
We notice that required CI checks cancel themselves even when attempting
to merge a single PR at a time. That is probably because there are CI
jobs run on both 'push' and 'merge_group'.

Try to make the concurrency group more unique by adding the github event
name to the group key.
We've noticed some delays when tracing is enabled, and suspected it is
due to lock contention or GC activity created by the tracing module
itself.

I've added some benchmarks to measure the overhead of tracing, and made
some improvements (more improvements are possible).
The benchmarks test both a situation with no workload, and a situation
with 2 other threads: one running the same workload as the function
under test, and another that runs a very GC intensive workload.

Improved:
* when max spans limit is hit: don't spam syslog. Time goes down from
202 *milli*seconds to 14 *micro*seconds per span
* reduce lock contention: eliminate one global lock, replace with atomic
ops
* reduce allocation

We are now at ~9µs/nested span on this particular machine, but further
improvements are possible by delaying work to the exporter thread and
finishing really quickly otherwise (e.g. upstream ocaml-trace claims
~60ns/span).

When tracing is disabled the overhead is very small, just like before
(on the order of 20ns without a workload).

Before:
```
Running benchmarks (no workloads)
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  tracing/max span overflow          │             0.0000 mjw/run│          1187.0000 mnw/run│       96214066.4652 ns/run│
│  tracing/overhead(off)              │             0.0000 mjw/run│            14.0000 mnw/run│             19.0061 ns/run│
│  tracing/overhead(on, create span)  │             2.9250 mjw/run│           496.3888 mnw/run│          25431.5210 ns/run│
│  tracing/overhead(on, no span)      │             0.0000 mjw/run│            14.0000 mnw/run│             18.8063 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯

tracing/overhead(on, no span) (ns):
 { monotonic-clock per run = 18.806282 (confidence: 20.597860 to 16.986898);
   r² = Some 0.63699 }
tracing/max span overflow (ns):
 { monotonic-clock per run = 96214066.465201 (confidence: 125336685.666667 to 70136316.380165);
   r² = Some 0.518256 }
tracing/overhead(off) (ns):
 { monotonic-clock per run = 19.006133 (confidence: 20.619355 to 17.390492);
   r² = Some 0.667542 }
tracing/overhead(on, create span) (ns):
 { monotonic-clock per run = 25431.521030 (confidence: 37607.254799 to 15151.227932);
   r² = Some 0.0329005 }

Running benchmarks (workloads)
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  tracing/max span overflow          │         80934.7363 mjw/run│      10580588.4725 mnw/run│      202271848.5714 ns/run│
│  tracing/overhead(off)              │             0.0000 mjw/run│            16.5479 mnw/run│            212.0958 ns/run│
│  tracing/overhead(on, create span)  │             0.0000 mjw/run│           503.4400 mnw/run│          15633.2400 ns/run│
│  tracing/overhead(on, no span)      │             0.0000 mjw/run│            18.3256 mnw/run│            326.9568 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯

tracing/overhead(on, no span) (ns):
 { monotonic-clock per run = 326.956811; r² = Some -9.40099 }
tracing/max span overflow (ns):
 { monotonic-clock per run = 202271848.571429; r² = Some 0.065489 }
tracing/overhead(off) (ns):
 { monotonic-clock per run = 212.095829; r² = Some -4.46794 }
tracing/overhead(on, create span) (ns):
 { monotonic-clock per run = 15633.240000; r² = Some 0.805726 }
```

After:
```
Running benchmarks (no workloads)
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  tracing/max span overflow          │             0.0964 mjw/run│           379.0196 mnw/run│          14479.0371 ns/run│
│  tracing/overhead(off)              │             0.0000 mjw/run│            14.0000 mnw/run│             18.2174 ns/run│
│  tracing/overhead(on, create span)  │             0.0000 mjw/run│           397.3440 mnw/run│          14712.1307 ns/run│
│  tracing/overhead(on, no span)      │             0.0000 mjw/run│            14.0000 mnw/run│             18.1771 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯

tracing/overhead(on, no span) (ns):
 { monotonic-clock per run = 18.177093 (confidence: 19.990180 to 16.374104);
   r² = Some 0.600249 }
tracing/max span overflow (ns):
 { monotonic-clock per run = 14479.037101 (confidence: 15625.053708 to 13293.667720);
   r² = Some 0.635766 }
tracing/overhead(off) (ns):
 { monotonic-clock per run = 18.217390 (confidence: 19.845841 to 16.442527);
   r² = Some 0.642373 }
tracing/overhead(on, create span) (ns):
 { monotonic-clock per run = 14712.130670 (confidence: 23416.444172 to 7426.891454);
   r² = Some 0.0476263 }

Running benchmarks (workloads)
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  tracing/max span overflow          │             0.0000 mjw/run│           380.4429 mnw/run│           7658.2207 ns/run│
│  tracing/overhead(off)              │             0.0000 mjw/run│            15.3007 mnw/run│            102.2479 ns/run│
│  tracing/overhead(on, create span)  │             0.0000 mjw/run│           400.5094 mnw/run│           8657.5585 ns/run│
│  tracing/overhead(on, no span)      │             0.0000 mjw/run│            18.1333 mnw/run│            373.9794 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯

tracing/overhead(on, no span) (ns):
 { monotonic-clock per run = 373.979447 (confidence: 435.400802 to 336.056569);
   r² = Some -1.84338 }
tracing/max span overflow (ns):
 { monotonic-clock per run = 7658.220695 (confidence: 7952.878804 to 7396.117711);
   r² = Some 0.950954 }
tracing/overhead(off) (ns):
 { monotonic-clock per run = 102.247932 (confidence: 119.364768 to 89.830417);
   r² = Some -0.607146 }
tracing/overhead(on, create span) (ns):
 { monotonic-clock per run = 8657.558458 (confidence: 8904.596470 to 8435.216348);
   r² = Some 0.956299 }
```

Draft PR, this was only unit tested so far.
@edwintorok
Copy link
Contributor Author

Continued in #6000 to fix merge conflict, github doesn't allow me to edit the origin repo, just the branch.

@edwintorok edwintorok closed this Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.