Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-52821: use Mtime in Xapi_periodic_scheduler #6161

Open
wants to merge 18 commits into
base: feature/perf
Choose a base branch
from

Conversation

edwintorok
Copy link
Contributor

Target: feature/perf

Note that this conflicts with 7e9310f?diff=unified&w=1, but is a pre-requisite of #6126

@robhoes
Copy link
Member

robhoes commented Dec 5, 2024

Isn't it better to rebase on master already and build on top of the scheduler changes there? Those rebase changes would have to be done later anyway, but then end up in a complicated and hard to review merge commit.

@psafont
Copy link
Member

psafont commented Dec 5, 2024

There's a further improvement here: replace all Mtime_clock.now () with Mtime_clock.elapsed() to remove the overflow treatment because the latter returns a span instead of a timestamp

@edwintorok
Copy link
Contributor Author

edwintorok commented Dec 6, 2024

Isn't it better to rebase on master already and build on top of the scheduler changes there? Those rebase changes would have to be done later anyway, but then end up in a complicated and hard to review merge commit.

I'll rebase this PR on top of master once Frediano's PR is merged.

@freddy77
Copy link
Collaborator

freddy77 commented Dec 6, 2024

There's a further improvement here: replace all Mtime_clock.now () with Mtime_clock.elapsed() to remove the overflow treatment because the latter returns a span instead of a timestamp

It's not clear what overflows you are talking about. Internally the timer is a counter since boot, that will overflow the int64 (although the numbers should be unsigned) in about 290 years. I personally find the code with Mtime_clock.elapsed() less readable having to deal with differences of differences (this span is less or more than this other span) instead of just comparing time (the deadline cross current time).

PS: @psafont are you referring to floating point? Maybe it would be sensible to consider a too big timeout (like 10 years) as an error?

@psafont
Copy link
Member

psafont commented Dec 6, 2024

It's not clear what overflows you are talking about.

One function returns an option in case of overflow, the other does not. Because, as you point out, the overflow won't happen in practise, I prefer to use the latter rather than the former.

I personally find the code with Mtime_clock.elapsed() less readable having to deal with differences of differences

I fundamentally disagree, timestamps are actually differences as well (between an epoch and a point in time), and the only practical difference is replacing is_{earlier,later} with is_{shorter,longer}. I'm fine with the improvement not being in this PR, it was only an observation

Test that the event is correctly executed.

Signed-off-by: Frediano Ziglio <[email protected]>
Do not use ">" or other operators to compare Mtime.t, the value is intended to
be unsigned and should be compared with Int64.unsigned_compare as Mtime
functions do.

Signed-off-by: Frediano Ziglio <[email protected]>
The parameter was false only to support an internal usage, external users
should always alert the thread loop.

Signed-off-by: Frediano Ziglio <[email protected]>
- Do not wait huge amount of time if the queue is empty but
  use Delay.wait if possible;
- Fix delete of periodic events. In case the event is processed
  it's removed from the queue. Previously remove_from_queue was
  not able to mark this event as removed;
- Do not race between checking the first event and removing it.
  These 2 actions were done in 2 separate critical section, now
  they are done in a single one.

Signed-off-by: Frediano Ziglio <[email protected]>
Potentially a periodic event can be cancelled while this is processed.
Simulate this behavior removing the event inside the handler.
This was fixed by previous commit.

Signed-off-by: Frediano Ziglio <[email protected]>
Previously if the queue was empty and the loop thread was active
the scheduler took quite some time to pick up the new event.
Check that this is done in a timely fashion to avoid regressions in code.

Signed-off-by: Frediano Ziglio <[email protected]>
@edwintorok edwintorok marked this pull request as draft December 9, 2024 11:04
@edwintorok
Copy link
Contributor Author

Moved this to draft until the referenced PR is merged to master as this will need a rebase. #6155

freddy77 and others added 2 commits December 9, 2024 14:50
Similar to test_empty test however the state of the loop should be different.

Signed-off-by: Frediano Ziglio <[email protected]>
For the scan retry, previously we were comparing the entire vdi data
structure from the database using the (<>) operator. This is a bit
wasteful and not very stable. Instead let us just compare the vdi refs,
since the race here comes from `VDI.db_{introduce,forget}`, which would
only add/remove vdis from the db, but not change its actual data
structure.

Also add a bit more logging when retrying, since this should not happen
very often.

Signed-off-by: Vincent Liu <[email protected]>
@edwintorok edwintorok marked this pull request as ready for review December 10, 2024 17:30
@edwintorok
Copy link
Contributor Author

Rebased on latest feature/perf (which in turn has been updated to include latest master).
Target is still feature/perf because there are more changes coming, and this needs a bit wider testing.

E.g. previously there was a +0.5 in the periodic scheduler, but now we just use Clock.Timer.remaining, so there could be some subtle differences in behaviour.

changlei-li and others added 4 commits December 11, 2024 17:41
When add leaked datapath:
1. add leaked datapath to Sr.vdis
2. write to db file
3. log enhance

If there are storage exceptions raised when destroying datapath,
the procedure fails and the state of VDI becomes incorrect,
which leads to various abnormalresults in subsequent operations.
To handle this, the leaked datapath is designed to redestroy the
datapath and refresh the state before next storage operation via
function remove_datapaths_andthen_nolock. But this mechanism
doesn't take effect in current code.
This commit is to fix this bug. Leaked datapath should be added
to Sr.vdis to make the leaked datapath really work. And write to
db file to avoid losing the leaked datapath if xapi restarts.

Signed-off-by: Changlei Li <[email protected]>
There was some issue in the current code, from structure corruption to
thread safety.
Add some test and fix discovered issues.

More details on commit messages.
When add leaked datapath:
1. add leaked datapath to Sr.vdis
2. write to db file
3. log enhance

If there are storage exceptions raised when destroying datapath,
the procedure fails and the state of VDI becomes incorrect,
which leads to various abnormalresults in subsequent operations.
To handle this, the leaked datapath is designed to redestroy the
datapath and refresh the state before next storage operation via
function remove_datapaths_andthen_nolock. But this mechanism
doesn't take effect in current code.
This commit is to fix this bug. leaked datapath should be added
to Sr.vdis to make the leaked datapath really work. And write to
db file to avoid losing the leaked datapath if xapi restarts.
For the scan retry, previously we were comparing the entire vdi data
structure from the database using the (<>) operator. This is a bit
wasteful and not very stable. Instead let us just compare the vdi refs,
since the race here comes from `VDI.db_{introduce,forget}`, which would
only add/remove vdis from the db, but not change its actual data
structure.

Also add a bit more logging when retrying, since this should not happen
very often.
Signed-off-by: Edwin Török <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants