Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze fixes and v1 kludges #2545

Open
wants to merge 2 commits into
base: criu-dev
Choose a base branch
from

Conversation

kolyshkin
Copy link
Contributor

1. freeze_processes: fix nr_attempts calculation

Commit 9fae23f grossly (by 1000x) miscalculated the number
of attempts required, as a result, we are seeing something like this:

(00.000340) freezing processes: 100000 attempts with 100 ms steps
(00.000351) freezer.state=THAWED
(00.000358) freezer.state=FREEZING
(00.100446) freezer.state=FREEZING
...close to 100 lines skipped...
(09.915110) freezer.state=FREEZING
(10.000432) Error (criu/cr-dump.c:1467): Timeout reached. Try to interrupt: 0
(10.000563) freezer.state=FREEZING

For 10s with 100ms steps we only need 100 attempts, not 100000.

While at it, add an error print in case we hit the timeout earlier than
i reaches nr_attempts.

2. freeze_processes: implement kludges for cgroup v1

Cgroup v1 freezer has always been problematic, failing to freeze a
cgroup.

In runc, we have implemented a few kludges to increase the chance of
succeeding, but those are used when runc freezes a cgroup for its own
purposes (for "runc pause" and to modify device properties for cgroup
v1).

When criu is used, it fails to freeze a cgroup from time to time
(see 1, 2). Let's try adding kludges similar to ones in runc.

Alas, I have absolutely no way to test this, so please review carefully.

Commit 9fae23f grossly (by 1000x) miscalculated the number
of attempts required, as a result, we are seeing something like this:

> (00.000340) freezing processes: 100000 attempts with 100 ms steps
> (00.000351) freezer.state=THAWED
> (00.000358) freezer.state=FREEZING
> (00.100446) freezer.state=FREEZING
> ...close to 100 lines skipped...
> (09.915110) freezer.state=FREEZING
> (10.000432) Error (criu/cr-dump.c:1467): Timeout reached. Try to interrupt: 0
> (10.000563) freezer.state=FREEZING

For 10s with 100ms steps we only need 100 attempts, not 100000.

While at it, add an error print in case we hit the timeout earlier than
i reaches nr_attempts.

Signed-off-by: Kir Kolyshkin <[email protected]>
Cgroup v1 freezer has always been problematic, failing to freeze a
cgroup.

In runc, we have implemented a few kludges to increase the chance of
succeeding, but those are used when runc freezes a cgroup for its own
purposes (for "runc pause" and to modify device properties for cgroup
v1).

When criu is used, it fails to freeze a cgroup from time to time
(see [1], [2]). Let's try adding kludges similar to ones in runc.

Alas, I have absolutely no way to test this, so please review carefully.

[1]: opencontainers/runc#4273
[2]: opencontainers/runc#4457

Signed-off-by: Kir Kolyshkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant