Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSOC] udp: Add support for checkpoint/restore UDP socket's queues #1571

Draft
wants to merge 322 commits into
base: criu-dev
Choose a base branch
from

Conversation

minhbq-99
Copy link
Member

In these patches, I implement

  • A new protobuf image for storing dumped UDP packets. The image has the
    following format: UdpQueueEntry at the beginning for storing queues'
    information, each packet has UdpPacketEntry at the beginning for storing
    address information and packet's length followed by packet's data.

  • Checkpoint/restore support for UDP send queue via the newly proposed
    UDP_REPAIR in the Linux kernel. Here is the UDP_REPAIR patch
    https://gist.github.com/minhbq-99/256be103c96aa939c75e05a009e6fc1c

  • Checkpoint/restore support for UDP receive queue. When checkpointing,
    packets are peeked from the socket. When restoring, a temporary socket is
    used to send these dumped packet. The source IP addresses of these are
    modified to matched the IP addresses when checkpointing by using
    IP_PKTINFO and IP_TRANSPARENT. However, the source port cannot be
    restored in some cases and the packet will be dropped.

rst0git and others added 30 commits January 12, 2021 20:47
Signed-off-by: Radostin Stoyanov <[email protected]>
musl defines 'loff_t' in fcntl.h as 'off_t'.
This patch resolves the following error when running the compel tests
on Alpine Linux:

gcc -O2 -g -Wall -Werror -c -Wstrict-prototypes -fno-stack-protector -nostdlib -fomit-frame-pointer -ffreestanding -fpie -I ../../../compel/include/uapi -o parasite.o parasite.c
In file included from ../../../compel/include/uapi/compel/plugins/std/syscall.h:8,
                 from ../../../compel/include/uapi/compel/plugins/std.h:5,
                 from parasite.c:3:
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:19:66: error: unknown type name 'loff_t'; did you mean 'off_t'?
   19 | extern long sys_pread (unsigned int fd, char *buf, size_t count, loff_t pos) ;
      |                                                                  ^~~~~~
      |                                                                  off_t
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:96:46: error: unknown type name 'loff_t'; did you mean 'off_t'?
   96 | extern long sys_fallocate (int fd, int mode, loff_t offset, loff_t len) ;
      |                                              ^~~~~~
      |                                              off_t
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:96:61: error: unknown type name 'loff_t'; did you mean 'off_t'?
   96 | extern long sys_fallocate (int fd, int mode, loff_t offset, loff_t len) ;
      |                                                             ^~~~~~
      |                                                             off_t
make[1]: *** [Makefile:32: parasite.o] Error 1

Signed-off-by: Radostin Stoyanov <[email protected]>
gcc -O2 -g -Wall -Werror -I ../../../compel/include/uapi -o spy spy.c ../../../compel/libcompel.a
spy.c: In function ‘check_pipe_ends’:
spy.c:107:2: error: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Werror=unused-result]
  107 |  write(wfd, "1", 2);
      |  ^~~~~~~~~~~~~~~~~~
spy.c:108:2: error: ignoring return value of ‘read’, declared with attribute warn_unused_result [-Werror=unused-result]
  108 |  read(rfd, aux, sizeof(aux));
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Radostin Stoyanov <[email protected]>
Improve the check to skip moving service fds in clone_service_fd because
we don't need to move anything if old service fds resulting offset is
the same as new service fds resulting offset. It saves us from excess
calls to dup/fcntl(F_DUPFD).

Currently we check that base is the same and shared fdt ids are the
same, but there also can be situations where different bases with
different shared fdt ids still give the same offset sum (running zdtm in
Virtuozzo CT we've seen such a case where service_fd_base=512,
new_base=128, service_fd_id=24, id=0, SERVICE_FD_MAX=16).

Signed-off-by: Pavel Tikhomirov <[email protected]>
We need this to avoid conflicts with file descriptors which has to be
restored. Currently open_proc PROC_SELF is not used during restoring
file descriptors, but we are going to use it for memfd restore.

Note: in get_proc_fd let's not close service fd if we detect service fd
is not ours, it will be replaced in open_pid_proc anyway, and e.g. in
protected sfd context we can't close or open sfd, but can replace it
without any problems.

While on it also add FIXME because the check in get_proc_fd is error
prone in some rare cases (nested pidns is not supported yet).

We need to populate this new SELF service fd in populate_pid_proc, so
that it is later available in protected context. Also don't close
/proc/self service fd in prep_unix_sk_cwd as it can be called from
protected context and there is no much point of closing it anyway.

Close proc self servicefd in close_old_fds, because first we don't wan't
to reuse it from some ancestor, second there can be some junk fd as we
are yet only in the beginning of close_old_fds. This junk fd can come
from service fds of other tasks from parent's shared fdtable, and this
fd would not allow us to do opendir_proc(PROC_SELF).

Signed-off-by: Pavel Tikhomirov <[email protected]>
This looks better for me, should be no functional change.

Another implication of this is nested pid namespaces, when we will
support them "__open_proc(getpid()...)" will try to open file of the
process which has the same pid but in NS_ROOT pidns, which is bad.

See also aa2d920 ("files: use PROC_SELF when a process accesses its
/proc/PID") for a similar change.

Signed-off-by: Pavel Tikhomirov <[email protected]>
 criu/cr-restore.c:3230:3: note: Value stored to 'ret' is never read
                 ret = false;
                 ^     ~~~~~
  3228|   	if (n == -1) {
  3229|   		pr_perror("Failed to get number of supplementary groups");
  3230|-> 		ret = false;
  3231|   	}
  3232|   	if (n != n_groups)

Signed-off-by: Adrian Reber <[email protected]>
…gative

criu/fdstore.c:110: negative_return_fn: Function "get_service_fd(FDSTORE_SK_OFF)" returns a negative number.
criu/fdstore.c:110: assign: Assigning: "sk" = "get_service_fd(FDSTORE_SK_OFF)".
criu/fdstore.c:114: negative_returns: "sk" is passed to a parameter that cannot be negative.

criu/namespaces.c:1366: negative_return_fn: Function "get_service_fd(USERNSD_SK)" returns a negative number.
criu/namespaces.c:1366: assign: Assigning: "sk" = "get_service_fd(USERNSD_SK)".
criu/namespaces.c:1389: negative_returns: "sk" is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Stats often call 'get_timing' function which has a BUG() assertion to
catch cases when stats failed to initialize correctly.
If stats haven't initialized yet assertion will also be triggered.
We dont want the trigger to happen in a case when criu fails at early
steps before initializing stats, but this can happen in the following
case:
  - at cr_dump_tasks criu can catch error before the call to init_stats.
  - it then decides to gracefully quit with error and
    calls cr_dump_finish.
  - cr_dump_finish will call timing_stop -> get_timing
    and BUG() gets triggered

But because criu is already quitting gracefully, BUG() is not needed.
In this code path we can call timing_stop under proper condition.

[avagin: rebase on top of criu-dev and a few minor changes]

Signed-off-by: Andrei Vagin <[email protected]>
In case get_clean_mnt fails open_mountpoint is still able to resolve mounts
by helper process or print error in the worst case. Using pr_warn instead of
pr_perror.

Signed-off-by: Andrey Zhadchenko <[email protected]>
mnt_is_dir is used when looking up for suitable mount point. In some cases
that function may fail several times. Error level seems to strict for this
cases.
Added error message to lookup_mnt_sdev in case all mnt_is_dir failed.
As for open_handle and alloc_openable which are calling mnt_is_dir, they are
used in check_open_handle, which will call error afterwards.

Adjusted log level for __open_mountpoint result in open_handle since it is
allowed to fail several times. open_handle caller get_mark_path expect
possible failure and will print error in case.

Signed-off-by: Andrey Zhadchenko <[email protected]>
open_handle and first part of alloc_openable do the same work. Both these
function are called from check_open_handle. Rework check_open_handle to call
only alloc_openable.

Signed-off-by: Andrey Zhadchenko <[email protected]>
With Travis dramatically reducing the minutes available for CI, CRIU
needs a new place to run tests. This moves the Vagrant based Fedora 32
no VDSO test cases to Cirrus CI. Cirrus CI seems to be one of the very
few free CI services allowing access to /dev/kvm.

Signed-off-by: Adrian Reber <[email protected]>
1) Let's do test_init earlier so that max_nr test_msg is now visible in
thread-bomb.out.

2) Also lets check errors from malloc and pthread_...  functions, print
messages about their errors and do proper deallocation at least.

Signed-off-by: Pavel Tikhomirov <[email protected]>
Besides Travis CI Drone CI seems to be only service providing ARM based
builds. This switches the aarch64 and arm32 builds to drone.io.

Because Drone CI is running in a Docker container we cannot use 'setarch
linux32' as it requires the blocked syscall 'personality(2)'.

But Drone CI provides an 'arch: arm' which gives the same architecture
as 'setarch linux32' on Travis aarch64: armv8l

Signed-off-by: Adrian Reber <[email protected]>
Running in an environment with clang and without gcc even installed
does not work as compel-host-bin uses HOSTCC which defaults to gcc.

If CLANG=1 is set this also sets HOSTCC to clang to actually build
compel-host-bin with clang and not with gcc.

Signed-off-by: Adrian Reber <[email protected]>
The commas need to be inside of the quotes. Not on the outside.

Signed-off-by: Adrian Reber <[email protected]>
The updates to the latest Vagrant version and from Fedora 32 to 33.

Also using --no-tty instead of > /dev/null for vagrant up.

Also run 'dnf upgrade -y' in out vagrant VM to get the latest kernel.

Signed-off-by: Adrian Reber <[email protected]>
To run Fedora Rawhide based aarch64 containers on Drone CI our current
Dockerfile setup does not work.

This moves the package installation out of the Dockerfile into
scripts/ci/prepare-for-fedora-rawhide.sh to be usable in the Dockerfile
environment and in the Drone CI environment.

Signed-off-by: Adrian Reber <[email protected]>
This moves Fedora Rawhide based tests away from Travis. To Github
Actions for x86_64 and to Drone for aarch64.

Signed-off-by: Adrian Reber <[email protected]>
When I compile criu with "make DEBUG=1" and run it to restore my
program, it produces a segmentation fault.

In aarch64, with compile flag "-O0", when criu executes the code in pie,
it is unable to visit the content of ARCH_VDSO_SYMBOLS. So I put these
variables into the stack.

Signed-off-by: anatasluo <[email protected]>
@minhbq-99 minhbq-99 force-pushed the udp_dump branch 2 times, most recently from 48a012e to d82680e Compare August 9, 2021 14:32
adrianreber and others added 3 commits August 9, 2021 09:24
Jenkins test runs are failing with:

 ./test/jenkins/run_ct ./test/jenkins/crit.sh
 ./test/jenkins/crit.sh: 3: source: not found

Switch to bash which has 'source'.

Signed-off-by: Adrian Reber <[email protected]>
On Virtuozzo7 jenkins we see a fail of criu-dev zdtm:

  ===================== Run zdtm/static/pthread_timers in ns =====================
  Start test
  ./pthread_timers --pidfile=pthread_timers.pid --outfile=pthread_timers.out
  Run criu dump
  =[log]=> dump/zdtm/static/pthread_timers/112/1/dump.log
  ------------------------ grep Error ------------------------
  (00.004817) netlink: Collect netlink sock 0x1cad6e21
  (00.004821) netlink: Collect netlink sock 0x1cad6e22
  (00.004831) Collecting pidns 9/112
  (00.004886) No parent images directory provided
  (00.004903) Warn  (criu/lsm.c:328): don't know how to suspend LSM 0
  ------------------------ ERROR OVER ------------------------
  Run criu restore
  4: Old maps lost: set([])
  4: New maps appeared: set([u'7fe4c54ca000-7fe4c54cb000 ---p', u'7fe4c0000000-7fe4c0021000 rw-p', u'7fe4c0021000-7fe4c4000000 ---p', u'7fe4c54cb000-7fe4c5ccb000 rw-p'])
  ############# Test zdtm/static/pthread_timers FAIL at maps compare #############

https://ci.openvz.org/job/CRIU/job/CRIU-virtuozzo/job/criu-dev/8032/consoleFull

First thing to mention is that this is not related to criu. I can manage
to reproduce it with "--nocr", problem is that some mapping appears a
bit later when we do pre-cr get_visible_state().

By debugging SIGEV_THREAD thread with gdb I can see that addresses from
this unexpectedly appearing mapping are used by glibc here as "struct
pthread *pd":

 clone()
  start_thread()
   timer_helper_thread()
    __pthread_create_2_1()

So the mapping looks allocated by allocate_stack(), and it is only
gets done after first timer trigger (we have glibc-2.17 on vz7):

https://github.com/bminor/glibc/blob/release/2.17/master/nptl/sysdeps/unix/sysv/linux/timer_routines.c#L92

So let's wait at least 1 timer trigger so that memory outfit of the test
become permanent and our check_visible_state zdtm check would not be
false negative.

Signed-off-by: Pavel Tikhomirov <[email protected]>
On systems where there is no build_id we get a warning for each run

 Warn  (criu/files-reg.c:1458): Couldn't find the build-id note for file with fd 15

Change the log level to debug for this message as the file just does not have
a build_id and printing a warning clutters the CI logs.

Signed-off-by: Adrian Reber <[email protected]>
criu/include/magic.h Outdated Show resolved Hide resolved
criu/sk-udp.c Outdated Show resolved Hide resolved
@avagin
Copy link
Member

avagin commented Aug 10, 2021

  • Checkpoint/restore support for UDP send queue via the newly proposed UDP_REPAIR in the Linux kernel.

Have you sent these patches to lkml? Could you update https://criu.org/Upstream_kernel_commits?

@avagin avagin requested a review from mihalicyn August 10, 2021 07:27
criu/sk-udp.c Outdated Show resolved Hide resolved
@minhbq-99
Copy link
Member Author

  • Checkpoint/restore support for UDP send queue via the newly proposed UDP_REPAIR in the Linux kernel.

Have you sent these patches to lkml? Could you update https://criu.org/Upstream_kernel_commits?

I have just sent it, I will update the wiki when I get the requested account

criu/cr-check.c Outdated Show resolved Hide resolved
criu/sk-udp.c Outdated Show resolved Hide resolved
criu/sk-udp.c Outdated Show resolved Hide resolved
criu/sk-udp.c Outdated Show resolved Hide resolved
In this patch, I implement

- A new protobuf image for storing dumped UDP packets. The image has the
following format: UdpQueueEntry at the beginning for storing queues'
information, each packet has UdpPacketEntry at the beginning for storing
address information and packet's length followed by packet's data.

- Checkpoint/restore support for UDP send queue via the newly proposed
UDP_REPAIR in the Linux kernel.

- Checkpoint/restore support for UDP receive queue. When checkpointing,
packets are peeked from the socket. When restoring, a temporary socket is
used to send these dumped packet. The source IP addresses of these are
modified to matched the IP addresses when checkpointing by using
IP_PKTINFO and IP_TRANSPARENT. However, the source port cannot be
restored in some cases and the packet will be dropped.

Signed-off-by: Bui Quang Minh <[email protected]>
@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@avagin avagin added no-auto-close Don't auto-close as a stale issue and removed stale-pr labels Sep 21, 2021
@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-auto-close Don't auto-close as a stale issue stale-pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.