Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with TCP Connection Restoration in Container Dump/Restore #2456

Closed
wjstk16 opened this issue Jul 30, 2024 · 6 comments
Closed

Issue with TCP Connection Restoration in Container Dump/Restore #2456

wjstk16 opened this issue Jul 30, 2024 · 6 comments

Comments

@wjstk16
Copy link

wjstk16 commented Jul 30, 2024

Description
I am attempting to perform a dump/restore of containers with active TCP connections. I have two containers, one running a Python-based TCP server and the other running a TCP client, both placed on the same node. I performed the dump using the Kubernetes kubelet API for both containers. The checkpoint files (tar files) were then converted to images using buildah and pushed to a repository. The restoration was attempted on a different server with —tcp-established options, not the original one, without encountering any errors during the dump/restore process (logs and runc configuration files are attached).

Issue
The restored server/client processes resume from their state at the time of the dump, but the TCP socket connections are not restored, causing the processes to hang while waiting for data from the socket. Specifically, they remain stuck at the PP value from the dump moment, waiting to receive data that never arrives. This issue occurs even if the containers are restored on the same node where the dump was performed.

Details:

  • Environment:
    • Kubernetes & CRIO-O v1.30.2
    • CRIU v3.19
    • TCP test container image
      • TCP Server: docker.io/wjstk16/base-image:2.0.1.2
      • TCP Client: docker.io/wjstk16/base-image:2.0.1.2
      • Restore Server: docker.io/wjstk16/checkpoint-tcpserver:2.0.5
      • Restore Client: docker.io/wjstk16/checkpoint-tcpclient:2.0.5
    • Dump and restore performed using kubelet API (https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/)
  • Observed Behavior:
    • TCP connections are not restored
    • Processes hang, waiting for data on the socket
  • TCP Socket Information:
    • The only difference observed is in the inode values before and after restoration
    • Other values remain consistent
    • Below is the TCP information from the TCP server container (left) and client container (right) before restoration.
Pasted Graphic * Below is the TCP information from the restored TCP server container (left) and client container (right). Pasted Graphic 1

Actions Taken:

  • Ensured the TCP server and client are binding to specific IPs
  • Used ipvlan type interfaces to ensure containers retain the same fixed IPs before and after restoration
  • Verified that the IP and port bound to the socket remain unchanged

@avagin Despite reviewing all relevant issues related to TCP, I am unable to successfully restore container-based TCP connections.
Is there any additional configuration required in my environment? Can I get any references or guidance that could help resolve this issue? Any assistance or pointers would be greatly appreciated.

Attachments:
dump.log
restore.log
runc.conf.txt
tcpserver.py & tcpclient.py Gist
tcp & restored yaml Gist

Thank you for your help.

@adrianreber
Copy link
Member

During implementation this has never been tested or thought about in CRI-O and Kubernetes. You are trying something completely unsupported and untested. Can you make it work without containers correctly? That would be the first step. Then maybe with just runc and maybe Podman. We know that it should work there.

@h2wonS
Copy link

h2wonS commented Jul 31, 2024

During implementation this has never been tested or thought about in CRI-O and Kubernetes. You are trying something completely unsupported and untested. Can you make it work without containers correctly? That would be the first step. Then maybe with just runc and maybe Podman. We know that it should work there.

@adrianreber Hi Adrian, I've just written a new issue with the TCP connection between remote host machines. (#2457) I don't know why the restored client doesn't work normally in remote hostmachine.

@wjstk16
Copy link
Author

wjstk16 commented Aug 1, 2024

@adrianreber, thank you for your response. As you suggested, I verified the TCP restoration at the host process level without containers and also checked the functionality using runc/podman. The host process dump/restore worked well within the same host, and I am currently testing it for cross-node migration. For container tests with podman, I changed the runtime from crun to runc, and by executing the commands below, I successfully tested both single-node and multi-node scenarios.

During my testing, I identified that the probable reason for the failure of TCP restoration in Kubernetes is that the pod is not stopped at the moment of checkpointing. When using podman for dumping/restoring, the container is stopped by default at the moment of the dump. For example, when dumping the client, the server's TCP connection is not terminated, and the TCP restoration works well. However, if I use the --leave-running option to dump the client and then manually stop the client container before attempting the restoration, a "connection reset by peer" error occurs. This happens because the TCP socket is closed when I manually stop the client container. Therefore, to achieve successful TCP restoration in Kubernetes, it seems necessary to stop the pod at the moment of the checkpoint, similar to the behavior in runc.

Currently, the Kubernetes checkpoint API does not seem to support specifying CRIU options, and it appears to operate with the default --leave-running option. Is there a way to configure this? Even trying to delete the pod immediately with the following command results in the socket being closed:

kubectl delete po simple-tcpclient --grace-period=0 --force

Since the Kubernetes API does not currently allow specifying CRIU options, I have been configuring these options in runc.conf as shown below. However, I have not found a way to stop the pod immediately during checkpointing.

runc.conf:
tcp-established
manage-cgroups=ignore
log-file=/var/log/criu.log

podman checkpoint/restore command:

podman run -d --name tcpserver --network ipvlan0 --ip 10.10.10.201 -p 8080:8080 docker.io/wjstk16/base-image:2.0.1.2 python3 tcpserver.py 10.10.10.201 8080
podman run -d --name tcpclient --network ipvlan0 --ip 10.10.10.202 docker.io/wjstk16/base-image:2.0.1.2 python3 tcpclient.py 10.10.10.201 10.10.10.202 8080
podman container checkpoint -k --tcp-established -e ckp.tar.gz tcpclient
podman container restore -i ckp.tar.gz

@adrianreber
Copy link
Member

Currently, the Kubernetes checkpoint API does not seem to support specifying CRIU options, and it appears to operate with the default --leave-running option.

Correct. The official Kubernetes use case is described in KEP-2008. "Forensic Container Checkpointing". Anything outside of the use case described in the KEP is unfortunately not supported.

Is there a way to configure this?

No. My recommendation would be to get involved in Kubernetes and help extending the checkpoint/restore use case by contributing a new KEP and also the corresponding code changes. What you need is just passing of additional parameters:

  • stop after checkpointing
  • support established TCP connections

Both things should be really simple code changes.

You could try to use action-scripts to install firewall rules to block any traffic to/from your destination port. Or install the firewall rules before checkpointing. This way you can make sure no FIN or RST packages can be transmitted.

You have to work around your unsupported use case somehow.

@wjstk16
Copy link
Author

wjstk16 commented Aug 6, 2024

I have successfully restored TCP connectivity in a Kubernetes environment after several experiments. The actions required for this were as follows:

  1. Add tcp-established and manage-cgroups=ignore to /etc/criu/runc.conf (create the file if it does not exist).
  2. Assign static IPs to the containers (PODs) and bind both the TCP server and client to static IPs (e.g., using macvlan, ipvlan, etc.).
  3. Immediately before checkpointing the container, block outgoing packets from that container:
  • Add iptables rules in the network namespace used by the container.
  • For example, enter the namespace used by the container from the host using nsenter and add iptables rules.
  1. Modify the interval for sending packets between the TCP server and client from 1 second to 10 seconds:
  • The reason for this is that I am manually blocking packets.
  • If the server/client sends packets too frequently, the TCP information on both sides may become out of sync, causing the restoration to fail.

With these attempts, I was able to successfully restore TCP connectivity.

@adrianreber I am currently in the testing phase but will attempt to make code changes and contributions in the near future. I appreciate your help.

@wjstk16 wjstk16 closed this as completed Aug 6, 2024
@adrianreber
Copy link
Member

@wjstk16 Thanks for the detailed description of how you were able to make it work. Good to have it documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants