Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Add CRaC support #868

Closed
wants to merge 17 commits into from

Conversation

lingjun-cg
Copy link
Collaborator

AntonKozlov and others added 17 commits October 10, 2024 11:06
Summary: The CRaC Project researches coordination of Java programs with mechanisms to checkpoint (make an image of, snapshot) a Java instance while it is executing. Restoring from the image could be a solution to some of the problems with the start-up and warm-up times. The project website is at https://openjdk.org/projects/crac/.

Testing: All testcases in jdk/jdk/crac. These testcase requires root privilege and os kernel version should >=4.19.

Reviewers: lei.yul, denghui.ddh

Issue: dragonwell-project#867
Summary:  1. RMI TCPSocket not support C/R(Checkpoint/Restore). 2. The cppath file write after fork criu process, but criu will kill jvm. Sometimes there no chance to write cppath,but cppath is madatory for restore. 3. Comparing System.nanoTime()'s output is meaningful only in the same process.

Testing: All testcases in jdk/jdk/crac.

Reviewers: lei.yul, denghui.ddh

Issue: dragonwell-project#867
…ses lazily.

Summary:  Most of the CRaC testcases run failed when run on the lastest criu and linux kernel 5.10. The first is write to file instead of stdout and stderr, because stdout and stderr depend on pipe or tty, but pipe and tty cannot be checkpoint and restore correctly with criu. The second is use 'docker run' instead of 'docker exec' to run java application in docker.If checkpoint a application that lauched by 'docker exec', there is an error "Can't lookup mount=24 for fd=0 path=/dev/null".

Testing: All testcases in jdk/jdk/crac.

Reviewers: lei.yul, denghui.ddh

Issue: dragonwell-project#867
… before restore

Summary: If validate failed before restore,JVM fail back to normal startup.

Testing : All testcases in jdk/jdk/crac.

Reviewers: lei.yul, denghui.ddh

Issue: dragonwell-project#867
Summary: The interfaces in ProcessTools.java has changes.

Testing: test/jdk/jdk/crac/MinimizeLoadedClass.java

Reviewers: lingjun.cg, yulei.lx

Issue: dragonwell-project#867
…hread to blocked when write data to client.

Summary: AttachListener write data to client,but current thread status is in vm.The other thread cannot enter into safepoint.

Testing: runtime/SharedArchiveFile/DumpSymbolAndStringTable.java

Reviewers: denghui.ddh,lei.yul

Issue: dragonwell-project#867
Summary: To support run CRaC on flink successful, add these features: pseudo persisten file, run with unprivileged mode, and append to app classloader classpath.And remove RMI Transport CRaC callback implementation which is not solid.

Testing: All crac testcases.

Reviewers: lei.yul,denghui.ddh

Issue: dragonwell-project#867
Summary: Add a new option CRaCRestoreInheritPipeFds specify the pipe fds that should restore. Restore stdout and stderr pipe is important when run in container.The container runtime read logs from these pipes.

Testing: All crac testcases.

Reviewers: lei.yul,denghui.ddh

Issue: dragonwell-project#867
Summary: 1. Add to set that cannot run concurrently 2. Remove unused TCPTransportTest.java testcase.

Testing: All crac testcases.

Reviewers: lei.yul,denghui.ddh

Issue: dragonwell-project#867
Summary: The root cause is the difference in BaseOS, not the kernel version that led to restore failure.

Testing: All crac testcases.

Reviewers: lei.yul,denghui.ddh

Issue: dragonwell-project#867
…e/Inflator/Deflator

Summary: The finalize methods of ZipFile, Inflater and Deflater are marked deprecated and for removal in since JDK9.They should be removed in JDK12 as planed.

Testing: test/jdk/java/util/zip/ZipFile/TestCleaner.java

Reviewers: lei.yul, denghui.ddh

Issue: dragonwell-project#867
Summary: JMX cache localhost name in sun.rmi.transport.tcp.TCPEndpoint, it should resample in afterRestore.

Testing: All crac testcases.

Reviewers: lei.yul,denghui.ddh

Issue: dragonwell-project#867
Summary: The crac debug log format changed

Testing: jdk/crac/LazyProps.java

Reviewers: yansendao.ysd,denghui.ddh

Issue: dragonwell-project#867
Summary: There is no guarantee the CRaC image dir exists before registering WatchService, so checking image dir exists with proper timeout.

Testing: jdk/crac/recursiveCheckpoint/Test.java

Reviewers: yansendao.ysd, yueshi.zwj

Issue: dragonwell-project#867
…ed docker.

Summary: The stdout and stderr are pipe files when run in docker, restore these pipe files is rather tricky. Write the fds to a file named pipefds, then after criu checkpoint successfully, the criu execute the criuengine as a postdump callback. In the callback, append the pipe info of java process to the file pipefds. It read the pipefds when restore, than pass the pipe info as --inherit-fd to criu. The problem is criuengine cannot get the pipefds file path if run with nonprivilged. To fix this, set the environment CRAC_IMAGE_DIR explictly when do checkpointing.

Testing: jdk/jdk/crac/stdoutInDocker/TestStdoutInDocker.sh

Reviewers: lei.yul,denghui.ddh

Issue: dragonwell-project#867
Summary: If the image dir is a relative path, the criuenginue process cannot write to the pipefds file in image dir.So convert it to real path before do checkpointing.

Testing: jdk/jdk/crac/AppendAppClassLoaderTest.java,jdk/jdk/crac/RestorePipeFdTest.java

Reviewers: yansendao.ysd,lvfei.lv

Issue: dragonwell-project#867
…d mode when do checkpointing.

Summary: Add a new option CRaCAppendOnlyLogFiles to configure the files can be ignored when do checkpointing, and create an empty file if not exist when restore.

Testing: jdk/crac/AppendOnlyFileTest.java

Reviewers: lei.yul,denghui.ddh

Issue: dragonwell-project#867
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ lingjun-cg
✅ jia-wei-tang
❌ AntonKozlov
You have signed the CLA already but the status is still pending? Let us recheck it.

@lingjun-cg lingjun-cg requested review from yuleil and D-D-H October 10, 2024 11:22
@lingjun-cg lingjun-cg changed the title Add CRaC support [Misc] Add CRaC support Oct 11, 2024
@lingjun-cg lingjun-cg closed this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants