Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Znapzend hangs on 2.2.5 in Arch Linux #672

Open
sempervictus opened this issue Aug 24, 2024 · 3 comments
Open

Znapzend hangs on 2.2.5 in Arch Linux #672

sempervictus opened this issue Aug 24, 2024 · 3 comments

Comments

@sempervictus
Copy link

Trying to runonce a freshly configured backup set on a new system with 2.2.5 results in znapzend hanging forever without creating or sending anything.

Ran an strace of the execution and looks like it errors out then just clock_nanosleeps forever:

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x6f320fc20310) = 2575019
close(9)                                = 0
close(7)                                = 0
read(8, "", 4)                          = 0
close(8)                                = 0
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
ioctl(6, TCGETS, 0x77196e686f90)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(6, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(5, "", 8192)                       = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2575018, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
read(6, "", 8192)                       = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2575019, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
pipe2([7, 8], O_CLOEXEC)                = 0
pipe2([9, 10], O_CLOEXEC)               = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x6f320fc20310) = 2575020
close(10)                               = 0
close(8)                                = 0
read(9, "", 4)                          = 0
close(9)                                = 0
fcntl(7, F_SETFD, FD_CLOEXEC)           = 0
ioctl(7, TCGETS, 0x77196e686f90)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(7, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(7, "on\n", 8192)                   = 3
close(7)                                = 0
wait4(2575020, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2575020
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2575020, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
close(6)                                = 0
wait4(2575019, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2575019
close(5)                                = 0
wait4(2575018, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2575018
getpid()                                = 2574975
sendto(4, "<30>Aug 24 16:56:19 znapzend[257"..., 82, MSG_NOSIGNAL, NULL, 0) = 82
getpid()                                = 2574975
sendto(4, "<30>Aug 24 16:56:19 znapzend[257"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
openat(AT_FDCWD, "/dev/urandom", O_RDONLY|O_CLOEXEC) = 5
read(5, "O,\1\376", 4)                  = 4
close(5)                                = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
getpid()                                = 2574975
getpid()                                = 2574975
sendto(4, "<30>Aug 24 16:56:19 znapzend[257"..., 105, MSG_NOSIGNAL, NULL, 0) = 105
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x77196e687290) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x77196e687290) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x77196e687290) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x77196e687290) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x77196e687290) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x77196e687290) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x77196e687290) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0, tv_nsec=8052287}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
strace: Process 2574975 detached

System is 6.6.46 with ZFS & SPL built into the kernel binary (vs being a module).

@jimklimov
Copy link
Contributor

Just in case, are you sure strace attached to the znapzend (perl) process? Maybe the nanosleeps are about it waiting for I/O from a launched zfs program, or mbuffer etc.?

With --debug it should show what it runs, so you can copy-paste the command into shell and run (or here strace) it.

It may help to run iostat -z 1 and/or zpool iostat 1 along the way - perhaps the first runonce tries to send gigabytes of your backed-up dataset to a remote destination, so while zfs send runs, there is nothing for the script to report?

Copy link

stale bot commented Dec 20, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 20, 2024
@georgh-
Copy link

georgh- commented Dec 27, 2024

Hi, I have been having the same issue as OP, with the same results on the strace.
My system:

  • Debian testing
  • znapzend 0.23.2
  • ZFS 2.2.6
  • Perl 5.40.0
  • Linux 6.11.5

Znapzend, when run with debug mode, detects the configuration and reads the datasets correctly, and then hangs after this line:
znapzend (PID=788631) Creating workers for the backup plans processing...
And then strace produces the same output as OP.

Debug output (removed lists of datasets, everything is recursive from tank):


$ znapzend --debug
znapzend (PID=788631) Refreshing backup plans...
# zfs list -H -o name -t filesystem,volume
# zfs get -H -s local -o name,property,value,source all tank
[... zfs get -H list of datasets ...]
# zfs list -H -o name -t filesystem,volume tank/photos
# test -x /usr/bin/mbuffer || command -v /usr/bin/mbuffer
# ssh -o batchMode=yes -o ConnectTimeout=30 [email protected] zfs list -H -o name -t 'filesystem,volume' tank-bk/photos
# ssh -o batchMode=yes -o ConnectTimeout=30 [email protected] test -x /usr/bin/mbuffer || ssh -o batchMode=yes -o ConnectTimeout=30 [email protected] command -v /usr/bin/mbuffer
[... other 2 datasets ...]
=== getBackupSet() : got 3 dataset(s) with a local backup plan
=== getBackupSet() : got 3 enabled-only dataset(s) with a local backup plan
# zfs list -H -o name -t filesystem,volume -r tank/photos
# zfs get -H -s local -o value org.znapzend:enabled tank/photos
# zfs get -H -s local -o value org.znapzend:recursive tank/photos
[... list of datasets ...]
znapzend (PID=788631) Creating workers for the backup plans processing...


@stale stale bot removed the stale label Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants