Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure to reconnect with -persist #93

Open
mxork opened this issue Dec 17, 2018 · 28 comments
Open

failure to reconnect with -persist #93

mxork opened this issue Dec 17, 2018 · 28 comments

Comments

@mxork
Copy link

mxork commented Dec 17, 2018

I still need to do a proper job getting logs together, but maybe someone can tell me I'm being dumb before I put too much time into this.

Currently, filesystems I have mounted from nbd devices panic on I/O failure if I restart the corresponding server. I would like nbd to renegotiate the connection if the server drops, and -persist seems to do the right thing . However, if I set up a test environment on the local machine and restart the server after connecting:

$ nbd-client -N default localhost /dev/nbd0 -persist -nonetlink -nofork             
Negotiation: ..size = 1024MB
bs=1024, sz=1073741824 bytes
timeout=5
<restart nbd-server here>
Kernel call returned.
sock, done

The call simply returns, and does not attempt to reconnect. From the log message, it does not take the branch at

nbd/nbd-client.c

Line 1292 in 128fd55

if (ioctl(nbd, NBD_DO_IT) < 0) {
, instead seeming to take the branch at

nbd/nbd-client.c

Line 1329 in 128fd55

} else {
. I have no idea why the ioctl call would return >= 0, but it seems to.

I realize that the filesystem may also need some love to get the desired behavior, but that's moot if nbd does not renegotiate.

@nand11
Copy link

nand11 commented Mar 29, 2020

Hi,

TL;DR: Here is my experience with the -persist option of nbd-client, can you help me?

I want to use NBD over an unreliable network connection. So I was trying to use nbd-client with the -persist option. A while ago somone on irc (#nbd on oftc.net) told me that -persist is broken on recent kernels and I should try to use -nonetlink (Thanks!). So I did that (using debian buster). I also started nbd-client with -nofork to get debug outputs.

Test 1:

I connected nbd, interrupted the network connection and nbd-client exited with

Kernel call returned.
sock, done

Okay, that's not what I wanted so I had a look at the sourcecode of nbd-client (debian source package nbd-3.19). Near the end of main(), there is this section that gets executed:

  if (ioctl(nbd, NBD_DO_IT) < 0) {
           [...]
   } else {
           /* We're on 2.4. It's not clearly defined what exactly
            * happened at this point. Probably best to quit, now
            */
           fprintf(stderr, "Kernel call returned.\n");
           cont=0;
   }

Okay, we are not on Kernel 2.4, my kernel version is 4.19.

Test 2:

I found out that ioctl returned 0 in case of a network disconnect. So I changed

  if (ioctl(nbd, NBD_DO_IT) < 0) {

to

  if (ioctl(nbd, NBD_DO_IT) <= 0) {

and things got better. After a network disconnect, nbd-client tried to reconnect:

nbd,16756: Kernel call returned: Success
Reconnecting
Error: Socket failed: Connection refused
Reconnecting
Error: Socket failed: Connection refused
Reconnecting
Error: Socket failed: Connection refused
Reconnecting

A connection retry was done about once per second. Yay! I restored my network connection and it worked:

Negotiation: ..size = 2844688MB
bs=512, sz=2982871564288 bytes
timeout=120
nbd,31830: Kernel call returned: Success
Reconnecting
Error: Socket failed: Connection refused
Reconnecting
Error: Socket failed: Connection refused
Reconnecting
Negotiation: ..size = 2844688MB
bs=512, sz=2982871564288 bytes
timeout=120

Test 3:

The test above was done without a mounted filesystem. Now I mounted a filesystem (readonly, via LUKS crypto) and tried it again. Unfortunately, the result was a little bit different. The kernel said it was busy and retries were attempted quickly after another. nbd-client printed:

Negotiation: ..size = 2844688MB
bs=512, sz=2982871564288 bytes
timeout=120
nbd,16756: Kernel call returned: Device or resource busy
Reconnecting
Negotiation: ..size = 2844688MB
bs=512, sz=2982871564288 bytes
timeout=120
nbd,16756: Kernel call returned: Device or resource busy
Reconnecting

The reconnect in nbd-client does not print an error, but the next iteration of

ioctl(nbd, NBD_DO_IT)

returns "Device or resource busy".

Now I am at a point where I would need to debug or understand the kernel nbd driver, what I have not attempted yet. (__nbd_ioctl calls nbd_start_device_ioctl for NBD_DO_IT, this calls nbd_start_device, and this returns -EBUSY if nbd->task_recv is set.
https://github.com/torvalds/linux/blob/e595dd94515ed6bc5ba38fce0f9598db8c0ee9a9/drivers/block/nbd.c#L1232
Is this where it fails?)

best regards,
nand11

@axos88
Copy link

axos88 commented Jan 31, 2021

This is still an issue with version 3.18

@0rtz
Copy link

0rtz commented Feb 20, 2021

I see the same behaviour as nand11 with version 3.21

@fff7d1bc
Copy link

fff7d1bc commented Apr 4, 2021

I can confirm that there's no attempt to reconnect on 3.21.

@yoe
Copy link
Member

yoe commented Jun 25, 2021

So.

-persist uses an old quirk in the ioctl configuration interface that used to work at some point, but seems to have been lost after a few maintainer changes for the kernel module of nbd.

The netlink interface does not have anything to support -persist; when using the netlink interface, nbd-client has exited long before the connection is dropped, and so there is no way for it to discover that this has happened.

In order for -persist to work again, I think we need to go back to the drawing board. Meanwhile a possible workaround could be to use the multiple connection feature; if the connection drops, and you still have another connection open, then that allows you to continue working (but obviously that doesn't work if you only have one server and that one is being restarted).

For now, I think I'll just disable -persist (better to not have a feature rather than one that does nothing), and talk to the kernel maintainer to see how we can fix this.

@fff7d1bc
Copy link

In that case, would you also consider enabling timeout by default? From what I see, even if I don't use -persist, like

nbd-client  -nofork  -N test1 HOSTNAME /dev/nbd0

And the server reboots, I get a block device and nbd-client forever in D state, until I reboot. Seems like timeout is essential here.

@yoe
Copy link
Member

yoe commented Jun 25, 2021

It's actually in the D state until the TCP timeout, which is a per-system setting that defaults to 2 hours and 12 minutes (IIRC).

The timeout thing is another of those bad ideas that I should probably get rid of; it triggers if the device is perfectly happily connected but idle. This may be a good idea in some cases, but not in most.

@chabad360
Copy link

chabad360 commented Jun 25, 2021

Well that would explain why my kiosks randomly fail (well, clearly not as random as I thought) if they sit idle for too long. How would I go about disabling the timeout?

@yoe
Copy link
Member

yoe commented Jun 25, 2021

If you don't explicitly pass the -t or -timeout parameter to nbd-client, it shouldn't be set. If you still see things going wrong there, please file a (separate) bug.

@chabad360
Copy link

Wait, now I'm confused. The timeout is set by default, or by default it never times out?

Cause at this point, this has become quite a problem for me, but I've never been able (or had enough time and patience) to really track it down. I'm not sure if my kiosks are failing from this timeout (seems to happen only during idle) or if it's a lucky network failure.

I'm thinking of switching to ISCSI, but the CoW feature of NBD is very useful to me.

(Does the timeout in nbd-server also close on idle?)

@yoe
Copy link
Member

yoe commented Jul 7, 2021

Neither the client nor the server timeout should be set by default (which means neither should time out by default).

The TCP keepalive probes are set, and it's not possible to switch them off. As long as the remote end is still functioning properly, these shouldn't interrupt your connection, however.

@bauen1
Copy link

bauen1 commented Feb 18, 2022

Hi, is there any chance this is getting fixed in the near future, or a way to work around this issue ?
Otherwise I'll also have to investigate replacing nbd with iSCSI or something else.

I have setup backups on a remote server for my laptop, for this a wireguard VPN is setup between the hosts, and the server runs an nbd-server.
When connecting with -persist, eventually after a day or so, attempting to access the mounted filesystem will result in IO errors due to nbd having dropped the connection, forcing me to unmount everything uncleanly, reconnect and remount everything again.

@yoe
Copy link
Member

yoe commented Mar 9, 2022

A simple workaround is to make sure the connection never remains idle for too long. Just touching a file in the mounted NBD file system every once in a while should do that.

@wtarreau
Copy link

wtarreau commented Oct 2, 2022

That's a very sad situation, I was testing NBD as a really appealing candidate for remote backups, but ended up on this non-working persist situation, and the D state as well when timeout is not set. Yes I think we should rework all of this a few ways:

  • always set TCP_USER_TIMEOUT on linux >= 2.6.37. This one is very clean, as it only counts failures to ACK sent packets ;
  • also enable SO_KEEPALIVE with a short TCP_KEEPIDLE value (e.g. configured timeout divided by 3 or so retries).

But this would only be used to make sure the timeout doesn't kill idle connections and actually only kills dead ones (killing idle connections didn't happen in my tests). The fact that the daemon cannot automatically reconnect by default with -p is a problem that clearly indicates a logic error in the code, but if it fails on EBUSY once the block device is in use, we have a much bigger problem which is that NBD is basically unusable for any real-world purpose since TCP connections eventually fail. I can easily reproduce this here by trying to restart nbd-client after a network error:

19:02:11.161069 ioctl(4, NBD_SET_SOCK, 3) = 0
19:02:11.176188 rt_sigprocmask(SIG_SETMASK, ~[KILL PIPE TERM RTMIN RT_1], ~[KILL PIPE TERM STOP RTMIN RT_1], 8) = 0
19:02:11.176416 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 7755 attached
, child_tidptr=0xffff8e2290f0) = 7755
[pid  7447] 19:02:11.177218 ioctl(4, NBD_DO_IT <unfinished ...>
[pid  7755] 19:02:11.177284 set_robust_list(0xffff8e229100, 24 <unfinished ...>
[pid  7447] 19:02:11.177338 <... ioctl resumed>) = -1 EBUSY (Device or resource busy)
[pid  7755] 19:02:11.177398 <... set_robust_list resumed>) = 0
[pid  7447] 19:02:11.177481 write(2, "nbd,7447: Kernel call returned: Device or resource busy\n", 56nbd,7447: Kernel call returned: Device or resource busy
) = 56
[pid  7447] 19:02:11.177651 close(3 <unfinished ...>
[pid  7755] 19:02:11.177698 openat(AT_FDCWD, "/sys/block/nbd0/pid", O_RDONLY <unfinished ...>
[pid  7447] 19:02:11.177783 <... close resumed>) = 0
[pid  7447] 19:02:11.177869 close(4 <unfinished ...>

Worse, it loops like crazy eating the CPU trying to do that in loops.

I agree that it may become important to get back to the blackboard. It seems to me we're dealing with a bunch of chicken-and-egg problems here. Maybe we're just missing a "reconnect" operation to communicate with the kernel instead of the "connect" one, I don't know.

@nand11
Copy link

nand11 commented Oct 3, 2022

Oh, this thread is still active? Then it may make sense to share my experience.

I tried iscsi over an unreliable network connection and it worked. It is slow over this kind of network, as expected, but it can handle reconnects. I am using the standard debian packages targetcli-fb (server) and open-iscsi (client).

For iscsi, the client demon iscsid does connection-level error processing. But I have not looked at the source code to see how things are handled differently between nbd and iscsi.

@wtarreau
Copy link

wtarreau commented Oct 3, 2022

Very interesting, thanks a lot for sharing your experience. That's definitely something I should have a look at!

@wtarreau
Copy link

wtarreau commented Oct 4, 2022

Many thanks @nand11 for your insights. I've followed some howtos (there are different server implementations so it may look confusing at first but "tgtd" did work fine). It worked very well, and in addition it's particularly robust to connection outage. I've unplugged links as well as removed/restored/changed IP address on the interface. There's a 5s timeout after which the connection is declared dead and is destroyed, then a new one is attempted via the regular paths, so that should resist rebooting firewalls and triple-play boxes silently changing IPs. I'll go that way now, even if the configuration is less trivial, it looks way more robust. Thanks again!

@fathyb
Copy link

fathyb commented Oct 22, 2022

This issue was affecting me pretty badly so I built an alternative nbd-client: https://github.com/fathyb/node-nbd-client. It also resolves other issues I was having related to performance and Docker.

@wtarreau
Copy link

This issue was affecting me pretty badly so I built an alternative nbd-client: https://github.com/fathyb/node-nbd-client.

Interesting to see some work still being done around this. However the choice of nodejs makes it a showstopper for many of us using embedded devices (typically where the full OS+config fits in a 16 MB NOR flash). But it likely has use cases in other environments. Now that I've got iscsi working (using much more complex components and configs), I have not yet figured if nbd still has some benefits (beyond its significant simplicity).

@felixonmars
Copy link
Contributor

Just in case anyone's interested here, I have migrated my setup to NVMe/TCP using nvmetcli for the server, and nvme-cli for the client. Reconnects work flawlessly if the problem doesn't present for too long (it stops retrying after 1h).

@wtarreau
Copy link

wtarreau commented Feb 2, 2024

Thanks for the info. I personally migrated to iSCSI instead, which is amazingly complicated but rock solid and never failed me once in one year despite multiple short and long network outages. Why does nvme-cli stop retrying after one hour ? Is it a config setting or anything else ?

@AndySchroder
Copy link

What are the practical use cases for nbd if it can't handle a simple server restart? Seems like a lot of coordination required to use nbd with this constraint.

@xujihui1985
Copy link

you can use netlink interface to reconfig the device, establish a new sock with server and pass the socket fd to device with NBD_CMD_RECONFIGURE

image

but the thing is how to check if the sock that device hold is broken? I can't find a good way to do that maybe use a thread to periodically ping the server with the sock?

@michael-newsrx
Copy link

What is the fix for this? I am investigating using this for XFS on top of S3. But it keeps erroring out in the nbd-client part and everything stops working. Is there a way to run iSCSI with S3 backing storage?

@yoe
Copy link
Member

yoe commented Aug 6, 2024 via email

@corbolais
Copy link

corbolais commented Aug 22, 2024

@yoe yoe, thanks for your work on nbd.

I got excited as all the others in this thread have, for nbd's simplicity, mainly. It was very appealing for a scenario similar to that remote backup/LUKS thing someone else tried. I'm flatlining now as this persist option is still not working and reconnect is also still failing. So sad.

A simple test setup with a vanishing nbd-server was enough. After nbd-server restarts it results in nbd-client ebusy errors and a blocking cp process, with ugly kernel task hung msg and all. FWIW, the ZFS pool seemed like overkill in hindsight. "But it ought to be a realistic scenario!", someone may have heard me thinking..

[Thu Aug 22 01:25:52 2024] zio pool=nbdpool vdev=/dev/nbd0 error=5 type=1 offset=524034048 size=8192 flags=721089
[Thu Aug 22 01:25:52 2024] WARNING: Pool 'nbdpool' has encountered an uncorrectable I/O failure and has been suspended.

[Thu Aug 22 01:26:06 2024] nbd: nbd0 already in use
[Thu Aug 22 01:26:17 2024] block nbd0: NBD_DISCONNECT
[Thu Aug 22 01:26:17 2024] block nbd0: Send disconnect failed -32
[Thu Aug 22 01:26:19 2024] nbd: nbd0 already in use
[Thu Aug 22 01:27:24 2024] block nbd0: NBD_DISCONNECT
[Thu Aug 22 01:27:24 2024] block nbd0: Send disconnect failed -32
[Thu Aug 22 01:27:29 2024] INFO: task txg_sync:2020404 blocked for more than 122 seconds.
[Thu Aug 22 01:27:29 2024]       Tainted: P           OE      6.5.0-28-lowlatency #29.1-Ubuntu
[Thu Aug 22 01:27:29 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 22 01:27:29 2024] task:txg_sync        state:D

So iSCSI it is. For now.

Edit: version under test (VUT): server 1:3.23-3ubuntu1.22.04.1, client 1:3.26.1-1ubuntu0.1

@myyddngyer03932
Copy link

This is still an issue with version 3.24

@myyddngyer03932
Copy link

multipath seems to be a good choice, I will try to use this tool to solve the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests