routers apparently prefer fastd #175

AiyionPrime · 2021-02-25T19:14:37Z

@CodeFetch and @bschelm observed, routers tend to like connection via fastd, rather then wireguard.

@CodeFetch further found this to be connected to packetloss in wireguard.

We need statistics to back these theses up.

bschelm · 2021-02-25T19:41:03Z

A router that has a WG-connection and several wifi mesh partners seemed to have lost the connection to WG, although in the status page of the router, it shows still connected to the WG supernode. However, that router did not or could not use that WG-connection but instead routed via wifi mesh.

What I tried is, disable wifi for 5 minutes via "wifi down ; sleep 300 ; wifi" in order to force the router to user the WG-connection instead of the wifi mesh way. Didn't work. Router was offline for 5 minutes.

What helped, was a restart of WG with "ifdown vpn ; sleep 5 ; ifup vpn"

lemoer · 2021-02-26T09:06:19Z

Hi Bernd, thanks for the description. I would like to collect some more information: - When did this happen? - How often do you observe this? - Does wg connect to different supernodes when you run "ifdown vpn ; sleep 5 ; ifup vpn"? (The supernode should be chosen randomly here.) - If it (randomly) picks the same supernode as before, is the problem still existing then? - Is this happening with all supernodes? - Could you please check with "batctl n", if the node still sees a batman neighbor on the "vx_vpn_wired" interface (in case of the error)?

…

On Thu, 25 Feb, 2021, 20:41 Bernd Schittenhelm, ***@***.***> wrote: A router that has a WG-connection and several wifi mesh partners seemed to have lost the connection to WG although in the status page it shows still connected. However, that router did not or could not use that WG-connection but routed via wifi mesh. What I tried is, disable wifi for 5 minutes via "wifi down ; sleep 300 ; wifi" in order to force the router to user the WG-connection instead of the wifi mesh way. Didn't work. Router was offline for 5 minutes. What helped, was a restart of WG with "ifdown vpn ; sleep 5 ; ifup vpn" — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAESYQMXBKEIKRGUX5TS6YDTA2RV5ANCNFSM4YHCMHFQ> .

bschelm · 2021-02-26T09:12:20Z

I would have to wait for another occasion.
It happened twice already.
I can't tell when it happened because the router, in that case, is still online via mesh.
You see it only when you click on the router.
After restarting WG, it connected to a different SN.

lemoer · 2021-02-26T10:01:21Z

I added a graph in the router dashboard in Grafana at the very bottom, which shows the vpn neighbors. https://stats.ffh.zone/d/000000021/router-fur-meshviewer?orgId=1 @bschelm: Can you have a look, whether the outages are visible there?

…

On Fri, 26 Feb, 2021, 10:12 Bernd Schittenhelm, ***@***.***> wrote: I would have to wait for another occasion. It happened twice already. I can't tell when it happened because the router, in that case, is still online via mesh. You see it only when you click on the router. After restarting WG, it connected to a different SN. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAESYQN5X3ET33WRMVWTSETTA5QYFANCNFSM4YHCMHFQ> .

bschelm · 2021-02-26T12:58:58Z

Nope.
VPN-Neighbours is always zero.
Same on my router.

lemoer · 2021-02-27T13:20:18Z

@bschelm I added another graph to the dashboard. It's quite messy, so I selected some traces and posted a screenshot above. The selected traces contain rx TQ from and tx TQ to the supernodes. Are your outages correlated to the gaps in the graph?

lemoer · 2021-02-27T13:25:42Z

Well, the time range is kinda long. Here is a more detailed screenshot of the recent history:

lemoer · 2021-02-28T22:22:22Z

From all what I have heard, this doesn't happen very often. So let's start with our Infrastructure Freeze Week, and see whether it will occur again in that week. If it happens again, please do not "fix" it directly, but collect as many data as possible:

output of batctl n from the router
output of batctl meshif bat14 n from the connected supernode
output of wg show from the router
output of wg show from the connected supernode
screenshot of the status page of the router
ip -6 route from the router
ip -6 route from the supernode
20 seconds of tcpdump -n -i vpn inbound -w /tmp/test1.pcap from the router (collect it via scp)
20 seconds of tcpdump -n -i vpn outbound -w /tmp/test2.pcap from the router (collect it via scp)
20 seconds of tcpdump -n -i vx_vpn_wired inbound -w /tmp/test3.pcap from the router (collect it via scp)
20 seconds of tcpdump -n -i vx_vpn_wired outbound -w /tmp/test4.pcap from the router (collect it via scp)
20 seconds of tcpdump -n -i br-wan inbound -w /tmp/test5.pcap from the router (collect it via scp)
20 seconds of tcpdump -n -i br-wan outbound -w /tmp/test6.pcap from the router (collect it via scp)
20 seconds of tcpdump -n -i vx-14 inbound -w /root/test7.pcap from the supernode (collect it via scp)
20 seconds of tcpdump -n -i vx-14 outbound -w /root/test8.pcap from the supernode (collect it via scp)
20 seconds of tcpdump -n -i wg-14 inbound -w /root/test9.pcap from the supernode (collect it via scp)
20 seconds of tcpdump -n -i wg-14 outbound -w /root/test10.pcap from the supernode (collect it via scp)
output of bridge fdb show | grep vx from the connected supernode
output of logread from the router
output of uci export from the router
output of ip addr show from the router
Find the exact time, when the problem has started.

Hopefully this data will be enough to find the issue.

lemoer · 2021-02-28T22:23:12Z

I think, this is the same issue as #147 .

lemoer · 2021-02-28T22:25:33Z

It does not make sense to have either #175 (this issue) or #147 as blocker for the infrastructure freeze week, so I'll remove the milestone here.

AiyionPrime · 2021-03-01T08:35:45Z

I think, this is the same issue as #147 .

I don't remember exactly why, but we came to the conclusion it wasn't;
maybe @1977er remembers this better,
but I think it was due to some fixes applied on sn09, which did not correlate to resolving this issue.

lemoer · 2023-04-16T17:16:37Z

Is this still an issue?

AiyionPrime · 2023-04-16T22:00:19Z

We still have both WireGuard and fastd nodes and have not yet resolved the issue.

lemoer · 2023-04-17T18:14:26Z

Is there any setup, where we saw this recently? CC: @bschelm? Jan-Niklas Burfeind ***@***.***> schrieb am Mo., 17. Apr. 2023, 00:00:

…

We still have both WireGuard and fastd nodes and have not yet resolved the issue. — Reply to this email directly, view it on GitHub <#175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAESYQNM2VUIPELNLEAWY6TXBRTX5ANCNFSM4YHCMHFQ> . You are receiving this because you commented.Message ID: ***@***.***>

AiyionPrime added the bug label Feb 25, 2021

lemoer added this to the Bis zum Beginn der stabilen Phase milestone Feb 25, 2021

lemoer added the effort:huge label Feb 25, 2021

lemoer mentioned this issue Feb 28, 2021

wireguard: bridge fdb entry sometimes missing? #147

Open

lemoer removed this from the wireguard infrastructure freeze milestone Feb 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

routers apparently prefer fastd #175

routers apparently prefer fastd #175

AiyionPrime commented Feb 25, 2021

bschelm commented Feb 25, 2021 •

edited

Loading

lemoer commented Feb 26, 2021 via email

bschelm commented Feb 26, 2021

lemoer commented Feb 26, 2021 via email •

edited

Loading

bschelm commented Feb 26, 2021

lemoer commented Feb 27, 2021 •

edited

Loading

lemoer commented Feb 27, 2021

lemoer commented Feb 28, 2021 •

edited

Loading

lemoer commented Feb 28, 2021

lemoer commented Feb 28, 2021

AiyionPrime commented Mar 1, 2021

lemoer commented Apr 16, 2023

AiyionPrime commented Apr 16, 2023

lemoer commented Apr 17, 2023 via email

routers apparently prefer fastd #175

routers apparently prefer fastd #175

Comments

AiyionPrime commented Feb 25, 2021

bschelm commented Feb 25, 2021 • edited Loading

lemoer commented Feb 26, 2021 via email

bschelm commented Feb 26, 2021

lemoer commented Feb 26, 2021 via email • edited Loading

bschelm commented Feb 26, 2021

lemoer commented Feb 27, 2021 • edited Loading

lemoer commented Feb 27, 2021

lemoer commented Feb 28, 2021 • edited Loading

lemoer commented Feb 28, 2021

lemoer commented Feb 28, 2021

AiyionPrime commented Mar 1, 2021

lemoer commented Apr 16, 2023

AiyionPrime commented Apr 16, 2023

lemoer commented Apr 17, 2023 via email

bschelm commented Feb 25, 2021 •

edited

Loading

lemoer commented Feb 26, 2021 via email •

edited

Loading

lemoer commented Feb 27, 2021 •

edited

Loading

lemoer commented Feb 28, 2021 •

edited

Loading