Changing TX-Queuelength to enhance Bandwidth #100

CodeFetch · 2021-01-06T22:48:40Z

Momentarily we use Debian's default values for e.g. the txqueuelen. These values are not optimized for routing. Increasing the buffer sizes and queue lengths will likely reduce the number of context switches. When setting them too high we might starve the userspace. Therefore we need to find reasonable values depending on the interface type (e.g. userspace VPN like fastd vs. kernel land WireGuard).

CodeFetch · 2021-01-10T09:40:02Z

Buffer sizes should be symmetric for servers:
ethtool -G eth0 rx 4096 tx 4096

CodeFetch · 2021-01-10T09:44:34Z

Increase input queue lengths:
echo 16384 > /proc/sys/net/core/netdev_max_backlog
ip link set eth0 txqueuelen 16384

1977er · 2021-01-10T12:08:45Z

I guess, this goes into the ffh.supernode role.

lemoer · 2021-01-10T18:29:39Z

Please make sure, that this does not introduce additional buffer bloat.

CodeFetch · 2021-01-11T01:08:36Z

@lemoer I did, but I didn't measure any improvement. So it seems it is not our bottleneck. A queue length of 16384 is enough for flattening spikes on a 2 Gbit/s link for about 5 ms. So this is nothing critical for bufferbloat, but just what we need for a 2 GBit/s link to not excessively start dropping on a high load. About the ring buffer sizes... They should be symmetric on a server that has symmetric up and down rates. Debian's defaults are for "client devices" which download more than they upload.

1977er · 2021-01-25T22:36:54Z

@CodeFetch If you did not measure any improvement, can this issue be closed?

CodeFetch · 2021-01-25T23:00:25Z

We will sooner or later run into these issues when we have higher bandwidth demands e.g. with WireGuard, because the default configuration of Debian is not for routers or better say machines that symmetrically receive/transmit packets.

1977er · 2021-01-25T23:17:47Z

So, what did you mean by "but I didn't measure any improvement"?

CodeFetch · 2021-01-26T07:20:25Z

@1977er That it's not the bottleneck at the moment. But I think it becomes relevant with more than 1 gbit/s traffic forwarding from different connections.

1977er · 2021-01-26T09:54:44Z

This will never happen with our current hardware and our resources.

AiyionPrime · 2021-02-20T10:30:23Z

@CodeFetch @1977er So this is stalled/blocked, at least until we rolled out WireGuard for the broad mass?

1977er · 2021-02-20T10:37:19Z

If these settings do not harm, we can introduce them for future use.

AiyionPrime · 2021-02-20T10:50:16Z

@CodeFetch if they don't, do you want to implement this?

lemoer · 2021-02-20T17:59:49Z

@1977er Settings like TXQueuelength can drastically reduce network perfomace due to bufferbloat. Therefore they can cause harm.

1977er · 2021-02-20T22:40:01Z

I guess then it's settled. As long as its not needed, don't change it.

@CodeFetch if you disagree, please re-open again.

CodeFetch · 2022-08-18T15:34:08Z

That's not how codel etc. work as these algorithms look at the latency for deciding to drop packets. Therefore we should give it a try as the current buffer sizes are not optimized for routing and this could partially explain our current problems.

1977er · 2022-08-19T07:52:03Z

Tested the setting of 16k again on a supernode with quite decent wg usage (already having the problem of dropping packets due to overusage). Looking at the % of neighbours with TQ < 95% chart I can see no effect after introducing the setting.

(injected at 9:38)

I suggest to close this issue until we know, that we need this.

AiyionPrime · 2022-08-19T08:09:40Z

Maybe we can re-test this once we have WG roled out, just to settle this once and for all then.

CodeFetch · 2022-08-22T20:33:37Z

@1977er @AiyionPrime We should at least set this setting symmetric as we are using the servers as routers and not desktop machines. The current/asymmetric values are definitely wrong.

CodeFetch · 2022-11-14T21:11:57Z

ip link set eth0 txqueuelen 16384

Tuning txqueuelen on pfifo_fast

The default txqueuelen was set to 1000 in Linux, in 2006. This is arguably wrong, even for gigE. Most network simulations at 100Mbit and below use 100, or even 50 as their default. Setting it below 50 is generally wrong, too, at any speed, except on wireless.

Source: https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmarking_Codel_and_FQ_Codel/

Whatever that means...

Edit: "Time Squeeze" constantly increases so we might want to try tuning NAPI budgets
https://levelup.gitconnected.com/linux-kernel-tuning-for-high-performance-networking-receive-backlog-5b3f54fb82a7

Edit2: txqueuelen and netdev_max_backlog need to be set high enough for avoiding packet drops. At the same time NAPI weight and budget need to be set high enough for the softirq to handle all packets in time. Our issue is indeed softirq backpressure, but it seems that the reason for it is that the CPU cores are actually saturated. Maybe it makes sense to dedicate cores to softirq handling as recommended here (to reduce rescheduling and thereby increase cache warmth?), but that needs to be tested:
https://www.linkedin.com/pulse/navigating-linux-kernel-network-stack-receive-path-mark-price

Edit3: I like this guide more than the previous ones as it is more compact still covering all tips I've read about so far:
https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf

AiyionPrime added blocked enhancement labels Feb 20, 2021

lemoer changed the title ~~Improve network interface performance~~ Changing TX-Queuelength to enhance Bandwidth Feb 20, 2021

1977er closed this as completed Feb 20, 2021

CodeFetch reopened this Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing TX-Queuelength to enhance Bandwidth #100

Changing TX-Queuelength to enhance Bandwidth #100

CodeFetch commented Jan 6, 2021

CodeFetch commented Jan 10, 2021

CodeFetch commented Jan 10, 2021

1977er commented Jan 10, 2021

lemoer commented Jan 10, 2021

CodeFetch commented Jan 11, 2021

1977er commented Jan 25, 2021

CodeFetch commented Jan 25, 2021

1977er commented Jan 25, 2021

CodeFetch commented Jan 26, 2021

1977er commented Jan 26, 2021

AiyionPrime commented Feb 20, 2021

1977er commented Feb 20, 2021

AiyionPrime commented Feb 20, 2021 •

edited

Loading

lemoer commented Feb 20, 2021

1977er commented Feb 20, 2021

CodeFetch commented Aug 18, 2022

1977er commented Aug 19, 2022 •

edited

Loading

AiyionPrime commented Aug 19, 2022

CodeFetch commented Aug 22, 2022

CodeFetch commented Nov 14, 2022 •

edited

Loading

Changing TX-Queuelength to enhance Bandwidth #100

Changing TX-Queuelength to enhance Bandwidth #100

Comments

CodeFetch commented Jan 6, 2021

CodeFetch commented Jan 10, 2021

CodeFetch commented Jan 10, 2021

1977er commented Jan 10, 2021

lemoer commented Jan 10, 2021

CodeFetch commented Jan 11, 2021

1977er commented Jan 25, 2021

CodeFetch commented Jan 25, 2021

1977er commented Jan 25, 2021

CodeFetch commented Jan 26, 2021

1977er commented Jan 26, 2021

AiyionPrime commented Feb 20, 2021

1977er commented Feb 20, 2021

AiyionPrime commented Feb 20, 2021 • edited Loading

lemoer commented Feb 20, 2021

1977er commented Feb 20, 2021

CodeFetch commented Aug 18, 2022

1977er commented Aug 19, 2022 • edited Loading

AiyionPrime commented Aug 19, 2022

CodeFetch commented Aug 22, 2022

CodeFetch commented Nov 14, 2022 • edited Loading

AiyionPrime commented Feb 20, 2021 •

edited

Loading

1977er commented Aug 19, 2022 •

edited

Loading

CodeFetch commented Nov 14, 2022 •

edited

Loading