Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for one-network-card card configuration, single card configuration, one NIC only #48

Closed
interduo opened this issue Aug 29, 2022 · 20 comments
Milestone

Comments

@interduo
Copy link
Collaborator

interduo commented Aug 29, 2022

Is it possible to use LibreQoS on one network card using different VLANs or just use one interface for network conectivity?

I don't think xdp-project/xdp-cpumap-tc#7. xdp-cpumap-tc is what we use to redirect packets to the appropriate CPU quickly. If you follow this network design, you can have any VLANs desired on the Core router at least, and as they pass, routed, over the LibreQoS box, they'll be shaped correctly.

Now I have 1x40G QSFP+ card - here on the test machine - I thought that I could use vlan interface

ens1np0.100 
ens1np0.101

for networking.

The target for production is to:

  • connect LibreQoS VM router to two different core switches,
  • with two passthrough network interfaces and set some teaming (bridge,bond,lacp...)
  • make the same on backup router with but with bigger path costs,

So I need six 1x40G cards or three 2x40G cards. (2x interfaces for BGP VM, 4x interfaces for LibreQoS)
Is that a problem? It depends.

This would results in:

  • providing better ressistance from unpredictable events.
    (switch/QSFP cable/QSFP module serwer/QSFP switch module/more things failure),
  • better maintance (and don't fear about upgrades),

I am just asking because I would like to make sure of design.png.

If we use OSPF (L3) and src/dst address (L3) - Can we use only one network card when the sumarized network throughoutput is less than 40% of total maximum sumarized (upload+download) throughoutput?

Is this sentence from README.txt:
"NIC must have two or more interfaces for traffic shaping." is a requirement or just recomendation?

On LibreQoS there is no NAT - so this is possible I think. Am I right?

Maybe we could use https://man7.org/linux/man-pages/man8/tc-vlan.8.html ?

@interduo
Copy link
Collaborator Author

interduo commented Aug 30, 2022

I would like to remove all SPOF's ... so get something like that:
https://raw.githubusercontent.com/interduo/LibreQoS/354141a766e19c313301b3f0824cc010330cc650/diagram-test.drawio

image

@interduo
Copy link
Collaborator Author

@rchac how big is performance dropdown when using virtio instead of passthrough NICs? Did You try to use SRIOV instead of PCI-Passthrough?

@rchac
Copy link
Member

rchac commented Sep 16, 2022

@rchac how big is performance dropdown when using virtio instead of passthrough NICs? Did You try to use SRIOV instead of PCI-Passthrough?

Virtio / generic XDP is about 1/4 the performance of pass-through NIC. Iam testing out a bare metal server and it's working great so far.

@interduo
Copy link
Collaborator Author

Did You tested SRIOV?

@rchac
Copy link
Member

rchac commented Sep 16, 2022

Did You tested SRIOV?

I don't think SRIOV works with XDP. But it may be possible.

@interduo
Copy link
Collaborator Author

interduo commented Sep 16, 2022

If yes then we could make a LibreQoS OSPF neighbour and add two /32 networks between EDGE router and CORE switch on virtualized NICs. SRIOV overhead is about 2 or less %.

@dtaht
Copy link
Collaborator

dtaht commented Nov 13, 2022

We haven't been in a position to document or discuss virtualization all that much, and it makes my head hurt. We did just aquire some serious resources from equinix #151 to be able to test this more fully, but I kind of consider virtualization testing to be a whole other "project", after we get a solid test suite in #153. @interduo Interduo, though, since you have the resources, goferit with v1.3 and get back to us?

Secondly, I like the idea1 card operation a LOT, but don't know if that works in 1.3?

@interduo
Copy link
Collaborator Author

Now I use 1.2 version as VM. It's doing it works.

I will get You a feedback but my local roadmap needs few hw changes so it will take us few weeks. And I will test firstly one interface configuration.

@interduo interduo changed the title Support for one-network-card card configuration Support for one-network-card card configuration, single card configuration, one NIC only Nov 23, 2022
@interduo
Copy link
Collaborator Author

interduo commented Nov 23, 2022

I am trying to test this config:

My netplan is:

root@rtx-libreqos-new:~/LibreQoS/v1.3# cat /etc/netplan/00-installer-config.yaml

# This is the network config written by 'subiquity'
network:
  ethernets:
    ens18:
      addresses:
      - 10.100.0.6/24
      routes:
      - to: default
        via: 10.100.0.254
      nameservers:
        addresses:
        - 46.151.191.151
        - 46.151.191.5
        search: []
    ens16np0:
      dhcp4: no
  version: 2
  vlans:
      v800@ens16np0:
          id: 800
          link: ens16np0
      v900@ens16np0:
          id: 900
          link: ens16np0
  bridges:
    br0:
      interfaces:
        - v800@ens16np0
        - v900@ens16np0
  version: 2

root@rtx-libreqos-new:~/LibreQoS/v1.3# cat ispConfig.py | grep interface

interfaceA = 'v800@ens16np0'
interfaceB = 'v900@ens16np0'
root@rtx-libreqos-new:~/LibreQoS/v1.3# ip a s | grep @ens16np0
5: v900@ens16np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br0 state LOWERLAYERDOWN group default qlen 1000
6: v800@ens16np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br0 state LOWERLAYERDOWN group default qlen 1000

I got an error:

root@rtx-libreqos-new:~/LibreQoS/v1.3# ./LibreQoS.py 
refreshShapers starting at 23/11/2022 10:11:26
First time run since system boot.
Validating input files 'ShapedDevices.csv' and 'network.json'
network.json passed validation
ShapedDevices.csv passed validation
Backed up good config as lastGoodConfig.csv and lastGoodConfig.json
Traceback (most recent call last):
  File "/root/LibreQoS/v1.3/./LibreQoS.py", line 1216, in <module>
    refreshShapers()
  File "/root/LibreQoS/v1.3/./LibreQoS.py", line 442, in refreshShapers
    queuesAvailable = findQueuesAvailable()
  File "/root/LibreQoS/v1.3/./LibreQoS.py", line 94, in findQueuesAvailable
    directory_contents = os.listdir(path)
FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/net/v800@ens16np0/queues/'

Whereas:
root@rtx-libreqos-new:~/LibreQoS/v1.3# ethtool -l ens16np0

gives:

Channel parameters for ens16np0:
Pre-set maximums:
RX:		n/a
TX:		n/a
Other:		n/a
Combined:	16
Current hardware settings:
RX:		n/a
TX:		n/a
Other:		n/a
Combined:	16

If VLAN is used, the LibreQoS should try find queues count in system using native interface.

root@rtx-libreqos-new:~/LibreQoS/v1.3# brctl show

bridge name	bridge id		STP enabled	interfaces
br0		8000.7e4911d668ca	no		v800@ens16np0
							v900@ens16np0

If I set:
cat /root/LibreQoS/v1.3/ispConfig.py | grep Over
queuesAvailableOverride = 16

It's got no error - but:

root@rtx-libreqos-new:~/LibreQoS/v1.3# tc class show dev v800@ens16np0
root@rtx-libreqos-new:~/LibreQoS/v1.3# tc class show dev v900@ens16np0

gives an empty output.

Full running log: http://kłopotek.pl/running_libre.txt

@rchac
Copy link
Member

rchac commented Nov 23, 2022

I noticed "LOWERLAYERDOWN" on your ip output. Is the physical interface not properly connected?

@thebracket
Copy link
Collaborator

I did a little digging (ran into some hyper-v issues with tagged VLANs between VMs; I need to get a better test environment locally!). So there's good news and bad news:

The good news is that binding XDP programs to VLAN interfaces works. I can bind an XDP program to a VLAN I've created (in this case eth1.8), and it functions.

The bad news is that performance is going to be really, really bad:

  • There's only 1 queue on a VLAN interface. ls /sys/class/net/eth1.8/queues/ shows rx-0 and tx-0. So CPU mapping won't do anything, because there's only one queue to map - it's going to sit on a single core, no matter what you do.
  • VLAN interfaces appear as "XDP generic" (which means "SKB bind mode" internally); if your interface had any hardware acceleration for XDP, it just went away.

So my advice: don't do this.

@thebracket
Copy link
Collaborator

Just to expand in case in anyone wants to test.

(as root)

  1. Enable VLAN tagging support in your kernel with modprobe 8021q
  2. Add a VLAN with ip link add link eth1 eth1.8 type vlan id 8
  3. Turn the VLAN interface on with ip link set dev eth1.8 up

Now have a look at sys/class/net/eth1.8/queues and you should see 1 queue for each direction. 1 queue will work, and CPU map can still spray IP matches into different cores - but with only 1 in and 1 out, performance is going to be very limited at the interrupt level.

If you see more than 1 queue in each direction, you obviously have a nicer NIC than me and I'm thrilled to be proven wrong.

Using my test XDP program, the packets I see are un-tagged on that interface (as they should be), and I didn't see signs of leakage from the parent VLAN. So isolation is working.

@interduo
Copy link
Collaborator Author

I noticed "LOWERLAYERDOWN" on your ip output. Is the physical interface not properly connected?

Yes its ok - because its test instance. Switch port is administrativly down.

@thebracket
Copy link
Collaborator

Is there a way to unlink this from issue 26? They aren't really related. 26 is talking about VLANs inside the bridge - which works really well, I'm pushing 2gbit/s through VLANs in a shaping bridge right now. This is talking about the top-level bridge members being VLANs. (We should probably close 26)

@interduo
Copy link
Collaborator Author

interduo commented Nov 23, 2022

Vlan interface use NIC queues.

@thebracket
Copy link
Collaborator

Looking at the config, I see a couple of things:

interfaceA = 'v800@ens16np0'
interfaceB = 'v900@ens16np0'

Isn't going to work, because the devices are named v800 and v900. You should see entries in /sys/class/net/v800/queues and /sys/class/v900/queues.

Since there's almost certainly only 1 queue (per direction), you're going to want to look in /sys/class/net/ens12np0/queues and see how many there actually are. Add queuesAvailableOverride=(the number of queues) to your config to bypass the test.

@interduo
Copy link
Collaborator Author

interduo commented Nov 23, 2022

Sorry for not editing my earlier full post. I did it now - interfaces are named corectly in ispConfig.py and in netplan YAML. I did use queuesAvailableOverride=16.

That is really bad news like You said.

Maybe there is another (not vlans) idea/technology/design model for allowing us to use just one link?
Thinking...maybe we could tag inbound/outbound network packets by some mark/tag on Edge router and then change the tag on LibreQoS after queueing?

@interduo
Copy link
Collaborator Author

interduo commented Nov 23, 2022

So having two core switches and no SPOF comes with requirement now:
LibreQoS - 2
Second LibreQoS - 2
BGP - 1
Second BGP - 1
Internet Upsteams - >=2
another core switch - 2 (one switch-switch link)

So we need minimum 10x QSFP+ ports WRRRRR

That is not a thing for 10G network through but the stairs shows when we are above that line.
Just look at the market and see that all switches in resonable prices comes with only 4xQSFP+ or change my mind.

@interduo
Copy link
Collaborator Author

interduo commented Jan 5, 2023

I saw be2e2fb
@thebracket do You know how big is performance penalty (and if is any) using only one interface instead of two?

@thebracket
Copy link
Collaborator

thebracket commented Jan 5, 2023 via email

@dtaht dtaht closed this as completed Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants