Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Browse files Browse the repository at this point in the history
Pull networking fixes from David Miller:

 1) Several hash table refcount fixes in batman-adv, from Sven
    Eckelmann.

 2) Use after free in bpf_evict_inode(), from Daniel Borkmann.

 3) Fix mdio bus registration in ixgbe, from Ivan Vecera.

 4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.

 5) ila rhashtable corruption fix from Herbert Xu.

 6) Don't allow upper-devices to be added to vrf devices, from Sabrina
    Dubroca.

 7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.

 8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
    from Alexander Lobakin.

 9) Missing IDR checks in mlx5 driver, from Aditya Pakki.

10) Fix false connection termination in ktls, from Jakub Kicinski.

11) Work around some ASPM issues with r8169 by disabling rx interrupt
    coalescing on certain chips. From Heiner Kallweit.

12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
    Abeni.

13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.

14) Various BPF flow dissector fixes from Stanislav Fomichev.

15) Divide by zero in act_sample, from Davide Caratti.

16) Fix bridging multicast regression introduced by rhashtable
    conversion, from Nikolay Aleksandrov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  ibmvnic: Fix completion structure initialization
  ipv6: sit: reset ip header pointer in ipip6_rcv
  net: bridge: always clear mcast matching struct on reports and leaves
  libcxgb: fix incorrect ppmax calculation
  vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
  sch_cake: Make sure we can write the IP header before changing DSCP bits
  sch_cake: Use tc_skb_protocol() helper for getting packet protocol
  tcp: Ensure DCTCP reacts to losses
  net/sched: act_sample: fix divide by zero in the traffic path
  net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
  net: hns: Fix sparse: some warnings in HNS drivers
  net: hns: Fix WARNING when remove HNS driver with SMMU enabled
  net: hns: fix ICMP6 neighbor solicitation messages discard problem
  net: hns: Fix probabilistic memory overwrite when HNS driver initialized
  net: hns: Use NAPI_POLL_WEIGHT for hns driver
  net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
  flow_dissector: rst'ify documentation
  ipv6: Fix dangling pointer when ipv6 fragment
  net-gro: Fix GRO flush when receiving a GSO packet.
  flow_dissector: document BPF flow dissector environment
  ...
  • Loading branch information
torvalds committed Apr 5, 2019
2 parents 8e22ba9 + bbd669a commit 0548740
Show file tree
Hide file tree
Showing 122 changed files with 1,096 additions and 563 deletions.
8 changes: 4 additions & 4 deletions Documentation/bpf/btf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,16 +148,16 @@ The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()``
for the type. The maximum value of ``BTF_INT_BITS()`` is 128.

The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values
for this int. For example, a bitfield struct member has: * btf member bit
offset 100 from the start of the structure, * btf member pointing to an int
type, * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
for this int. For example, a bitfield struct member has:
* btf member bit offset 100 from the start of the structure,
* btf member pointing to an int type,
* the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``

Then in the struct memory layout, this member will occupy ``4`` bits starting
from bits ``100 + 2 = 102``.

Alternatively, the bitfield struct member can be the following to access the
same bits as the above:

* btf member bit offset 102,
* btf member pointing to an int type,
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
Expand Down
126 changes: 126 additions & 0 deletions Documentation/networking/bpf_flow_dissector.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
.. SPDX-License-Identifier: GPL-2.0
==================
BPF Flow Dissector
==================

Overview
========

Flow dissector is a routine that parses metadata out of the packets. It's
used in the various places in the networking subsystem (RFS, flow hash, etc).

BPF flow dissector is an attempt to reimplement C-based flow dissector logic
in BPF to gain all the benefits of BPF verifier (namely, limits on the
number of instructions and tail calls).

API
===

BPF flow dissector programs operate on an ``__sk_buff``. However, only the
limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
and output arguments.

The inputs are:
* ``nhoff`` - initial offset of the networking header
* ``thoff`` - initial offset of the transport header, initialized to nhoff
* ``n_proto`` - L3 protocol type, parsed out of L2 header

Flow dissector BPF program should fill out the rest of the ``struct
bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
also adjusted accordingly.

The return code of the BPF program is either BPF_OK to indicate successful
dissection, or BPF_DROP to indicate parsing error.

__sk_buff->data
===============

In the VLAN-less case, this is what the initial state of the BPF flow
dissector looks like::

+------+------+------------+-----------+
| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
+------+------+------------+-----------+
^
|
+-- flow dissector starts here


.. code:: c
skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
flow_keys->thoff = nhoff
flow_keys->n_proto = ETHER_TYPE
In case of VLAN, flow dissector can be called with the two different states.

Pre-VLAN parsing::

+------+------+------+-----+-----------+-----------+
| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
+------+------+------+-----+-----------+-----------+
^
|
+-- flow dissector starts here

.. code:: c
skb->data + flow_keys->nhoff point the to first byte of TCI
flow_keys->thoff = nhoff
flow_keys->n_proto = TPID
Please note that TPID can be 802.1AD and, hence, BPF program would
have to parse VLAN information twice for double tagged packets.


Post-VLAN parsing::

+------+------+------+-----+-----------+-----------+
| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
+------+------+------+-----+-----------+-----------+
^
|
+-- flow dissector starts here

.. code:: c
skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
flow_keys->thoff = nhoff
flow_keys->n_proto = ETHER_TYPE
In this case VLAN information has been processed before the flow dissector
and BPF flow dissector is not required to handle it.


The takeaway here is as follows: BPF flow dissector program can be called with
the optional VLAN header and should gracefully handle both cases: when single
or double VLAN is present and when it is not present. The same program
can be called for both cases and would have to be written carefully to
handle both cases.


Reference Implementation
========================

See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
for the loader. bpftool can be used to load BPF flow dissector program as well.

The reference implementation is organized as follows:
* ``jmp_table`` map that contains sub-programs for each supported L3 protocol
* ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
does ``bpf_tail_call`` to the appropriate L3 handler

Since BPF at this point doesn't support looping (or any jumping back),
jmp_table is used instead to handle multiple levels of encapsulation (and
IPv6 options).


Current Limitations
===================
BPF flow dissector doesn't support exporting all the metadata that in-kernel
C-based implementation can export. Notable example is single VLAN (802.1Q)
and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
for a set of information that's currently can be exported from the BPF context.
1 change: 1 addition & 0 deletions Documentation/networking/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Contents:
netdev-FAQ
af_xdp
batman-adv
bpf_flow_dissector
can
can_ucan_protocol
device_drivers/freescale/dpaa2/index
Expand Down
4 changes: 2 additions & 2 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -5833,7 +5833,7 @@ L: [email protected]
S: Maintained
F: Documentation/ABI/testing/sysfs-bus-mdio
F: Documentation/devicetree/bindings/net/mdio*
F: Documentation/networking/phy.txt
F: Documentation/networking/phy.rst
F: drivers/net/phy/
F: drivers/of/of_mdio.c
F: drivers/of/of_net.c
Expand Down Expand Up @@ -13981,7 +13981,7 @@ F: drivers/media/rc/serial_ir.c
SFC NETWORK DRIVER
M: Solarflare linux maintainers <[email protected]>
M: Edward Cree <[email protected]>
M: Bert Kenward <bkenward@solarflare.com>
M: Martin Habets <mhabets@solarflare.com>
L: [email protected]
S: Supported
F: drivers/net/ethernet/sfc/
Expand Down
4 changes: 3 additions & 1 deletion drivers/net/bonding/bond_sysfs_slave.c
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ static SLAVE_ATTR_RO(link_failure_count);

static ssize_t perm_hwaddr_show(struct slave *slave, char *buf)
{
return sprintf(buf, "%pM\n", slave->perm_hwaddr);
return sprintf(buf, "%*phC\n",
slave->dev->addr_len,
slave->perm_hwaddr);
}
static SLAVE_ATTR_RO(perm_hwaddr);

Expand Down
24 changes: 16 additions & 8 deletions drivers/net/dsa/mv88e6xxx/port.c
Original file line number Diff line number Diff line change
Expand Up @@ -427,18 +427,22 @@ int mv88e6390x_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
return 0;

lane = mv88e6390x_serdes_get_lane(chip, port);
if (lane < 0)
if (lane < 0 && lane != -ENODEV)
return lane;

if (chip->ports[port].serdes_irq) {
err = mv88e6390_serdes_irq_disable(chip, port, lane);
if (lane >= 0) {
if (chip->ports[port].serdes_irq) {
err = mv88e6390_serdes_irq_disable(chip, port, lane);
if (err)
return err;
}

err = mv88e6390x_serdes_power(chip, port, false);
if (err)
return err;
}

err = mv88e6390x_serdes_power(chip, port, false);
if (err)
return err;
chip->ports[port].cmode = 0;

if (cmode) {
err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_STS, &reg);
Expand All @@ -452,6 +456,12 @@ int mv88e6390x_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
if (err)
return err;

chip->ports[port].cmode = cmode;

lane = mv88e6390x_serdes_get_lane(chip, port);
if (lane < 0)
return lane;

err = mv88e6390x_serdes_power(chip, port, true);
if (err)
return err;
Expand All @@ -463,8 +473,6 @@ int mv88e6390x_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
}
}

chip->ports[port].cmode = cmode;

return 0;
}

Expand Down
20 changes: 12 additions & 8 deletions drivers/net/ethernet/cavium/thunder/nicvf_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -1328,10 +1328,11 @@ int nicvf_stop(struct net_device *netdev)
struct nicvf_cq_poll *cq_poll = NULL;
union nic_mbx mbx = {};

cancel_delayed_work_sync(&nic->link_change_work);

/* wait till all queued set_rx_mode tasks completes */
drain_workqueue(nic->nicvf_rx_mode_wq);
if (nic->nicvf_rx_mode_wq) {
cancel_delayed_work_sync(&nic->link_change_work);
drain_workqueue(nic->nicvf_rx_mode_wq);
}

mbx.msg.msg = NIC_MBOX_MSG_SHUTDOWN;
nicvf_send_msg_to_pf(nic, &mbx);
Expand Down Expand Up @@ -1452,7 +1453,8 @@ int nicvf_open(struct net_device *netdev)
struct nicvf_cq_poll *cq_poll = NULL;

/* wait till all queued set_rx_mode tasks completes if any */
drain_workqueue(nic->nicvf_rx_mode_wq);
if (nic->nicvf_rx_mode_wq)
drain_workqueue(nic->nicvf_rx_mode_wq);

netif_carrier_off(netdev);

Expand Down Expand Up @@ -1550,10 +1552,12 @@ int nicvf_open(struct net_device *netdev)
/* Send VF config done msg to PF */
nicvf_send_cfg_done(nic);

INIT_DELAYED_WORK(&nic->link_change_work,
nicvf_link_status_check_task);
queue_delayed_work(nic->nicvf_rx_mode_wq,
&nic->link_change_work, 0);
if (nic->nicvf_rx_mode_wq) {
INIT_DELAYED_WORK(&nic->link_change_work,
nicvf_link_status_check_task);
queue_delayed_work(nic->nicvf_rx_mode_wq,
&nic->link_change_work, 0);
}

return 0;
cleanup:
Expand Down
30 changes: 14 additions & 16 deletions drivers/net/ethernet/cavium/thunder/nicvf_queues.c
Original file line number Diff line number Diff line change
Expand Up @@ -105,20 +105,19 @@ static inline struct pgcache *nicvf_alloc_page(struct nicvf *nic,
/* Check if page can be recycled */
if (page) {
ref_count = page_ref_count(page);
/* Check if this page has been used once i.e 'put_page'
* called after packet transmission i.e internal ref_count
* and page's ref_count are equal i.e page can be recycled.
/* This page can be recycled if internal ref_count and page's
* ref_count are equal, indicating that the page has been used
* once for packet transmission. For non-XDP mode, internal
* ref_count is always '1'.
*/
if (rbdr->is_xdp && (ref_count == pgcache->ref_count))
pgcache->ref_count--;
else
page = NULL;

/* In non-XDP mode, page's ref_count needs to be '1' for it
* to be recycled.
*/
if (!rbdr->is_xdp && (ref_count != 1))
if (rbdr->is_xdp) {
if (ref_count == pgcache->ref_count)
pgcache->ref_count--;
else
page = NULL;
} else if (ref_count != 1) {
page = NULL;
}
}

if (!page) {
Expand Down Expand Up @@ -365,11 +364,10 @@ static void nicvf_free_rbdr(struct nicvf *nic, struct rbdr *rbdr)
while (head < rbdr->pgcnt) {
pgcache = &rbdr->pgcache[head];
if (pgcache->page && page_ref_count(pgcache->page) != 0) {
if (!rbdr->is_xdp) {
put_page(pgcache->page);
continue;
if (rbdr->is_xdp) {
page_ref_sub(pgcache->page,
pgcache->ref_count - 1);
}
page_ref_sub(pgcache->page, pgcache->ref_count - 1);
put_page(pgcache->page);
}
head++;
Expand Down
9 changes: 8 additions & 1 deletion drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,10 @@ static struct cxgbi_ppm_pool *ppm_alloc_cpu_pool(unsigned int *total,
ppmax = max;

/* pool size must be multiple of unsigned long */
bmap = BITS_TO_LONGS(ppmax);
bmap = ppmax / BITS_PER_TYPE(unsigned long);
if (!bmap)
return NULL;

ppmax = (bmap * sizeof(unsigned long)) << 3;

alloc_sz = sizeof(*pools) + sizeof(unsigned long) * bmap;
Expand Down Expand Up @@ -402,6 +405,10 @@ int cxgbi_ppm_init(void **ppm_pp, struct net_device *ndev,
if (reserve_factor) {
ppmax_pool = ppmax / reserve_factor;
pool = ppm_alloc_cpu_pool(&ppmax_pool, &pool_index_max);
if (!pool) {
ppmax_pool = 0;
reserve_factor = 0;
}

pr_debug("%s: ppmax %u, cpu total %u, per cpu %u.\n",
ndev->name, ppmax, ppmax_pool, pool_index_max);
Expand Down
4 changes: 3 additions & 1 deletion drivers/net/ethernet/hisilicon/hns/hnae.c
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,6 @@ static int hnae_alloc_buffers(struct hnae_ring *ring)
/* free desc along with its attached buffer */
static void hnae_free_desc(struct hnae_ring *ring)
{
hnae_free_buffers(ring);
dma_unmap_single(ring_to_dev(ring), ring->desc_dma_addr,
ring->desc_num * sizeof(ring->desc[0]),
ring_to_dma_dir(ring));
Expand Down Expand Up @@ -183,6 +182,9 @@ static int hnae_alloc_desc(struct hnae_ring *ring)
/* fini ring, also free the buffer for the ring */
static void hnae_fini_ring(struct hnae_ring *ring)
{
if (is_rx_ring(ring))
hnae_free_buffers(ring);

hnae_free_desc(ring);
kfree(ring->desc_cb);
ring->desc_cb = NULL;
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/ethernet/hisilicon/hns/hnae.h
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ struct hnae_buf_ops {
};

struct hnae_queue {
void __iomem *io_base;
u8 __iomem *io_base;
phys_addr_t phy_base;
struct hnae_ae_dev *dev; /* the device who use this queue */
struct hnae_ring rx_ring ____cacheline_internodealigned_in_smp;
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ int hns_mac_clr_multicast(struct hns_mac_cb *mac_cb, int vfn)
static void hns_mac_param_get(struct mac_params *param,
struct hns_mac_cb *mac_cb)
{
param->vaddr = (void *)mac_cb->vaddr;
param->vaddr = mac_cb->vaddr;
param->mac_mode = hns_get_enet_interface(mac_cb);
ether_addr_copy(param->addr, mac_cb->addr_entry_idx[0].addr);
param->mac_id = mac_cb->mac_id;
Expand Down
Loading

0 comments on commit 0548740

Please sign in to comment.