Commits · 2d0f0ca2c7b56c1df29429dd5a768fc49e79ffae · Kirill Smelkov / linux

18 Oct, 2018 40 commits

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 2d0f0ca2

David S. Miller authored Oct 18, 2018

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2018-10-17

This series adds support for the new igc driver.

The igc driver is the new client driver supporting the Intel I225
Ethernet Controller, which supports 2.5GbE speeds.  The reason for
creating a new client driver, instead of adding support for the new
device in e1000e, is that the silicon behaves more like devices
supported in igb driver.  It also did not make sense to add a client
part, to the igb driver which supports only 1GbE server parts.

This initial set of patches is designed for basic support (i.e. link and
pass traffic).  Follow-on patch series will add more advanced support
like VLAN, Wake-on-LAN, etc..
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2d0f0ca2

Merge tag 'mlx5-updates-2018-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 99e9acd8

David S. Miller authored Oct 18, 2018

mlx5-updates-2018-10-17

========================================================================

From Or Gerlitz <ogerlitz@mellanox.com>:

This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains.

This is made of four building blocks (along with few minor driver refactors):

[1] Split FDB fast path prio to multiple namespaces

Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1).
The slow path contains the per representor SQ send-to-vport TX rule and the match-all
RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path
to multiple namespaces  (sub namespaces), each with multiple priorities.

[2] E-Switch chains and priorities

A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains,
and a flow table for each priority in them.

Because these namespaces are parallel and in series to the slow path
fdb, the chains aren't connected to each other (but to the slow path),
and one must use a explicit goto action to reach a different chain.

Flow tables for the priorities are created on demand and destroyed
once not used.

[3] Add a no-append flow insertion mode, use it for TC offloads

Enhance the driver fs core, such that if a no-append flag is set by the caller,
we add a new FTE, instead of appending the actions of the inserted rule when
the same match already exists.

For encap rules, we defer the HW offloading till we have a valid neighbor. This can
result in the packet hitting a lower priority rule in the HW DP. Use the no-append API
to push these packets to the slow path FDB table, so they go to the TC kernel DP as done
before priorities where supported.

[4] Offloading tc priorities and chains for eswitch flows

Using [1], [2] and [3] above we add the support for offloading both chains
and priorities. To get to a new chain, use the tc goto action. We support
a fixed prio range 1-16, and chains 0-3.
=============================================================================
Signed-off-by: David S. Miller <davem@davemloft.net>

99e9acd8

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 8f18da47

David S. Miller authored Oct 18, 2018

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-10-18

1) Remove an unnecessary dev->tstats check in xfrmi_get_stats64.
   From Li RongQing.

2) We currently do a sizeof(element) instead of a sizeof(array)
   check when initializing the ovec array of the secpath.
   Currently this array can have only one element, so code is
   OK but error-prone. Change this to do a sizeof(array)
   check so that we can add more elements in future.
   From Li RongQing.

3) Improve xfrm IPv6 address hashing by using the complete IPv6
   addresses for a hash. From Michal Kubecek.

Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8f18da47

net: skbuff.h: Mark expected switch fall-throughs · 82385b0d

Gustavo A. R. Silva authored Oct 17, 2018

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

82385b0d

net: ena: enable Low Latency Queues · 9fd25592

Arthur Kiyanovski authored Oct 17, 2018

Use the new API to enable usage of LLQ.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9fd25592

net: ena: Fix Kconfig dependency on X86 · 8c590f97

Netanel Belgazal authored Oct 17, 2018

The Kconfig limitation of X86 is to too wide.
The ENA driver only requires a little endian dependency.

Change the dependency to be on little endian CPU.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c590f97

Merge branch 'tcp_bbr-TCP-BBR-changes-for-EDT-pacing-model' · a58598a4

David S. Miller authored Oct 17, 2018

Neal Cardwell says:

====================
tcp_bbr: TCP BBR changes for EDT pacing model

Two small patches for TCP BBR to follow up with Eric's recent work to change
the TCP and fq pacing machinery to an "earliest departure time" (EDT) model:

- The first patch adjusts the TCP BBR logic to work with the new
  "earliest departure time" (EDT) pacing model.

- The second patch adjusts the TCP BBR logic to centralize the setting
  of gain values, to simplify the code and prepare for future changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a58598a4

tcp_bbr: centralize code to set gains · cf33e25c

Neal Cardwell authored Oct 16, 2018

Centralize the code that sets gains used for computing cwnd and pacing
rate. This simplifies the code and makes it easier to change the state
machine or (in the future) dynamically change the gain values and
ensure that the correct gain values are always used.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf33e25c

tcp_bbr: adjust TCP BBR for departure time pacing · a87c83d5

Neal Cardwell authored Oct 16, 2018

Adjust TCP BBR for the new departure time pacing model in the recent
commit ab408b6d ("tcp: switch tcp and sch_fq to new earliest
departure time model").

With TSQ and pacing at lower layers, there are often several skbs
queued in the pacing layer, and thus there is less data "in the
network" than "in flight".

With departure time pacing at lower layers (e.g. fq or potential
future NICs), the data in the pacing layer now has a pre-scheduled
("baked-in") departure time that cannot be changed, even if the
congestion control algorithm decides to use a new pacing rate.

This means that there can be a non-trivial lag between when BBR makes
a pacing rate change and when the inter-skb pacing delays
change. After a pacing rate change, the number of packets in the
network can gradually evolve to be higher or lower, depending on
whether the sending rate is higher or lower than the delivery
rate. Thus ignoring this lag can cause significant overshoot, with the
flow ending up with too many or too few packets in the network.

This commit changes BBR to adapt its pacing rate based on the amount
of data in the network that it estimates has already been "baked in"
by previous departure time decisions. We estimate the number of our
packets that will be in the network at the earliest departure time
(EDT) for the next skb scheduled as:

   in_network_at_edt = inflight_at_edt - (EDT - now) * bw

If we're increasing the amount of data in the network ("in_network"),
then we want to know if the transmit of the EDT skb will push
in_network above the target, so our answer includes
bbr_tso_segs_goal() from the skb departing at EDT. If we're decreasing
in_network, then we want to know if in_network will sink too low just
before the EDT transmit, so our answer does not include the segments
from the skb departing at EDT.

Why do we treat pacing_gain > 1.0 case and pacing_gain < 1.0 case
differently? The in_network curve is a step function: in_network goes
up on transmits, and down on ACKs. To accurately predict when
in_network will go beyond our target value, this will happen on
different events, depending on whether we're concerned about
in_network potentially going too high or too low:

 o if pushing in_network up (pacing_gain > 1.0),
   then in_network goes above target upon a transmit event

 o if pushing in_network down (pacing_gain < 1.0),
   then in_network goes below target upon an ACK event

This commit changes the BBR state machine to use this estimated
"packets in network" value to make its decisions.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a87c83d5

net/ncsi: Add NCSI Broadcom OEM command · cb10c7c0

Vijay Khemka authored Oct 16, 2018

This patch adds OEM Broadcom commands and response handling. It also
defines OEM Get MAC Address handler to get and configure the device.

ncsi_oem_gma_handler_bcm: This handler send NCSI broadcom command for
getting mac address.
ncsi_rsp_handler_oem_bcm: This handles response received for all
broadcom OEM commands.
ncsi_rsp_handler_oem_bcm_gma: This handles get mac address response and
set it to device.
Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb10c7c0

Merge branch 'mscc-fixes' · 1010c17e

David S. Miller authored Oct 17, 2018

Gustavo A. R. Silva says:

====================
fix signedness bug and memory leak in mscc driver

This patchset aims to fix a signedness bug in function
vsc85xx_downshift_get() and a memory leak in function
vsc8574_config_pre_init().

Changes in v3:
 - Add Quentin's Reviewed-by to commit log in patch 2/2.
 - Post the series to netdev.

Changes in v2:
 - Add Quentin's Reviewed-by to commit log in patch 1/2.
 - Jump to out label so all functions in the driver exit with the PHY
   set to access the standard page. Thanks to Quentin Schulz for
   pointing this out.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1010c17e

net: phy: mscc: fix memory leak in vsc8574_config_pre_init · 47d20212

Gustavo A. R. Silva authored Oct 16, 2018

In case memory resources for *fw* were successfully allocated,
release them before return.

Addresses-Coverity-ID: 1473968 ("Resource leak")
Fixes: 00d70d8e ("net: phy: mscc: add support for VSC8574 PHY")
Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47d20212

net: phy: mscc: fix signedness bug in vsc85xx_downshift_get · e519869a

Gustavo A. R. Silva authored Oct 16, 2018

Currently, the error handling for the call to function
phy_read_paged() doesn't work because *reg_val* is of
type u16 (16 bits, unsigned), which makes it impossible
for it to hold a value less than 0.

Fix this by changing the type of variable *reg_val* to int.

Addresses-Coverity-ID: 1473970 ("Unsigned compared against 0")
Fixes: 6a0bfbbe ("net: phy: mscc: migrate to phy_select/restore_page functions")
Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e519869a

net: fix warning in af_unix · 33c4368e

Kyeongdon Kim authored Oct 16, 2018

This fixes the "'hash' may be used uninitialized in this function"

net/unix/af_unix.c:1041:20: warning: 'hash' may be used uninitialized in this function [-Wmaybe-uninitialized]
addr->hash = hash ^ sk->sk_type;
Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

33c4368e

net: dsa: mv88e6xxx: Fix 88E6141/6341 2500mbps SERDES speed · 26422340

Marek Behún authored Oct 13, 2018

This is a fix for the port_set_speed method for the Topaz family.
Currently the same method is used as for the Peridot family, but
this is wrong for the SERDES port.

On Topaz, the SERDES port is port 5, not 9 and 10 as in Peridot.
Moreover setting alt_bit on Topaz only makes sense for port 0 (for
(differentiating 100mbps vs 200mbps). The SERDES port does not
support more than 2500mbps, so alt_bit does not make any difference.
Signed-off-by: Marek Behún <marek.behun@nic.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

26422340

Merge branch 'octeontx2-af-NPA-and-NIX-blocks-initialization' · e943d94e

David S. Miller authored Oct 17, 2018

Sunil Goutham says:

====================
octeontx2-af: NPA and NIX blocks initialization

This patchset is a continuation to earlier submitted patch series
to add a new driver for Marvell's OcteonTX2 SOC's
Resource virtualization unit (RVU) admin function driver.

octeontx2-af: Add RVU Admin Function driver
https://www.spinics.net/lists/netdev/msg528272.html

This patch series adds logic for the following.
- Modified register polling loop to use time_before(jiffies, timeout),
  as suggested by Arnd Bergmann.
- Support to forward interface link status notifications sent by
  firmware to registered PFs mapped to a CGX::LMAC.
- Support to set CGX LMAC in loopback mode, retrieve stats,
  configure DMAC filters at CGX level etc.
- Network pool allocator (NPA) functional block initialization,
  admin queue support, NPALF aura/pool contexts memory allocation, init
  and deinit.
- Network interface controller (NIX) functional block basic init,
  admin queue support, NIXLF RQ/CQ/SQ HW contexts memory allocation,
  init and deinit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e943d94e

octeontx2-af: Support for disabling NIX RQ/SQ/CQ contexts · 557dd485

Geetha sowjanya authored Oct 16, 2018

This patch adds support for a RVU PF/VF to disable all RQ/SQ/CQ
contexts of a NIX LF via mbox. This will be used by PF/VF drivers
upon teardown or while freeing up HW resources.

A HW context which is not INIT'ed cannot be modified and a
RVU PF/VF driver may or may not INIT all the RQ/SQ/CQ contexts.
So a bitmap is introduced to keep track of enabled NIX RQ/SQ/CQ
contexts, so that only enabled hw contexts are disabled upon LF
teardown.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

557dd485

octeontx2-af: NIX AQ instruction enqueue support · ffb0abd7

Sunil Goutham authored Oct 16, 2018

Add support for a RVU PF/VF to submit instructions to NIX AQ
via mbox. Instructions can be to init/write/read RQ/SQ/CQ/RSS
contexts. In case of read, context will be returned as part of
response to the mbox msg received.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffb0abd7

octeontx2-af: Alloc bitmaps for NIX Tx scheduler queues · 709a4f0c

Sunil Goutham authored Oct 16, 2018

Allocate bitmaps and memory for PFVF mapping info for
maintaining NIX transmit scheduler queues maintenance.
PF/VF drivers will request for alloc, free e.t.c of
Tx schedulers via mailbox.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

709a4f0c

octeontx2-af: NIX LSO config for TSOv4/v6 offload · 59360e98

Sunil Goutham authored Oct 16, 2018

Config LSO formats for TSOv4 and TSOv6 offloads.
These formats tell HW which fields in the TCP packet's
headers have to be updated while performing segmentation
offload.

Also report PF/VF drivers the LSO format indices as part
of response to NIX_LF_ALLOC mbox msg. These indices are
used in SQE extension headers while framing SQE for pkt
transmission with TSO offload.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59360e98

octeontx2-af: NIX block LF initialization · cb30711a

Sunil Goutham authored Oct 16, 2018

Upon receiving NIX_LF_ALLOC mbox message allocate memory for
NIXLF's CQ, SQ, RQ, CINT, QINT and RSS HW contexts and configure
respective base iova HW. Enable caching of contexts into NIX NDC.

Return SQ buffer (SQB) size, this PF/VF MAC address etc info
e.t.c to the mbox msg sender.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb30711a

octeontx2-af: NIX block admin queue init · aba53d5d

Sunil Goutham authored Oct 16, 2018

Initialize NIX admin queue (AQ) i.e alloc memory for
AQ instructions and for the results. All NIX LFs will submit
instructions to AQ to init/write/read RQ/SQ/CQ/RSS contexts
and in case of read, get context from result memory.

Also before configuring/using NIX block calibrate X2P bus
and check if NIX interfaces like CGX and LBK are in active
and working state.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aba53d5d

octeontx2-af: Support for disabling NPA Aura/Pool contexts · 57856dde

Geetha sowjanya authored Oct 16, 2018

This patch adds support for a RVU PF/VF to disable all Aura/Pool
contexts of a NPA LF via mbox. This will be used by PF/VF drivers
upon teardown or while freeing up HW resources.

A HW context which is not INIT'ed cannot be modified and a
RVU PF/VF driver may or may not INIT all the Aura/Pool contexts.
So a bitmap is introduced to keep track of enabled NPA Aura/Pool
contexts, so that only enabled hw contexts are disabled upon LF
teardown.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57856dde

octeontx2-af: NPA AQ instruction enqueue support · 4a3581cd

Sunil Goutham authored Oct 16, 2018

Add support for a RVU PF/VF to submit instructions to NPA AQ
via mbox. Instructions can be to init/write/read Aura/Pool/Qint
contexts. In case of read, context will be returned as part of
response to the mbox msg received.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a3581cd

octeontx2-af: NPA block LF initialization · 3fa4c323

Sunil Goutham authored Oct 16, 2018

Upon receiving NPA_LF_ALLOC mbox message allocate memory for
NPALF's aura, pool and qint contexts and configure the same
to HW. Enable caching of contexts into NPA NDC.

Return pool related info like stack size, num pointers per
stack page e.t.c to the mbox msg sender.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fa4c323

octeontx2-af: NPA block admin queue init · 7a37245e

Sunil Goutham authored Oct 16, 2018

Initialize NPA admin queue (AQ) i.e alloc memory for
AQ instructions and for the results. All NPA LFs will submit
instructions to AQ to init/write/read Aura/Pool contexts
and in case of read, get context from result memory.

Added some common APIs for allocating memory for a queue
and get IOVA in return, these APIs will be used by
NIX AQ and for other purposes.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a37245e

octeontx2-af: Enable or disable CGX internal loopback · 23999b30

Geetha sowjanya authored Oct 16, 2018

Add support to enable or disable internal loopback mode in CGX.
New mbox IDs CGX_INTLBK_ENABLE/DISABLE added for this.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23999b30

octeontx2-af: Forward CGX link notifications to PFs · 61071a87

Linu Cherian authored Oct 16, 2018

Upon receiving notification from firmware the CGX event handler
in the AF driver gets the current link info such as status, speed,
duplex etc from CGX driver and sends it across to PFs who have
registered to receive such notifications.

To support above
 - Mbox messaging support for sending msgs from AF to PF has been added.
 - Added mbox msgs so that PFs can register/unregister for link events.
 - Link notifications are sent to PF under two scenarioss.
  1. When a asynchronous link change notification is received from
     firmware with notification flag turned on for that PF.
  2. Upon notification turn on request, the current link status is
     send to the PF.

Also added a new mailbox msg using which RVU PF/VF can retrieve
their mapped CGX LMAC's current link info. Link info includes
status, speed, duplex and lmac type.
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61071a87

octeontx2-af: Support for MAC address filters in CGX · 96be2e0d

Vidhya Raman authored Oct 16, 2018

This patch adds support for setting MAC address filters in CGX
for PF interfaces. Also PF interfaces can be put in promiscuous
mode. Dataplane PFs access this functionality using mailbox
messages to the AF driver.
Signed-off-by: Vidhya Raman <vraman@marvell.com>
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96be2e0d

octeontx2-af: Support to retrieve CGX LMAC stats · 66208910

Christina Jacob authored Oct 16, 2018

This patch adds support for a RVU PF/VF driver to retrieve
it's mapped CGX LMAC Rx and Tx stats from AF via mbox.
New mailbox msg is added is added.
Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

66208910

octeontx2-af: CGX Rx/Tx enable/disable mbox handlers · 1435f66a

Sunil Goutham authored Oct 16, 2018

Added new mailbox msgs for RVU PF/VFs to request AF
to enable/disable their mapped CGX::LMAC Rx & Tx.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1435f66a

octeontx2-af: Improve register polling loop · 6ca3ee2f

Sunil Goutham authored Oct 16, 2018

Instead of looping on a integer timeout, use time_before(jiffies),
so that maximum poll time is capped.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ca3ee2f

Merge branch 'mlxsw-Add-VxLAN-support' · 53e50a6e

David S. Miller authored Oct 17, 2018

Ido Schimmel says:

====================
mlxsw: Add VxLAN support

This patchset adds support for VxLAN offload in the mlxsw driver.

With regards to the forwarding plane, VxLAN support is composed from two
main parts: Encapsulation and decapsulation.

In the device, NVE encapsulation (and VxLAN in particular) takes place
in the bridge. A packet can be encapsulated using VxLAN either because
it hit an FDB entry that forwards it to the router with the IP of the
remote VTEP or because it was flooded, in which case it is sent to a
list of remote VTEPs (in addition to local ports). In either case, the
VNI is derived from the filtering identifier (FID) the packet was
classified to at ingress and the underlay source IP is taken from a
device global configuration.

VxLAN decapsulation takes place in the underlay router, where packets
that hit a local route that corresponds to the source IP of the local
VTEP are decapsulated and injected to the bridge. The packets are
classified to a FID based on the VNI they came with.

The first six patches export the required APIs in the VxLAN and mlxsw
drivers in order to allow for the introduction of the NVE core in the
next two patches. The NVE core is designed to support a variety of NVE
encapsulations (e.g., VxLAN, NVGRE) and different ASICs, but currently
only VxLAN and Spectrum are supported. Spectrum-2 support will be added
in the future.

The last 10 patches add support for VxLAN decapsulation and
encapsulation and include the addition of the required switchdev APIs in
the VxLAN driver. These APIs allow capable drivers to get a notification
about the addition / deletion of FDB entries to / from the VxLAN's FDB.

Subsequent patchset will add selftests (generic and mlxsw-specific),
data plane learning, FDB extack and vetoing and support for VLAN-aware
bridges (one VNI per VxLAN device model).

v2:
* Implement netif_is_vxlan() using rtnl_link_ops->kind (Jakub & Stephen)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

53e50a6e

mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation · 1231e04f

Ido Schimmel authored Oct 17, 2018

In the device, VxLAN encapsulation takes place in the FDB table where
certain {MAC, FID} entries are programmed with an underlay unicast IP.
MAC addresses that are not programmed in the FDB are flooded to the
relevant local ports and also to a list of underlay unicast IPs that are
programmed using the all zeros MAC address in the VxLAN driver.

One difference between the hardware and software data paths is the fact
that in the software data path there are two FDB lookups prior to the
encapsulation of the packet. First in the bridge's FDB table using {MAC,
VID} and another in the VxLAN's FDB table using {MAC, VNI}.

Therefore, when a new VxLAN FDB entry is notified, it is only programmed
to the device if there is a corresponding entry in the bridge's FDB
table. Similarly, when a new bridge FDB entry pointing to the VxLAN
device is notified, it is only programmed to the device if there is a
corresponding entry in the VxLAN's FDB table.

Note that the above scheme will result in a discrepancy between both
data paths if only one FDB table is populated in the software data path.
For example, if only the bridge's FDB is populated with an entry
pointing to a VxLAN device, then a packet hitting the entry will only be
flooded by the kernel to remote VTEPs whereas the device will also flood
the packets to other local ports member in the VLAN.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1231e04f

mlxsw: spectrum: Enable VxLAN enslavement to bridges · 1c30d183

Ido Schimmel authored Oct 17, 2018

Enslavement of VxLAN devices to offloaded bridges was never forbidden by
mlxsw, but this patch makes sure the required configuration is performed
in order to allow VxLAN encapsulation and decapsulation to take place in
the device.

The patch handles both the case where a VxLAN device is enslaved to an
already offloaded bridge and the case where the first mlxsw port is
enslaved to a bridge that already has VxLAN device configured.

Invalid configurations are sanitized and an error string is returned via
extack.

Since encapsulation and decapsulation do not occur when the VxLAN device
is down, the driver makes sure to enable / disable these functionalities
based on NETDEV_PRE_UP and NETDEV_DOWN events.

Note that NETDEV_PRE_UP is used in favor of NETDEV_UP, as the former
allows to veto the operation, if necessary.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c30d183

bridge: switchdev: Allow clearing FDB entry offload indication · e9ba0fbc

Ido Schimmel authored Oct 17, 2018

Currently, an FDB entry only ceases being offloaded when it is deleted.
This changes with VxLAN encapsulation.

Devices capable of performing VxLAN encapsulation usually have only one
FDB table, unlike the software data path which has two - one in the
bridge driver and another in the VxLAN driver.

Therefore, bridge FDB entries pointing to a VxLAN device are only
offloaded if there is a corresponding entry in the VxLAN FDB.

Allow clearing the offload indication in case the corresponding entry
was deleted from the VxLAN FDB.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9ba0fbc

vxlan: Notify for each remote of a removed FDB entry · 045a5a99

Petr Machata authored Oct 17, 2018

When notifications are sent about FDB activity, and an FDB entry with
several remotes is removed, the notification is sent only for the first
destination. That makes it impossible to distinguish between the case
where only this first remote is removed, and the one where the FDB entry
is removed as a whole.

Therefore send one notification for each remote of a removed FDB entry.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

045a5a99

vxlan: Support marking RDSTs as offloaded · 0efe1173

Petr Machata authored Oct 17, 2018

Offloaded bridge FDB entries are marked with NTF_OFFLOADED. Implement a
similar mechanism for VXLAN, where a given remote destination can be
marked as offloaded.

To that end, introduce a new event, SWITCHDEV_VXLAN_FDB_OFFLOADED,
through which the marking is communicated to the vxlan driver. To
identify which RDST should be marked as offloaded, an
switchdev_notifier_vxlan_fdb_info is passed to the listeners. The
"offloaded" flag in that object determines whether the offloaded mark
should be set or cleared.

When sending offloaded FDB entries over netlink, mark them with
NTF_OFFLOADED.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0efe1173

vxlan: Add vxlan_fdb_find_uc() for FDB querying · 1941f1d6

Petr Machata authored Oct 17, 2018

A switchdev-capable driver that is aware of VXLAN may need to query
VXLAN FDB. In the particular case of mlxsw, this functionality is
limited to querying UC FDBs. Those being easier to deal with than the
general case of RDST chain traversal, introduce an interface to query
specifically UC FDBs: vxlan_fdb_find_uc().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1941f1d6

vxlan: Add switchdev notifications · 9a997353

Petr Machata authored Oct 17, 2018

When offloading VXLAN devices, drivers need to know about events in
VXLAN FDB database. Since VXLAN models a bridge, it is natural to
distribute the VXLAN FDB notifications using the pre-existing switchdev
notification mechanism.

To that end, introduce two new notification types:
SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE and SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE.
Introduce a new function, vxlan_fdb_switchdev_call_notifiers() to send
the new notifier types, and a struct switchdev_notifier_vxlan_fdb_info
to communicate the details of the FDB entry under consideration.

Invoke the new function from vxlan_fdb_notify().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a997353