Commits · 0c235113b3c42197dba66baf76697359b03a5046 · nexedi / linux

25 May, 2018 40 commits

sfc: stop the TX queue before pushing new buffers · 0c235113

Martin Habets authored May 24, 2018

efx_enqueue_skb() can push new buffers for the xmit_more functionality.
We must stops the TX queue before this or else the TX queue does not get
restarted and we get a netdev watchdog.

In the error handling we may now need to unwind more than 1 packet, and
we may need to push the new buffers onto the partner queue.

v2: In the error leg also push this queue if xmit_more is set

Fixes: e9117e50 ("sfc: Firmware-Assisted TSO version 2")
Reported-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c235113

net: bridge: add support for port isolation · 7d850abd

Nikolay Aleksandrov authored May 24, 2018

This patch adds support for a new port flag - BR_ISOLATED. If it is set
then isolated ports cannot communicate between each other, but they can
still communicate with non-isolated ports. The same can be achieved via
ACLs but they can't scale with large number of ports and also the
complexity of the rules grows. This feature can be used to achieve
isolated vlan functionality (similar to pvlan) as well, though currently
it will be port-wide (for all vlans on the port). The new test in
should_deliver uses data that is already cache hot and the new boolean
is used to avoid an additional source port test in should_deliver.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d850abd

Merge branch 'nfp-offload-LAG-for-tc-flower-egress' · 9c590490

David S. Miller authored May 24, 2018

Jakub Kicinski says:

====================
nfp: offload LAG for tc flower egress

This series from John adds bond offload to the nfp driver.  Patch 5
exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
hashing matches that of the software LAG.  This may be unnecessarily
conservative, let's see what LAG maintainers think :)

John says:

This patchset sets up the infrastructure and offloads output actions for
when a TC flower rule attempts to egress a packet to a LAG port.

Firstly it adds some of the infrastructure required to the flower app and
to the nfp core. This includes the ability to change the MAC address of a
repr, a function for combining lookup and write to a FW symbol, and the
addition of private data to a repr on a per app basis.

Patch 6 continues by implementing notifiers that track Linux bonds and
communicates to the FW those which enslave reprs, along with the current
state of reprs within the bond.

Patch 7 ensures bonds are synchronised with FW by receiving and acting
upon cmsgs sent to the kernel. These may request that a bond message is
retransmitted when FW can process it, or may request a full sync of the
bonds defined in the kernel.

Patch 8 offloads a flower action when that action requires egressing to a
pre-defined Linux bond.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9c590490

nfp: flower: compute link aggregation action · 7e24a593

John Hurley authored May 23, 2018

If the egress device of an offloaded rule is a LAG port, then encode the
output port to the NFP with a LAG identifier and the offloaded group ID.

A prelag action is also offloaded which must be the first action of the
series (although may appear after other pre-actions - e.g. tunnels). This
causes the FW to check that it has the necessary information to output to
the requested LAG port. If it does not, the packet is sent to the kernel
before any other actions are applied to it.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e24a593

nfp: flower: implement host cmsg handler for LAG · 2e1cc522

John Hurley authored May 23, 2018

Adds the control message handler to synchronize offloaded group config
with that of the kernel. Such messages are sent from fw to driver and
feature the following 3 flags:

- Data: an attached cmsg could not be processed - store for retransmission
- Xon: FW can accept new messages - retransmit any stored cmsgs
- Sync: full sync requested so retransmit all kernel LAG group info
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e1cc522

nfp: flower: monitor and offload LAG groups · bb9a8d03

John Hurley authored May 23, 2018

Monitor LAG events via the NETDEV_CHANGEUPPER/NETDEV_CHANGELOWERSTATE
notifiers to maintain a list of offloadable groups. Sync these groups with
HW via a delayed workqueue to prevent excessive re-configuration. When the
workqueue is triggered it may generate multiple control messages for
different groups. These messages are linked via a batch ID and flags to
indicate a new batch and the end of a batch.

Update private data in each repr to track their LAG lower state flags. The
state of a repr is used to determine the active netdevs that can be
offloaded. For example, in active-backup mode, we only offload the netdev
currently active.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb9a8d03

net: include hash policy in LAG changeupper info · f44aa9ef

John Hurley authored May 23, 2018

LAG upper event notifiers contain the tx type used by the LAG device.
Extend this to also include the hash policy used for tx types that
utilize hashing.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f44aa9ef

nfp: flower: add per repr private data for LAG offload · b9452452

John Hurley authored May 23, 2018

Add a bitmap to each flower repr to track its state if it is enslaved by a
bond. This LAG state may be different to the port state - for example, the
port may be up but LAG state may be down due to the selection in an
active/backup bond.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9452452

nfp: flower: check for/turn on LAG support in firmware · 898bc7d6

John Hurley authored May 23, 2018

Check if the fw contains the _abi_flower_balance_sync_enable symbol. If it
does then write a 1 to this indicating that the driver is willing to
receive NIC to kernel LAG related control messages.

If the write is successful, update the list of extra features supported by
the fw and add a stub to accept LAG cmsgs.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

898bc7d6

nfp: nfpcore: add rtsym writing function · 1945ca7a

John Hurley authored May 23, 2018

Add an rtsym API function that combines the lookup of a symbol and the
writing of a value to it. Values can be written as unsigned 32 or 64 bits.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1945ca7a

nfp: add ndo_set_mac_address for representors · 24f132e2

John Hurley authored May 23, 2018

Adding a netdev to a bond requires that its mac address can be modified.
The default eth_mac_addr is sufficient to satisfy this requirement.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24f132e2

hv_netvsc: fix bogus ifalias on network device · d97cde6a

Stephen Hemminger authored May 23, 2018

If the guest network adapter is not configured with DeviceNaming
enabled on the host, then the query for friendly name will return
success but with a zero length name. Which then leads to a garbage value
(stack contents) for ifalias.

Fix is simple, just don't set name if  host doesn't return it.

Fixes: 0fe554a4 ("hv_netvsc: propogate Hyper-V friendly name into interface alias")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d97cde6a

Merge branch 'net-Update-fib_table_lookup-tracepoints' · de0a8267

David S. Miller authored May 24, 2018

David Ahern says:

====================
net: Update fib_table_lookup tracepoints

Update the FIB lookup tracepoints to include ip proto and port fields
from the flow struct. In the process make the IPv4 tracepoint inline
with IPv6 which is much easier to use and follow the lookup and result.

Remove the tracepoint in fib_validate_source which does not provide
value above the fib_table_lookup which immediately follows it.

v2
- move CREATE_TRACE_POINTS for the v6 tracepoint to route.c to handle
  its need for an internal function to convert route type to error and
  handle IPv6 as a module or builtin. Reported by kbuild robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

de0a8267

net/ipv4: Remove tracepoint in fib_validate_source · c949cbbb

David Ahern authored May 23, 2018

Tracepoint does not add value and the call to fib_lookup follows
it which shows the same information and the fib lookup result.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c949cbbb

net/ipv6: Udate fib6_table_lookup tracepoint · 30d444d3

David Ahern authored May 23, 2018

Commit bb0ad198 ("ipv6: fib6_rules: support for match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30d444d3

net/ipv4: Udate fib_table_lookup tracepoint · 9f323973

David Ahern authored May 23, 2018

Commit 4a2d73a4 ("ipv4: fib_rules: support match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.

In addition, make the IPv4 tracepoint similar to the IPv6 one where
the lookup parameters and result are dumped in 1 event. It is much
easier to use and understand the outcome of the lookup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f323973

net_sched: switch to rcu_work · aaa908ff

Cong Wang authored May 23, 2018

Commit 05f0fe6b ("RCU, workqueue: Implement rcu_work") introduces
new API's for dispatching work in a RCU callback. Now we can just
switch to the new API's for tc filters. This could get rid of a lot
of code.

Cc: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aaa908ff

Merge branch 'Mirroring-tests-involving-VLAN' · 1bb58d2d

David S. Miller authored May 24, 2018

Petr Machata says:

====================
Mirroring tests involving VLAN

This patchset tests mirror-to-gretap with various underlay
configurations involving VLAN netdevice in particular. Some of the tests
involve bridges as well, but tests aimed specifically at testing bridges
(i.e. FDB, STP) are not part of this patchset.

In patches #1-#6, the codebase is adapted to support the new tests.

In patch #7, a test for mirroring to VLAN is introduced.

Patches #8-#10 add three tests where VLAN is part of underlay path after
gretap encapsulation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1bb58d2d

selftests: forwarding: Test mirror-to-gre w/ UL 802.1d+VLAN · 181d95f8

Petr Machata authored May 24, 2018

Test for "tc action mirred egress mirror" that mirrors to GRE when the
underlay route points at an 802.1d bridge and packet egresses through a
VLAN device.

Besides testing basic connectivity, this also tests that the traffic is
properly tagged.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

181d95f8

selftests: forwarding: Test mirror-to-gre w/ UL VLAN · a08fb9f1

Petr Machata authored May 24, 2018

Test for "tc action mirred egress mirror" that mirrors to a gretap
netdevice whose underlay route points at a vlan device.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a08fb9f1

selftests: forwarding: Test mirror-to-gre w/ UL VLAN+802.1q · 0056042f

Petr Machata authored May 24, 2018

Test for "tc action mirred egress mirror" that mirrors to GRE when the
underlay route points at a vlan device on top of a bridge device with
vlan filtering (802.1q).
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0056042f

selftests: forwarding: Test mirror-to-vlan · 35388a6a

Petr Machata authored May 24, 2018

Test for "tc action mirred egress mirror" that mirrors to a vlan device.
- test_vlan() tests that the packets get mirrored
- test_tagged_vlan() tests that the mirrored packets have correct inner
  VLAN tag.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35388a6a

selftests: forwarding: lib: Extract trap_{, un}install() · 87c0c046

Petr Machata authored May 24, 2018

A mirror-to-vlan test that's coming next needs to install the trap
unconditionally. Therefore extract from slow_path_trap_{,un}install()
a more generic functions trap_install() and trap_uninstall(), and covert
the former two to conditional wrappers around these.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87c0c046

selftests: forwarding: mirror_gre_lib: Support VLAN · 1893150f

Petr Machata authored May 24, 2018

Add full_test_span_gre_dir_vlan_ips() and full_test_span_gre_dir_vlan()
to support mirror-to-gre tests that involve VLAN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1893150f

selftests: forwarding: lib: Support VLAN devices · 0e7a504c

Petr Machata authored May 24, 2018

Add vlan_create() and vlan_destroy() to manage VLAN netdevices.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e7a504c

selftests: forwarding: Add $h3's clsact to mirror_topo_lib.sh · 91bac7f9

Petr Machata authored May 24, 2018

Having a clsact qdisc on $h3 is useful in several tests, and will be
useful in more tests to come. Move the registration from all the tests
that need it into the topology file itself.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91bac7f9

selftests: forwarding: mirror_gre_lib: Extract generic functions · d5ea2bfc

Petr Machata authored May 24, 2018

For non-GRE mirroring tests, a functions along the lines of
do_test_span_gre_dir_ips() and test_span_gre_dir_ips() are necessary,
but such that they don't assume tunnels are involved. Extract the code
from mirror_gre_lib.sh to mirror_lib.sh and convert to just use a given
device without assuming it's named "h3-$tundev". Convert the two
above-mentioned functions to wrappers that pass along the correct device
name.

Add test_span_dir() and fail_test_span_dir() to round up the API for use
by following patches.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5ea2bfc

selftests: forwarding: Split mirror_gre_topo_lib.sh · 74ed089d

Petr Machata authored May 24, 2018

Move generic parts of mirror_gre_topo_lib.sh into a new file
mirror_topo_lib.sh. Reuse the functions in GRE topo, adding the tunnel
devices as necessary.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74ed089d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 90fed9c9

David S. Miller authored May 24, 2018

Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-05-24

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).

2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.

3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.

4) Jiong Wang adds support for indirect and arithmetic shifts to NFP

5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.

6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
   to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.

7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.

8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.

9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

90fed9c9

Merge branch 'ibmvnic-Failover-hardening' · 49a473f5

David S. Miller authored May 24, 2018

Thomas Falcon says:

====================
ibmvnic: Failover hardening

Introduce additional transport event hardening to handle
events during device reset. In the driver's current state,
if a transport event is received during device reset, it can
cause the device to become unresponsive as invalid operations
are processed as the backing device context changes. After
a transport event, the device expects a request to begin the
initialization process. If the driver is still processing
a previously queued device reset in this state, it is likely
to fail as firmware will reject any commands other than the
one to initialize the client driver's Command-Response Queue.

Instead of failing and becoming dormant, the driver will make
one more attempt to recover and continue operation. This is
achieved by setting a state flag, which if true will direct
the driver to clean up all allocated resources and perform
a hard reset in an attempt to bring the driver back to an
operational state.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

49a473f5

ibmvnic: Introduce hard reset recovery · 2770a798

Thomas Falcon authored May 23, 2018

Introduce a recovery hard reset to handle reset failure as a result of
change of device context following a transport event, such as a
backing device failover or partition migration. These operations reset
the device context to its initial state. If this occurs during a reset,
any initialization commands are likely to fail with an invalid state
error as backing device firmware requests reinitialization.

When this happens, make one more attempt by performing a hard reset,
which frees any resources currently allocated and performs device
initialization. If a transport event occurs during a device reset, a
flag is set which will trigger a new hard reset following the
completionof the current reset event.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2770a798

ibmvnic: Set resetting state at earliest possible point · 06e43d7f

Thomas Falcon authored May 23, 2018

Set device resetting state at the earliest possible point: as soon as a
reset is successfully scheduled. The reset state is toggled off when
all resets have been processed to completion.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06e43d7f

ibmvnic: Create separate initialization routine for resets · 8a348450

Thomas Falcon authored May 23, 2018

Instead of having one initialization routine for all cases, create
a separate, simpler function for standard initialization, such as during
device probe. Use the original initialization function to handle
device reset scenarios. The goal of this patch is to avoid having
a single, cluttered init function to handle all possible
scenarios.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a348450

ibmvnic: Handle error case when setting link state · ab5ec33b

Thomas Falcon authored May 23, 2018

If setting the link state is not successful, print a warning
with the resulting return code and return it to be handled
by the caller.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab5ec33b

ibmvnic: Return error code if init interrupted by transport event · 17c87058

Thomas Falcon authored May 23, 2018

If device init is interrupted by a failover, set the init return
code so that it can be checked and handled appropriately by the
init routine.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17c87058

ibmvnic: Check CRQ command return codes · 9c4eaabd

Thomas Falcon authored May 23, 2018

Check whether CRQ command is successful before awaiting a response
from the management partition. If the command was not successful, the
driver may hang waiting for a response that will never come.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c4eaabd

ibmvnic: Introduce active CRQ state · 5153698e

Thomas Falcon authored May 23, 2018

Introduce an "active" state for a IBM vNIC Command-Response Queue. A CRQ
is considered active once it has initialized or linked with its partner by
sending an initialization request and getting a successful response back
from the management partition. Until this has happened, do not allow CRQ
commands to be sent other than the initialization request.

This change will avoid a protocol error in case of a device transport
event occurring during a initialization. When the driver receives a
transport event notification indicating that the backing hardware
has changed and needs reinitialization, any further commands other
than the initialization handshake with the VIOS management partition
will result in an invalid state error. Instead of sending a command
that will be returned with an error, print a warning and return an
error that will be handled by the caller.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5153698e

ibmvnic: Mark NAPI flag as disabled when released · c3f22415

Thomas Falcon authored May 23, 2018

Set adapter NAPI state as disabled if they are removed. This will allow
them to be enabled again if reallocated in case of a hard reset.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3f22415

Merge branch 'gretap-mirroring-selftests' · 180f848b

David S. Miller authored May 24, 2018

Petr Machata says:

====================
selftests: forwarding: Additions to mirror-to-gretap tests

This patchset is for a handful of edge cases in mirror-to-gretap
scenarios: removal of mirrored-to netdevice (#1), removal of underlay
route for tunnel remote endpoint (#2) and cessation of mirroring upon
removal of flower mirroring rule (#3).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

180f848b

selftests: forwarding: Test removal of mirroring · a96d81a2

Petr Machata authored May 23, 2018

Test that when flower-based mirror action is removed, mirroring stops.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a96d81a2