Commits · 47b02f7294a483387772a46931da942b2ca9d845 · Kirill Smelkov / linux

10 Sep, 2016 2 commits

dwc_eth_qos: do not register semi-initialized device · 47b02f72

Lars Persson authored Sep 08, 2016

We move register_netdev() to the end of dwceqos_probe() to close any
races where the netdev callbacks are called before the initialization
has finished.
Reported-by: Pavel Andrianov <andrianov@ispras.ru>
Signed-off-by: Lars Persson <larper@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47b02f72

sctp: identify chunks that need to be fragmented at IP level · 7303a147

Marcelo Ricardo Leitner authored Sep 08, 2016

Previously, without GSO, it was easy to identify it: if the chunk didn't
fit and there was no data chunk in the packet yet, we could fragment at
IP level. So if there was an auth chunk and we were bundling a big data
chunk, it would fragment regardless of the size of the auth chunk. This
also works for the context of PMTU reductions.

But with GSO, we cannot distinguish such PMTU events anymore, as the
packet is allowed to exceed PMTU.

So we need another check: to ensure that the chunk that we are adding,
actually fits the current PMTU. If it doesn't, trigger a flush and let
it be fragmented at IP level in the next round.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7303a147

09 Sep, 2016 8 commits

Merge branch 'mlxsw-fixes' · 1b672f5f

David S. Miller authored Sep 09, 2016

Jiri Pirko says:

====================
mlxsw: couple of fixes

Couple of fixes from Ido and myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1b672f5f

mlxsw: spectrum: Set port type before setting its address · 3247ff2b

Ido Schimmel authored Sep 08, 2016

During port init, we currently set the port's type to Ethernet after
setting its MAC address. However, the hardware documentation states this
should be the other way around.

Align the driver with the hardware documentation and set the port's MAC
address after setting its type.

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3247ff2b

mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init · 40d25904

Jiri Pirko authored Sep 08, 2016

When neigh_init fails, we have to do proper cleanup including
router_fini call.

Fixes: 6cf3c971 ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40d25904

Merge branch 'nfp-fixes' · 2c2c8e33

David S. Miller authored Sep 08, 2016

Jakub Kicinski says:

====================
nfp: fixes and trivial cleanup

First patch drops unnecessary version.h includes.  Second one
drops support for pre-release versions of FW ABI.  Removing
FW ABI 0.0 from supported set is particularly good since 0
could just be uninitialized memory.  Last but not least I drop
unnecessary padding of frames on RX which makes us count bytes
incorrectly for the VF2VF traffic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2c2c8e33

nfp: don't pad frames on receive · ebecefc8

Jakub Kicinski authored Sep 07, 2016

There is no need to pad frames to ETH_ZLEN on RX.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebecefc8

nfp: drop support for old firmware ABIs · 313b345c

Jakub Kicinski authored Sep 07, 2016

Be more strict about FW versions.  Drop support for old
transitional revisions which were never used in production.
Dropping support for FW ABI version 0.0.0.0 is particularly
useful because 0 could just be uninitialized memory.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

313b345c

nfp: remove linux/version.h includes · 312fada1

Jakub Kicinski authored Sep 07, 2016

Remove unnecessary version.h includes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

312fada1

tcp: cwnd does not increase in TCP YeAH · db7196a0

Artem Germanov authored Sep 07, 2016

Commit 76174004
(tcp: do not slow start when cwnd equals ssthresh )
introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual
ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP
connection and kernel 4.3 and above (including 4.8-rc4) shows bandwidth
~100KB/s.
   That is caused by stalled cwnd when cwnd equals ssthresh. This patch
fixes it by proper increasing cwnd in this case.
Signed-off-by: Artem Germanov <agermanov@anchorfree.com>
Acked-by: Dmitry Adamushko <d.adamushko@anchorfree.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db7196a0

08 Sep, 2016 9 commits

Merge branch 'mlx5-fixes' · 81d1a366

David S. Miller authored Sep 08, 2016

Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes 2016-09-07

The following series contains bug fixes for the mlx5e driver.

from Gal,
	- Static code checker cleanup (casting overflow)
	- Fix global PFC counter statistics reading
	- Fix HW LRO when vlan stripping is off

From Bodong,
	- Deprecate old autoneg capability bit and use new one.

From Tariq,
	- Fix xmit more counter race condition
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

81d1a366

net/mlx5e: Fix parsing of vlan packets when updating lro header · cd17d230

Gal Pressman authored Sep 07, 2016

Currently vlan tagged packets were not parsed correctly
and assumed to be regular IPv4/IPv6 packets.
We should check for 802.1Q/802.1ad tags and update the lro header
accordingly.
This fixes the use case where LRO is on and rxvlan is off
(vlan stripping is off).

Fixes: e586b3b0 ('net/mlx5: Ethernet Datapath files')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd17d230

net/mlx5e: Fix global PFC counters replication · 4e39883d

Gal Pressman authored Sep 07, 2016

Currently when reading global PFC statistics we left the counter
iterator out of the equation and we ended up reading the same counter
over and over again.

Instead of reading the counter at index 0 on every iteration we now read
the counter at index (i).

Fixes: e989d5a5 ('net/mlx5e: Expose flow control counters to ethtool')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e39883d

net/mlx5e: Prevent casting overflow · 7abc2110

Gal Pressman authored Sep 07, 2016

On 64 bits architectures unsigned long is longer than u32,
casting to unsigned long will result in overflow.
We need to first allocate an unsigned long variable, then assign the
wanted value.

Fixes: 665bc539 ('net/mlx5e: Use new ethtool get/set link ksettings API')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7abc2110

net/mlx5e: Move an_disable_cap bit to a new position · e7e31ca4

Bodong Wang authored Sep 07, 2016

Previous an_disable_cap position bit31 is deprecated to be use in driver
with newer firmware.  New firmware will advertise the same capability
in bit29.

Old capability didn't allow setting more than one protocol for a
specific speed when autoneg is off, while newer firmware will allow
this and it is indicated in the new capability location.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7e31ca4

net/mlx5e: Fix xmit_more counter race issue · 0dbf657c

Tariq Toukan authored Sep 07, 2016

Update the xmit_more counter before notifying the HW,
to prevent a possible use-after-free of the skb.

Fixes: c8cf78fe ("net/mlx5e: Add ethtool counter for TX xmit_more")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0dbf657c

tcp: fastopen: avoid negative sk_forward_alloc · 76061f63

Eric Dumazet authored Sep 07, 2016

When DATA and/or FIN are carried in a SYN/ACK message or SYN message,
we append an skb in socket receive queue, but we forget to call
sk_forced_mem_schedule().

Effect is that the socket has a negative sk->sk_forward_alloc as long as
the message is not read by the application.

Josh Hunt fixed a similar issue in commit d22e1537 ("tcp: fix tcp
fin memory accounting")

Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76061f63

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 40e3012e

David S. Miller authored Sep 08, 2016

Steffen Klassert says:

====================
ipsec 2016-09-08

1) Fix a crash when xfrm_dump_sa returns an error.
   From Vegard Nossum.

2) Remove some incorrect WARN() on normal error handling.
   From Vegard Nossum.

3) Ignore socket policies when rebuilding hash tables,
   socket policies are not inserted into the hash tables.
   From Tobias Brunner.

4) Initialize and check tunnel pointers properly before
   we use it. From Alexey Kodanev.

5) Fix l3mdev oif setting on xfrm dst lookups.
   From David Ahern.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

40e3012e

MAINTAINERS: Update CPMAC email address · 9dd4aaef

Florian Fainelli authored Sep 06, 2016

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9dd4aaef

06 Sep, 2016 6 commits

ipv6: addrconf: fix dev refcont leak when DAD failed · 751eb6b6

Wei Yongjun authored Sep 05, 2016

In general, when DAD detected IPv6 duplicate address, ifp->state
will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
delayed work, the call tree should be like this:

ndisc_recv_ns
  -> addrconf_dad_failure        <- missing ifp put
     -> addrconf_mod_dad_work
       -> schedule addrconf_dad_work()
         -> addrconf_dad_stop()  <- missing ifp hold before call it

addrconf_dad_failure() called with ifp refcont holding but not put.
addrconf_dad_work() call addrconf_dad_stop() without extra holding
refcount. This will not cause any issue normally.

But the race between addrconf_dad_failure() and addrconf_dad_work()
may cause ifp refcount leak and netdevice can not be unregister,
dmesg show the following messages:

IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
...
unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Cc: stable@vger.kernel.org
Fixes: c15b1cca ("ipv6: move DAD and addrconf_verify processing
to workqueue")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

751eb6b6

bnxt_en: Fix TX push operation on ARM64. · 9d13744b

Michael Chan authored Sep 05, 2016

There is a code path where we are calling __iowrite64_copy() on
an address that is not 64-bit aligned.  This causes an exception on
some architectures such as arm64.  Fix that code path by using
__iowrite32_copy().
Reported-by: JD Zheng <jiandong.zheng@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d13744b

net: Don't delete routes in different VRFs · 5a56a0b3

Mark Tomlinson authored Sep 05, 2016

When deleting an IP address from an interface, there is a clean-up of
routes which refer to this local address. However, there was no check to
see that the VRF matched. This meant that deletion wasn't confined to
the VRF it should have been.

To solve this, a new field has been added to fib_info to hold a table
id. When removing fib entries corresponding to a local ip address, this
table id is also used in the comparison.

The table id is populated when the fib_info is created. This was already
done in some places, but not in ip_rt_ioctl(). This has now been fixed.

Fixes: 021dd3b8 ("net: Add routes to the table associated with the device")
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a56a0b3

net: smsc: remove build warning of duplicate definition · daa7ee8d

Sudip Mukherjee authored Sep 04, 2016

The build of m32r was giving warning:

In file included from drivers/net/ethernet/smsc/smc91x.c:92:0:
drivers/net/ethernet/smsc/smc91x.h:448:0: warning: "SMC_inb" redefined
 #define SMC_inb(ioaddr, reg)  ({ BUG(); 0; })

drivers/net/ethernet/smsc/smc91x.h:106:0:
	note: this is the location of the previous definition
 #define SMC_inb(a, r)  inb(((u32)a) + (r))

drivers/net/ethernet/smsc/smc91x.h:449:0: warning: "SMC_outb" redefined
 #define SMC_outb(x, ioaddr, reg) BUG()

drivers/net/ethernet/smsc/smc91x.h:108:0:
	note: this is the location of the previous definition
 #define SMC_outb(v, a, r) outb(v, ((u32)a) + (r))
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

daa7ee8d

net: macb: initialize checksum when using checksum offloading · 007e4ba3

Helmut Buchsbaum authored Sep 04, 2016

I'm still struggling to get this fix right..

Changes since v2:
 - do not blindly modify SKB contents according to Dave's legitimate
   objection

Changes since v1:
 - dropped disabling HW checksum offload for Zynq
 - initialize checksum similar to net/ethernet/freescale/fec_main.c

-- >8 --
MACB/GEM needs the checksum field initialized to 0 to get correct
results on transmit in all cases, e.g. on Zynq, UDP packets with
payload <= 2 otherwise contain a wrong checksums.
Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

007e4ba3

ipv6: release dst in ping_v6_sendmsg · 03c2778a

Dave Jones authored Sep 02, 2016

Neither the failure or success paths of ping_v6_sendmsg release
the dst it acquires.  This leads to a flood of warnings from
"net/core/dst.c:288 dst_release" on older kernels that
don't have 8bf4ada2 backported.

That patch optimistically hoped this had been fixed post 3.10, but
it seems at least one case wasn't, where I've seen this triggered
a lot from machines doing unprivileged icmp sockets.

Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

03c2778a

04 Sep, 2016 6 commits

af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' · 6e1ce3c3

Linus Torvalds authored Sep 01, 2016

Right now we use the 'readlock' both for protecting some of the af_unix
IO path and for making the bind be single-threaded.

The two are independent, but using the same lock makes for a nasty
deadlock due to ordering with regards to filesystem locking. The bind
locking would want to nest outside the VSF pathname locking, but the IO
locking wants to nest inside some of those same locks.

We tried to fix this earlier with commit c845acb3 ("af_unix: Fix
splice-bind deadlock") which moved the readlock inside the vfs locks,
but that caused problems with overlayfs that will then call back into
filesystem routines that take the lock in the wrong order anyway.

Splitting the locks means that we can go back to having the bind lock be
the outermost lock, and we don't have any deadlocks with lock ordering.
Acked-by: Rainer Weikusat <rweikusat@cyberadapt.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e1ce3c3

Revert "af_unix: Fix splice-bind deadlock" · 38f7bd94

Linus Torvalds authored Sep 01, 2016

This reverts commit c845acb3.

It turns out that it just replaces one deadlock with another one: we can
still get the wrong lock ordering with the readlock due to overlayfs
calling back into the filesystem layer and still taking the vfs locks
after the readlock.

The proper solution ends up being to just split the readlock into two
pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
(taken *inside* the filesystem locks).  The two locks are independent
anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38f7bd94

Merge branch 'vxlan-fixes' · 2f83a53a

David S. Miller authored Sep 04, 2016

Jiri Benc says:

====================
vxlan: fix error reporting

This patchset improves checking for invalid configuration in VXLAN and
fixes problems with duplicated and inappropriate error messages.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2f83a53a

vxlan: fix duplicated and wrong error messages · 3555621d

Jiri Benc authored Sep 02, 2016

vxlan_dev_configure outputs error messages before returning, no need to
print again the same mesages in vxlan_newlink. Also, vxlan_dev_configure may
return a particular error code for a different reason than vxlan_newlink
thinks.

Move the remaining error messages into vxlan_dev_configure and let
vxlan_newlink just pass on the error code.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3555621d

vxlan: reject multicast destination without an interface · 9b4cdd51

Jiri Benc authored Sep 02, 2016

Currently, kernel accepts configurations such as:

ip l a type vxlan dstport 4789 id 1 group 239.192.0.1
ip l a type vxlan dstport 4789 id 1 group ff0e::110

However, neither of those really works. In the IPv4 case, the interface
cannot be brought up ("RTNETLINK answers: No such device"). This is because
multicast join will be rejected without the interface being specified.

In the IPv6 case, multicast wil be joined on the first interface found. This
is not what the user wants as it depends on random factors (order of
interfaces).

Note that it's possible to add a local address but it doesn't solve
anything. For IPv4, it's not considered in the multicast join (thus the same
error as above is returned on ifup). This could be added but it wouldn't
help for IPv6 anyway. For IPv6, we do need the interface.

Just reject a configuration that sets multicast address and does not provide
an interface. Nobody can depend on the previous behavior as it never worked.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b4cdd51

bonding: Fix bonding crash · 24b27fc4

Mahesh Bandewar authored Sep 01, 2016

Following few steps will crash kernel -

  (a) Create bonding master
      > modprobe bonding miimon=50
  (b) Create macvlan bridge on eth2
      > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
	   type macvlan
  (c) Now try adding eth2 into the bond
      > echo +eth2 > /sys/class/net/bond0/bonding/slaves
      <crash>

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24b27fc4

03 Sep, 2016 5 commits

Merge branch 'smsc911x-fixes' · 312565a0

David S. Miller authored Sep 02, 2016

Jeremy Linton says:

====================
net: smsc911x: Move phy and interrupt config

v2-v3: Move error handing into separate patch, replace a couple cases
 of fixed errors with the errors being returned from the failing functions.
 Hoist irq handler.

The smsc911x driver is doing a number of things in its probe routine that
should be delayed until the interface is started. Because of this, the module
cannot be unloaded, the phy states are incorrect/stale if the interface isn't
running, open's unnecessarily fail causing network configuration problems, and
the /proc/irq nodes are incorrectly named.

Clean up a number of these problems by moving the mdio and interrupt
configuration into the smsc911x_open routine.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

312565a0

net: smsc911x: Move interrupt allocation to open/stop · f252974e

Jeremy Linton authored Sep 01, 2016

The /proc/irq/xx information is incorrect for smsc911x because
the request_irq is happening before the register_netdev has the
proper device name. Moving it to the open also fixes the case
of when the device is renamed.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f252974e

net: smsc911x: Move interrupt handler before open · a85f00c3

Jeremy Linton authored Sep 01, 2016

In preparation for the allocating/enabling interrupts
in the ndo_open routine move the irq handler before it.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a85f00c3

net: smsc911x: Fix register_netdev, phy startup, driver unload ordering · aea95dd5

Jeremy Linton authored Sep 01, 2016

Move phy startup/shutdown into the smsc911x_open/stop routines. This
allows the module to be unloaded because phy_connect_direct is no longer
always holding the module use count. This one change also resolves a
number of other problems.

The link status of a downed interface no longer reflects a stale state.
Errors caused by the net device being opened before the mdio/phy was
configured. There is also a potential power savings as the phy's don't
remain powered when the interface isn't running.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aea95dd5

net: smsc911x: Remove multiple exit points from smsc911x_open · 1358bd5a

Jeremy Linton authored Sep 01, 2016

Rework the error handling in smsc911x open in preparation
for the mdio startup being moved here.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1358bd5a

02 Sep, 2016 4 commits

l2tp: fix use-after-free during module unload · 2f86953e

Sabrina Dubroca authored Sep 02, 2016

Tunnel deletion is delayed by both a workqueue (l2tp_tunnel_delete -> wq
 -> l2tp_tunnel_del_work) and RCU (sk_destruct -> RCU ->
l2tp_tunnel_destruct).

By the time l2tp_tunnel_destruct() runs to destroy the tunnel and finish
destroying the socket, the private data reserved via the net_generic
mechanism has already been freed, but l2tp_tunnel_destruct() actually
uses this data.

Make sure tunnel deletion for the netns has completed before returning
from l2tp_exit_net() by first flushing the tunnel removal workqueue, and
then waiting for RCU callbacks to complete.

Fixes: 167eb17e ("l2tp: create tunnel sockets in the right namespace")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f86953e

ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit() · ab343801

Eli Cooper authored Aug 26, 2016

Commit 8eb30be0 ("ipv6: Create ip6_tnl_xmit") unsets
flowi6_proto in ip4ip6_tnl_xmit() and ip6ip6_tnl_xmit().
Since xfrm_selector_match() relies on this info, IPv6 packets
sent by an ip6tunnel cannot be properly selected by their
protocols after removing it. This patch puts flowi6_proto back.

Cc: stable@vger.kernel.org
Fixes: 8eb30be0 ("ipv6: Create ip6_tnl_xmit")
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab343801

bnx2x: don't reset chip on cleanup if PCI function is offline · b44e108b

Guilherme G. Piccoli authored Aug 31, 2016

When PCI error is detected, in some architectures (like PowerPC) a slot
reset is performed - the driver's error handlers are in charge of "disable"
device before the reset, and re-enable it after a successful slot reset.

There are two cases though that another path is taken on the code: if the
slot reset is not successful or if too many errors already happened in the
specific adapter (meaning that possibly the device is experiencing a HW
failure that slot reset is not able to solve), the core PCI error mechanism
(called EEH in PowerPC) will remove the adapter from the system, since it
will consider this as a permanent failure on device. In this case, a path
is taken that leads to bnx2x_chip_cleanup() calling bnx2x_reset_hw(), which
then tries to perform a HW reset on chip. This reset won't succeed since
the HW is in a fault state, which can be seen by multiple messages on
kernel log like below:

bnx2x: [bnx2x_issue_dmae_with_comp:552(eth1)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:600(eth1)]DMAE returned failure -1

After some time, the PCI error mechanism gives up on waiting the driver's
correct removal procedure and forcibly remove the adapter from the system.
We can see soft lockup while core PCI error mechanism is waiting for driver
to accomplish the right removal process.

This patch adds a verification to avoid a chip reset whenever the function
is in PCI error state - since this case is only reached when we have a
device being removed because of a permanent failure, the HW chip reset is
not expected to work fine neither is necessary.

Also, as a minor improvement in error path, we avoid the MCP information dump
in case of non-recoverable PCI error (when adapter is about to be removed),
since it will certainly fail.
Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-By: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b44e108b

rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly · 635c223c

Gao Feng authored Aug 31, 2016

The original codes depend on that the function parameters are evaluated from
left to right. But the parameter's evaluation order is not defined in C
standard actually.

When flow_keys_have_l4(&keys) is invoked before ___skb_get_hash(skb, &keys,
hashrnd) with some compilers or environment, the keys passed to
flow_keys_have_l4 is not initialized.

Fixes: 6db61d79 ("flow_dissector: Ignore flow dissector return value from ___skb_get_hash")
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

635c223c