Commits · 9a0b1e8ba4061778897b544afc898de2163382f7 · nexedi / linux

17 Oct, 2016 2 commits

net: pktgen: remove rcu locking in pktgen_change_name() · 9a0b1e8b

Eric Dumazet authored Oct 15, 2016

After Jesper commit back in linux-3.18, we trigger a lockdep
splat in proc_create_data() while allocating memory from
pktgen_change_name().

This patch converts t->if_lock to a mutex, since it is now only
used from control path, and adds proper locking to pktgen_change_name()

1) pktgen_thread_lock to protect the outer loop (iterating threads)
2) t->if_lock to protect the inner loop (iterating devices)

Note that before Jesper patch, pktgen_change_name() was lacking proper
protection, but lockdep was not able to detect the problem.

Fixes: 8788370a ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()")
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a0b1e8b

net: Require exact match for TCP socket lookups if dif is l3mdev · a04a480d

David Ahern authored Oct 16, 2016

Currently, socket lookups for l3mdev (vrf) use cases can match a socket
that is bound to a port but not a device (ie., a global socket). If the
sysctl tcp_l3mdev_accept is not set this leads to ack packets going out
based on the main table even though the packet came in from an L3 domain.
The end result is that the connection does not establish creating
confusion for users since the service is running and a socket shows in
ss output. Fix by requiring an exact dif to sk_bound_dev_if match if the
skb came through an interface enslaved to an l3mdev device and the
tcp_l3mdev_accept is not set.

skb's through an l3mdev interface are marked by setting a flag in
inet{6}_skb_parm. The IPv6 variant is already set; this patch adds the
flag for IPv4. Using an skb flag avoids a device lookup on the dif. The
flag is set in the VRF driver using the IP{6}CB macros. For IPv4, the
inet_skb_parm struct is moved in the cb per commit 971f10ec, so the
match function in the TCP stack needs to use TCP_SKB_CB. For IPv6, the
move is done after the socket lookup, so IP6CB is used.

The flags field in inet_skb_parm struct needs to be increased to add
another flag. There is currently a 1-byte hole following the flags,
so it can be expanded to u16 without increasing the size of the struct.

Fixes: 193125db ("net: Introduce VRF device driver")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a04a480d

15 Oct, 2016 4 commits

vmxnet3: avoid assumption about invalid dma_pa in vmxnet3_set_mc() · fb5c6cfa

Alexey Khoroshilov authored Oct 15, 2016

vmxnet3_set_mc() checks new_table_pa returned by dma_map_single()
with dma_mapping_error(), but even there it assumes zero is invalid pa
(it assumes dma_mapping_error(...,0) returns true if new_table is NULL).

The patch adds an explicit variable to track status of new_table_pa.

Found by Linux Driver Verification project (linuxtesting.org).

v2: use "bool" and "true"/"false" for boolean variables.
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb5c6cfa

stmmac: fix an error code in stmmac_ptp_register() · 50756ebe

Dan Carpenter authored Oct 14, 2016

PTR_ERR(NULL) is success.  We have to preserve the error code earlier.

Fixes: 7086605a ("stmmac: fix error check when init ptp")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50756ebe

net: qcom/emac: disable interrupts before calling phy_disconnect · 93966b71

Timur Tabi authored Oct 14, 2016

There is a race condition that can occur if EMAC interrupts are
enabled when phy_disconnect() is called.  phy_disconnect() sets
adjust_link to NULL.  When an interrupt occurs, the ISR might
call phy_mac_interrupt(), which wakes up the workqueue function
phy_state_machine().  This function might reference adjust_link,
thereby causing a null pointer exception.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

93966b71

r8169: set coherent DMA mask as well as streaming DMA mask · f0076436

Ard Biesheuvel authored Oct 14, 2016

PCI devices that are 64-bit DMA capable should set the coherent
DMA mask as well as the streaming DMA mask. On some architectures,
these are managed separately, and so the coherent DMA mask will be
left at its default value of 32 if it is not set explicitly. This
results in errors such as

     r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
     hwdev DMA mask = 0x00000000ffffffff, dev_addr = 0x00000080fbfff000
     swiotlb: coherent allocation failed for device 0000:02:00.0 size=4096
     CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35
     Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016

on systems without memory that is 32-bit addressable by PCI devices.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0076436

14 Oct, 2016 23 commits

Merge tag 'wireless-drivers-for-davem-2016-10-14' of... · 9e55d0f9

David S. Miller authored Oct 14, 2016

Merge tag 'wireless-drivers-for-davem-2016-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.9

wlcore

* fix a double free regression causing hard to track crashes

rtl8xxxu

* fix driver reload issues, a memory leak and an endian bug

rtlwifi

* fix a major regression introduced in 4.9 with firmware loading on
  certain hardware

ath10k

* fix regression about broken cal_data debugfs file (since 4.7)

ath9k

* revert temperature compensation for AR9003+ devices, it was causing
  too much problems

ath6kl

* add Dell OEM SDIO I/O for the Venue 8 Pro
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9e55d0f9

net: asix: Avoid looping when the device does not respond · 610df1d2

Guenter Roeck authored Oct 13, 2016

Check answers from USB stack and avoid re-sending the request
multiple times if the device does not respond.

This fixes the following problem, observed with a probably flaky adapter.

[62108.732707] usb 1-3: new high-speed USB device number 5 using xhci_hcd
[62108.914421] usb 1-3: New USB device found, idVendor=0b95, idProduct=7720
[62108.914463] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[62108.914476] usb 1-3: Product: AX88x72A
[62108.914486] usb 1-3: Manufacturer: ASIX Elec. Corp.
[62108.914495] usb 1-3: SerialNumber: 000001
[62114.109109] asix 1-3:1.0 (unnamed net_device) (uninitialized):
	Failed to write reg index 0x0000: -110
[62114.109139] asix 1-3:1.0 (unnamed net_device) (uninitialized):
	Failed to send software reset: ffffff92
[62119.109048] asix 1-3:1.0 (unnamed net_device) (uninitialized):
	Failed to write reg index 0x0000: -110
...

Since the USB timeout is 5 seconds, and the operation is retried 30 times,
this results in

[62278.180353] INFO: task mtpd:1725 blocked for more than 120 seconds.
[62278.180373]       Tainted: G        W      3.18.0-13298-g94ace9e #1
[62278.180383] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
[62278.180957] kworker/2:0     D 0000000000000000     0  5744      2 0x00000000
[62278.180978] Workqueue: usb_hub_wq hub_event
[62278.181029]  ffff880177f833b8 0000000000000046 ffff88017fd00000 ffff88017b126d80
[62278.181048]  ffff880177f83fd8 ffff880065a71b60 0000000000013340 ffff880065a71b60
[62278.181065]  0000000000000286 0000000103b1c199 0000000000001388 0000000000000002
[62278.181081] Call Trace:
[62278.181092]  [<ffffffff8e0971fd>] ? console_conditional_schedule+0x2c/0x2c
[62278.181105]  [<ffffffff8e094f7b>] schedule+0x69/0x6b
[62278.181117]  [<ffffffff8e0972e0>] schedule_timeout+0xe3/0x11d
[62278.181133]  [<ffffffff8daadb1b>] ? trace_timer_start+0x51/0x51
[62278.181146]  [<ffffffff8e095a05>] do_wait_for_common+0x12f/0x16c
[62278.181162]  [<ffffffff8da856a7>] ? wake_up_process+0x39/0x39
[62278.181174]  [<ffffffff8e095aee>] wait_for_common+0x52/0x6d
[62278.181187]  [<ffffffff8e095b3b>] wait_for_completion_timeout+0x13/0x15
[62278.181201]  [<ffffffff8de676ce>] usb_start_wait_urb+0x93/0xf1
[62278.181214]  [<ffffffff8de6780d>] usb_control_msg+0xe1/0x11d
[62278.181230]  [<ffffffffc037d629>] usbnet_write_cmd+0x9c/0xc6 [usbnet]
[62278.181286]  [<ffffffffc03af793>] asix_write_cmd+0x4e/0x7e [asix]
[62278.181300]  [<ffffffffc03afb41>] asix_set_sw_mii+0x25/0x4e [asix]
[62278.181314]  [<ffffffffc03b001d>] asix_mdio_read+0x51/0x109 [asix]
...
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

610df1d2

ethtool: silence warning on bit loss · 85a62440

Jesse Brandeburg authored Oct 13, 2016

Sparse was complaining when we went to prototype some code
using ethtool_cmd_speed_set and SPEED_100000, which uses
the upper 16 bits of __u32 speed for the first time.

CHECK
...
.../uapi/linux/ethtool.h:123:28: warning:
  cast truncates bits from constant value (186a0 becomes 86a0)

The warning is actually bogus, as no bits are really lost, but
we can get rid of the sparse warning with this one small change.
Reported-by: Preethi Banala <preethi.banala@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85a62440

net/mlx4_en: fixup xdp tx irq to match rx · 958b3d39

Brenden Blanco authored Oct 13, 2016

In cases where the number of tx rings is not a multiple of the number of
rx rings, the tx completion event will be handled on a different core
from the transmit and population of the ring. Races on the ring will
lead to a double-free of the page, and possibly other corruption.

The rings are initialized by default with a valid multiple of rings,
based on the number of cpus, therefore an invalid configuration requires
ethtool to change the ring layout. For instance 'ethtool -L eth0 rx 9 tx
8' will cause packets received on rx0, and XDP_TX'd to tx48, to be
completed on cpu3 (48 % 9 == 3).

Resolve this discrepancy by shifting the irq for the xdp tx queues to
start again from 0, modulo rx_ring_num.

Fixes: 9ecc2d86 ("net/mlx4_en: add xdp forwarding and data write support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

958b3d39

Merge branch 'qed-fixes' · fbbfa34c

David S. Miller authored Oct 14, 2016

Yuval Mintz says:

====================
qed: Fix dependencies and warnings series

The first patch in this series follows Dan Carpenter's reports about
Smatch warnings for recent qed additions and fixes those.

The second patch is the most significant one [and the reason this is
ntended for 'net'] - it's based on Arnd Bermann's suggestion for fixing
compilation issues that were introduced with the roce addition as a result
of certain combinations of qed, qede and qedr Kconfig options.

The third follows the discussion with Arnd and clears a lot of the warnings
that arise when compiling the drivers with "C=1".

Please consider applying this series to 'net'.
====================
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

fbbfa34c

qed: Additional work toward cleaning C=1 · 8c93beaf

Yuval Mintz authored Oct 13, 2016

This cleans many of the warnings that would arise in qed as a
result of compilations with C=1; Most of those are the addition
of missing 'static' to functions, although there are several other
fixes as well.
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c93beaf

qed*: Fix Kconfig dependencies with INFINIBAND_QEDR · 0189efb8

Yuval Mintz authored Oct 13, 2016

The qedr driver would require a tristate Kconfig option [to allow
it to compile as a module], and toward that end we've added the
INFINIBAND_QEDR option. But as we've made the compilation of the
qed/qede infrastructure required for RoCE dependent on the option
we'd be facing linking difficulties in case that QED=y or QEDE=y,
and INFINIBAND_QEDR=m.

To resolve this, we seperate between the INFINIBAND_QEDR option
and the infrastructure support in qed/qede by introducing a new
QED_RDMA option which would be selected by INFINIBAND_QEDR but would
be a boolean instead of a tristate; Following that, the qed/qede is
fixed based on this new option so that all config combinations would
be supported.

Fixes: cee9fbd8 ("qede: add qedr framework")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0189efb8

qed: Fix static checker warning. · ce6b04ee

Yuval Mintz authored Oct 13, 2016

Smatch compains about qed_roce_ll2_tx() dereference
of the 'cdev' variable while testing its validity later.
As the validation checking is an over-kill [variable would always
be set], simply remove it.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: abd49676 ("qed: Add RoCE ll2 & GSI support")
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce6b04ee

IPv6: fix DESYNC_FACTOR · 76506a98

Jiri Bohac authored Oct 13, 2016

The IPv6 temporary address generation uses a variable called DESYNC_FACTOR
to prevent hosts updating the addresses at the same time. Quoting RFC 4941:

   ... The value DESYNC_FACTOR is a random value (different for each
   client) that ensures that clients don't synchronize with each other and
   generate new addresses at exactly the same time ...

DESYNC_FACTOR is defined as:

   DESYNC_FACTOR -- A random value within the range 0 - MAX_DESYNC_FACTOR.
   It is computed once at system start (rather than each time it is used)
   and must never be greater than (TEMP_VALID_LIFETIME - REGEN_ADVANCE).

First, I believe the RFC has a typo in it and meant to say: "and must
never be greater than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE)"

The reason is that at various places in the RFC, DESYNC_FACTOR is used in
a calculation like (TEMP_PREFERRED_LIFETIME - DESYNC_FACTOR) or
(TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR). It needs to be
smaller than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE) for the result of
these calculations to be larger than zero. It's never used in a
calculation together with TEMP_VALID_LIFETIME.

I already submitted an errata to the rfc-editor:
https://www.rfc-editor.org/errata_search.php?rfc=4941

The Linux implementation of DESYNC_FACTOR is very wrong:
max_desync_factor is used in places DESYNC_FACTOR should be used.
max_desync_factor is initialized to the RFC-recommended value for
MAX_DESYNC_FACTOR (600) but the whole point is to get a _random_ value.

And nothing ensures that the value used is not greater than
(TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE), which leads to underflows.  The
effect can easily be observed when setting the temp_prefered_lft sysctl
e.g. to 60. The preferred lifetime of the temporary addresses will be
bogus.

TEMP_PREFERRED_LIFETIME and REGEN_ADVANCE are not constants and can be
influenced by these three sysctls: regen_max_retry, dad_transmits and
temp_prefered_lft. Thus, the upper bound for desync_factor needs to be
re-calculated each time a new address is generated and if desync_factor is
larger than the new upper bound, a new random value needs to be
re-generated.

And since we already have max_desync_factor configurable per interface, we
also need to calculate and store desync_factor per interface.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

76506a98

IPv6: Drop the temporary address regen_timer · 9d6280da

Jiri Bohac authored Oct 13, 2016

The randomized interface identifier (rndid) was periodically updated from
the regen_timer timer. Simplify the code by updating the rndid only when
needed by ipv6_try_regen_rndid().

This makes the follow-up DESYNC_FACTOR fix much simpler.  Also it fixes a
reference counting error in this error path, where an in6_dev_put was
missing:
		err = addrconf_sysctl_register(ndev);
		if (err) {
			ipv6_mc_destroy_dev(ndev);
	-               del_timer(&ndev->regen_timer);
			snmp6_unregister_dev(ndev);
			goto err_release;
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d6280da

IB/ipoib: move back IB LL address into the hard header · fc791b63

Paolo Abeni authored Oct 13, 2016

After the commit 9207f9d4 ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:

http://marc.info/?l=linux-kernel&m=146787279825501&w=2

This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit 936d7de3 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
To avoid changing the connected mode maximum mtu, the allocated
head buffer size is increased by the pseudo header length.

After this commit, IPoIB performances are back to pre-regression
value.

v2 -> v3: rebased
v1 -> v2: avoid changing the max mtu, increasing the head buf size

Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc791b63

Merge tag 'rxrpc-rewrite-20161013' of... · f1f081ce

David S. Miller authored Oct 14, 2016

Merge tag 'rxrpc-rewrite-20161013' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fixes

This set of patches contains a bunch of fixes:

 (1) Fix use of kunmap() after change from kunmap_atomic() within AFS.

 (2) Don't use of ERR_PTR() with an always zero value.

 (3) Check the right error when using ip6_route_output().

 (4) Be consistent about whether call->operation_ID is BE or CPU-E within
     AFS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f1f081ce

Documentation/networking: update git urls to use https over http · f56f7d2e

Alexander Alemayhu authored Oct 13, 2016

This fixes the following errors when trying to clone the urls:

Cloning into 'net'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/' not found
Cloning into 'net-next'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/' not found
Cloning into 'linux'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/' not found
Cloning into 'stable-queue'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/' not found
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f56f7d2e

net: wan: slic_ds26522: Allow driver to built if COMPILE_TEST is enabled · 059f0141

Javier Martinez Canillas authored Oct 12, 2016

The driver only has runtime but no build time dependency with FSL_SOC ||
ARCH_MXC || ARCH_LAYERSCAPE. So it can be built for testing purposes if
the COMPILE_TEST option is enabled.

This is useful to have more build coverage and make sure that the driver
is not affected by changes that could cause build regressions.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

059f0141

net: wan: slic_ds26522: Export OF module alias information · 485c9d43

Javier Martinez Canillas authored Oct 12, 2016

When the device is registered via OF, the OF table is used to match the
driver instead of the SPI device ID table, but the entries in the later
are used as aliasses to load the module if the driver was not built-in.

This is because the SPI core always reports an SPI module alias instead
of an OF one, but that could change so it's better to always export it.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

485c9d43

net: wan: slic_ds26522: add SPI device ID table to fix module autoload · 558c5eb5

Javier Martinez Canillas authored Oct 12, 2016

If the driver is built as a module, module alias information isn't filled
so the module won't be autoloaded. Add a SPI device ID table and use the
MODULE_DEVICE_TABLE() macro so the information is exported in the module.

Before this patch:

$ modinfo drivers/net/wan/slic_ds26522.ko | grep alias
$

After this patch:

$ modinfo drivers/net/wan/slic_ds26522.ko | grep alias
alias:          spi:ds26522
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

558c5eb5

ipv6: correctly add local routes when lo goes up · a220445f

Nicolas Dichtel authored Oct 12, 2016

The goal of the patch is to fix this scenario:
 ip link add dummy1 type dummy
 ip link set dummy1 up
 ip link set lo down ; ip link set lo up

After that sequence, the local route to the link layer address of dummy1 is
not there anymore.

When the loopback is set down, all local routes are deleted by
addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
exists, because the corresponding idev has a reference on it. After the rcu
grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
set obsolete to DST_OBSOLETE_DEAD.

In this case, init_loopback() is called before dst_rcu_free(), thus
obsolete is still sets to something <= 0. So, the function doesn't add the
route again. To avoid that race, let's check the rt6 refcnt instead.

Fixes: 25fb6ca4 ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
Fixes: a881ae1f ("ipv6: don't call addrconf_dst_alloc again when enable lo")
Fixes: 33d99113 ("ipv6: reallocate addrconf router for ipv6 address when lo device up")
Reported-by: Francesco Santoro <francesco.santoro@6wind.com>
Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com>
CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Weilong Chen <chenweilong@huawei.com>
CC: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a220445f

ip6_tunnel: fix ip6_tnl_lookup · 68d00f33

Vadim Fedorenko authored Oct 11, 2016

The commit ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel
endpoints.") introduces support for wildcards in tunnels endpoints,
but in some rare circumstances ip6_tnl_lookup selects wrong tunnel
interface relying only on source or destination address of the packet
and not checking presence of wildcard in tunnels endpoints. Later in
ip6_tnl_rcv this packets can be dicarded because of difference in
ipproto even if fallback device have proper ipproto configuration.

This patch adds checks of wildcard endpoint in tunnel avoiding such
behavior

Fixes: ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Signed-off-by: Vadim Fedorenko <junk@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

68d00f33

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 29fbff86

Linus Torvalds authored Oct 13, 2016

Pull networking fixes from David Miller:

 1) Fix various build warnings in tlan/qed/xen-netback drivers, from
    Arnd Bergmann.

 2) Propagate proper error code in strparser's strp_recv(), from Geert
    Uytterhoeven.

 3) Fix accidental broadcast of RTM_GETTFILTER responses, from Eric
    Dumazret.

 4) Need to use list_for_each_entry_safe() in qed driver, from Wei
    Yongjun.

 5) Openvswitch 802.1AD bug fixes from Jiri Benc.

 6) Cure BUILD_BUG_ON() in mlx5 driver, from Tom Herbert.

 7) Fix UDP ipv6 checksumming in netvsc driver, from Stephen Hemminger.

 8) stmmac driver fixes from Giuseppe CAVALLARO.

 9) Fix access to mangled IP6CB in tcp, from Eric Dumazet.

10) Fix info leaks in tipc and rtnetlink, from Dan Carpenter.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  net: bridge: add the multicast_flood flag attribute to brport_attrs
  net: axienet: Remove unused parameter from __axienet_device_reset
  liquidio: CN23XX: fix a loop timeout
  net: rtnl: info leak in rtnl_fill_vfinfo()
  tipc: info leak in __tipc_nl_add_udp_addr()
  net: ipv4: Do not drop to make_route if oif is l3mdev
  net: phy: Trigger state machine on state change and not polling.
  ipv6: tcp: restore IP6CB for pktoptions skbs
  netvsc: Remove mistaken udp.h inclusion.
  xen-netback: fix type mismatch warning
  stmmac: fix error check when init ptp
  stmmac: fix ptp init for gmac4
  qed: fix old-style function definition
  netvsc: fix checksum on UDP IPV6
  net_sched: reorder pernet ops and act ops registrations
  xen-netback: fix guest Rx stall detection (after guest Rx refactor)
  drivers/ptp: Fix kernel memory disclosure
  net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
  qmi_wwan: add support for Quectel EC21 and EC25
  openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
  ...

29fbff86

Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · c4a86165

Linus Torvalds authored Oct 13, 2016

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Stable bugfixes:
   - sunrpc: fix writ espace race causing stalls
   - NFS: Fix inode corruption in nfs_prime_dcache()
   - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
   - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
   - NFSv4: Open state recovery must account for file permission changes
   - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

  Features:
   - Add support for tracking multiple layout types with an ordered list
   - Add support for using multiple backchannel threads on the client
   - Add support for pNFS file layout session trunking
   - Delay xprtrdma use of DMA API (for device driver removal)
   - Add support for xprtrdma remote invalidation
   - Add support for larger xprtrdma inline thresholds
   - Use a scatter/gather list for sending xprtrdma RPC calls
   - Add support for the CB_NOTIFY_LOCK callback
   - Improve hashing sunrpc auth_creds by using both uid and gid

  Bugfixes:
   - Fix xprtrdma use of DMA API
   - Validate filenames before adding to the dcache
   - Fix corruption of xdr->nwords in xdr_copy_to_scratch
   - Fix setting buffer length in xdr_set_next_buffer()
   - Don't deadlock the state manager on the SEQUENCE status flags
   - Various delegation and stateid related fixes
   - Retry operations if an interrupted slot receives EREMOTEIO
   - Make nfs boot time y2038 safe"

* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
  NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
  fs: nfs: Make nfs boot time y2038 safe
  sunrpc: replace generic auth_cred hash with auth-specific function
  sunrpc: add RPCSEC_GSS hash_cred() function
  sunrpc: add auth_unix hash_cred() function
  sunrpc: add generic_auth hash_cred() function
  sunrpc: add hash_cred() function to rpc_authops struct
  Retry operation on EREMOTEIO on an interrupted slot
  pNFS: Fix atime updates on pNFS clients
  sunrpc: queue work on system_power_efficient_wq
  NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
  NFSv4: If recovery failed for a specific open stateid, then don't retry
  NFSv4: Fix retry issues with nfs41_test/free_stateid
  NFSv4: Open state recovery must account for file permission changes
  NFSv4: Mark the lock and open stateids as invalid after freeing them
  NFSv4: Don't test open_stateid unless it is set
  NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
  NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
  NFSv4: Fix a race when updating an open_stateid
  NFSv4: Fix a race in nfs_inode_reclaim_delegation()
  ...

c4a86165

Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux · 27785564

Linus Torvalds authored Oct 13, 2016

Pull nfsd updates from Bruce Fields:
 "Some RDMA work and some good bugfixes, and two new features that could
  benefit from user testing:

   - Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
     COPY is already supported on the client side, so a call to
     copy_file_range() on a recent client should now result in a
     server-side copy that doesn't require all the data to make a round
     trip to the client and back.

   - Jeff Layton implemented callbacks to notify clients when contended
     locks become available, which should reduce latency on workloads
     with contended locks"

* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
  NFSD: Implement the COPY call
  nfsd: handle EUCLEAN
  nfsd: only WARN once on unmapped errors
  exportfs: be careful to only return expected errors.
  nfsd4: setclientid_confirm with unmatched verifier should fail
  nfsd: randomize SETCLIENTID reply to help distinguish servers
  nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
  nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
  nfsd: add a LRU list for blocked locks
  nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
  nfsd: plumb in a CB_NOTIFY_LOCK operation
  NFSD: fix corruption in notifier registration
  svcrdma: support Remote Invalidation
  svcrdma: Server-side support for rpcrdma_connect_private
  rpcrdma: RDMA/CM private message data structure
  svcrdma: Skip put_page() when send_reply() fails
  svcrdma: Tail iovec leaves an orphaned DMA mapping
  nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
  nfsd: eliminate cb_minorversion field
  nfsd: don't set a FL_LAYOUT lease for flexfiles layouts

27785564

Merge tag 'xfs-reflink-for-linus-4.9-rc1' of... · 35a891be

Linus Torvalds authored Oct 13, 2016

Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

    < XFS has gained super CoW powers! >
     ----------------------------------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||

Pull XFS support for shared data extents from Dave Chinner:
 "This is the second part of the XFS updates for this merge cycle.  This
  pullreq contains the new shared data extents feature for XFS.

  Given the complexity and size of this change I am expecting - like the
  addition of reverse mapping last cycle - that there will be some
  follow-up bug fixes and cleanups around the -rc3 stage for issues that
  I'm sure will show up once the code hits a wider userbase.

  What it is:

  At the most basic level we are simply adding shared data extents to
  XFS - i.e. a single extent on disk can now have multiple owners. To do
  this we have to add new on-disk features to both track the shared
  extents and the number of times they've been shared. This is done by
  the new "refcount" btree that sits in every allocation group. When we
  share or unshare an extent, this tree gets updated.

  Along with this new tree, the reverse mapping tree needs to be updated
  to track each owner or a shared extent. This also needs to be updated
  ever share/unshare operation. These interactions at extent allocation
  and freeing time have complex ordering and recovery constraints, so
  there's a significant amount of new intent-based transaction code to
  ensure that operations are performed atomically from both the runtime
  and integrity/crash recovery perspectives.

  We also need to break sharing when writes hit a shared extent - this
  is where the new copy-on-write implementation comes in. We allocate
  new storage and copy the original data along with the overwrite data
  into the new location. We only do this for data as we don't share
  metadata at all - each inode has it's own metadata that tracks the
  shared data extents, the extents undergoing CoW and it's own private
  extents.

  Of course, being XFS, nothing is simple - we use delayed allocation
  for CoW similar to how we use it for normal writes. ENOSPC is a
  significant issue here - we build on the reservation code added in
  4.8-rc1 with the reverse mapping feature to ensure we don't get
  spurious ENOSPC issues part way through a CoW operation. These
  mechanisms also help minimise fragmentation due to repeated CoW
  operations. To further reduce fragmentation overhead, we've also
  introduced a CoW extent size hint, which indicates how large a region
  we should allocate when we execute a CoW operation.

  With all this functionality in place, we can hook up .copy_file_range,
  .clone_file_range and .dedupe_file_range and we gain all the
  capabilities of reflink and other vfs provided functionality that
  enable manipulation to shared extents. We also added a fallocate mode
  that explicitly unshares a range of a file, which we implemented as an
  explicit CoW of all the shared extents in a file.

  As such, it's a huge chunk of new functionality with new on-disk
  format features and internal infrastructure. It warns at mount time as
  an experimental feature and that it may eat data (as we do with all
  new on-disk features until they stabilise). We have not released
  userspace suport for it yet - userspace support currently requires
  download from Darrick's xfsprogs repo and build from source, so the
  access to this feature is really developer/tester only at this point.
  Initial userspace support will be released at the same time the kernel
  with this code in it is released.

  The new code causes 5-6 new failures with xfstests - these aren't
  serious functional failures but things the output of tests changing
  slightly due to perturbations in layouts, space usage, etc. OTOH,
  we've added 150+ new tests to xfstests that specifically exercise this
  new functionality so it's got far better test coverage than any
  functionality we've previously added to XFS.

  Darrick has done a pretty amazing job getting us to this stage, and
  special mention also needs to go to Christoph (review, testing,
  improvements and bug fixes) and Brian (caught several intricate bugs
  during review) for the effort they've also put in.

  Summary:

   - unshare range (FALLOC_FL_UNSHARE) support for fallocate

   - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
     interface

   - shared extent support for XFS

   - copy-on-write support for shared extents

   - copy_file_range support

   - clone_file_range support (implements reflink)

   - dedupe_file_range support

   - defrag support for reverse mapping enabled filesystems"

* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
  xfs: convert COW blocks to real blocks before unwritten extent conversion
  xfs: rework refcount cow recovery error handling
  xfs: clear reflink flag if setting realtime flag
  xfs: fix error initialization
  xfs: fix label inaccuracies
  xfs: remove isize check from unshare operation
  xfs: reduce stack usage of _reflink_clear_inode_flag
  xfs: check inode reflink flag before calling reflink functions
  xfs: implement swapext for rmap filesystems
  xfs: refactor swapext code
  xfs: various swapext cleanups
  xfs: recognize the reflink feature bit
  xfs: simulate per-AG reservations being critically low
  xfs: don't mix reflink and DAX mode for now
  xfs: check for invalid inode reflink flags
  xfs: set a default CoW extent size of 32 blocks
  xfs: convert unwritten status of reverse mappings for shared files
  xfs: use interval query for rmap alloc operations on shared files
  xfs: add shared rmap map/unmap/convert log item types
  xfs: increase log reservations for reflink
  ...

35a891be

Merge tag 'pci-v4.9-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 40bd3a5f

Linus Torvalds authored Oct 13, 2016

PCI changes for the v4.9 merge window:
 "Here are some more changes I'd like to have in v4.9.  There's one
  small Tegra bug fix in the PHY poweroff path, which is only used in
  failure paths.

  The rest is all strictly cleanup that should make host bridge drivers
  more readable, but shouldn't actually change any behavior.

  Summary:

   - use local struct device pointers in many host bridge drivers for
     clarity

   - remove unused platform data

   - use generic DesignWare accessors

   - misc cleanups: remove redundant structure entries and re-order
     structure members to put comon generic fields first etc"

* tag 'pci-v4.9-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits)
  MAINTAINERS: Add maintainer for the PCIe Marvell Armada 8K driver
  MAINTAINERS: Add DT binding to the Aardvark PCIe driver maintainer
  PCI: rockchip: Indent "if" statement body
  PCI: hisi: Reorder struct hisi_pcie
  PCI: hisi: Pass device-specific struct to internal functions
  PCI: hisi: Include register block base in PCIE_SYS_STATE4 address
  PCI: dra7xx: Reorder struct dra7xx_pcie
  PCI: xilinx-nwl: Remove unused platform data
  PCI: xilinx-nwl: Add local struct device pointers
  PCI: xilinx: Removed unused xilinx_pcie_assign_msi() argument
  PCI: xilinx: Remove unused platform data
  PCI: xilinx: Add local struct device pointers
  PCI: xgene: Add register accessors
  PCI: xgene: Pass struct xgene_pcie_port to setup functions
  PCI: xgene: Remove unused platform data
  PCI: tegra: Remove unused platform data
  PCI: tegra: Add local struct device pointers
  PCI: tegra: Fix argument order in tegra_pcie_phy_disable()
  PCI: rockchip: Remove unused platform data
  PCI: rcar-gen2: Add local struct device pointers
  ...

40bd3a5f

13 Oct, 2016 11 commits

Merge tag 'platform-drivers-x86-v4.9-1' of... · 44dc8c9d

Linus Torvalds authored Oct 13, 2016

Merge tag 'platform-drivers-x86-v4.9-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform drivers updates from Darren Hart:
 "Cleanups, refactoring, and a couple bug fixes.

  intel_pmc_core:
   - avoid boot time warning for !CONFIG_DEBUGFS_FS

  intel_pmc_ipc:
   - Convert to use platform_device_register_full

  asus-wmi:
   - Filter buggy scan codes on ASUS Q500A

  toshiba_bluetooth:
   - Decouple an error checking status code

  toshiba_haps:
   - Change logging level from info to debug
   - Split ACPI and HDD protection error handling

  asus-laptop:
   - get rid of parse_arg()

  asus-wmi:
   - fix asus ux303ub brightness issue

  toshiba_acpi:
   - Fix typo in *_cooling_method_set function
   - Change error checking logic from TCI functions
   - Clean up variable declaration"

* tag 'platform-drivers-x86-v4.9-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  platform/x86: intel_pmc_core: avoid boot time warning for !CONFIG_DEBUGFS_FS
  platform/x86: intel_pmc_ipc: Convert to use platform_device_register_full
  platform/x86: asus-wmi: Filter buggy scan codes on ASUS Q500A
  platform/x86: toshiba_bluetooth: Decouple an error checking status code
  platform/x86: toshiba_haps: Change logging level from info to debug
  platform/x86: toshiba_haps: Split ACPI and HDD protection error handling
  platform/x86: asus-laptop: get rid of parse_arg()
  platform/x86: asus-wmi: fix asus ux303ub brightness issue
  platform/x86: toshiba_acpi: Fix typo in *_cooling_method_set function
  platform/x86: toshiba_acpi: Change error checking logic from TCI functions
  platform/x86: toshiba_acpi: Clean up variable declaration

44dc8c9d

Merge git://www.linux-watchdog.org/linux-watchdog · e3799a21

Linus Torvalds authored Oct 13, 2016

Pull watchdog updates from Wim Van Sebroeck:

 - a new watchdog pretimeout governor framework

 - support to upload the firmware on the ziirave_wdt

 - several fixes and cleanups

* git://www.linux-watchdog.org/linux-watchdog: (26 commits)
  watchdog: imx2_wdt: add pretimeout function support
  watchdog: softdog: implement pretimeout support
  watchdog: pretimeout: add pretimeout_available_governors attribute
  watchdog: pretimeout: add option to select a pretimeout governor in runtime
  watchdog: pretimeout: add panic pretimeout governor
  watchdog: pretimeout: add noop pretimeout governor
  watchdog: add watchdog pretimeout governor framework
  watchdog: hpwdt: add support for iLO5
  fs: compat_ioctl: add pretimeout functions for watchdogs
  watchdog: add pretimeout support to the core
  watchdog: imx2_wdt: use preferred BIT macro instead of open coded values
  watchdog: st_wdt: Remove support for obsolete platforms
  watchdog: bindings: Remove obsolete platforms from dt doc.
  watchdog: mt7621_wdt: Remove assignment of dev pointer
  watchdog: rt2880_wdt: Remove assignment of dev pointer
  watchdog: constify watchdog_ops structures
  watchdog: tegra: constify watchdog_ops structures
  watchdog: iTCO_wdt: constify iTCO_wdt_pm structure
  watchdog: cadence_wdt: Fix the suspend resume
  watchdog: txx9wdt: Add missing clock (un)prepare calls for CCF
  ...

e3799a21

net: bridge: add the multicast_flood flag attribute to brport_attrs · 4eb6753c

Nikolay Aleksandrov authored Oct 13, 2016

When I added the multicast flood control flag, I also added an attribute
for it for sysfs similar to other flags, but I forgot to add it to
brport_attrs.

Fixes: b6cb5ac8 ("net: bridge: add per-port multicast flood flag")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4eb6753c

net: axienet: Remove unused parameter from __axienet_device_reset · 5852e93d

Tobias Klauser authored Oct 13, 2016

The dev parameter passed to __axienet_device_reset() is not used inside
the function, so remove it.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5852e93d

liquidio: CN23XX: fix a loop timeout · 10f6c4d6

Dan Carpenter authored Oct 13, 2016

This is supposed to loop 1000 times and then give up.  The problem is
it's a post-op and after the loop we test if "loop" is zero when really
it would be -1.  Fix this by making it a pre-op.

Fixes: 1b7c55c4 ("liquidio: CN23XX queue manipulation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10f6c4d6

net: rtnl: info leak in rtnl_fill_vfinfo() · 775f4f05

Dan Carpenter authored Oct 13, 2016

The "vf_vlan_info" struct ends with a 2 byte struct hole so we have to
memset it to ensure that no stack information is revealed to user space.

Fixes: 79aab093 ('net: Update API for VF vlan protocol 802.1ad support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

775f4f05

tipc: info leak in __tipc_nl_add_udp_addr() · 73076162

Dan Carpenter authored Oct 13, 2016

We should clear out the padding and unused struct members so that we
don't expose stack information to userspace.

Fixes: fdb3accc ('tipc: add the ability to get UDP options via netlink')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

73076162

net: ipv4: Do not drop to make_route if oif is l3mdev · 6104e112

David Ahern authored Oct 12, 2016

Commit e0d56fdd was a bit aggressive removing l3mdev calls in
the IPv4 stack. If the fib_lookup fails we do not want to drop to
make_route if the oif is an l3mdev device.

Also reverts 19664c6a ("net: l3mdev: Remove netif_index_is_l3_master")
which removed netif_index_is_l3_master.

Fixes: e0d56fdd ("net: l3mdev: remove redundant calls")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6104e112

afs: call->operation_ID sometimes used as __be32 sometimes as u32 · 50a2c953

David Howells authored Oct 13, 2016

call->operation_ID is sometimes being used as __be32 sometimes is being
used as u32.  Be consistent and settle on using as u32.

Signed-off-by: David Howells <dhowells@redhat.com.

50a2c953

net: phy: Trigger state machine on state change and not polling. · 3c293f4e

Andrew Lunn authored Oct 12, 2016

The phy_start() is used to indicate the PHY is now ready to do its
work. The state is changed, normally to PHY_UP which means that both
the MAC and the PHY are ready.

If the phy driver is using polling, when the next poll happens, the
state machine notices the PHY is now in PHY_UP, and kicks off
auto-negotiation, if needed.

If however, the PHY is using interrupts, there is no polling. The phy
is stuck in PHY_UP until the next interrupt comes along. And there is
no reason for the PHY to interrupt.

Have phy_start() schedule the state machine to run, which both speeds
up the polling use case, and makes the interrupt use case actually
work.

This problems exists whenever there is a state change which will not
cause an interrupt. Trigger the state machine in these cases,
e.g. phy_error().
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Cc: Kyle Roeschley <kyle.roeschley@ni.com>
Tested-by: Kyle Roeschley <kyle.roeschley@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c293f4e

ipv6: tcp: restore IP6CB for pktoptions skbs · 8ce48623

Eric Dumazet authored Oct 12, 2016

Baozeng Ding reported following KASAN splat :

BUG: KASAN: use-after-free in ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 at addr ffff880029c84ec8
Read of size 1 by task poc/25548
Call Trace:
 [<ffffffff82cf43c9>] dump_stack+0x12e/0x185 /lib/dump_stack.c:15
 [<     inline     >] print_address_description /mm/kasan/report.c:204
 [<ffffffff817ced3b>] kasan_report_error+0x48b/0x4b0 /mm/kasan/report.c:283
 [<     inline     >] kasan_report /mm/kasan/report.c:303
 [<ffffffff817ced9e>] __asan_report_load1_noabort+0x3e/0x40 /mm/kasan/report.c:321
 [<ffffffff85c71da1>] ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 /net/ipv6/datagram.c:687
 [<ffffffff85c734c3>] ip6_datagram_recv_ctl+0x33/0x40
 [<ffffffff85c0b07c>] do_ipv6_getsockopt.isra.4+0xaec/0x2150
 [<ffffffff85c0c7f6>] ipv6_getsockopt+0x116/0x230
 [<ffffffff859b5a12>] tcp_getsockopt+0x82/0xd0 /net/ipv4/tcp.c:3035
 [<ffffffff855fb385>] sock_common_getsockopt+0x95/0xd0 /net/core/sock.c:2647
 [<     inline     >] SYSC_getsockopt /net/socket.c:1776
 [<ffffffff855f8ba2>] SyS_getsockopt+0x142/0x230 /net/socket.c:1758
 [<ffffffff8685cdc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
Memory state around the buggy address:
 ffff880029c84d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff880029c84e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff880029c84e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                              ^
 ffff880029c84f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff880029c84f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

He also provided a syzkaller reproducer.

Issue is that ip6_datagram_recv_specific_ctl() expects to find IP6CB
data that was moved at a different place in tcp_v6_rcv()

This patch moves tcp_v6_restore_cb() up and calls it from
tcp_v6_do_rcv() when np->pktoptions is set.

Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ce48623