Commits · 8a4465368964b4fbaf084760c94c7aabf61059fb · Kirill Smelkov / linux

13 Jan, 2021 13 commits

net/smc: use memcpy instead of snprintf to avoid out of bounds read · 8a446536

Guvenc Gulce authored Jan 12, 2021

Using snprintf() to convert not null-terminated strings to null
terminated strings may cause out of bounds read in the source string.
Therefore use memcpy() and terminate the target string with a null
afterwards.

Fixes: a3db10ef ("net/smc: Add support for obtaining SMCR device list")
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8a446536

smc: fix out of bound access in smc_nl_get_sys_info() · 25fe2c9c

Jakub Kicinski authored Jan 12, 2021

smc_clc_get_hostname() sets the host pointer to a buffer
which is not NULL-terminated (see smc_clc_init()).

Reported-by: syzbot+f4708c391121cfc58396@syzkaller.appspotmail.com
Fixes: 099b990b ("net/smc: Add support for obtaining system information")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

25fe2c9c

Merge branch 'mptcp-a-couple-of-fixes' · 584c19f9

Jakub Kicinski authored Jan 12, 2021

Paolo Abeni says:

====================
mptcp: a couple of fixes

This series includes two related fixes addressing potential divide by 0
bugs in the MPTCP datapath.
====================

Link: https://lore.kernel.org/r/cover.1610471474.git.pabeni@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

584c19f9

mptcp: better msk-level shutdown. · 76e2a55d

Paolo Abeni authored Jan 12, 2021

Instead of re-implementing most of inet_shutdown, re-use
such helper, and implement the MPTCP-specific bits at the
'proto' level.

The msk-level disconnect() can now be invoked, lets provide a
suitable implementation.

As a side effect, this fixes bad state management for listener
sockets. The latter could lead to division by 0 oops since
commit ea4ca586 ("mptcp: refine MPTCP-level ack scheduling").

Fixes: 43b54c6e ("mptcp: Use full MPTCP-level disconnect state machine")
Fixes: ea4ca586 ("mptcp: refine MPTCP-level ack scheduling")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

76e2a55d

mptcp: more strict state checking for acks · 20bc80b6

Paolo Abeni authored Jan 12, 2021

Syzkaller found a way to trigger division by zero
in mptcp_subflow_cleanup_rbuf().

The current checks implemented into tcp_can_send_ack()
are too week, let's be more accurate.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes: ea4ca586 ("mptcp: refine MPTCP-level ack scheduling")
Fixes: fd897679 ("mptcp: be careful on MPTCP-level ack.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

20bc80b6

Merge branch 'bnxt_en-bug-fixes' · ece9ab2a

Jakub Kicinski authored Jan 12, 2021

Michael Chan says:

====================
bnxt_en: Bug fixes.

This series has 2 fixes.  The first one fixes a resource accounting error
with the RDMA driver loaded and the second one fixes the firmware
flashing sequence after defragmentation.
====================

Link: https://lore.kernel.org/r/1610357200-30755-1-git-send-email-michael.chan@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ece9ab2a

bnxt_en: Clear DEFRAG flag in firmware message when retry flashing. · 68748775

Pavan Chebbi authored Jan 11, 2021

When the FW tells the driver to retry the INSTALL_UPDATE command after
it has cleared the NVM area, the driver is not clearing the previously
used ALLOWED_TO_DEFRAG flag. As a result the FW tries to defrag the NVM
area a second time in a loop and can fail the request.

Fixes: 1432c3f6 ("bnxt_en: Retry installing FW package under NO_SPACE error condition.")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

68748775

bnxt_en: Improve stats context resource accounting with RDMA driver loaded. · 869c4d5e

Michael Chan authored Jan 11, 2021

The function bnxt_get_ulp_stat_ctxs() does not count the stats contexts
used by the RDMA driver correctly when the RDMA driver is freeing the
MSIX vectors.  It assumes that if the RDMA driver is registered, the
additional stats contexts will be needed.  This is not true when the
RDMA driver is about to unregister and frees the MSIX vectors.

This slight error leads to over accouting of the stats contexts needed
after the RDMA driver has unloaded.  This will cause some firmware
warning and error messages in dmesg during subsequent config. changes
or ifdown/ifup.

Fix it by properly accouting for extra stats contexts only if the
RDMA driver is registered and MSIX vectors have been successfully
requested.

Fixes: c027c6b4 ("bnxt_en: get rid of num_stat_ctxs variable")
Reviewed-by: Yongping Zhang <yongping.zhang@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

869c4d5e

r8153_ecm: Add Lenovo Powered USB-C Hub as a fallback of r8152 · 2284bbd0

Leon Schuermann authored Jan 11, 2021

This commit enables the use of the r8153_ecm driver, introduced with
commit c1aedf01 ("net/usb/r8153_ecm: support ECM mode for
RTL8153") for the Lenovo Powered USB-C Hub (17ef:721e) based on the
Realtek RTL8153B chip.

This results in the following driver preference:

- if r8152 is available, use the r8152 driver
- if r8152 is not available, use the r8153_ecm driver

This is done to prevent the NIC from constantly sending pause frames
when the host system enters standby (fixed by using the r8152 driver
in "r8152: Add Lenovo Powered USB-C Travel Hub"), while still allowing
the device to work with the r8153_ecm driver as a fallback.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
Tested-by: Leon Schuermann <leon@is.currently.online>
Link: https://lore.kernel.org/r/20210111190312.12589-3-leon@is.currently.onlineSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2284bbd0

r8152: Add Lenovo Powered USB-C Travel Hub · cb82a549

Leon Schuermann authored Jan 11, 2021

This USB-C Hub (17ef:721e) based on the Realtek RTL8153B chip used to
use the cdc_ether driver. However, using this driver, with the system
suspended the device constantly sends pause-frames as soon as the
receive buffer fills up. This causes issues with other devices, where
some Ethernet switches stop forwarding packets altogether.

Using the Realtek driver (r8152) fixes this issue. Pause frames are no
longer sent while the host system is suspended.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
Tested-by: Leon Schuermann <leon@is.currently.online>
Link: https://lore.kernel.org/r/20210111190312.12589-2-leon@is.currently.onlineSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cb82a549

net: dsa: clear devlink port type before unregistering slave netdevs · 91158e16

Vladimir Oltean authored Jan 12, 2021

Florian reported a use-after-free bug in devlink_nl_port_fill found with
KASAN:

(devlink_nl_port_fill)
(devlink_port_notify)
(devlink_port_unregister)
(dsa_switch_teardown.part.3)
(dsa_tree_teardown_switches)
(dsa_unregister_switch)
(bcm_sf2_sw_remove)
(platform_remove)
(device_release_driver_internal)
(device_links_unbind_consumers)
(device_release_driver_internal)
(device_driver_detach)
(unbind_store)

Allocated by task 31:
 alloc_netdev_mqs+0x5c/0x50c
 dsa_slave_create+0x110/0x9c8
 dsa_register_switch+0xdb0/0x13a4
 b53_switch_register+0x47c/0x6dc
 bcm_sf2_sw_probe+0xaa4/0xc98
 platform_probe+0x90/0xf4
 really_probe+0x184/0x728
 driver_probe_device+0xa4/0x278
 __device_attach_driver+0xe8/0x148
 bus_for_each_drv+0x108/0x158

Freed by task 249:
 free_netdev+0x170/0x194
 dsa_slave_destroy+0xac/0xb0
 dsa_port_teardown.part.2+0xa0/0xb4
 dsa_tree_teardown_switches+0x50/0xc4
 dsa_unregister_switch+0x124/0x250
 bcm_sf2_sw_remove+0x98/0x13c
 platform_remove+0x44/0x5c
 device_release_driver_internal+0x150/0x254
 device_links_unbind_consumers+0xf8/0x12c
 device_release_driver_internal+0x84/0x254
 device_driver_detach+0x30/0x34
 unbind_store+0x90/0x134

What happens is that devlink_port_unregister emits a netlink
DEVLINK_CMD_PORT_DEL message which associates the devlink port that is
getting unregistered with the ifindex of its corresponding net_device.
Only trouble is, the net_device has already been unregistered.

It looks like we can stub out the search for a corresponding net_device
if we clear the devlink_port's type. This looks like a bit of a hack,
but also seems to be the reason why the devlink_port_type_clear function
exists in the first place.

Fixes: 3122433e ("net: dsa: Register devlink ports before calling DSA driver setup()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian fainelli <f.fainelli@gmail.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210112004831.3778323-1-olteanv@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

91158e16

net: dsa: unbind all switches from tree when DSA master unbinds · 07b90056

Vladimir Oltean authored Jan 12, 2021

Currently the following happens when a DSA master driver unbinds while
there are DSA switches attached to it:

$ echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind
------------[ cut here ]------------
WARNING: CPU: 0 PID: 392 at net/core/dev.c:9507
Call trace:
 rollback_registered_many+0x5fc/0x688
 unregister_netdevice_queue+0x98/0x120
 dsa_slave_destroy+0x4c/0x88
 dsa_port_teardown.part.16+0x78/0xb0
 dsa_tree_teardown_switches+0x58/0xc0
 dsa_unregister_switch+0x104/0x1b8
 felix_pci_remove+0x24/0x48
 pci_device_remove+0x48/0xf0
 device_release_driver_internal+0x118/0x1e8
 device_driver_detach+0x28/0x38
 unbind_store+0xd0/0x100

Located at the above location is this WARN_ON:

	/* Notifier chain MUST detach us all upper devices. */
	WARN_ON(netdev_has_any_upper_dev(dev));

Other stacked interfaces, like VLAN, do indeed listen for
NETDEV_UNREGISTER on the real_dev and also unregister themselves at that
time, which is clearly the behavior that rollback_registered_many
expects. But DSA interfaces are not VLAN. They have backing hardware
(platform devices, PCI devices, MDIO, SPI etc) which have a life cycle
of their own and we can't just trigger an unregister from the DSA
framework when we receive a netdev notifier that the master unregisters.

Luckily, there is something we can do, and that is to inform the driver
core that we have a runtime dependency to the DSA master interface's
device, and create a device link where that is the supplier and we are
the consumer. Having this device link will make the DSA switch unbind
before the DSA master unbinds, which is enough to avoid the WARN_ON from
rollback_registered_many.

Note that even before the blamed commit, DSA did nothing intelligent
when the master interface got unregistered either. See the discussion
here:
https://lore.kernel.org/netdev/20200505210253.20311-1-f.fainelli@gmail.com/
But this time, at least the WARN_ON is loud enough that the
upper_dev_link commit can be blamed.

The advantage with this approach vs dev_hold(master) in the attached
link is that the latter is not meant for long term reference counting.
With dev_hold, the only thing that will happen is that when the user
attempts an unbind of the DSA master, netdev_wait_allrefs will keep
waiting and waiting, due to DSA keeping the refcount forever. DSA would
not access freed memory corresponding to the master interface, but the
unbind would still result in a freeze. Whereas with device links,
graceful teardown is ensured. It even works with cascaded DSA trees.

$ echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
[ 1818.797546] device swp0 left promiscuous mode
[ 1819.301112] sja1105 spi2.0: Link is Down
[ 1819.307981] DSA: tree 1 torn down
[ 1819.312408] device eno2 left promiscuous mode
[ 1819.656803] mscc_felix 0000:00:00.5: Link is Down
[ 1819.667194] DSA: tree 0 torn down
[ 1819.711557] fsl_enetc 0000:00:00.2 eno2: Link is Down

This approach allows us to keep the DSA framework absolutely unchanged,
and the driver core will just know to unbind us first when the master
goes away - as opposed to the large (and probably impossible) rework
required if attempting to listen for NETDEV_UNREGISTER.

As per the documentation at Documentation/driver-api/device_link.rst,
specifying the DL_FLAG_AUTOREMOVE_CONSUMER flag causes the device link
to be automatically purged when the consumer fails to probe or later
unbinds. So we don't need to keep the consumer_link variable in struct
dsa_switch.

Fixes: 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210111230943.3701806-1-olteanv@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

07b90056

net: phy: smsc: fix clk error handling · a18caa97

Marco Felsch authored Jan 11, 2021

Commit bedd8d78 ("net: phy: smsc: LAN8710/20: add phy refclk in
support") added the phy clk support. The commit already checks if
clk_get_optional() throw an error but instead of returning the error it
ignores it.

Fixes: bedd8d78 ("net: phy: smsc: LAN8710/20: add phy refclk in support")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210111085932.28680-1-m.felsch@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a18caa97

12 Jan, 2021 7 commits

net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands · df85bc14

Petr Machata authored Jan 11, 2021

In commit 826f328e ("net: dcb: Validate netlink message in DCB
handler"), Linux started rejecting RTM_GETDCB netlink messages if they
contained a set-like DCB_CMD_ command.

The reason was that privileges were only verified for RTM_SETDCB messages,
but the value that determined the action to be taken is the command, not
the message type. And validation of message type against the DCB command
was the obvious missing piece.

Unfortunately it turns out that mlnx_qos, a somewhat widely deployed tool
for configuration of DCB, accesses the DCB set-like APIs through
RTM_GETDCB.

Therefore do not bounce the discrepancy between message type and command.
Instead, in addition to validating privileges based on the actual message
type, validate them also based on the expected message type. This closes
the loophole of allowing DCB configuration on non-admin accounts, while
maintaining backward compatibility.

Fixes: 2f90b865 ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver")
Fixes: 826f328e ("net: dcb: Validate netlink message in DCB handler")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/a3edcfda0825f2aa2591801c5232f2bbf2d8a554.1610384801.git.me@pmachata.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

df85bc14

Merge branch 'skb-frag-kmap_atomic-fixes' · 1ee527a7

Jakub Kicinski authored Jan 11, 2021

Willem de Bruijn says:

====================
skb frag: kmap_atomic fixes

skb frags may be backed by highmem and/or compound pages. Various
code calls kmap_atomic to safely access highmem pages. But this
needs additional care for compound pages. Fix a few issues:

patch 1 expect kmap mappings with CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
patch 2 fixes kmap_atomic + compound page support in skb_seq_read
patch 3 fixes kmap_atomic + compound page support in esp
====================

Link: https://lore.kernel.org/r/20210109221834.3459768-1-willemdebruijn.kernel@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1ee527a7

esp: avoid unneeded kmap_atomic call · 9bd6b629

Willem de Bruijn authored Jan 09, 2021

esp(6)_output_head uses skb_page_frag_refill to allocate a buffer for
the esp trailer.

It accesses the page with kmap_atomic to handle highmem. But
skb_page_frag_refill can return compound pages, of which
kmap_atomic only maps the first underlying page.

skb_page_frag_refill does not return highmem, because flag
__GFP_HIGHMEM is not set. ESP uses it in the same manner as TCP.
That also does not call kmap_atomic, but directly uses page_address,
in skb_copy_to_page_nocache. Do the same for ESP.

This issue has become easier to trigger with recent kmap local
debugging feature CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP.

Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9bd6b629

net: compound page support in skb_seq_read · 97550f6f

Willem de Bruijn authored Jan 09, 2021

skb_seq_read iterates over an skb, returning pointer and length of
the next data range with each call.

It relies on kmap_atomic to access highmem pages when needed.

An skb frag may be backed by a compound page, but kmap_atomic maps
only a single page. There are not enough kmap slots to always map all
pages concurrently.

Instead, if kmap_atomic is needed, iterate over each page.

As this increases the number of calls, avoid this unless needed.
The necessary condition is captured in skb_frag_must_loop.

I tried to make the change as obvious as possible. It should be easy
to verify that nothing changes if skb_frag_must_loop returns false.

Tested:
  On an x86 platform with
    CONFIG_HIGHMEM=y
    CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
    CONFIG_NETFILTER_XT_MATCH_STRING=y

  Run
    ip link set dev lo mtu 1500
    iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT
    dd if=/dev/urandom of=in bs=1M count=20
    nc -l -p 8000 > /dev/null &
    nc -w 1 -q 0 localhost 8000 < in
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

97550f6f

net: support kmap_local forced debugging in skb_frag_foreach · 29766bcf

Willem de Bruijn authored Jan 09, 2021

Skb frags may be backed by highmem and/or compound pages. Highmem
pages need kmap_atomic mappings to access. But kmap_atomic maps a
single page, not the entire compound page.

skb_foreach_page iterates over an skb frag, in one step in the common
case, page by page only if kmap_atomic must be called for each page.
The decision logic is captured in skb_frag_must_loop.

CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP extends kmap from highmem to all
pages, to increase code coverage.

Extend skb_frag_must_loop to this new condition.

Link: https://lore.kernel.org/linux-mm/20210106180132.41dc249d@gandalf.local.home/
Fixes: 0e91a0c6 ("mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

29766bcf

rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request · e56b3d94

Andrey Zhizhikin authored Jan 08, 2021

MSFT ActiveSync implementation requires that the size of the response for
incoming query is to be provided in the request input length. Failure to
set the input size proper results in failed request transfer, where the
ActiveSync counterpart reports the NDIS_STATUS_INVALID_LENGTH (0xC0010014L)
error.

Set the input size for OID_GEN_PHYSICAL_MEDIUM query to the expected size
of the response in order for the ActiveSync to properly respond to the
request.

Fixes: 039ee17d ("rndis_host: Add RNDIS physical medium checking into generic_rndis_bind()")
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Link: https://lore.kernel.org/r/20210108095839.3335-1-andrey.zhizhikin@leica-geosystems.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e56b3d94

net: mvpp2: Remove Pause and Asym_Pause support · 6f83802a

Stefan Chulski authored Jan 10, 2021

Packet Processor hardware not connected to MAC flow control unit and
cannot support TX flow control.
This patch disable flow control support.

Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/r/1610306582-16641-1-git-send-email-stefanc@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6f83802a

11 Jan, 2021 1 commit

dt-bindings: net: dwmac: fix queue priority documentation · 93828834

Seb Laveze authored Jan 11, 2021

The priority field is not the queue priority (queue priority is fixed)
but a bitmask of priorities assigned to this queue.

In receive, priorities relate to tagged frames priorities.

In transmit, priorities relate to PFC frames.
Signed-off-by: Seb Laveze <sebastien.laveze@nxp.com>
Link: https://lore.kernel.org/r/20210111081406.1348622-1-sebastien.laveze@oss.nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

93828834

10 Jan, 2021 4 commits

dt-bindings: net: renesas,etheravb: RZ/G2H needs tx-internal-delay-ps · f97844f9

Geert Uytterhoeven authored Jan 05, 2021

The merge resolution of the interaction of commits 307eea32
("dt-bindings: net: renesas,ravb: Add support for r8a774e1 SoC") and
d7adf633 ("dt-bindings: net: renesas,etheravb: Convert to
json-schema") missed that "tx-internal-delay-ps" should be a required
property on RZ/G2H.

Fixes: 8b0308fe ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20210105151516.1540653-1-geert+renesas@glider.beSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f97844f9

Merge branch 'mlxsw-core-thermal-control-fixes' · 26c49f0d

Jakub Kicinski authored Jan 09, 2021

Ido Schimmel says:

====================
mlxsw: core: Thermal control fixes

This series includes two fixes for thermal control in mlxsw.

Patch #1 validates that the alarm temperature threshold read from a
transceiver is above the warning temperature threshold. If not, the
current thresholds are maintained. It was observed that some transceiver
might be unreliable and sometimes report a too low alarm temperature
threshold which would result in thermal shutdown of the system.

Patch #2 increases the temperature threshold above which thermal
shutdown is triggered for the ASIC thermal zone. It is currently too low
and might result in thermal shutdown under perfectly fine operational
conditions.
====================

Link: https://lore.kernel.org/r/20210108145210.1229820-1-idosch@idosch.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

26c49f0d

mlxsw: core: Increase critical threshold for ASIC thermal zone · b06ca3d5

Vadim Pasternak authored Jan 08, 2021

Increase critical threshold for ASIC thermal zone from 110C to 140C
according to the system hardware requirements. All the supported ASICs
(Spectrum-1, Spectrum-2, Spectrum-3) could be still operational with ASIC
temperature below 140C. With the old critical threshold value system
can perform unjustified shutdown.

All the systems equipped with the above ASICs implement thermal
protection mechanism at firmware level and firmware could decide to
perform system thermal shutdown in case the temperature is below 140C.
So with the new threshold system will not meltdown, while thermal
operating range will be aligned with hardware abilities.

Fixes: 41e76084 ("mlxsw: core: Replace thermal temperature trips with defines")
Fixes: a50c1e35 ("mlxsw: core: Implement thermal zone")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b06ca3d5

mlxsw: core: Add validation of transceiver temperature thresholds · 57726ebe

Vadim Pasternak authored Jan 08, 2021

Validate thresholds to avoid a single failure due to some transceiver
unreliability. Ignore the last readouts in case warning temperature is
above alarm temperature, since it can cause unexpected thermal
shutdown. Stay with the previous values and refresh threshold within
the next iteration.

This is a rare scenario, but it was observed at a customer site.

Fixes: 6a79507c ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

57726ebe

09 Jan, 2021 12 commits

tipc: fix NULL deref in tipc_link_xmit() · b7741344

Hoang Le authored Jan 08, 2021

The buffer list can have zero skb as following path:
tipc_named_node_up()->tipc_node_xmit()->tipc_link_xmit(), so
we need to check the list before casting an &sk_buff.

Fault report:
 [] tipc: Bulk publication failure
 [] general protection fault, probably for non-canonical [#1] PREEMPT [...]
 [] KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf]
 [] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.10.0-rc4+ #2
 [] Hardware name: Bochs ..., BIOS Bochs 01/01/2011
 [] RIP: 0010:tipc_link_xmit+0xc1/0x2180
 [] Code: 24 b8 00 00 00 00 4d 39 ec 4c 0f 44 e8 e8 d7 0a 10 f9 48 [...]
 [] RSP: 0018:ffffc90000006ea0 EFLAGS: 00010202
 [] RAX: dffffc0000000000 RBX: ffff8880224da000 RCX: 1ffff11003d3cc0d
 [] RDX: 0000000000000019 RSI: ffffffff886007b9 RDI: 00000000000000c8
 [] RBP: ffffc90000007018 R08: 0000000000000001 R09: fffff52000000ded
 [] R10: 0000000000000003 R11: fffff52000000dec R12: ffffc90000007148
 [] R13: 0000000000000000 R14: 0000000000000000 R15: ffffc90000007018
 [] FS:  0000000000000000(0000) GS:ffff888037400000(0000) knlGS:000[...]
 [] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [] CR2: 00007fffd2db5000 CR3: 000000002b08f000 CR4: 00000000000006f0

Fixes: af9b028e ("tipc: make media xmit call outside node spinlock context")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Link: https://lore.kernel.org/r/20210108071337.3598-1-hoang.h.le@dektech.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b7741344

selftests/tls: fix selftests after adding ChaCha20-Poly1305 · 3502bd9b

Vadim Fedorenko authored Jan 09, 2021

TLS selftests where broken because of wrong variable types used.
Fix it by changing u16 -> uint16_t

Fixes: 4f336e88 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1610141865-7142-1-git-send-email-vfedorenko@novek.ruSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3502bd9b

net: ipv6: Validate GSO SKB before finish IPv6 processing · b210de4f

Aya Levin authored Jan 07, 2021

There are cases where GSO segment's length exceeds the egress MTU:
 - Forwarding of a TCP GRO skb, when DF flag is not set.
 - Forwarding of an skb that arrived on a virtualisation interface
   (virtio-net/vhost/tap) with TSO/GSO size set by other network
   stack.
 - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
   interface with a smaller MTU.
 - Arriving GRO skb (or GSO skb in a virtualised environment) that is
   bridged to a NETIF_F_TSO tunnel stacked over an interface with an
   insufficient MTU.

If so:
 - Consume the SKB and its segments.
 - Issue an ICMP packet with 'Packet Too Big' message containing the
   MTU, allowing the source host to reduce its Path MTU appropriately.

Note: These cases are handled in the same manner in IPv4 output finish.
This patch aligns the behavior of IPv6 and the one of IPv4.

Fixes: 9e508490 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b210de4f

netxen_nic: fix MSI/MSI-x interrupts · a2bc221b

Manish Chopra authored Jan 07, 2021

For all PCI functions on the netxen_nic adapter, interrupt
mode (INTx or MSI) configuration is dependent on what has
been configured by the PCI function zero in the shared
interrupt register, as these adapters do not support mixed
mode interrupts among the functions of a given adapter.

Logic for setting MSI/MSI-x interrupt mode in the shared interrupt
register based on PCI function id zero check is not appropriate for
all family of netxen adapters, as for some of the netxen family
adapters PCI function zero is not really meant to be probed/loaded
in the host but rather just act as a management function on the device,
which caused all the other PCI functions on the adapter to always use
legacy interrupt (INTx) mode instead of choosing MSI/MSI-x interrupt mode.

This patch replaces that check with port number so that for all
type of adapters driver attempts for MSI/MSI-x interrupt modes.

Fixes: b37eb210 ("netxen_nic: Avoid mixed mode interrupts")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20210107101520.6735-1-manishc@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a2bc221b

Merge branch 'net-fix-issues-around-register_netdevice-failures' · c49243e8

Jakub Kicinski authored Jan 08, 2021

Jakub Kicinski says:

====================
net: fix issues around register_netdevice() failures

This series attempts to clean up the life cycle of struct
net_device. Dave has added dev->needs_free_netdev in the
past to fix double frees, we can lean on that mechanism
a little more to fix remaining issues with register_netdevice().

This is the next chapter of the saga which already includes:
commit 0e0eee24 ("net: correct error path in rtnl_newlink()")
commit e51fb152 ("rtnetlink: fix a memory leak when ->newlink fails")
commit cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
commit 93ee31f1 ("[NET]: Fix free_netdev on register_netdev failure.")
commit 814152a8 ("net: fix memleak in register_netdevice()")
commit 10cc514f ("net: Fix null de-reference of device refcount")

The immediate problem which gets fixed here is that calling
free_netdev() right after unregister_netdevice() is illegal
because we need to release rtnl_lock first, to let the
unregistration finish. Note that unregister_netdevice() is
just a wrapper of unregister_netdevice_queue(), it only
does half of the job.

Where this limitation becomes most problematic is in failure
modes of register_netdevice(). There is a notifier call right
at the end of it, which lets other subsystems veto the entire
thing. At which point we should really go through a full
unregister_netdevice(), but we can't because callers may
go straight to free_netdev() after the failure, and that's
no bueno (see the previous paragraph).

This set makes free_netdev() more lenient, when device
is still being unregistered free_netdev() will simply set
dev->needs_free_netdev and let the unregister process do
the freeing.

With the free_netdev() problem out of the way failures in
register_netdevice() can make use of net_todo, again.
Users are still expected to call free_netdev() right after
failure but that will only set dev->needs_free_netdev.

To prevent the pathological case of:

 dev->needs_free_netdev = true;
 if (register_netdevice(dev)) {
   rtnl_unlock();
   free_netdev(dev);
 }

make register_netdevice()'s failure clear dev->needs_free_netdev.

Problems described above are only present with register_netdevice() /
unregister_netdevice(). We have two parallel APIs for registration
of devices:
 - those called outside rtnl_lock (register_netdev(), and
   unregister_netdev());
 - and those to be used under rtnl_lock - register_netdevice()
   and unregister_netdevice().
The former is trivial and has no problems. The alternative
approach to fix the latter would be to also separate the
freeing functions - i.e. add free_netdevice(). This has been
implemented (incl. converting all relevant calls in the tree)
but it feels a little unnecessary to put the burden of choosing
the right free_netdev{,ice}() call on the programmer when we
can "just do the right thing" by default.
====================

Link: https://lore.kernel.org/r/20210106184007.1821480-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c49243e8

net: make sure devices go through netdev_wait_all_refs · 766b0515

Jakub Kicinski authored Jan 06, 2021

If register_netdevice() fails at the very last stage - the
notifier call - some subsystems may have already seen it and
grabbed a reference. struct net_device can't be freed right
away without calling netdev_wait_all_refs().

Now that we have a clean interface in form of dev->needs_free_netdev
and lenient free_netdev() we can undo what commit 93ee31f1 ("[NET]:
Fix free_netdev on register_netdev failure.") has done and complete
the unregistration path by bringing the net_set_todo() call back.

After registration fails user is still expected to explicitly
free the net_device, so make sure ->needs_free_netdev is cleared,
otherwise rolling back the registration will cause the old double
free for callers who release rtnl_lock before the free.

This also solves the problem of priv_destructor not being called
on notifier error.

net_set_todo() will be moved back into unregister_netdevice_queue()
in a follow up.
Reported-by: Hulk Robot <hulkci@huawei.com>
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

766b0515

net: make free_netdev() more lenient with unregistering devices · c269a24c

Jakub Kicinski authored Jan 06, 2021

There are two flavors of handling netdev registration:
 - ones called without holding rtnl_lock: register_netdev() and
   unregister_netdev(); and
 - those called with rtnl_lock held: register_netdevice() and
   unregister_netdevice().

While the semantics of the former are pretty clear, the same can't
be said about the latter. The netdev_todo mechanism is utilized to
perform some of the device unregistering tasks and it hooks into
rtnl_unlock() so the locked variants can't actually finish the work.
In general free_netdev() does not mix well with locked calls. Most
drivers operating under rtnl_lock set dev->needs_free_netdev to true
and expect core to make the free_netdev() call some time later.

The part where this becomes most problematic is error paths. There is
no way to unwind the state cleanly after a call to register_netdevice(),
since unreg can't be performed fully without dropping locks.

Make free_netdev() more lenient, and defer the freeing if device
is being unregistered. This allows error paths to simply call
free_netdev() both after register_netdevice() failed, and after
a call to unregister_netdevice() but before dropping rtnl_lock.

Simplify the error paths which are currently doing gymnastics
around free_netdev() handling.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c269a24c

docs: net: explain struct net_device lifetime · 2b446e65

Jakub Kicinski authored Jan 06, 2021

Explain the two basic flows of struct net_device's operation.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2b446e65

ppp: fix refcount underflow on channel unbridge · c1787ffd

Tom Parkin authored Jan 07, 2021

When setting up a channel bridge, ppp_bridge_channels sets the
pch->bridge field before taking the associated reference on the bridge
file instance.

This opens up a refcount underflow bug if ppp_bridge_channels called
via. iotcl runs concurrently with ppp_unbridge_channels executing via.
file release.

The bug is triggered by ppp_bridge_channels taking the error path
through the 'err_unset' label. In this scenario, pch->bridge is set,
but the reference on the bridged channel will not be taken because
the function errors out. If ppp_unbridge_channels observes pch->bridge
before it is unset by the error path, it will erroneously drop the
reference on the bridged channel and cause a refcount underflow.

To avoid this, ensure that ppp_bridge_channels holds a reference on
each channel in advance of setting the bridge pointers.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Fixes: 4cf476ce ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls")
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20210107181315.3128-1-tparkin@katalix.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c1787ffd

udp: Prevent reuseport_select_sock from reading uninitialized socks · fd2ddef0

Baptiste Lepers authored Jan 07, 2021

reuse->socks[] is modified concurrently by reuseport_add_sock. To
prevent reading values that have not been fully initialized, only read
the array up until the last known safe index instead of incorrectly
re-reading the last index of the array.

Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20210107051110.12247-1-baptiste.lepers@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fd2ddef0

net: fix use-after-free when UDP GRO with shared fraglist · 53475c5d

Dongseok Yi authored Jan 08, 2021

skbs in fraglist could be shared by a BPF filter loaded at TC. If TC
writes, it will call skb_ensure_writable -> pskb_expand_head to create
a private linear section for the head_skb. And then call
skb_clone_fraglist -> skb_get on each skb in the fraglist.

skb_segment_list overwrites part of the skb linear section of each
fragment itself. Even after skb_clone, the frag_skbs share their
linear section with their clone in PF_PACKET.

Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have
a link for the same frag_skbs chain. If a new skb (not frags) is
queued to one of the sk_receive_queue, multiple ptypes can see and
release this. It causes use-after-free.

[ 4443.426215] ------------[ cut here ]------------
[ 4443.426222] refcount_t: underflow; use-after-free.
[ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190
refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8
[ 4443.426808] Call trace:
[ 4443.426813]  refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426823]  skb_release_data+0x144/0x264
[ 4443.426828]  kfree_skb+0x58/0xc4
[ 4443.426832]  skb_queue_purge+0x64/0x9c
[ 4443.426844]  packet_set_ring+0x5f0/0x820
[ 4443.426849]  packet_setsockopt+0x5a4/0xcd0
[ 4443.426853]  __sys_setsockopt+0x188/0x278
[ 4443.426858]  __arm64_sys_setsockopt+0x28/0x38
[ 4443.426869]  el0_svc_common+0xf0/0x1d0
[ 4443.426873]  el0_svc_handler+0x74/0x98
[ 4443.426880]  el0_svc+0x8/0xc

Fixes: 3a1296a3 (net: Support GRO/GSO fraglist chaining.)
Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

53475c5d

net: ipa: modem: add missing SET_NETDEV_DEV() for proper sysfs links · afba9dc1

Stephan Gerhold authored Jan 06, 2021

At the moment it is quite hard to identify the network interface
provided by IPA in userspace components: The network interface is
created as virtual device, without any link to the IPA device.
The interface name ("rmnet_ipa%d") is the only indication that the
network interface belongs to IPA, but this is not very reliable.

Add SET_NETDEV_DEV() to associate the network interface with the
IPA parent device. This allows userspace services like ModemManager
to properly identify that this network interface is provided by IPA
and belongs to the modem.

Cc: Alex Elder <elder@kernel.org>
Fixes: a646d6ec ("soc: qcom: ipa: modem and microcontroller")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20210106100755.56800-1-stephan@gerhold.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

afba9dc1

08 Jan, 2021 3 commits

Merge tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 6279d812

Linus Torvalds authored Jan 08, 2021

Pull more networking fixes from Jakub Kicinski:
 "Slightly lighter pull request to get back into the Thursday cadence.

  Current release - always broken:

   - can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions

   - dsa: hellcreek: fix led_classdev build errors

  Previous releases - regressions:

   - ipv6: fib: flush exceptions when purging route to avoid netdev
     reference leak

   - ip_tunnels: fix pmtu check in nopmtudisc mode

   - ip: always refragment ip defragmented packets to avoid MTU issues
     when forwarding through tunnels, correct "packet too big" message
     is prohibitively tricky to generate

   - s390/qeth: fix locking for discipline setup / removal and during
     recovery to prevent both deadlocks and races

   - mlx5: Use port_num 1 instead of 0 when delete a RoCE address

  Previous releases - always broken:

   - cdc_ncm: correct overhead calculation in delayed_ndp_size to
     prevent out of bound accesses with Huawei 909s-120 LTE module

   - fix stmmac dwmac-sun8i suspend/resume:
           - PHY being left powered off
           - MAC syscon configuration being reset
           - reference to the reset controller being improperly dropped

   - qrtr: fix null-ptr-deref in qrtr_ns_remove

   - can: tcan4x5x: fix bittiming const, use common bittiming from m_can
     driver

   - mlx5e: CT: Use per flow counter when CT flow accounting is enabled

   - mlx5e: Fix SWP offsets when vlan inserted by driver

  Misc:

   - bpf: Fix a task_iter bug caused by a bpf -> net merge conflict
     resolution

  And the usual many fixes to various error paths"

* tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
  s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
  s390/qeth: fix locking for discipline setup / removal
  s390/qeth: fix deadlock during recovery
  selftests: fib_nexthops: Fix wrong mausezahn invocation
  nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
  nexthop: Unlink nexthop group entry in error path
  nexthop: Fix off-by-one error in error path
  octeontx2-af: fix memory leak of lmac and lmac->name
  chtls: Fix chtls resources release sequence
  chtls: Added a check to avoid NULL pointer dereference
  chtls: Replace skb_dequeue with skb_peek
  chtls: Avoid unnecessary freeing of oreq pointer
  chtls: Fix panic when route to peer not configured
  chtls: Remove invalid set_tcb call
  chtls: Fix hardware tid leak
  net: ip: always refragment ip defragmented packets
  net: fix pmtu check in nopmtudisc mode
  selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking
  docs: octeontx2: tune rst markup
  ...

6279d812

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · ea1c87c1

Linus Torvalds authored Jan 08, 2021

Pull crypto fixes from Herbert Xu:
 "This fixes a functional bug in arm/chacha-neon as well as a potential
  buffer overflow in ecdh"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ecdh - avoid buffer overflow in ecdh_set_secret()
  crypto: arm/chacha-neon - add missing counter increment

ea1c87c1

poll: fix performance regression due to out-of-line __put_user() · ef0ba055

Linus Torvalds authored Jan 07, 2021

The kernel test robot reported a -5.8% performance regression on the
"poll2" test of will-it-scale, and bisected it to commit d55564cf
("x86: Make __put_user() generate an out-of-line call").

I didn't expect an out-of-line __put_user() to matter, because no normal
core code should use that non-checking legacy version of user access any
more.  But I had overlooked the very odd poll() usage, which does a
__put_user() to update the 'revents' values of the poll array.

Now, Al Viro correctly points out that instead of updating just the
'revents' field, it would be much simpler to just copy the _whole_
pollfd entry, and then we could just use "copy_to_user()" on the whole
array of entries, the same way we use "copy_from_user()" a few lines
earlier to get the original values.

But that is not what we've traditionally done, and I worry that threaded
applications might be concurrently modifying the other fields of the
pollfd array.  So while Al's suggestion is simpler - and perhaps worth
trying in the future - this instead keeps the "just update revents"
model.

To fix the performance regression, use the modern "unsafe_put_user()"
instead of __put_user(), with the proper "user_write_access_begin()"
guarding in place. This improves code generation enormously.

Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Laight <David.Laight@aculab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ef0ba055