1. 17 Mar, 2023 15 commits
  2. 16 Mar, 2023 15 commits
    • David S. Miller's avatar
      Merge branch 'virtio_net-xdp-bugs' · 04504793
      David S. Miller authored
      Xuan Zhuo says:
      
      ====================
      virtio_net: fix two bugs related to XDP
      
      This patch set fixes two bugs related to XDP.
      These two patch is not associated.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04504793
    • Xuan Zhuo's avatar
      virtio_net: free xdp shinfo frags when build_skb_from_xdp_buff() fails · 1a3bd6ea
      Xuan Zhuo authored
      build_skb_from_xdp_buff() may return NULL, in this case
      we need to free the frags of xdp shinfo.
      
      Fixes: fab89baf ("virtio-net: support multi-buffer xdp")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a3bd6ea
    • Xuan Zhuo's avatar
      virtio_net: fix page_to_skb() miss headroom · fa0f1ba7
      Xuan Zhuo authored
      Because headroom is not passed to page_to_skb(), this causes the shinfo
      exceeds the range. Then the frags of shinfo are changed by other process.
      
      [  157.724634] stack segment: 0000 [#1] PREEMPT SMP NOPTI
      [  157.725358] CPU: 3 PID: 679 Comm: xdp_pass_user_f Tainted: G            E      6.2.0+ #150
      [  157.726401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/4
      [  157.727820] RIP: 0010:skb_release_data+0x11b/0x180
      [  157.728449] Code: 44 24 02 48 83 c3 01 39 d8 7e be 48 89 d8 48 c1 e0 04 41 80 7d 7e 00 49 8b 6c 04 30 79 0c 48 89 ef e8 89 b
      [  157.730751] RSP: 0018:ffffc90000178b48 EFLAGS: 00010202
      [  157.731383] RAX: 0000000000000010 RBX: 0000000000000001 RCX: 0000000000000000
      [  157.732270] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff888100dd0b00
      [  157.733117] RBP: 5d5d76010f6e2408 R08: ffff888100dd0b2c R09: 0000000000000000
      [  157.734013] R10: ffffffff82effd30 R11: 000000000000a14e R12: ffff88810981ffc0
      [  157.734904] R13: ffff888100dd0b00 R14: 0000000000000002 R15: 0000000000002310
      [  157.735793] FS:  00007f06121d9740(0000) GS:ffff88842fcc0000(0000) knlGS:0000000000000000
      [  157.736794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  157.737522] CR2: 00007ffd9a56c084 CR3: 0000000104bda001 CR4: 0000000000770ee0
      [  157.738420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  157.739283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  157.740146] PKRU: 55555554
      [  157.740502] Call Trace:
      [  157.740843]  <IRQ>
      [  157.741117]  kfree_skb_reason+0x50/0x120
      [  157.741613]  __udp4_lib_rcv+0x52b/0x5e0
      [  157.742132]  ip_protocol_deliver_rcu+0xaf/0x190
      [  157.742715]  ip_local_deliver_finish+0x77/0xa0
      [  157.743280]  ip_sublist_rcv_finish+0x80/0x90
      [  157.743834]  ip_list_rcv_finish.constprop.0+0x16f/0x190
      [  157.744493]  ip_list_rcv+0x126/0x140
      [  157.744952]  __netif_receive_skb_list_core+0x29b/0x2c0
      [  157.745602]  __netif_receive_skb_list+0xed/0x160
      [  157.746190]  ? udp4_gro_receive+0x275/0x350
      [  157.746732]  netif_receive_skb_list_internal+0xf2/0x1b0
      [  157.747398]  napi_gro_receive+0xd1/0x210
      [  157.747911]  virtnet_receive+0x75/0x1c0
      [  157.748422]  virtnet_poll+0x48/0x1b0
      [  157.748878]  __napi_poll+0x29/0x1b0
      [  157.749330]  net_rx_action+0x27a/0x340
      [  157.749812]  __do_softirq+0xf3/0x2fb
      [  157.750298]  do_softirq+0xa2/0xd0
      [  157.750745]  </IRQ>
      [  157.751563]  <TASK>
      [  157.752329]  __local_bh_enable_ip+0x6d/0x80
      [  157.753178]  virtnet_xdp_set+0x482/0x860
      [  157.754159]  ? __pfx_virtnet_xdp+0x10/0x10
      [  157.755129]  dev_xdp_install+0xa4/0xe0
      [  157.756033]  dev_xdp_attach+0x20b/0x5e0
      [  157.756933]  do_setlink+0x82e/0xc90
      [  157.757777]  ? __nla_validate_parse+0x12b/0x1e0
      [  157.758744]  rtnl_setlink+0xd8/0x170
      [  157.759549]  ? mod_objcg_state+0xcb/0x320
      [  157.760328]  ? security_capable+0x37/0x60
      [  157.761209]  ? security_capable+0x37/0x60
      [  157.762072]  rtnetlink_rcv_msg+0x145/0x3d0
      [  157.762929]  ? ___slab_alloc+0x327/0x610
      [  157.763754]  ? __alloc_skb+0x141/0x170
      [  157.764533]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
      [  157.765422]  netlink_rcv_skb+0x58/0x110
      [  157.766229]  netlink_unicast+0x21f/0x330
      [  157.766951]  netlink_sendmsg+0x240/0x4a0
      [  157.767654]  sock_sendmsg+0x93/0xa0
      [  157.768434]  ? sockfd_lookup_light+0x12/0x70
      [  157.769245]  __sys_sendto+0xfe/0x170
      [  157.770079]  ? handle_mm_fault+0xe9/0x2d0
      [  157.770859]  ? preempt_count_add+0x51/0xa0
      [  157.771645]  ? up_read+0x3c/0x80
      [  157.772340]  ? do_user_addr_fault+0x1e9/0x710
      [  157.773166]  ? kvm_read_and_reset_apf_flags+0x49/0x60
      [  157.774087]  __x64_sys_sendto+0x29/0x30
      [  157.774856]  do_syscall_64+0x3c/0x90
      [  157.775518]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [  157.776382] RIP: 0033:0x7f06122def70
      
      Fixes: 18117a84 ("virtio-net: remove xdp related info from page_to_skb()")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa0f1ba7
    • Rob Herring's avatar
      net: Use of_property_read_bool() for boolean properties · 1a87e641
      Rob Herring authored
      It is preferred to use typed property access functions (i.e.
      of_property_read_<type> functions) rather than low-level
      of_get_property/of_find_property functions for reading properties.
      Convert reading boolean properties to of_property_read_bool().
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
      Acked-by: default avatarKalle Valo <kvalo@kernel.org>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Acked-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Reviewed-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a87e641
    • David S. Miller's avatar
      Merge branch 'net-dsa-marvell-mtu-reporting' · 65d63e82
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix MTU reporting for Marvell DSA switches where we can't change it
      
      As explained in patch 2, the driver doesn't know how to change the MTU
      on MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290, and there
      is a regression where it actually reports an MTU value below the
      Ethernet standard (1500).
      
      Fixing that shows another issue where DSA is unprepared to be told that
      a switch supports an MTU of only 1500, and still errors out. That is
      addressed by patch 1.
      
      Testing was not done on "real" hardware, but on a different Marvell DSA
      switch, with code modified such that the driver doesn't know how to
      change the MTU on that, either.
      
      A key assumption is that these switches don't need any MTU configuration
      to pass full MTU-sized, DSA-tagged packets, which seems like a
      reasonable assumption to make. My 6390 and 6190 switches, with
      .port_set_jumbo_size commented out, certainly don't seem to have any
      problem passing MTU-sized traffic, as can be seen in this iperf3 session
      captured with tcpdump on the DSA master:
      
      $MAC > $MAC, Marvell DSA mode Forward, dev 2, port 8, untagged, VID 1000,
      	FPri 0, ethertype IPv4 (0x0800), length 1518:
      	10.0.0.69.49590 > 10.0.0.1.5201: Flags [.], seq 81088:82536,
      	ack 1, win 502, options [nop,nop,TS val 2221498829 ecr 3012859850],
      	length 1448
      
      I don't want to go all the way and say that the adjustment made by
      commit b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when
      setting MTU for DSA and CPU ports") is completely unnecessary, just that
      there's an equally good chance that the switches with unknown MTU
      configuration procedure "just work".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d63e82
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290 · 7e951737
      Vladimir Oltean authored
      There are 3 classes of switch families that the driver is aware of, as
      far as mv88e6xxx_change_mtu() is concerned:
      
      - MTU configuration is available per port. Here, the
        chip->info->ops->port_set_jumbo_size() method will be present.
      
      - MTU configuration is global to the switch. Here, the
        chip->info->ops->set_max_frame_size() method will be present.
      
      - We don't know how to change the MTU. Here, none of the above methods
        will be present.
      
      Switch families MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290
      fall in category 3.
      
      The blamed commit has adjusted the MTU for all 3 categories by EDSA_HLEN
      (8 bytes), resulting in a new maximum MTU of 1492 being reported by the
      driver for these switches.
      
      I don't have the hardware to test, but I do have a MV88E6390 switch on
      which I can simulate this by commenting out its .port_set_jumbo_size
      definition from mv88e6390_ops. The result is this set of messages at
      probe time:
      
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 1
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 2
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 3
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 4
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 5
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 6
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 7
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 8
      
      It is highly implausible that there exist Ethernet switches which don't
      support the standard MTU of 1500 octets, and this is what the DSA
      framework says as well - the error comes from dsa_slave_create() ->
      dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN).
      
      But the error messages are alarming, and it would be good to suppress
      them.
      
      As a consequence of this unlikeliness, we reimplement mv88e6xxx_get_max_mtu()
      and mv88e6xxx_change_mtu() on switches from the 3rd category as follows:
      the maximum supported MTU is 1500, and any request to set the MTU to a
      value larger than that fails in dev_validate_mtu().
      
      Fixes: b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e951737
    • Vladimir Oltean's avatar
      net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu() · 636e8adf
      Vladimir Oltean authored
      Currently, when dsa_slave_change_mtu() is called on a user port where
      dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code
      will stumble upon this check:
      
      	if (new_master_mtu > mtu_limit)
      		return -ERANGE;
      
      because new_master_mtu is adjusted for the tagger overhead but mtu_limit
      is not.
      
      But it would be good if the logic went through, for example if the DSA
      master really depends on an MTU adjustment to accept DSA-tagged frames.
      
      To make the code pass through the check, we need to adjust mtu_limit for
      the overhead as well, if the minimum restriction was caused by the DSA
      user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU
      are always offset by the protocol overhead.
      
      Currently no drivers return 1500 .port_max_mtu(), but this is only
      temporary and a bug in itself - mv88e6xxx should have done that, but
      since commit b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when
      setting MTU for DSA and CPU ports") it no longer does. This is a
      preparation for fixing that.
      
      Fixes: bfcb8132 ("net: dsa: configure the MTU for switch ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      636e8adf
    • Maciej Fijalkowski's avatar
      ice: xsk: disable txq irq before flushing hw · b830c964
      Maciej Fijalkowski authored
      ice_qp_dis() intends to stop a given queue pair that is a target of xsk
      pool attach/detach. One of the steps is to disable interrupts on these
      queues. It currently is broken in a way that txq irq is turned off
      *after* HW flush which in turn takes no effect.
      
      ice_qp_dis():
      -> ice_qvec_dis_irq()
      --> disable rxq irq
      --> flush hw
      -> ice_vsi_stop_tx_ring()
      -->disable txq irq
      
      Below splat can be triggered by following steps:
      - start xdpsock WITHOUT loading xdp prog
      - run xdp_rxq_info with XDP_TX action on this interface
      - start traffic
      - terminate xdpsock
      
      [  256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018
      [  256.319560] #PF: supervisor read access in kernel mode
      [  256.324775] #PF: error_code(0x0000) - not-present page
      [  256.329994] PGD 0 P4D 0
      [  256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G           OE      6.2.0-rc5+ #51
      [  256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  256.355807] RIP: 0010:ice_clean_rx_irq_zc+0x9c/0x7d0 [ice]
      [  256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44
      [  256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206
      [  256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f
      [  256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80
      [  256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000
      [  256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000
      [  256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600
      [  256.421990] FS:  0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000
      [  256.430207] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0
      [  256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  256.457770] PKRU: 55555554
      [  256.460529] Call Trace:
      [  256.463015]  <TASK>
      [  256.465157]  ? ice_xmit_zc+0x6e/0x150 [ice]
      [  256.469437]  ice_napi_poll+0x46d/0x680 [ice]
      [  256.473815]  ? _raw_spin_unlock_irqrestore+0x1b/0x40
      [  256.478863]  __napi_poll+0x29/0x160
      [  256.482409]  net_rx_action+0x136/0x260
      [  256.486222]  __do_softirq+0xe8/0x2e5
      [  256.489853]  ? smpboot_thread_fn+0x2c/0x270
      [  256.494108]  run_ksoftirqd+0x2a/0x50
      [  256.497747]  smpboot_thread_fn+0x1c1/0x270
      [  256.501907]  ? __pfx_smpboot_thread_fn+0x10/0x10
      [  256.506594]  kthread+0xea/0x120
      [  256.509785]  ? __pfx_kthread+0x10/0x10
      [  256.513597]  ret_from_fork+0x29/0x50
      [  256.517238]  </TASK>
      
      In fact, irqs were not disabled and napi managed to be scheduled and run
      while xsk_pool pointer was still valid, but SW ring of xdp_buff pointers
      was already freed.
      
      To fix this, call ice_qvec_dis_irq() after ice_vsi_stop_tx_ring(). Also
      while at it, remove redundant ice_clean_rx_ring() call - this is handled
      in ice_qp_clean_rings().
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b830c964
    • David S. Miller's avatar
      Merge branch 'net-virtio-vsock' · 1a200b51
      David S. Miller authored
      1a200b51
    • Arseniy Krasnov's avatar
      test/vsock: copy to user failure test · 7e699d2a
      Arseniy Krasnov authored
      This adds SOCK_STREAM and SOCK_SEQPACKET tests for invalid buffer case.
      It tries to read data to NULL buffer (data already presents in socket's
      queue), then uses valid buffer. For SOCK_STREAM second read must return
      data, because skbuff is not dropped, but for SOCK_SEQPACKET skbuff will
      be dropped by kernel, and 'recv()' will return EAGAIN.
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e699d2a
    • Arseniy Krasnov's avatar
      virtio/vsock: don't drop skbuff on copy failure · 8daaf39f
      Arseniy Krasnov authored
      This returns behaviour of SOCK_STREAM read as before skbuff usage. When
      copying to user fails current skbuff won't be dropped, but returned to
      sockets's queue. Technically instead of 'skb_dequeue()', 'skb_peek()' is
      called and when skbuff becomes empty, it is removed from queue by
      '__skb_unlink()'.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarBobby Eshleman <bobby.eshleman@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8daaf39f
    • Arseniy Krasnov's avatar
      virtio/vsock: remove redundant 'skb_pull()' call · 6825e6b4
      Arseniy Krasnov authored
      Since we now no longer use 'skb->len' to update credit, there is no sense
      to update skbuff state, because it is used only once after dequeue to
      copy data and then will be released.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarBobby Eshleman <bobby.eshleman@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6825e6b4
    • Arseniy Krasnov's avatar
      virtio/vsock: don't use skbuff state to account credit · 07770616
      Arseniy Krasnov authored
      'skb->len' can vary when we partially read the data, this complicates the
      calculation of credit to be updated in 'virtio_transport_inc_rx_pkt()/
      virtio_transport_dec_rx_pkt()'.
      
      Also in 'virtio_transport_dec_rx_pkt()' we were miscalculating the
      credit since 'skb->len' was redundant.
      
      For these reasons, let's replace the use of skbuff state to calculate new
      'rx_bytes'/'fwd_cnt' values with explicit value as input argument. This
      makes code more simple, because it is not needed to change skbuff state
      before each call to update 'rx_bytes'/'fwd_cnt'.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarBobby Eshleman <bobby.eshleman@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07770616
    • Vladimir Oltean's avatar
      net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol() · cd356010
      Vladimir Oltean authored
      Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
      acquire phydev->lock, but the mscc phy driver implementations,
      vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
      resulting in a deadlock.
      
      $ ip link set swp3 down
      ============================================
      WARNING: possible recursive locking detected
      mscc_felix 0000:00:00.5 swp3: Link is Down
      --------------------------------------------
      ip/375 is trying to acquire lock:
      ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4
      
      but task is already holding lock:
      ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&dev->lock);
        lock(&dev->lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by ip/375:
       #0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c
       #1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
      
      Call trace:
       __mutex_lock+0x98/0x454
       mutex_lock_nested+0x2c/0x38
       vsc85xx_wol_get+0x2c/0xf4
       phy_ethtool_get_wol+0x50/0x6c
       phy_suspend+0x84/0xcc
       phy_state_machine+0x1b8/0x27c
       phy_stop+0x70/0x154
       phylink_stop+0x34/0xc0
       dsa_port_disable_rt+0x2c/0xa4
       dsa_slave_close+0x38/0xec
       __dev_close_many+0xc8/0x16c
       __dev_change_flags+0xdc/0x218
       dev_change_flags+0x24/0x6c
       do_setlink+0x234/0xea4
       __rtnl_newlink+0x46c/0x878
       rtnl_newlink+0x50/0x7c
       rtnetlink_rcv_msg+0x16c/0x58c
      
      Removing the mutex_lock(&phydev->lock) calls from the driver restores
      the functionality.
      
      Fixes: 2f987d48 ("net: phy: Add locks to ethtool functions")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230314153025.2372970-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd356010
    • Shawn Bohrer's avatar
      veth: Fix use after free in XDP_REDIRECT · 7c101318
      Shawn Bohrer authored
      Commit 718a18a0 ("veth: Rework veth_xdp_rcv_skb in order
      to accept non-linear skb") introduced a bug where it tried to
      use pskb_expand_head() if the headroom was less than
      XDP_PACKET_HEADROOM.  This however uses kmalloc to expand the head,
      which will later allow consume_skb() to free the skb while is it still
      in use by AF_XDP.
      
      Previously if the headroom was less than XDP_PACKET_HEADROOM we
      continued on to allocate a new skb from pages so this restores that
      behavior.
      
      BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
      Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
      
      CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
      Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
      Call Trace:
        <TASK>
        dump_stack_lvl+0x34/0x48
        print_report+0x170/0x473
        ? __xsk_rcv+0x18d/0x2c0
        kasan_report+0xad/0x130
        ? __xsk_rcv+0x18d/0x2c0
        kasan_check_range+0x149/0x1a0
        memcpy+0x20/0x60
        __xsk_rcv+0x18d/0x2c0
        __xsk_map_redirect+0x1f3/0x490
        ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
        xdp_do_redirect+0x5ca/0xd60
        veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
        ? __netif_receive_skb_list_core+0x671/0x920
        ? veth_xdp+0x670/0x670 [veth]
        veth_xdp_rcv+0x304/0xa20 [veth]
        ? do_xdp_generic+0x150/0x150
        ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
        ? _raw_spin_lock_bh+0xe0/0xe0
        ? newidle_balance+0x887/0xe30
        ? __perf_event_task_sched_in+0xdb/0x800
        veth_poll+0x139/0x571 [veth]
        ? veth_xdp_rcv+0xa20/0xa20 [veth]
        ? _raw_spin_unlock+0x39/0x70
        ? finish_task_switch.isra.0+0x17e/0x7d0
        ? __switch_to+0x5cf/0x1070
        ? __schedule+0x95b/0x2640
        ? io_schedule_timeout+0x160/0x160
        __napi_poll+0xa1/0x440
        napi_threaded_poll+0x3d1/0x460
        ? __napi_poll+0x440/0x440
        ? __kthread_parkme+0xc6/0x1f0
        ? __napi_poll+0x440/0x440
        kthread+0x2a2/0x340
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x22/0x30
        </TASK>
      
      Freed by task 148640:
        kasan_save_stack+0x23/0x50
        kasan_set_track+0x21/0x30
        kasan_save_free_info+0x2a/0x40
        ____kasan_slab_free+0x169/0x1d0
        slab_free_freelist_hook+0xd2/0x190
        __kmem_cache_free+0x1a1/0x2f0
        skb_release_data+0x449/0x600
        consume_skb+0x9f/0x1c0
        veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
        veth_xdp_rcv+0x304/0xa20 [veth]
        veth_poll+0x139/0x571 [veth]
        __napi_poll+0xa1/0x440
        napi_threaded_poll+0x3d1/0x460
        kthread+0x2a2/0x340
        ret_from_fork+0x22/0x30
      
      The buggy address belongs to the object at ffff888976250000
        which belongs to the cache kmalloc-2k of size 2048
      The buggy address is located 340 bytes inside of
        2048-byte region [ffff888976250000, ffff888976250800)
      
      The buggy address belongs to the physical page:
      page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
      head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
      flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
      raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
      raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
        ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
        ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 718a18a0 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb")
      Signed-off-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: default avatarToshiaki Makita <toshiaki.makita1@gmail.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@kernel.org>
      Link: https://lore.kernel.org/r/20230314153351.2201328-1-sbohrer@cloudflare.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7c101318
  3. 15 Mar, 2023 10 commits