1. 17 Mar, 2023 28 commits
  2. 16 Mar, 2023 12 commits
    • David S. Miller's avatar
      Merge branch 'virtio_net-xdp-bugs' · 04504793
      David S. Miller authored
      Xuan Zhuo says:
      
      ====================
      virtio_net: fix two bugs related to XDP
      
      This patch set fixes two bugs related to XDP.
      These two patch is not associated.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04504793
    • Xuan Zhuo's avatar
      virtio_net: free xdp shinfo frags when build_skb_from_xdp_buff() fails · 1a3bd6ea
      Xuan Zhuo authored
      build_skb_from_xdp_buff() may return NULL, in this case
      we need to free the frags of xdp shinfo.
      
      Fixes: fab89baf ("virtio-net: support multi-buffer xdp")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a3bd6ea
    • Xuan Zhuo's avatar
      virtio_net: fix page_to_skb() miss headroom · fa0f1ba7
      Xuan Zhuo authored
      Because headroom is not passed to page_to_skb(), this causes the shinfo
      exceeds the range. Then the frags of shinfo are changed by other process.
      
      [  157.724634] stack segment: 0000 [#1] PREEMPT SMP NOPTI
      [  157.725358] CPU: 3 PID: 679 Comm: xdp_pass_user_f Tainted: G            E      6.2.0+ #150
      [  157.726401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/4
      [  157.727820] RIP: 0010:skb_release_data+0x11b/0x180
      [  157.728449] Code: 44 24 02 48 83 c3 01 39 d8 7e be 48 89 d8 48 c1 e0 04 41 80 7d 7e 00 49 8b 6c 04 30 79 0c 48 89 ef e8 89 b
      [  157.730751] RSP: 0018:ffffc90000178b48 EFLAGS: 00010202
      [  157.731383] RAX: 0000000000000010 RBX: 0000000000000001 RCX: 0000000000000000
      [  157.732270] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff888100dd0b00
      [  157.733117] RBP: 5d5d76010f6e2408 R08: ffff888100dd0b2c R09: 0000000000000000
      [  157.734013] R10: ffffffff82effd30 R11: 000000000000a14e R12: ffff88810981ffc0
      [  157.734904] R13: ffff888100dd0b00 R14: 0000000000000002 R15: 0000000000002310
      [  157.735793] FS:  00007f06121d9740(0000) GS:ffff88842fcc0000(0000) knlGS:0000000000000000
      [  157.736794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  157.737522] CR2: 00007ffd9a56c084 CR3: 0000000104bda001 CR4: 0000000000770ee0
      [  157.738420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  157.739283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  157.740146] PKRU: 55555554
      [  157.740502] Call Trace:
      [  157.740843]  <IRQ>
      [  157.741117]  kfree_skb_reason+0x50/0x120
      [  157.741613]  __udp4_lib_rcv+0x52b/0x5e0
      [  157.742132]  ip_protocol_deliver_rcu+0xaf/0x190
      [  157.742715]  ip_local_deliver_finish+0x77/0xa0
      [  157.743280]  ip_sublist_rcv_finish+0x80/0x90
      [  157.743834]  ip_list_rcv_finish.constprop.0+0x16f/0x190
      [  157.744493]  ip_list_rcv+0x126/0x140
      [  157.744952]  __netif_receive_skb_list_core+0x29b/0x2c0
      [  157.745602]  __netif_receive_skb_list+0xed/0x160
      [  157.746190]  ? udp4_gro_receive+0x275/0x350
      [  157.746732]  netif_receive_skb_list_internal+0xf2/0x1b0
      [  157.747398]  napi_gro_receive+0xd1/0x210
      [  157.747911]  virtnet_receive+0x75/0x1c0
      [  157.748422]  virtnet_poll+0x48/0x1b0
      [  157.748878]  __napi_poll+0x29/0x1b0
      [  157.749330]  net_rx_action+0x27a/0x340
      [  157.749812]  __do_softirq+0xf3/0x2fb
      [  157.750298]  do_softirq+0xa2/0xd0
      [  157.750745]  </IRQ>
      [  157.751563]  <TASK>
      [  157.752329]  __local_bh_enable_ip+0x6d/0x80
      [  157.753178]  virtnet_xdp_set+0x482/0x860
      [  157.754159]  ? __pfx_virtnet_xdp+0x10/0x10
      [  157.755129]  dev_xdp_install+0xa4/0xe0
      [  157.756033]  dev_xdp_attach+0x20b/0x5e0
      [  157.756933]  do_setlink+0x82e/0xc90
      [  157.757777]  ? __nla_validate_parse+0x12b/0x1e0
      [  157.758744]  rtnl_setlink+0xd8/0x170
      [  157.759549]  ? mod_objcg_state+0xcb/0x320
      [  157.760328]  ? security_capable+0x37/0x60
      [  157.761209]  ? security_capable+0x37/0x60
      [  157.762072]  rtnetlink_rcv_msg+0x145/0x3d0
      [  157.762929]  ? ___slab_alloc+0x327/0x610
      [  157.763754]  ? __alloc_skb+0x141/0x170
      [  157.764533]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
      [  157.765422]  netlink_rcv_skb+0x58/0x110
      [  157.766229]  netlink_unicast+0x21f/0x330
      [  157.766951]  netlink_sendmsg+0x240/0x4a0
      [  157.767654]  sock_sendmsg+0x93/0xa0
      [  157.768434]  ? sockfd_lookup_light+0x12/0x70
      [  157.769245]  __sys_sendto+0xfe/0x170
      [  157.770079]  ? handle_mm_fault+0xe9/0x2d0
      [  157.770859]  ? preempt_count_add+0x51/0xa0
      [  157.771645]  ? up_read+0x3c/0x80
      [  157.772340]  ? do_user_addr_fault+0x1e9/0x710
      [  157.773166]  ? kvm_read_and_reset_apf_flags+0x49/0x60
      [  157.774087]  __x64_sys_sendto+0x29/0x30
      [  157.774856]  do_syscall_64+0x3c/0x90
      [  157.775518]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [  157.776382] RIP: 0033:0x7f06122def70
      
      Fixes: 18117a84 ("virtio-net: remove xdp related info from page_to_skb()")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa0f1ba7
    • Rob Herring's avatar
      net: Use of_property_read_bool() for boolean properties · 1a87e641
      Rob Herring authored
      It is preferred to use typed property access functions (i.e.
      of_property_read_<type> functions) rather than low-level
      of_get_property/of_find_property functions for reading properties.
      Convert reading boolean properties to of_property_read_bool().
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
      Acked-by: default avatarKalle Valo <kvalo@kernel.org>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Acked-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Reviewed-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a87e641
    • David S. Miller's avatar
      Merge branch 'net-dsa-marvell-mtu-reporting' · 65d63e82
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix MTU reporting for Marvell DSA switches where we can't change it
      
      As explained in patch 2, the driver doesn't know how to change the MTU
      on MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290, and there
      is a regression where it actually reports an MTU value below the
      Ethernet standard (1500).
      
      Fixing that shows another issue where DSA is unprepared to be told that
      a switch supports an MTU of only 1500, and still errors out. That is
      addressed by patch 1.
      
      Testing was not done on "real" hardware, but on a different Marvell DSA
      switch, with code modified such that the driver doesn't know how to
      change the MTU on that, either.
      
      A key assumption is that these switches don't need any MTU configuration
      to pass full MTU-sized, DSA-tagged packets, which seems like a
      reasonable assumption to make. My 6390 and 6190 switches, with
      .port_set_jumbo_size commented out, certainly don't seem to have any
      problem passing MTU-sized traffic, as can be seen in this iperf3 session
      captured with tcpdump on the DSA master:
      
      $MAC > $MAC, Marvell DSA mode Forward, dev 2, port 8, untagged, VID 1000,
      	FPri 0, ethertype IPv4 (0x0800), length 1518:
      	10.0.0.69.49590 > 10.0.0.1.5201: Flags [.], seq 81088:82536,
      	ack 1, win 502, options [nop,nop,TS val 2221498829 ecr 3012859850],
      	length 1448
      
      I don't want to go all the way and say that the adjustment made by
      commit b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when
      setting MTU for DSA and CPU ports") is completely unnecessary, just that
      there's an equally good chance that the switches with unknown MTU
      configuration procedure "just work".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d63e82
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290 · 7e951737
      Vladimir Oltean authored
      There are 3 classes of switch families that the driver is aware of, as
      far as mv88e6xxx_change_mtu() is concerned:
      
      - MTU configuration is available per port. Here, the
        chip->info->ops->port_set_jumbo_size() method will be present.
      
      - MTU configuration is global to the switch. Here, the
        chip->info->ops->set_max_frame_size() method will be present.
      
      - We don't know how to change the MTU. Here, none of the above methods
        will be present.
      
      Switch families MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290
      fall in category 3.
      
      The blamed commit has adjusted the MTU for all 3 categories by EDSA_HLEN
      (8 bytes), resulting in a new maximum MTU of 1492 being reported by the
      driver for these switches.
      
      I don't have the hardware to test, but I do have a MV88E6390 switch on
      which I can simulate this by commenting out its .port_set_jumbo_size
      definition from mv88e6390_ops. The result is this set of messages at
      probe time:
      
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 1
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 2
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 3
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 4
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 5
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 6
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 7
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 8
      
      It is highly implausible that there exist Ethernet switches which don't
      support the standard MTU of 1500 octets, and this is what the DSA
      framework says as well - the error comes from dsa_slave_create() ->
      dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN).
      
      But the error messages are alarming, and it would be good to suppress
      them.
      
      As a consequence of this unlikeliness, we reimplement mv88e6xxx_get_max_mtu()
      and mv88e6xxx_change_mtu() on switches from the 3rd category as follows:
      the maximum supported MTU is 1500, and any request to set the MTU to a
      value larger than that fails in dev_validate_mtu().
      
      Fixes: b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e951737
    • Vladimir Oltean's avatar
      net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu() · 636e8adf
      Vladimir Oltean authored
      Currently, when dsa_slave_change_mtu() is called on a user port where
      dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code
      will stumble upon this check:
      
      	if (new_master_mtu > mtu_limit)
      		return -ERANGE;
      
      because new_master_mtu is adjusted for the tagger overhead but mtu_limit
      is not.
      
      But it would be good if the logic went through, for example if the DSA
      master really depends on an MTU adjustment to accept DSA-tagged frames.
      
      To make the code pass through the check, we need to adjust mtu_limit for
      the overhead as well, if the minimum restriction was caused by the DSA
      user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU
      are always offset by the protocol overhead.
      
      Currently no drivers return 1500 .port_max_mtu(), but this is only
      temporary and a bug in itself - mv88e6xxx should have done that, but
      since commit b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when
      setting MTU for DSA and CPU ports") it no longer does. This is a
      preparation for fixing that.
      
      Fixes: bfcb8132 ("net: dsa: configure the MTU for switch ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      636e8adf
    • Maciej Fijalkowski's avatar
      ice: xsk: disable txq irq before flushing hw · b830c964
      Maciej Fijalkowski authored
      ice_qp_dis() intends to stop a given queue pair that is a target of xsk
      pool attach/detach. One of the steps is to disable interrupts on these
      queues. It currently is broken in a way that txq irq is turned off
      *after* HW flush which in turn takes no effect.
      
      ice_qp_dis():
      -> ice_qvec_dis_irq()
      --> disable rxq irq
      --> flush hw
      -> ice_vsi_stop_tx_ring()
      -->disable txq irq
      
      Below splat can be triggered by following steps:
      - start xdpsock WITHOUT loading xdp prog
      - run xdp_rxq_info with XDP_TX action on this interface
      - start traffic
      - terminate xdpsock
      
      [  256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018
      [  256.319560] #PF: supervisor read access in kernel mode
      [  256.324775] #PF: error_code(0x0000) - not-present page
      [  256.329994] PGD 0 P4D 0
      [  256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G           OE      6.2.0-rc5+ #51
      [  256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  256.355807] RIP: 0010:ice_clean_rx_irq_zc+0x9c/0x7d0 [ice]
      [  256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44
      [  256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206
      [  256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f
      [  256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80
      [  256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000
      [  256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000
      [  256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600
      [  256.421990] FS:  0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000
      [  256.430207] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0
      [  256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  256.457770] PKRU: 55555554
      [  256.460529] Call Trace:
      [  256.463015]  <TASK>
      [  256.465157]  ? ice_xmit_zc+0x6e/0x150 [ice]
      [  256.469437]  ice_napi_poll+0x46d/0x680 [ice]
      [  256.473815]  ? _raw_spin_unlock_irqrestore+0x1b/0x40
      [  256.478863]  __napi_poll+0x29/0x160
      [  256.482409]  net_rx_action+0x136/0x260
      [  256.486222]  __do_softirq+0xe8/0x2e5
      [  256.489853]  ? smpboot_thread_fn+0x2c/0x270
      [  256.494108]  run_ksoftirqd+0x2a/0x50
      [  256.497747]  smpboot_thread_fn+0x1c1/0x270
      [  256.501907]  ? __pfx_smpboot_thread_fn+0x10/0x10
      [  256.506594]  kthread+0xea/0x120
      [  256.509785]  ? __pfx_kthread+0x10/0x10
      [  256.513597]  ret_from_fork+0x29/0x50
      [  256.517238]  </TASK>
      
      In fact, irqs were not disabled and napi managed to be scheduled and run
      while xsk_pool pointer was still valid, but SW ring of xdp_buff pointers
      was already freed.
      
      To fix this, call ice_qvec_dis_irq() after ice_vsi_stop_tx_ring(). Also
      while at it, remove redundant ice_clean_rx_ring() call - this is handled
      in ice_qp_clean_rings().
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b830c964
    • David S. Miller's avatar
      Merge branch 'net-virtio-vsock' · 1a200b51
      David S. Miller authored
      1a200b51
    • Arseniy Krasnov's avatar
      test/vsock: copy to user failure test · 7e699d2a
      Arseniy Krasnov authored
      This adds SOCK_STREAM and SOCK_SEQPACKET tests for invalid buffer case.
      It tries to read data to NULL buffer (data already presents in socket's
      queue), then uses valid buffer. For SOCK_STREAM second read must return
      data, because skbuff is not dropped, but for SOCK_SEQPACKET skbuff will
      be dropped by kernel, and 'recv()' will return EAGAIN.
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e699d2a
    • Arseniy Krasnov's avatar
      virtio/vsock: don't drop skbuff on copy failure · 8daaf39f
      Arseniy Krasnov authored
      This returns behaviour of SOCK_STREAM read as before skbuff usage. When
      copying to user fails current skbuff won't be dropped, but returned to
      sockets's queue. Technically instead of 'skb_dequeue()', 'skb_peek()' is
      called and when skbuff becomes empty, it is removed from queue by
      '__skb_unlink()'.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarBobby Eshleman <bobby.eshleman@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8daaf39f
    • Arseniy Krasnov's avatar
      virtio/vsock: remove redundant 'skb_pull()' call · 6825e6b4
      Arseniy Krasnov authored
      Since we now no longer use 'skb->len' to update credit, there is no sense
      to update skbuff state, because it is used only once after dequeue to
      copy data and then will be released.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarBobby Eshleman <bobby.eshleman@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6825e6b4