1. 27 Jun, 2022 5 commits
    • Florian Westphal's avatar
      netfilter: br_netfilter: do not skip all hooks with 0 priority · c2577862
      Florian Westphal authored
      When br_netfilter module is loaded, skbs may be diverted to the
      ipv4/ipv6 hooks, just like as if we were routing.
      
      Unfortunately, bridge filter hooks with priority 0 may be skipped
      in this case.
      
      Example:
      1. an nftables bridge ruleset is loaded, with a prerouting
         hook that has priority 0.
      2. interface is added to the bridge.
      3. no tcp packet is ever seen by the bridge prerouting hook.
      4. flush the ruleset
      5. load the bridge ruleset again.
      6. tcp packets are processed as expected.
      
      After 1) the only registered hook is the bridge prerouting hook, but its
      not called yet because the bridge hasn't been brought up yet.
      
      After 2), hook order is:
         0 br_nf_pre_routing // br_netfilter internal hook
         0 chain bridge f prerouting // nftables bridge ruleset
      
      The packet is diverted to br_nf_pre_routing.
      If call-iptables is off, the nftables bridge ruleset is called as expected.
      
      But if its enabled, br_nf_hook_thresh() will skip it because it assumes
      that all 0-priority hooks had been called previously in bridge context.
      
      To avoid this, check for the br_nf_pre_routing hook itself, we need to
      resume directly after it, even if this hook has a priority of 0.
      
      Unfortunately, this still results in different packet flow.
      With this fix, the eval order after in 3) is:
      1. br_nf_pre_routing
      2. ip(6)tables (if enabled)
      3. nftables bridge
      
      but after 5 its the much saner:
      1. nftables bridge
      2. br_nf_pre_routing
      3. ip(6)tables (if enabled)
      
      Unfortunately I don't see a solution here:
      It would be possible to move br_nf_pre_routing to a higher priority
      so that it will be called later in the pipeline, but this also impacts
      ebtables evaluation order, and would still result in this very ordering
      problem for all nftables-bridge hooks with the same priority as the
      br_nf_pre_routing one.
      
      Searching back through the git history I don't think this has
      ever behaved in any other way, hence, no fixes-tag.
      Reported-by: default avatarRadim Hrazdil <rhrazdil@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c2577862
    • Florian Westphal's avatar
      netfilter: nf_tables: avoid skb access on nf_stolen · e34b9ed9
      Florian Westphal authored
      When verdict is NF_STOLEN, the skb might have been freed.
      
      When tracing is enabled, this can result in a use-after-free:
      1. access to skb->nf_trace
      2. access to skb->mark
      3. computation of trace id
      4. dump of packet payload
      
      To avoid 1, keep a cached copy of skb->nf_trace in the
      trace state struct.
      Refresh this copy whenever verdict is != STOLEN.
      
      Avoid 2 by skipping skb->mark access if verdict is STOLEN.
      
      3 is avoided by precomputing the trace id.
      
      Only dump the packet when verdict is not "STOLEN".
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e34b9ed9
    • Pablo Neira Ayuso's avatar
      netfilter: nft_dynset: restore set element counter when failing to update · 05907f10
      Pablo Neira Ayuso authored
      This patch fixes a race condition.
      
      nft_rhash_update() might fail for two reasons:
      
      - Element already exists in the hashtable.
      - Another packet won race to insert an entry in the hashtable.
      
      In both cases, new() has already bumped the counter via atomic_add_unless(),
      therefore, decrement the set element counter.
      
      Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      05907f10
    • Xin Long's avatar
      tipc: move bc link creation back to tipc_node_create · cb8092d7
      Xin Long authored
      Shuang Li reported a NULL pointer dereference crash:
      
        [] BUG: kernel NULL pointer dereference, address: 0000000000000068
        [] RIP: 0010:tipc_link_is_up+0x5/0x10 [tipc]
        [] Call Trace:
        []  <IRQ>
        []  tipc_bcast_rcv+0xa2/0x190 [tipc]
        []  tipc_node_bc_rcv+0x8b/0x200 [tipc]
        []  tipc_rcv+0x3af/0x5b0 [tipc]
        []  tipc_udp_recv+0xc7/0x1e0 [tipc]
      
      It was caused by the 'l' passed into tipc_bcast_rcv() is NULL. When it
      creates a node in tipc_node_check_dest(), after inserting the new node
      into hashtable in tipc_node_create(), it creates the bc link. However,
      there is a gap between this insert and bc link creation, a bc packet
      may come in and get the node from the hashtable then try to dereference
      its bc link, which is NULL.
      
      This patch is to fix it by moving the bc link creation before inserting
      into the hashtable.
      
      Note that for a preliminary node becoming "real", the bc link creation
      should also be called before it's rehashed, as we don't create it for
      preliminary nodes.
      
      Fixes: 4cbf8ac2 ("tipc: enable creating a "preliminary" node")
      Reported-by: default avatarShuang Li <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb8092d7
    • Eric Dumazet's avatar
      tunnels: do not assume mac header is set in skb_tunnel_check_pmtu() · 853a7614
      Eric Dumazet authored
      Recently added debug in commit f9aefd6b ("net: warn if mac header
      was not set") caught a bug in skb_tunnel_check_pmtu(), as shown
      in this syzbot report [1].
      
      In ndo_start_xmit() paths, there is really no need to use skb->mac_header,
      because skb->data is supposed to point at it.
      
      [1] WARNING: CPU: 1 PID: 8604 at include/linux/skbuff.h:2784 skb_mac_header_len include/linux/skbuff.h:2784 [inline]
      WARNING: CPU: 1 PID: 8604 at include/linux/skbuff.h:2784 skb_tunnel_check_pmtu+0x5de/0x2f90 net/ipv4/ip_tunnel_core.c:413
      Modules linked in:
      CPU: 1 PID: 8604 Comm: syz-executor.3 Not tainted 5.19.0-rc2-syzkaller-00443-g8720bd95 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:skb_mac_header_len include/linux/skbuff.h:2784 [inline]
      RIP: 0010:skb_tunnel_check_pmtu+0x5de/0x2f90 net/ipv4/ip_tunnel_core.c:413
      Code: 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 80 3c 02 00 0f 84 b9 fe ff ff 4c 89 ff e8 7c 0f d7 f9 e9 ac fe ff ff e8 c2 13 8a f9 <0f> 0b e9 28 fc ff ff e8 b6 13 8a f9 48 8b 54 24 70 48 b8 00 00 00
      RSP: 0018:ffffc90002e4f520 EFLAGS: 00010212
      RAX: 0000000000000324 RBX: ffff88804d5fd500 RCX: ffffc90005b52000
      RDX: 0000000000040000 RSI: ffffffff87f05e3e RDI: 0000000000000003
      RBP: ffffc90002e4f650 R08: 0000000000000003 R09: 000000000000ffff
      R10: 000000000000ffff R11: 0000000000000000 R12: 000000000000ffff
      R13: 0000000000000000 R14: 000000000000ffcd R15: 000000000000001f
      FS: 00007f3babba9700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000080 CR3: 0000000075319000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      geneve_xmit_skb drivers/net/geneve.c:927 [inline]
      geneve_xmit+0xcf8/0x35d0 drivers/net/geneve.c:1107
      __netdev_start_xmit include/linux/netdevice.h:4805 [inline]
      netdev_start_xmit include/linux/netdevice.h:4819 [inline]
      __dev_direct_xmit+0x500/0x730 net/core/dev.c:4309
      dev_direct_xmit include/linux/netdevice.h:3007 [inline]
      packet_direct_xmit+0x1b8/0x2c0 net/packet/af_packet.c:282
      packet_snd net/packet/af_packet.c:3073 [inline]
      packet_sendmsg+0x21f4/0x55d0 net/packet/af_packet.c:3104
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2489
      ___sys_sendmsg+0xf3/0x170 net/socket.c:2543
      __sys_sendmsg net/socket.c:2572 [inline]
      __do_sys_sendmsg net/socket.c:2581 [inline]
      __se_sys_sendmsg net/socket.c:2579 [inline]
      __x64_sys_sendmsg+0x132/0x220 net/socket.c:2579
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f3baaa89109
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f3babba9168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f3baab9bf60 RCX: 00007f3baaa89109
      RDX: 0000000000000000 RSI: 0000000020000a00 RDI: 0000000000000003
      RBP: 00007f3baaae305d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffe74f2543f R14: 00007f3babba9300 R15: 0000000000022000
      </TASK>
      
      Fixes: 4cb47a86 ("tunnels: PMTU discovery support for directly bridged IP packets")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      853a7614
  2. 24 Jun, 2022 12 commits
  3. 23 Jun, 2022 12 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 399bd66e
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - netfilter: cttimeout: fix slab-out-of-bounds read in
           cttimeout_net_exit
      
      Current release - new code bugs:
      
         - bpf: ftrace: keep address offset in ftrace_lookup_symbols
      
         - bpf: force cookies array to follow symbols sorting
      
        Previous releases - regressions:
      
         - ipv4: ping: fix bind address validity check
      
         - tipc: fix use-after-free read in tipc_named_reinit
      
         - eth: veth: add updating of trans_start
      
        Previous releases - always broken:
      
         - sock: redo the psock vs ULP protection check
      
         - netfilter: nf_dup_netdev: fix skb_under_panic
      
         - bpf: fix request_sock leak in sk lookup helpers
      
         - eth: igb: fix a use-after-free issue in igb_clean_tx_ring
      
         - eth: ice: prohibit improper channel config for DCB
      
         - eth: at803x: fix null pointer dereference on AR9331 phy
      
         - eth: virtio_net: fix xdp_rxq_info bug after suspend/resume
      
        Misc:
      
         - eth: hinic: replace memcpy() with direct assignment"
      
      * tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
        net: openvswitch: fix parsing of nw_proto for IPv6 fragments
        sock: redo the psock vs ULP protection check
        Revert "net/tls: fix tls_sk_proto_close executed repeatedly"
        virtio_net: fix xdp_rxq_info bug after suspend/resume
        igb: Make DMA faster when CPU is active on the PCIe link
        net: dsa: qca8k: reduce mgmt ethernet timeout
        net: dsa: qca8k: reset cpu port on MTU change
        MAINTAINERS: Add a maintainer for OCP Time Card
        hinic: Replace memcpy() with direct assignment
        Revert "drivers/net/ethernet/neterion/vxge: Fix a use-after-free bug in vxge-main.c"
        net: phy: smsc: Disable Energy Detect Power-Down in interrupt mode
        ice: ethtool: Prohibit improper channel config for DCB
        ice: ethtool: advertise 1000M speeds properly
        ice: Fix switchdev rules book keeping
        ice: ignore protocol field in GTP offload
        netfilter: nf_dup_netdev: add and use recursion counter
        netfilter: nf_dup_netdev: do not push mac header a second time
        selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh
        net/tls: fix tls_sk_proto_close executed repeatedly
        erspan: do not assume transport header is always set
        ...
      399bd66e
    • Linus Torvalds's avatar
      Merge tag 'mmc-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · f410c3e0
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
      
       - mtk-sd: Fix dma hang issues
      
       - sdhci-pci-o2micro: Fix card detect by dealing with debouncing
      
      * tag 'mmc-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: mediatek: wait dma stop bit reset to 0
        mmc: sdhci-pci-o2micro: Fix card detect by dealing with debouncing
      f410c3e0
    • Linus Torvalds's avatar
      Merge tag 'sound-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · ddfe8031
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "All small changes, mostly device-specific:
      
         - A regression fix for PCM WC-page allocation on x86
      
         - A regression fix for i915 audio component binding
      
         - Fixes for (longstanding) beep handling bug
      
         - Runtime PM fixes for Intel LPE HDMI audio
      
         - A couple of pending FireWire fixes
      
         - Usual HD-audio and USB-audio quirks, new Intel dspconf entries"
      
      * tag 'sound-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek: Add quirk for Clevo NS50PU
        ALSA: hda: Fix discovery of i915 graphics PCI device
        ALSA: hda/via: Fix missing beep setup
        ALSA: hda/conexant: Fix missing beep setup
        ALSA: memalloc: Drop x86-specific hack for WC allocations
        ALSA: hda/realtek: Add quirk for Clevo PD70PNT
        ALSA: x86: intel_hdmi_audio: use pm_runtime_resume_and_get()
        ALSA: x86: intel_hdmi_audio: enable pm_runtime and set autosuspend delay
        ALSA: hda: intel-nhlt: remove use of __func__ in dev_dbg
        ALSA: hda: intel-dspcfg: use SOF for UpExtreme and UpExtreme11 boards
        firewire: convert sysfs sprintf/snprintf family to sysfs_emit
        firewire: cdev: fix potential leak of kernel stack due to uninitialized value
        ALSA: hda/realtek: Apply fixup for Lenovo Yoga Duet 7 properly
        ALSA: hda/realtek - ALC897 headset MIC no sound
        ALSA: usb-audio: US16x08: Move overflow check before array access
        ALSA: hda/realtek: Add mute LED quirk for HP Omen laptop
      ddfe8031
    • Rosemarie O'Riorden's avatar
      net: openvswitch: fix parsing of nw_proto for IPv6 fragments · 12378a5a
      Rosemarie O'Riorden authored
      When a packet enters the OVS datapath and does not match any existing
      flows installed in the kernel flow cache, the packet will be sent to
      userspace to be parsed, and a new flow will be created. The kernel and
      OVS rely on each other to parse packet fields in the same way so that
      packets will be handled properly.
      
      As per the design document linked below, OVS expects all later IPv6
      fragments to have nw_proto=44 in the flow key, so they can be correctly
      matched on OpenFlow rules. OpenFlow controllers create pipelines based
      on this design.
      
      This behavior was changed by the commit in the Fixes tag so that
      nw_proto equals the next_header field of the last extension header.
      However, there is no counterpart for this change in OVS userspace,
      meaning that this field is parsed differently between OVS and the
      kernel. This is a problem because OVS creates actions based on what is
      parsed in userspace, but the kernel-provided flow key is used as a match
      criteria, as described in Documentation/networking/openvswitch.rst. This
      leads to issues such as packets incorrectly matching on a flow and thus
      the wrong list of actions being applied to the packet. Such changes in
      packet parsing cannot be implemented without breaking the userspace.
      
      The offending commit is partially reverted to restore the expected
      behavior.
      
      The change technically made sense and there is a good reason that it was
      implemented, but it does not comply with the original design of OVS.
      If in the future someone wants to implement such a change, then it must
      be user-configurable and disabled by default to preserve backwards
      compatibility with existing OVS versions.
      
      Cc: stable@vger.kernel.org
      Fixes: fa642f08 ("openvswitch: Derive IP protocol number for IPv6 later frags")
      Link: https://docs.openvswitch.org/en/latest/topics/design/#fragmentsSigned-off-by: default avatarRosemarie O'Riorden <roriorden@redhat.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Link: https://lore.kernel.org/r/20220621204845.9721-1-roriorden@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      12378a5a
    • Jakub Kicinski's avatar
      sock: redo the psock vs ULP protection check · e34a07c0
      Jakub Kicinski authored
      Commit 8a59f9d1 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
      has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to
      the new tcp_bpf_update_proto() function. I'm guessing that this
      was done to allow creating psocks for non-inet sockets.
      
      Unfortunately the destruction path for psock includes the ULP
      unwind, so we need to fail the sk_psock_init() itself.
      Otherwise if ULP is already present we'll notice that later,
      and call tcp_update_ulp() with the sk_proto of the ULP
      itself, which will most likely result in the ULP looping
      its callbacks.
      
      Fixes: 8a59f9d1 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Tested-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/r/20220620191353.1184629-2-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e34a07c0
    • Jakub Kicinski's avatar
      Revert "net/tls: fix tls_sk_proto_close executed repeatedly" · 1b205d94
      Jakub Kicinski authored
      This reverts commit 69135c57.
      
      This commit was just papering over the issue, ULP should not
      get ->update() called with its own sk_prot. Each ULP would
      need to add this check.
      
      Fixes: 69135c57 ("net/tls: fix tls_sk_proto_close executed repeatedly")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20220620191353.1184629-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1b205d94
    • Stephan Gerhold's avatar
      virtio_net: fix xdp_rxq_info bug after suspend/resume · 8af52fe9
      Stephan Gerhold authored
      The following sequence currently causes a driver bug warning
      when using virtio_net:
      
        # ip link set eth0 up
        # echo mem > /sys/power/state (or e.g. # rtcwake -s 10 -m mem)
        <resume>
        # ip link set eth0 down
      
        Missing register, driver bug
        WARNING: CPU: 0 PID: 375 at net/core/xdp.c:138 xdp_rxq_info_unreg+0x58/0x60
        Call trace:
         xdp_rxq_info_unreg+0x58/0x60
         virtnet_close+0x58/0xac
         __dev_close_many+0xac/0x140
         __dev_change_flags+0xd8/0x210
         dev_change_flags+0x24/0x64
         do_setlink+0x230/0xdd0
         ...
      
      This happens because virtnet_freeze() frees the receive_queue
      completely (including struct xdp_rxq_info) but does not call
      xdp_rxq_info_unreg(). Similarly, virtnet_restore() sets up the
      receive_queue again but does not call xdp_rxq_info_reg().
      
      Actually, parts of virtnet_freeze_down() and virtnet_restore_up()
      are almost identical to virtnet_close() and virtnet_open(): only
      the calls to xdp_rxq_info_(un)reg() are missing. This means that
      we can fix this easily and avoid such problems in the future by
      just calling virtnet_close()/open() from the freeze/restore handlers.
      
      Aside from adding the missing xdp_rxq_info calls the only difference
      is that the refill work is only cancelled if netif_running(). However,
      this should not make any functional difference since the refill work
      should only be active if the network interface is actually up.
      
      Fixes: 754b8a21 ("virtio_net: setup xdp_rxq_info")
      Signed-off-by: default avatarStephan Gerhold <stephan.gerhold@kernkonzept.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20220621114845.3650258-1-stephan.gerhold@kernkonzept.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8af52fe9
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 448ad88f
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-06-21
      
      This series contains updates to ice driver only.
      
      Marcin fixes GTP filters by allowing ignoring of the inner ethertype field.
      
      Wojciech adds VSI handle tracking in order to properly distinguish similar
      filters for removal.
      
      Anatolii removes ability to set 1000baseT and 1000baseX fields
      concurrently which caused link issues. He also disallows setting
      channels to less than the number of Traffic Classes which would cause
      NULL pointer dereference.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: ethtool: Prohibit improper channel config for DCB
        ice: ethtool: advertise 1000M speeds properly
        ice: Fix switchdev rules book keeping
        ice: ignore protocol field in GTP offload
      ====================
      
      Link: https://lore.kernel.org/r/20220621224756.631765-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      448ad88f
    • Kai-Heng Feng's avatar
      igb: Make DMA faster when CPU is active on the PCIe link · 4e0effd9
      Kai-Heng Feng authored
      Intel I210 on some Intel Alder Lake platforms can only achieve ~750Mbps
      Tx speed via iperf. The RR2DCDELAY shows around 0x2xxx DMA delay, which
      will be significantly lower when 1) ASPM is disabled or 2) SoC package
      c-state stays above PC3. When the RR2DCDELAY is around 0x1xxx the Tx
      speed can reach to ~950Mbps.
      
      According to the I210 datasheet "8.26.1 PCIe Misc. Register - PCIEMISC",
      "DMA Idle Indication" doesn't seem to tie to DMA coalesce anymore, so
      set it to 1b for "DMA is considered idle when there is no Rx or Tx AND
      when there are no TLPs indicating that CPU is active detected on the
      PCIe link (such as the host executes CSR or Configuration register read
      or write operation)" and performing Tx should also fall under "active
      CPU on PCIe link" case.
      
      In addition to that, commit b6e0c419 ("igb: Move DMA Coalescing init
      code to separate function.") seems to wrongly changed from enabling
      E1000_PCIEMISC_LX_DECISION to disabling it, also fix that.
      
      Fixes: b6e0c419 ("igb: Move DMA Coalescing init code to separate function.")
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20220621221056.604304-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4e0effd9
    • Christian Marangi's avatar
      net: dsa: qca8k: reduce mgmt ethernet timeout · 85467f7d
      Christian Marangi authored
      The current mgmt ethernet timeout is set to 100ms. This value is too
      big and would slow down any mdio command in case the mgmt ethernet
      packet have some problems on the receiving part.
      Reduce it to just 5ms to handle case when some operation are done on the
      master port that would cause the mgmt ethernet to not work temporarily.
      
      Fixes: 5950c7c0 ("net: dsa: qca8k: add support for mgmt read/write in Ethernet packet")
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Link: https://lore.kernel.org/r/20220621151633.11741-1-ansuelsmth@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      85467f7d
    • Christian Marangi's avatar
      net: dsa: qca8k: reset cpu port on MTU change · 386228c6
      Christian Marangi authored
      It was discovered that the Documentation lacks of a fundamental detail
      on how to correctly change the MAX_FRAME_SIZE of the switch.
      
      In fact if the MAX_FRAME_SIZE is changed while the cpu port is on, the
      switch panics and cease to send any packet. This cause the mgmt ethernet
      system to not receive any packet (the slow fallback still works) and
      makes the device not reachable. To recover from this a switch reset is
      required.
      
      To correctly handle this, turn off the cpu ports before changing the
      MAX_FRAME_SIZE and turn on again after the value is applied.
      
      Fixes: f58d2598 ("net: dsa: qca8k: implement the port MTU callbacks")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Link: https://lore.kernel.org/r/20220621151122.10220-1-ansuelsmth@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      386228c6
    • Vadim Fedorenko's avatar
      MAINTAINERS: Add a maintainer for OCP Time Card · 13f28c2c
      Vadim Fedorenko authored
      I've been contributing and reviewing patches for ptp_ocp driver for
      some time and I'm taking care of it's github mirror. On Jakub's
      suggestion, I would like to step forward and become a maintainer for
      this driver. This patch adds a dedicated entry to MAINTAINERS.
      Signed-off-by: default avatarVadim Fedorenko <vadfed@fb.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Link: https://lore.kernel.org/r/20220621233131.21240-1-vfedorenko@novek.ruSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13f28c2c
  4. 22 Jun, 2022 7 commits
  5. 21 Jun, 2022 4 commits
    • Anatolii Gerasymenko's avatar
      ice: ethtool: Prohibit improper channel config for DCB · a632b2a4
      Anatolii Gerasymenko authored
      Do not allow setting less channels, than Traffic Classes there are
      via ethtool. There must be at least one channel per Traffic Class.
      
      If you set less channels, than Traffic Classes there are, then during
      ice_vsi_rebuild there would be allocated only the requested amount
      of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map
      there would be requested at least one channel per Traffic Class. This
      results in setting num_rxq > alloc_rxq and num_txq > alloc_txq.
      Later, there would be a NULL pointer dereference in
      ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or
      tx_rings arrays.
      
      Change ice_set_channels() to return error if you try to allocate less
      channels, than Traffic Classes there are.
      Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return
      status code instead of void.
      Add error handling for ice_vsi_setup_q_map() and
      ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc().
      
      [53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi.
      [53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      [53763.992915] PGD 14b45f5067 P4D 0
      [53763.996444] Oops: 0002 [#1] SMP NOPTI
      [53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE    --------- -  - 4.18.0-240.el8.x86_64 #1
      [53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020
      [53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice]
      [53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4                           c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44
      [53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206
      [53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018
      [53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018
      [53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004
      [53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000
      [53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018
      [53764.090976] FS:  000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000
      [53764.099362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0
      [53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [53764.127747] PKRU: 55555554
      [53764.130781] Call Trace:
      [53764.133564]  ice_vsi_rebuild+0x611/0x870 [ice]
      [53764.138341]  ice_vsi_recfg_qs+0x94/0x100 [ice]
      [53764.143116]  ice_set_channels+0x1a8/0x3e0 [ice]
      [53764.147975]  ethtool_set_channels+0x14e/0x240
      [53764.152667]  dev_ethtool+0xd74/0x2a10
      [53764.156665]  ? __mod_lruvec_state+0x44/0x110
      [53764.161280]  ? __mod_lruvec_state+0x44/0x110
      [53764.165893]  ? page_add_file_rmap+0x15/0x170
      [53764.170518]  ? inet_ioctl+0xd1/0x220
      [53764.174445]  ? netdev_run_todo+0x5e/0x290
      [53764.178808]  dev_ioctl+0xb5/0x550
      [53764.182485]  sock_do_ioctl+0xa0/0x140
      [53764.186512]  sock_ioctl+0x1a8/0x300
      [53764.190367]  ? selinux_file_ioctl+0x161/0x200
      [53764.195090]  do_vfs_ioctl+0xa4/0x640
      [53764.199035]  ksys_ioctl+0x60/0x90
      [53764.202722]  __x64_sys_ioctl+0x16/0x20
      [53764.206845]  do_syscall_64+0x5b/0x1a0
      [53764.210887]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      Fixes: 87324e74 ("ice: Implement ethtool ops for channels")
      Signed-off-by: default avatarAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a632b2a4
    • Anatolii Gerasymenko's avatar
      ice: ethtool: advertise 1000M speeds properly · c3d184c8
      Anatolii Gerasymenko authored
      In current implementation ice_update_phy_type enables all link modes
      for selected speed. This approach doesn't work for 1000M speeds,
      because both copper (1000baseT) and optical (1000baseX) standards
      cannot be enabled at once.
      
      Fix this, by adding the function `ice_set_phy_type_from_speed()`
      for 1000M speeds.
      
      Fixes: 48cb27f2 ("ice: Implement handlers for ethtool PHY/link operations")
      Signed-off-by: default avatarAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c3d184c8
    • Wojciech Drewek's avatar
      ice: Fix switchdev rules book keeping · 3578dc90
      Wojciech Drewek authored
      Adding two filters with same matching criteria ends up with
      one rule in hardware with act = ICE_FWD_TO_VSI_LIST.
      In order to remove them properly we have to keep the
      information about vsi handle which is used in VSI bitmap
      (ice_adv_fltr_mgmt_list_entry::vsi_list_info::vsi_map).
      
      Fixes: 0d08a441 ("ice: ndo_setup_tc implementation for PF")
      Reported-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3578dc90
    • Marcin Szycik's avatar
      ice: ignore protocol field in GTP offload · d4ea6f63
      Marcin Szycik authored
      Commit 34a89775 ("ice: Add support for inner etype in switchdev")
      added the ability to match on inner ethertype. A side effect of that change
      is that it is now impossible to add some filters for protocols which do not
      contain inner ethtype field. tc requires the protocol field to be specified
      when providing certain other options, e.g. src_ip. This is a problem in
      case of GTP - when user wants to specify e.g. src_ip, they also need to
      specify protocol in tc command (otherwise tc fails with: Illegal "src_ip").
      Because GTP is a tunnel, the protocol field is treated as inner protocol.
      GTP does not contain inner ethtype field and the filter cannot be added.
      
      To fix this, ignore the ethertype field in case of GTP filters.
      
      Fixes: 9a225f81 ("ice: Support GTP-U and GTP-C offload in switchdev")
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d4ea6f63