1. 27 Jan, 2020 40 commits
    • Michael Chan's avatar
      bnxt_en: Do not accept fragments for aRFS flow steering. · f47d0e19
      Michael Chan authored
      In bnxt_rx_flow_steer(), if the dissected packet is a fragment, do not
      proceed to create the ntuple filter and return error instead.  Otherwise
      we would create a filter with 0 source and destination ports because
      the dissected ports would not be available for fragments.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f47d0e19
    • Michael Chan's avatar
      bnxt_en: Support UDP RSS hashing on 575XX chips. · c66c06c5
      Michael Chan authored
      575XX (P5) chips have the same UDP RSS hashing capability as P4 chips,
      so we can enable it on P5 chips.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c66c06c5
    • Michael Chan's avatar
      bnxt_en: Remove the setting of dev_port. · 1d86859f
      Michael Chan authored
      The dev_port is meant to distinguish the network ports belonging to
      the same PCI function.  Our devices only have one network port
      associated with each PCI function and so we should not set it for
      correctness.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d86859f
    • Michael Chan's avatar
      bnxt_en: Improve bnxt_probe_phy(). · 43a5107d
      Michael Chan authored
      If the 2nd parameter fw_dflt is not set, we are calling bnxt_probe_phy()
      after the firmware has reset.  There is no need to query the current
      PHY settings from firmware as these settings may be different from
      the ethtool settings that the driver will re-establish later.  So
      return earlier in bnxt_probe_phy() to save one firmware call.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43a5107d
    • Michael Chan's avatar
      bnxt_en: Improve link up detection. · 83d8f5e9
      Michael Chan authored
      In bnxt_update_phy_setting(), ethtool_get_link_ksettings() and
      bnxt_disable_an_for_lpbk(), we inconsistently use netif_carrier_ok()
      to determine link.  Instead, we should use bp->link_info.link_up
      which has the true link state.  The netif_carrier state may be off
      during self-test and while the device is being reset and may not always
      reflect the true link state.
      
      By always using bp->link_info.link_up, the code is now more
      consistent and more correct.  Some unnecessary link toggles are
      now prevented with this patch.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83d8f5e9
    • David S. Miller's avatar
      Merge branch 'ethtool-netlink-interface-part-2' · db038b1b
      David S. Miller authored
      Michal Kubecek says:
      
      ====================
      ethtool netlink interface, part 2
      
      This shorter series adds support for getting and setting of wake-on-lan
      settings and message mask (originally message level). Together with the
      code already in net-next, this will allow full implementation of
      "ethtool <dev>" and "ethtool -s <dev> ...".
      
      Older versions of the ethtool netlink series allowed getting WoL settings
      by unprivileged users and only filtered out the password but this was
      a source of controversy so for now, ETHTOOL_MSG_WOL_GET request always
      requires CAP_NET_ADMIN as ETHTOOL_GWOL ioctl request does.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db038b1b
    • Michal Kubecek's avatar
      ethtool: add WOL_NTF notification · 67bffa79
      Michal Kubecek authored
      Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of
      a device are modified using ETHTOOL_MSG_WOL_SET netlink message or
      ETHTOOL_SWOL ioctl request.
      
      As notifications can be received by anyone, do not include SecureOn(tm)
      password in notification messages.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67bffa79
    • Michal Kubecek's avatar
      ethtool: set wake-on-lan settings with WOL_SET request · 8d425b19
      Michal Kubecek authored
      Implement WOL_SET netlink request to set wake-on-lan settings. This is
      equivalent to ETHTOOL_SWOL ioctl request.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d425b19
    • Michal Kubecek's avatar
      ethtool: provide WoL settings with WOL_GET request · 51ea22b0
      Michal Kubecek authored
      Implement WOL_GET request to get wake-on-lan settings for a device,
      traditionally available via ETHTOOL_GWOL ioctl request.
      
      As part of the implementation, provide symbolic names for wake-on-line
      modes as ETH_SS_WOL_MODES string set.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51ea22b0
    • Michal Kubecek's avatar
      ethtool: add DEBUG_NTF notification · 0bda7af3
      Michal Kubecek authored
      Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message
      mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message
      or ETHTOOL_SMSGLVL ioctl request.
      
      The notification message has the same format as reply to DEBUG_GET request.
      As with other ethtool notifications, netlink requests only trigger the
      notification if the mask is actually changed while ioctl request trigger it
      whenever the request results in calling the ethtool_ops handler.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bda7af3
    • Michal Kubecek's avatar
      ethtool: set message mask with DEBUG_SET request · e54d04e3
      Michal Kubecek authored
      Implement DEBUG_SET netlink request to set debugging settings for a device.
      At the moment, only message mask corresponding to message level as set by
      ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in
      ioctl interface but almost all drivers interpret it as a bit mask.)
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e54d04e3
    • Michal Kubecek's avatar
      ethtool: provide message mask with DEBUG_GET request · 6a94b8cc
      Michal Kubecek authored
      Implement DEBUG_GET request to get debugging settings for a device. At the
      moment, only message mask corresponding to message level as reported by
      ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in
      ioctl interface but almost all drivers interpret it as a bit mask.)
      
      As part of the implementation, provide symbolic names for message mask bits
      as ETH_SS_MSG_CLASSES string set.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a94b8cc
    • Michal Kubecek's avatar
      ethtool: fix kernel-doc descriptions · d2c4b444
      Michal Kubecek authored
      Fix missing or incorrect function argument and struct member descriptions.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2c4b444
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-2020-01-26' of... · 82bc2e4a
      David S. Miller authored
      Merge tag 'wireless-drivers-next-2020-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for v5.6
      
      Second set of patches for v5.6. Nothing special standing out, smaller
      new features and fixes allover.
      
      Major changes:
      
      ar5523
      
      * add support for SMCWUSBT-G2 USB device
      
      iwlwifi
      
      * support new versions of the FTM FW APIs
      
      * support new version of the beacon template FW API
      
      * print some extra information when the driver is loaded
      
      rtw88
      
      * support wowlan feature for 8822c
      
      * add support for WIPHY_WOWLAN_NET_DETECT
      
      brcmfmac
      
      * add initial support for monitor mode
      
      qtnfmac
      
      * add module parameter to enable DFS offloading in firmware
      
      * add support for STA HE rates
      
      * add support for TWT responder and spatial reuse
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82bc2e4a
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · c4c57b97
      David S. Miller authored
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2020-01-26
      
      Here's (probably) the last bluetooth-next pull request for the 5.6 kernel.
      
       - Initial pieces of Bluetooth 5.2 Isochronous Channels support
       - mgmt: Various cleanups and a new Set Blocked Keys command
       - btusb: Added support for 04ca:3021 QCA_ROME device
       - hci_qca: Multiple fixes & cleanups
       - hci_bcm: Fixes & improved device tree support
       - Fixed attempts to create duplicate debugfs entries
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4c57b97
    • Christophe JAILLET's avatar
      drivers: net: xgene: Fix the order of the arguments of 'alloc_etherdev_mqs()' · 5a44c71c
      Christophe JAILLET authored
      'alloc_etherdev_mqs()' expects first 'tx', then 'rx'. The semantic here
      looks reversed.
      
      Reorder the arguments passed to 'alloc_etherdev_mqs()' in order to keep
      the correct semantic.
      
      In fact, this is a no-op because both XGENE_NUM_[RT]X_RING are 8.
      
      Fixes: 107dec27 ("drivers: net: xgene: Add support for multiple queues")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a44c71c
    • Heiner Kallweit's avatar
      r8169: don't set min_mtu/max_mtu if not needed · a8ec173a
      Heiner Kallweit authored
      Defaults for min_mtu and max_mtu are set by ether_setup(), which is
      called from devm_alloc_etherdev(). Let rtl_jumbo_max() only return
      a positive value if actually jumbo packets are supported. This also
      allows to remove constant Jumbo_1K which is a little misleading anyway.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8ec173a
    • Christophe JAILLET's avatar
      mlxsw: minimal: Fix an error handling path in 'mlxsw_m_port_create()' · 6dd4b4f3
      Christophe JAILLET authored
      An 'alloc_etherdev()' called is not ballanced by a corresponding
      'free_netdev()' call in one error handling path.
      
      Slighly reorder the error handling code to catch the missed case.
      
      Fixes: c100e47c ("mlxsw: minimal: Add ethtool support")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dd4b4f3
    • Vladimir Oltean's avatar
      net: dsa: Fix use-after-free in probing of DSA switch tree · 6dc43cd3
      Vladimir Oltean authored
      DSA sets up a switch tree little by little. Every switch of the N
      members of the tree calls dsa_register_switch, and (N - 1) will just
      touch the dst->ports list with their ports and quickly exit. Only the
      last switch that calls dsa_register_switch will find all DSA links
      complete in dsa_tree_setup_routing_table, and not return zero as a
      result but instead go ahead and set up the entire DSA switch tree
      (practically on behalf of the other switches too).
      
      The trouble is that the (N - 1) switches don't clean up after themselves
      after they get an error such as EPROBE_DEFER. Their footprint left in
      dst->ports by dsa_switch_touch_ports is still there. And switch N, the
      one responsible with actually setting up the tree, is going to work with
      those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and
      ds->dev might get freed by the device driver.
      
      Be there a 2-switch tree and the following calling order:
      - Switch 1 calls dsa_register_switch
        - Calls dsa_switch_touch_ports, populates dst->ports
        - Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits.
      - Switch 2 calls dsa_register_switch
        - Calls dsa_switch_touch_ports, populates dst->ports
        - Probe doesn't get deferred, so it goes ahead.
        - Calls dsa_tree_setup_routing_table, which returns "complete == true"
          due to Switch 1 having called dsa_switch_touch_ports before.
        - Because the DSA links are complete, it calls dsa_tree_setup_switches
          now.
        - dsa_tree_setup_switches iterates through dst->ports, initializing
          the Switch 1 ds structure (invalid) and the Switch 2 ds structure
          (valid).
        - Undefined behavior (use after free, sometimes NULL pointers, etc).
      
      Real example below (debugging prints added by me, as well as guards
      against NULL pointers):
      
      [    5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      
      The solution is to recognize that the functions that call
      dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side
      effects, and therefore one should clean up their side effects on error
      path. The cleanup of dst->ports was taken from dsa_switch_remove and
      moved into a dedicated dsa_switch_release_ports function, which should
      really be per-switch (free only the members of dst->ports that are also
      members of ds, instead of all switch ports).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dc43cd3
    • Heiner Kallweit's avatar
      net: remove eth_change_mtu · a85dd3a5
      Heiner Kallweit authored
      All usage of this function was removed three years ago, and the
      function was marked as deprecated:
      a52ad514 ("net: deprecate eth_change_mtu, remove usage")
      So I think we can remove it now.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a85dd3a5
    • David S. Miller's avatar
      Merge branch 'XDP-fixes-for-socionext-driver' · 0e6223ea
      David S. Miller authored
      Lorenzo Bianconi says:
      
      ====================
      XDP fixes for socionext driver
      
      Fix possible user-after-in XDP rx path
      Fix rx statistics accounting if no bpf program is attached
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e6223ea
    • Lorenzo Bianconi's avatar
      net: socionext: fix xdp_result initialization in netsec_process_rx · 02758cb6
      Lorenzo Bianconi authored
      Fix xdp_result initialization in netsec_process_rx in order to not
      increase rx counters if there is no bpf program attached to the xdp hook
      and napi_gro_receive returns GRO_DROP
      
      Fixes: ba2b2321 ("net: netsec: add XDP support")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02758cb6
    • Lorenzo Bianconi's avatar
      net: socionext: fix possible user-after-free in netsec_process_rx · b5e82e3c
      Lorenzo Bianconi authored
      Fix possible use-after-free in in netsec_process_rx that can occurs if
      the first packet is sent to the normal networking stack and the
      following one is dropped by the bpf program attached to the xdp hook.
      Fix the issue defining the skb pointer in the 'budget' loop
      
      Fixes: ba2b2321 ("net: netsec: add XDP support")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5e82e3c
    • David S. Miller's avatar
      Merge branch 'net-allow-per-net-notifier-to-follow-netdev-into-namespace' · 09917a12
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      net: allow per-net notifier to follow netdev into namespace
      
      Currently we have per-net notifier, which allows to get only
      notifications relevant to particular network namespace. That is enough
      for drivers that have netdevs local in a particular namespace (cannot
      move elsewhere).
      
      However if netdev can change namespace, per-net notifier cannot be used.
      Introduce dev_net variant that is basically per-net notifier with an
      extension that re-registers the per-net notifier upon netdev namespace
      change. Basically the per-net notifier follows the netdev into
      namespace.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09917a12
    • Jiri Pirko's avatar
      mlx5: Use dev_net netdevice notifier registrations · d48834f9
      Jiri Pirko authored
      Register the dev_net notifier and allow the per-net notifier to follow
      the device into different namespace.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d48834f9
    • Jiri Pirko's avatar
      net: introduce dev_net notifier register/unregister variants · 93642e14
      Jiri Pirko authored
      Introduce dev_net variants of netdev notifier register/unregister functions
      and allow per-net notifier to follow the netdevice into the namespace it is
      moved to.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93642e14
    • Jiri Pirko's avatar
      net: push code from net notifier reg/unreg into helpers · 1f637703
      Jiri Pirko authored
      Push the code which is done under rtnl lock in net notifier register and
      unregister function into separate helpers.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f637703
    • Jiri Pirko's avatar
      net: call call_netdevice_unregister_net_notifiers from unregister · 48b3a137
      Jiri Pirko authored
      The function does the same thing as the existing code, so rather call
      call_netdevice_unregister_net_notifiers() instead of code duplication.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48b3a137
    • Kuniyuki Iwashima's avatar
      soreuseport: Cleanup duplicate initialization of more_reuse->max_socks. · cd94ef06
      Kuniyuki Iwashima authored
      reuseport_grow() does not need to initialize the more_reuse->max_socks
      again. It is already initialized in __reuseport_alloc().
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd94ef06
    • David S. Miller's avatar
      Merge branch 'Support-fraglist-GRO-GSO' · 4d434705
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      Support fraglist GRO/GSO
      
      This patchset adds support to do GRO/GSO by chaining packets
      of the same flow at the SKB frag_list pointer. This avoids
      the overhead to merge payloads into one big packet, and
      on the other end, if GSO is needed it avoids the overhead
      of splitting the big packet back to the native form.
      
      Patch 1 adds netdev feature flags to enable fraglist GRO,
      this implements one of the configuration options discussed
      at netconf 2019.
      
      Patch 2 adds a netdev software feature set that defaults to off
      and assigns the new fraglist GRO feature flag to it.
      
      Patch 3 adds the core infrastructure to do fraglist GRO/GSO.
      
      Patch 4 enables UDP to use fraglist GRO/GSO if configured.
      
      I have only meaningful forwarding performance measurements.
      I did some tests for the local receive path with netperf and iperf,
      but in this case the sender that generates the packets is the
      bottleneck. So the benchmarks are not that meaningful for the
      receive path.
      
      Paolo Abeni did some benchmarks of the local receive path for the
      RFC v2 version of this pachset, results can be found here:
      
      https://www.spinics.net/lists/netdev/msg551158.html
      
      I used my IPsec forwarding test setup for the performance measurements:
      
                 ------------         ------------
              -->| router 1 |-------->| router 2 |--
              |  ------------         ------------  |
              |                                     |
              |       --------------------          |
              --------|Spirent Testcenter|<----------
                      --------------------
      
      net-next (September 7th 2019):
      
      Single stream UDP frame size 1460 Bytes: 1.161.000 fps (13.5 Gbps).
      
      ----------------------------------------------------------------------
      
      net-next (September 7th 2019) + standard UDP GRO/GSO (not implemented
      in this patchset):
      
      Single stream UDP frame size 1460 Bytes: 1.801.000 fps (21 Gbps).
      
      ----------------------------------------------------------------------
      
      net-next (September 7th 2019) + fraglist UDP GRO/GSO:
      
      Single stream UDP frame size 1460 Bytes: 2.860.000 fps (33.4 Gbps).
      
      =======================================================================
      
      net-next (January 23th 2020):
      
      Single stream UDP frame size 1460 Bytes: 919.000 fps (10.73 Gbps).
      
      ----------------------------------------------------------------------
      
      net-next (January 23th 2020) + fraglist UDP GRO/GSO:
      
      Single stream UDP frame size 1460 Bytes: 2.430.000 fps (28.38 Gbps).
      
      -----------------------------------------------------------------------
      
      Changes from RFC v1:
      
      - Add IPv6 support.
      - Split patchset to enable UDP GRO by default before adding
        fraglist GRO support.
      - Mark fraglist GRO packets as CHECKSUM_NONE.
      - Take a refcount on the first segment skb when doing fraglist
        segmentation. With this we can use the same error handling
        path as with standard segmentation.
      
      Changes from RFC v2:
      
      - Add a netdev feature flag to configure listifyed GRO.
      - Fix UDP GRO enabling for IPv6.
      - Fix a rcu_read_lock() imbalance.
      - Fix error path in skb_segment_list().
      
      Changes from RFC v3:
      
      - Rename NETIF_F_GRO_LIST to NETIF_F_GRO_FRAGLIST and add
        NETIF_F_GSO_FRAGLIST.
      - Move introduction of SKB_GSO_FRAGLIST to patch 2.
      - Use udpv6_encap_needed_key instead of udp_encap_needed_key in IPv6.
      - Move some missplaced code from patch 5 to patch 1 where it belongs to.
      
      Changes from RFC v4:
      
      - Drop the 'UDP: enable GRO by default' patch for now. Standard UDP GRO
        is not changed with this patchset.
      - Rebase to net-next current.
      
      Changes fom v1 (December 18th):
      
      - Do a full __copy_skb_header instead of tryng to find the really
        needed subset header fields. Thisa can be done later.
      - Mark all fraglist GRO packets with CHECKSUM_UNNECESSARY.
      - Rebase to net-next current.
      
      Changes fom v2 (January 24th):
      
      - Do the CHECKSUM_UNNECESSARY setting from IPv4 for IPv6 too.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d434705
    • Steffen Klassert's avatar
      udp: Support UDP fraglist GRO/GSO. · 9fd1ff5d
      Steffen Klassert authored
      This patch extends UDP GRO to support fraglist GRO/GSO
      by using the previously introduced infrastructure.
      If the feature is enabled, all UDP packets are going to
      fraglist GRO (local input and forward).
      
      After validating the csum,  we mark ip_summed as
      CHECKSUM_UNNECESSARY for fraglist GRO packets to
      make sure that the csum is not touched.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fd1ff5d
    • Steffen Klassert's avatar
      net: Support GRO/GSO fraglist chaining. · 3a1296a3
      Steffen Klassert authored
      This patch adds the core functions to chain/unchain
      GSO skbs at the frag_list pointer. This also adds
      a new GSO type SKB_GSO_FRAGLIST and a is_flist
      flag to napi_gro_cb which indicates that this
      flow will be GROed by fraglist chaining.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a1296a3
    • Steffen Klassert's avatar
      net: Add a netdev software feature set that defaults to off. · 1a3c998f
      Steffen Klassert authored
      The previous patch added the NETIF_F_GRO_FRAGLIST feature.
      This is a software feature that should default to off.
      Current software features default to on, so add a new
      feature set that defaults to off.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a3c998f
    • Steffen Klassert's avatar
      net: Add fraglist GRO/GSO feature flags · 3b335832
      Steffen Klassert authored
      This adds new Fraglist GRO/GSO feature flags. They will be used
      to configure fraglist GRO/GSO what will be implemented with some
      followup paches.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b335832
    • Sven Auhagen's avatar
      mvneta driver disallow XDP program on hardware buffer management · 79572c98
      Sven Auhagen authored
      Recently XDP Support was added to the mvneta driver
      for software buffer management only.
      It is still possible to attach an XDP program if
      hardware buffer management is used.
      It is not doing anything at that point.
      
      The patch disallows attaching XDP programs to mvneta
      if hardware buffer management is used.
      
      I am sorry about that. It is my first submission and I am having
      some troubles with the format of my emails.
      
      v4 -> v5:
      - Remove extra tabs
      
      v3 -> v4:
      - Please ignore v3 I accidentally submitted
        my other patch with git-send-mail and v4 is correct
      
      v2 -> v3:
      - My mailserver corrupted the patch
        resubmission with git-send-email
      
      v1 -> v2:
      - Fixing the patches indentation
      Signed-off-by: default avatarSven Auhagen <sven.auhagen@voleatech.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79572c98
    • David Howells's avatar
      rxrpc: Fix use-after-free in rxrpc_receive_data() · 122d74fa
      David Howells authored
      The subpacket scanning loop in rxrpc_receive_data() references the
      subpacket count in the private data part of the sk_buff in the loop
      termination condition.  However, when the final subpacket is pasted into
      the ring buffer, the function is no longer has a ref on the sk_buff and
      should not be looking at sp->* any more.  This point is actually marked in
      the code when skb is cleared (but sp is not - which is an error).
      
      Fix this by caching sp->nr_subpackets in a local variable and using that
      instead.
      
      Also clear 'sp' to catch accesses after that point.
      
      This can show up as an oops in rxrpc_get_skb() if sp->nr_subpackets gets
      trashed by the sk_buff getting freed and reused in the meantime.
      
      Fixes: e2de6c40 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      122d74fa
    • Eric Dumazet's avatar
      net_sched: ematch: reject invalid TCF_EM_SIMPLE · 55cd9f67
      Eric Dumazet authored
      It is possible for malicious userspace to set TCF_EM_SIMPLE bit
      even for matches that should not have this bit set.
      
      This can fool two places using tcf_em_is_simple()
      
      1) tcf_em_tree_destroy() -> memory leak of em->data
         if ops->destroy() is NULL
      
      2) tcf_em_tree_dump() wrongly report/leak 4 low-order bytes
         of a kernel pointer.
      
      BUG: memory leak
      unreferenced object 0xffff888121850a40 (size 32):
        comm "syz-executor927", pid 7193, jiffies 4294941655 (age 19.840s)
        hex dump (first 32 bytes):
          00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f67036ea>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<00000000f67036ea>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<00000000f67036ea>] slab_alloc mm/slab.c:3320 [inline]
          [<00000000f67036ea>] __do_kmalloc mm/slab.c:3654 [inline]
          [<00000000f67036ea>] __kmalloc_track_caller+0x165/0x300 mm/slab.c:3671
          [<00000000fab0cc8e>] kmemdup+0x27/0x60 mm/util.c:127
          [<00000000d9992e0a>] kmemdup include/linux/string.h:453 [inline]
          [<00000000d9992e0a>] em_nbyte_change+0x5b/0x90 net/sched/em_nbyte.c:32
          [<000000007e04f711>] tcf_em_validate net/sched/ematch.c:241 [inline]
          [<000000007e04f711>] tcf_em_tree_validate net/sched/ematch.c:359 [inline]
          [<000000007e04f711>] tcf_em_tree_validate+0x332/0x46f net/sched/ematch.c:300
          [<000000007a769204>] basic_set_parms net/sched/cls_basic.c:157 [inline]
          [<000000007a769204>] basic_change+0x1d7/0x5f0 net/sched/cls_basic.c:219
          [<00000000e57a5997>] tc_new_tfilter+0x566/0xf70 net/sched/cls_api.c:2104
          [<0000000074b68559>] rtnetlink_rcv_msg+0x3b2/0x4b0 net/core/rtnetlink.c:5415
          [<00000000b7fe53fb>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
          [<00000000e83a40d0>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
          [<00000000d62ba933>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
          [<00000000d62ba933>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
          [<0000000088070f72>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
          [<00000000f70b15ea>] sock_sendmsg_nosec net/socket.c:639 [inline]
          [<00000000f70b15ea>] sock_sendmsg+0x54/0x70 net/socket.c:659
          [<00000000ef95a9be>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
          [<00000000b650f1ab>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
          [<0000000055bfa74a>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
          [<000000002abac183>] __do_sys_sendmsg net/socket.c:2426 [inline]
          [<000000002abac183>] __se_sys_sendmsg net/socket.c:2424 [inline]
          [<000000002abac183>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+03c4738ed29d5d366ddf@syzkaller.appspotmail.com
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55cd9f67
    • Stephen Worley's avatar
      net: include struct nhmsg size in nh nlmsg size · f9e95555
      Stephen Worley authored
      Include the size of struct nhmsg size when calculating
      how much of a payload to allocate in a new netlink nexthop
      notification message.
      
      Without this, we will fail to fill the skbuff at certain nexthop
      group sizes.
      
      You can reproduce the failure with the following iproute2 commands:
      
      ip link add dummy1 type dummy
      ip link add dummy2 type dummy
      ip link add dummy3 type dummy
      ip link add dummy4 type dummy
      ip link add dummy5 type dummy
      ip link add dummy6 type dummy
      ip link add dummy7 type dummy
      ip link add dummy8 type dummy
      ip link add dummy9 type dummy
      ip link add dummy10 type dummy
      ip link add dummy11 type dummy
      ip link add dummy12 type dummy
      ip link add dummy13 type dummy
      ip link add dummy14 type dummy
      ip link add dummy15 type dummy
      ip link add dummy16 type dummy
      ip link add dummy17 type dummy
      ip link add dummy18 type dummy
      ip link add dummy19 type dummy
      
      ip ro add 1.1.1.1/32 dev dummy1
      ip ro add 1.1.1.2/32 dev dummy2
      ip ro add 1.1.1.3/32 dev dummy3
      ip ro add 1.1.1.4/32 dev dummy4
      ip ro add 1.1.1.5/32 dev dummy5
      ip ro add 1.1.1.6/32 dev dummy6
      ip ro add 1.1.1.7/32 dev dummy7
      ip ro add 1.1.1.8/32 dev dummy8
      ip ro add 1.1.1.9/32 dev dummy9
      ip ro add 1.1.1.10/32 dev dummy10
      ip ro add 1.1.1.11/32 dev dummy11
      ip ro add 1.1.1.12/32 dev dummy12
      ip ro add 1.1.1.13/32 dev dummy13
      ip ro add 1.1.1.14/32 dev dummy14
      ip ro add 1.1.1.15/32 dev dummy15
      ip ro add 1.1.1.16/32 dev dummy16
      ip ro add 1.1.1.17/32 dev dummy17
      ip ro add 1.1.1.18/32 dev dummy18
      ip ro add 1.1.1.19/32 dev dummy19
      
      ip next add id 1 via 1.1.1.1 dev dummy1
      ip next add id 2 via 1.1.1.2 dev dummy2
      ip next add id 3 via 1.1.1.3 dev dummy3
      ip next add id 4 via 1.1.1.4 dev dummy4
      ip next add id 5 via 1.1.1.5 dev dummy5
      ip next add id 6 via 1.1.1.6 dev dummy6
      ip next add id 7 via 1.1.1.7 dev dummy7
      ip next add id 8 via 1.1.1.8 dev dummy8
      ip next add id 9 via 1.1.1.9 dev dummy9
      ip next add id 10 via 1.1.1.10 dev dummy10
      ip next add id 11 via 1.1.1.11 dev dummy11
      ip next add id 12 via 1.1.1.12 dev dummy12
      ip next add id 13 via 1.1.1.13 dev dummy13
      ip next add id 14 via 1.1.1.14 dev dummy14
      ip next add id 15 via 1.1.1.15 dev dummy15
      ip next add id 16 via 1.1.1.16 dev dummy16
      ip next add id 17 via 1.1.1.17 dev dummy17
      ip next add id 18 via 1.1.1.18 dev dummy18
      ip next add id 19 via 1.1.1.19 dev dummy19
      
      ip next add id 1111 group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
      ip next del id 1111
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: default avatarStephen Worley <sworley@cumulusnetworks.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9e95555
    • Cong Wang's avatar
      net_sched: walk through all child classes in tc_bind_tclass() · 760d228e
      Cong Wang authored
      In a complex TC class hierarchy like this:
      
      tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit         \
        avpkt 1000 cell 8
      tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit  \
        rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20      \
        avpkt 1000 bounded
      
      tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
        sport 80 0xffff flowid 1:3
      tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
        sport 25 0xffff flowid 1:4
      
      tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit  \
        rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20      \
        avpkt 1000
      tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit  \
        rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20      \
        avpkt 1000
      
      where filters are installed on qdisc 1:0, so we can't merely
      search from class 1:1 when creating class 1:3 and class 1:4. We have
      to walk through all the child classes of the direct parent qdisc.
      Otherwise we would miss filters those need reverse binding.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      760d228e
    • Cong Wang's avatar
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang authored
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e24cd75