1. 22 Jan, 2021 4 commits
    • Parav Pandit's avatar
      devlink: Support get and set state of port function · a556dded
      Parav Pandit authored
      devlink port function can be in active or inactive state.
      Allow users to get and set port function's state.
      
      When the port function it activated, its operational state may change
      after a while when the device is created and driver binds to it.
      Similarly on deactivation flow.
      
      To clearly describe the state of the port function and its device's
      operational state in the host system, define state and opstate
      attributes.
      
      Example of a PCI SF port which supports a port function:
      
      $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
      
      $ devlink port show
      pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
      
      $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
      pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
        function:
          hw_addr 00:00:00:00:00:00 state inactive opstate detached
      
      $ devlink port show pci/0000:06:00.0/32768
      pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
        function:
          hw_addr 00:00:00:00:88:88 state inactive opstate detached
      
      $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active
      
      $ devlink port show pci/0000:06:00.0/32768 -jp
      {
          "port": {
              "pci/0000:06:00.0/32768": {
                  "type": "eth",
                  "netdev": "ens2f0npf0sf88",
                  "flavour": "pcisf",
                  "controller": 0,
                  "pfnum": 0,
                  "sfnum": 88,
                  "external": false,
                  "splittable": false,
                  "function": {
                      "hw_addr": "00:00:00:00:88:88",
                      "state": "active",
                      "opstate": "attached"
                  }
              }
          }
      }
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarVu Pham <vuhuong@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a556dded
    • Parav Pandit's avatar
      devlink: Support add and delete devlink port · cd76dcd6
      Parav Pandit authored
      Extended devlink interface for the user to add and delete a port.
      Extend devlink to connect user requests to driver to add/delete
      a port in the device.
      
      Driver routines are invoked without holding devlink instance lock.
      This enables driver to perform several devlink objects registration,
      unregistration such as (port, health reporter, resource etc) by using
      existing devlink APIs.
      This also helps to uniformly use the code for port unregistration
      during driver unload and during port deletion initiated by user.
      
      Examples of add, show and delete commands:
      $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
      
      $ devlink port show
      pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
      
      $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
      pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
        function:
          hw_addr 00:00:00:00:00:00 state inactive opstate detached
      
      $ devlink port show pci/0000:06:00.0/32768
      pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
        function:
          hw_addr 00:00:00:00:00:00 state inactive opstate detached
      
      $ udevadm test-builtin net_id /sys/class/net/eth6
      Load module index
      Parsed configuration file /usr/lib/systemd/network/99-default.link
      Created link configuration context.
      Using default interface naming scheme 'v245'.
      ID_NET_NAMING_SCHEME=v245
      ID_NET_NAME_PATH=enp6s0f0npf0sf88
      ID_NET_NAME_SLOT=ens2f0npf0sf88
      Unload module index
      Unloaded link configuration context.
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarVu Pham <vuhuong@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cd76dcd6
    • Parav Pandit's avatar
      devlink: Introduce PCI SF port flavour and port attribute · b8288837
      Parav Pandit authored
      A PCI sub-function (SF) represents a portion of the device similar
      to PCI VF.
      
      In an eswitch, PCI SF may have port which is normally represented
      using a representor netdevice.
      To have better visibility of eswitch port, its association with SF,
      and its representor netdevice, introduce a PCI SF port flavour.
      
      When devlink port flavour is PCI SF, fill up PCI SF attributes of the
      port.
      
      Extend port name creation using PCI PF and SF number scheme on best
      effort basis, so that vendor drivers can skip defining their own
      scheme.
      This is done as cApfNSfM, where A, N and M are controller, PCI PF and
      PCI SF number respectively.
      This is similar to existing naming for PCI PF and PCI VF ports.
      
      An example view of a PCI SF port:
      
      $ devlink port show pci/0000:06:00.0/32768
      pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
        function:
          hw_addr 00:00:00:00:88:88 state active opstate attached
      
      $ devlink port show pci/0000:06:00.0/32768 -jp
      {
          "port": {
              "pci/0000:06:00.0/32768": {
                  "type": "eth",
                  "netdev": "ens2f0npf0sf88",
                  "flavour": "pcisf",
                  "controller": 0,
                  "pfnum": 0,
                  "sfnum": 88,
                  "splittable": false,
                  "function": {
                      "hw_addr": "00:00:00:00:88:88",
                      "state": "active",
                      "opstate": "attached"
                  }
              }
          }
      }
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarVu Pham <vuhuong@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b8288837
    • Parav Pandit's avatar
      devlink: Prepare code to fill multiple port function attributes · 1230d948
      Parav Pandit authored
      Prepare code to fill zero or more port function optional attributes.
      Subsequent patch makes use of this to fill more port function
      attributes.
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarVu Pham <vuhuong@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1230d948
  2. 20 Jan, 2021 14 commits
  3. 19 Jan, 2021 22 commits
    • Jakub Kicinski's avatar
      Merge branch 'net-support-sctp-crc-csum-offload-for-tunneling-packets-in-some-drivers' · 9f23de41
      Jakub Kicinski authored
      Xin Long says:
      
      ====================
      net: support SCTP CRC csum offload for tunneling packets in some drivers
      
      This patchset introduces inline function skb_csum_is_sctp(), and uses it
      to validate it's a sctp CRC csum offload packet, to make SCTP CRC csum
      offload for tunneling packets supported in some HW drivers.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1610777159.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f23de41
    • Xin Long's avatar
      net: ixgbevf: use skb_csum_is_sctp instead of protocol check · fc186d0a
      Xin Long authored
      Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
      checksum offload packet, and yet it also makes ixgbevf support SCTP
      CRC checksum offload for UDP and GRE encapped packets, just as it
      does in igb driver.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fc186d0a
    • Xin Long's avatar
      net: ixgbe: use skb_csum_is_sctp instead of protocol check · f8c4b01d
      Xin Long authored
      Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
      checksum offload packet, and yet it also makes ixgbe support SCTP
      CRC checksum offload for UDP and GRE encapped packets, just as it
      does in igb driver.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f8c4b01d
    • Xin Long's avatar
      net: igc: use skb_csum_is_sctp instead of protocol check · 609d29a9
      Xin Long authored
      Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
      checksum offload packet, and yet it also makes igc support SCTP
      CRC checksum offload for UDP and GRE encapped packets, just as it
      does in igb driver.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      609d29a9
    • Xin Long's avatar
      net: igbvf: use skb_csum_is_sctp instead of protocol check · d2de4444
      Xin Long authored
      Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
      checksum offload packet, and yet it also makes igbvf support SCTP
      CRC checksum offload for UDP and GRE encapped packets, just as it
      does in igb driver.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d2de4444
    • Xin Long's avatar
      net: igb: use skb_csum_is_sctp instead of protocol check · 8bcf0203
      Xin Long authored
      Using skb_csum_is_sctp is a easier way to validate it's a SCTP
      CRC checksum offload packet, and there is no need to parse the
      packet to check its proto field, especially when it's a UDP or
      GRE encapped packet.
      
      So this patch also makes igb support SCTP CRC checksum offload
      for UDP and GRE encapped packets.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8bcf0203
    • Xin Long's avatar
      net: add inline function skb_csum_is_sctp · fa821170
      Xin Long authored
      This patch is to define a inline function skb_csum_is_sctp(), and
      also replace all places where it checks if it's a SCTP CSUM skb.
      This function would be used later in many networking drivers in
      the following patches.
      Suggested-by: default avatarAlexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa821170
    • Alexander Lobakin's avatar
      mdio, phy: fix -Wshadow warnings triggered by nested container_of() · 7eab14de
      Alexander Lobakin authored
      container_of() macro hides a local variable '__mptr' inside. This
      becomes a problem when several container_of() are nested in each
      other within single line or plain macros.
      As C preprocessor doesn't support generating random variable names,
      the sole solution is to avoid defining macros that consist only of
      container_of() calls, or they will self-shadow '__mptr' each time:
      
      In file included from ./include/linux/bitmap.h:10,
                       from drivers/net/phy/phy_device.c:12:
      drivers/net/phy/phy_device.c: In function ‘phy_device_release’:
      ./include/linux/kernel.h:693:8: warning: declaration of ‘__mptr’ shadows a previous local [-Wshadow]
        693 |  void *__mptr = (void *)(ptr);     \
            |        ^~~~~~
      ./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’
        647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
            |                          ^~~~~~~~~~~~
      ./include/linux/mdio.h:52:27: note: in expansion of macro ‘container_of’
         52 | #define to_mdio_device(d) container_of(d, struct mdio_device, dev)
            |                           ^~~~~~~~~~~~
      ./include/linux/phy.h:647:39: note: in expansion of macro ‘to_mdio_device’
        647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
            |                                       ^~~~~~~~~~~~~~
      drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’
        217 |  kfree(to_phy_device(dev));
            |        ^~~~~~~~~~~~~
      ./include/linux/kernel.h:693:8: note: shadowed declaration is here
        693 |  void *__mptr = (void *)(ptr);     \
            |        ^~~~~~
      ./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’
        647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
            |                          ^~~~~~~~~~~~
      drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’
        217 |  kfree(to_phy_device(dev));
            |        ^~~~~~~~~~~~~
      
      As they are declared in header files, these warnings are highly
      repetitive and very annoying (along with the one from linux/pci.h).
      
      Convert the related macros from linux/{mdio,phy}.h to static inlines
      to avoid self-shadowing and potentially improve bug-catching.
      No functional changes implied.
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20210116161246.67075-1-alobakin@pm.meSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7eab14de
    • Yunjian Wang's avatar
      vhost_net: avoid tx queue stuck when sendmsg fails · dc9c9e72
      Yunjian Wang authored
      Currently the driver doesn't drop a packet which can't be sent by tun
      (e.g bad packet). In this case, the driver will always process the
      same packet lead to the tx queue stuck.
      
      To fix this issue:
      1. in the case of persistent failure (e.g bad packet), the driver
         can skip this descriptor by ignoring the error.
      2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
         the driver schedules the worker to try again.
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/1610685980-38608-1-git-send-email-wangyunjian@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dc9c9e72
    • Tom Rix's avatar
      net: hns: fix variable used when DEBUG is defined · 99d51897
      Tom Rix authored
      When DEBUG is defined this error occurs
      
      drivers/net/ethernet/hisilicon/hns/hns_enet.c:1505:36: error:
        ‘struct net_device’ has no member named ‘ae_handle’;
        did you mean ‘rx_handler’?
        assert(skb->queue_mapping < ndev->ae_handle->q_num);
                                          ^~~~~~~~~
      
      ae_handle is an element of struct hns_nic_priv, so change
      ndev to priv.
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Link: https://lore.kernel.org/r/20210117191044.533725-1-trix@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      99d51897
    • Tom Rix's avatar
      arcnet: fix macro name when DEBUG is defined · 7cfabe4f
      Tom Rix authored
      When DEBUG is defined this error occurs
      
      drivers/net/arcnet/com20020_cs.c:70:15: error: ‘com20020_REG_W_ADDR_HI’
        undeclared (first use in this function);
        did you mean ‘COM20020_REG_W_ADDR_HI’?
             ioaddr, com20020_REG_W_ADDR_HI);
                     ^~~~~~~~~~~~~~~~~~~~~~
      
      From reviewing the context, the suggestion is what is meant.
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Acked-by: default avatarJoe Perches <joe@perches.com>
      Link: https://lore.kernel.org/r/20210117181519.527625-1-trix@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7cfabe4f
    • Jakub Kicinski's avatar
      Merge branch 'tls-device-offload-for-bond' · be7f4578
      Jakub Kicinski authored
      Tariq Toukan says:
      
      ====================
      TLS device offload for Bond
      
      This series opens TX and RX TLS device offload for bond interfaces.
      This allows bond interfaces to benefit from capable lower devices.
      
      We add a new ndo_sk_get_lower_dev() to be used to get the lower dev that
      corresponds to a given socket.
      The TLS module uses it to interact directly with the lowest device in
      chain, and invoke the control operations in tlsdev_ops. This means that the
      bond interface doesn't have his own struct tlsdev_ops instance and
      derived logic/callbacks.
      
      To keep simple track of the HW and SW TLS contexts, we bind each socket to
      a specific lower device for the socket's whole lifetime. This is logically
      valid (and similar to the SW kTLS behavior) in the following bond configuration,
      so we restrict the offload support to it:
      
      ((mode == balance-xor) or (mode == 802.3ad))
      and xmit_hash_policy == layer3+4.
      
      In this design, TLS TX/RX offload feature flags of the bond device are
      independent from the lower devices. They reflect the current features state,
      but are not directly controllable.
      This is because the bond driver is bypassed by the call to
      ndo_sk_get_lower_dev(), without him knowing who the caller is.
      The bond TLS feature flags are set/cleared only according to the configuration
      of the mode and xmit_hash_policy.
      
      Bypass is true only for the control flow. Packets in fast path still go through
      the bond logic.
      
      The design here differs from the xfrm/ipsec offload, where the bond driver
      has his own copy of struct xfrmdev_ops and callbacks.
      ====================
      
      Link: https://lore.kernel.org/r/20210117145949.8632-1-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be7f4578
    • Tariq Toukan's avatar
      net/tls: Except bond interface from some TLS checks · 4e5a7332
      Tariq Toukan authored
      In the tls_dev_event handler, ignore tlsdev_ops requirement for bond
      interfaces, they do not exist as the interaction is done directly with
      the lower device.
      
      Also, make the validate function pass when it's called with the upper
      bond interface.
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4e5a7332
    • Tariq Toukan's avatar
      net/tls: Device offload to use lowest netdevice in chain · 153cbd13
      Tariq Toukan authored
      Do not call the tls_dev_ops of upper devices. Instead, ask them
      for the proper lowest device and communicate with it directly.
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      153cbd13
    • Tariq Toukan's avatar
      net/bonding: Declare TLS RX device offload support · dc5809f9
      Tariq Toukan authored
      Following the description in previous patch (for TX):
      As the bond interface is being bypassed by the TLS module, interacting
      directly against the lower devs, there is no way for the bond interface
      to disable its device offload capabilities, as long as the mode/policy
      config allows it.
      Hence, the feature flag is not directly controllable, but just reflects
      the offload status based on the logic under bond_sk_check().
      
      Here we just declare RX device offload support, and expose it via the
      NETIF_F_HW_TLS_RX flag.
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dc5809f9
    • Tariq Toukan's avatar
      net/bonding: Implement TLS TX device offload · 89df6a81
      Tariq Toukan authored
      Implement TLS TX device offload for bonding interfaces.
      This allows kTLS sockets running on a bond to benefit from the
      device offload on capable lower devices.
      
      To allow a simple and fast maintenance of the TLS context in SW and
      lower devices, we bind the TLS socket to a specific lower dev.
      To achieve a behavior similar to SW kTLS, we support only balance-xor
      and 802.3ad modes, with xmit_hash_policy=layer3+4. This is enforced
      in bond_sk_check(), done in a previous patch.
      
      For the above configuration, the SW implementation keeps picking the
      same exact lower dev for all the socket's SKBs. The device offload
      behaves similarly, making the decision once at the connection creation.
      
      Per socket, the TLS module should work directly with the lowest netdev
      in chain, to call the tls_dev_ops operations.
      
      As the bond interface is being bypassed by the TLS module, interacting
      directly against the lower devs, there is no way for the bond interface
      to disable its device offload capabilities, as long as the mode/policy
      config allows it.
      Hence, the feature flag is not directly controllable, but just reflects
      the current offload status based on the logic under bond_sk_check().
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89df6a81
    • Tariq Toukan's avatar
      net/bonding: Take update_features call out of XFRM funciton · f45583de
      Tariq Toukan authored
      In preparation for more cases that call netdev_update_features().
      
      While here, move the features logic to the stage where struct bond
      is already updated, and pass it as the only parameter to function
      bond_set_xfrm_features().
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f45583de
    • Tariq Toukan's avatar
      net/bonding: Implement ndo_sk_get_lower_dev · 007feb87
      Tariq Toukan authored
      Add ndo_sk_get_lower_dev() implementation for bond interfaces.
      
      Support only for the cases where the socket's and SKBs' hash
      yields identical value for the whole connection lifetime.
      
      Here we restrict it to L3+4 sockets only, with
      xmit_hash_policy==LAYER34 and bond modes xor/802.3ad.
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      007feb87
    • Tariq Toukan's avatar
      net/bonding: Take IP hash logic into a helper · 5b998545
      Tariq Toukan authored
      Hash logic on L3 will be used in a downstream patch for one more use
      case.
      Take it to a function for a better code reuse.
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5b998545
    • Tariq Toukan's avatar
      net: netdevice: Add operation ndo_sk_get_lower_dev · 719a402c
      Tariq Toukan authored
      ndo_sk_get_lower_dev returns the lower netdev that corresponds to
      a given socket.
      Additionally, we implement a helper netdev_sk_get_lowest_dev() to get
      the lowest one in chain.
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      719a402c
    • Christophe JAILLET's avatar
      net/qla3xxx: switch from 'pci_' to 'dma_' API · 41fb4c1b
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated in 'ql_alloc_net_req_rsp_queues()' GFP_KERNEL can
      be used because it is only called from 'ql_alloc_mem_resources()' which
      already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see below)
      
      When memory is allocated in 'ql_alloc_buffer_queues()' GFP_KERNEL can be
      used because this flag is already used just a few line above.
      
      When memory is allocated in 'ql_alloc_small_buffers()' GFP_KERNEL can
      be used because it is only called from 'ql_alloc_mem_resources()' which
      already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see above)
      
      When memory is allocated in 'ql_alloc_mem_resources()' GFP_KERNEL can be
      used because this function already calls 'ql_alloc_buffer_queues()' which
      uses GFP_KERNEL. (see above)
      
      While at it, use 'dma_set_mask_and_coherent()' instead of 'dma_set_mask()/
      dma_set_coherent_mask()' in order to slightly simplify code.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Link: https://lore.kernel.org/r/20210117081542.560021-1-christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41fb4c1b
    • Cong Wang's avatar
      net_sched: fix RTNL deadlock again caused by request_module() · d349f997
      Cong Wang authored
      tcf_action_init_1() loads tc action modules automatically with
      request_module() after parsing the tc action names, and it drops RTNL
      lock and re-holds it before and after request_module(). This causes a
      lot of troubles, as discovered by syzbot, because we can be in the
      middle of batch initializations when we create an array of tc actions.
      
      One of the problem is deadlock:
      
      CPU 0					CPU 1
      rtnl_lock();
      for (...) {
        tcf_action_init_1();
          -> rtnl_unlock();
          -> request_module();
      				rtnl_lock();
      				for (...) {
      				  tcf_action_init_1();
      				    -> tcf_idr_check_alloc();
      				   // Insert one action into idr,
      				   // but it is not committed until
      				   // tcf_idr_insert_many(), then drop
      				   // the RTNL lock in the _next_
      				   // iteration
      				   -> rtnl_unlock();
          -> rtnl_lock();
          -> a_o->init();
            -> tcf_idr_check_alloc();
            // Now waiting for the same index
            // to be committed
      				    -> request_module();
      				    -> rtnl_lock()
      				    // Now waiting for RTNL lock
      				}
      				rtnl_unlock();
      }
      rtnl_unlock();
      
      This is not easy to solve, we can move the request_module() before
      this loop and pre-load all the modules we need for this netlink
      message and then do the rest initializations. So the loop breaks down
      to two now:
      
              for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                      struct tc_action_ops *a_o;
      
                      a_o = tc_action_load_ops(name, tb[i]...);
                      ops[i - 1] = a_o;
              }
      
              for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                      act = tcf_action_init_1(ops[i - 1]...);
              }
      
      Although this looks serious, it only has been reported by syzbot, so it
      seems hard to trigger this by humans. And given the size of this patch,
      I'd suggest to make it to net-next and not to backport to stable.
      
      This patch has been tested by syzbot and tested with tdc.py by me.
      
      Fixes: 0fedc63f ("net_sched: commit action insertions together")
      Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com
      Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Tested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d349f997