1. 08 Jan, 2022 12 commits
    • Arthur Kiyanovski's avatar
      net: ena: Change ENI stats support check to use capabilities field · 394c48e0
      Arthur Kiyanovski authored
      Use the capabilities field to query the device for ENI stats
      support.
      
      This replaces the previous method that tried to get the ENI stats
      during ena_probe() and used the success or failure as an indication
      for support by the device.
      
      Remove eni_stats_supported field from struct ena_adapter. This field
      was used for the previous method of queriying for ENI stats support.
      
      Change the severity level of the print in case of
      ena_com_get_eni_stats() failure from info to error.
      With the previous method of querying form ENI stats support, failure
      to get ENI stats was normal for devices that don't support it.
      With the use of the capabilities field such a failure is unexpected,
      as it is called only if the device reported that it supports ENI
      stats.
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      394c48e0
    • Arthur Kiyanovski's avatar
      net: ena: Add capabilities field with support for ENI stats capability · a2d5d6a7
      Arthur Kiyanovski authored
      This bitmask field indicates what capabilities are supported by the
      device.
      
      The capabilities field differs from the 'supported_features' field which
      indicates what sub-commands for the set/get feature commands are
      supported. The sub-commands are specified in the 'feature_id' field of
      the 'ena_admin_set_feat_cmd' struct in the following way:
      
              struct ena_admin_set_feat_cmd cmd;
      
              cmd.aq_common_descriptor.opcode = ENA_ADMIN_SET_FEATURE;
              cmd.feat_common.feature_
      
      The 'capabilities' field, on the other hand, specifies different
      capabilities of the device. For example, whether the device supports
      querying of ENI stats.
      
      Also add an enumerator which contains all the capabilities. The
      first added capability macro is for ENI stats feature.
      
      Capabilities are queried along with the other device attributes (in
      ena_com_get_dev_attr_feat()) during device initialization and are stored
      in the ena_com_dev struct. They can be later queried using the
      ena_com_get_cap() helper function.
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a2d5d6a7
    • Arthur Kiyanovski's avatar
      net: ena: Change return value of ena_calc_io_queue_size() to void · 7dcf9221
      Arthur Kiyanovski authored
      ena_calc_io_queue_size() always returns 0, therefore make it a
      void function and update the calling function to stop checking
      the return value.
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7dcf9221
    • Eric Dumazet's avatar
      af_packet: fix tracking issues in packet_do_bind() · bf44077c
      Eric Dumazet authored
      It appears that my changes in packet_do_bind() were
      slightly wrong.
      
      syzbot found that calling bind() twice would trigger
      a false positive.
      
      Remove proto_curr/dev_curr variables and rewrite things
      to be less confusing (like not having to use netdev_tracker_alloc(),
      and instead use the standard dev_hold_track())
      
      Fixes: f1d9268e ("net: add net device refcount tracker to struct packet_type")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220107183953.3886647-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bf44077c
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-refactoring-for-one-selftest-and-csum-validation' · d8caa2ed
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Refactoring for one selftest and csum validation
      
      Patch 1 changes the MPTCP join self tests to depend more on events
      rather than delays, so the script runs faster and has more consistent
      results.
      
      Patches 2 and 3 get rid of some duplicate code in MPTCP's checksum
      validation by modifying and leveraging an existing helper function.
      ====================
      
      Link: https://lore.kernel.org/r/20220107192524.445137-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d8caa2ed
    • Geliang Tang's avatar
      mptcp: reuse __mptcp_make_csum in validate_data_csum · 8401e87f
      Geliang Tang authored
      This patch reused __mptcp_make_csum() in validate_data_csum() instead of
      open-coding.
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8401e87f
    • Geliang Tang's avatar
      mptcp: change the parameter of __mptcp_make_csum · c312ee21
      Geliang Tang authored
      This patch changed the type of the last parameter of __mptcp_make_csum()
      from __sum16 to __wsum. And export this function in protocol.h.
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c312ee21
    • Paolo Abeni's avatar
      selftests: mptcp: more stable join tests-cases · 327b9a94
      Paolo Abeni authored
      MPTCP join self-tests are a bit fragile as they reply on
      delays instead of events to catch-up with the expected
      sockets states.
      
      Replace the delay with state checking where possible and
      reduce the number of sleeps in the most complex scenarios.
      
      This will both reduce the tests run-time and will improve
      stability.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      327b9a94
    • Vladimir Oltean's avatar
      net: dsa: felix: add port fast age support · 5cad43a5
      Vladimir Oltean authored
      Add support for flushing the MAC table on a given port in the ocelot
      switch library, and use this functionality in the felix DSA driver.
      
      This operation is needed when a port leaves a bridge to become
      standalone, and when the learning is disabled, and when the STP state
      changes to a state where no FDB entry should be present.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220107144229.244584-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5cad43a5
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix incorrect balancing with down LAG ports · a14e6b69
      Vladimir Oltean authored
      Assuming the test setup described here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
      (swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0)
      
      it can be seen that when swp1 goes down (on either board A or B), then
      traffic that should go through that port isn't forwarded anywhere.
      
      A dump of the PGID table shows the following:
      
      PGID_DST[0] = ports 0
      PGID_DST[1] = ports 1
      PGID_DST[2] = ports 2
      PGID_DST[3] = ports 3
      PGID_DST[4] = ports 4
      PGID_DST[5] = ports 5
      PGID_DST[6] = no ports
      PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5
      PGID_SRC[0] = ports 1, 2
      PGID_SRC[1] = ports 0
      PGID_SRC[2] = ports 0
      PGID_SRC[3] = no ports
      PGID_SRC[4] = no ports
      PGID_SRC[5] = no ports
      PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
      
      Whereas a "good" PGID configuration for that setup should have looked
      like this:
      
      PGID_DST[0] = ports 0
      PGID_DST[1] = ports 1, 2
      PGID_DST[2] = ports 1, 2
      PGID_DST[3] = ports 3
      PGID_DST[4] = ports 4
      PGID_DST[5] = ports 5
      PGID_DST[6] = no ports
      PGID_AGGR[0] = ports 0, 2, 3, 4, 5
      PGID_AGGR[1] = ports 0, 2, 3, 4, 5
      PGID_AGGR[2] = ports 0, 2, 3, 4, 5
      PGID_AGGR[3] = ports 0, 2, 3, 4, 5
      PGID_AGGR[4] = ports 0, 2, 3, 4, 5
      PGID_AGGR[5] = ports 0, 2, 3, 4, 5
      PGID_AGGR[6] = ports 0, 2, 3, 4, 5
      PGID_AGGR[7] = ports 0, 2, 3, 4, 5
      PGID_AGGR[8] = ports 0, 2, 3, 4, 5
      PGID_AGGR[9] = ports 0, 2, 3, 4, 5
      PGID_AGGR[10] = ports 0, 2, 3, 4, 5
      PGID_AGGR[11] = ports 0, 2, 3, 4, 5
      PGID_AGGR[12] = ports 0, 2, 3, 4, 5
      PGID_AGGR[13] = ports 0, 2, 3, 4, 5
      PGID_AGGR[14] = ports 0, 2, 3, 4, 5
      PGID_AGGR[15] = ports 0, 2, 3, 4, 5
      PGID_SRC[0] = ports 1, 2
      PGID_SRC[1] = ports 0
      PGID_SRC[2] = ports 0
      PGID_SRC[3] = no ports
      PGID_SRC[4] = no ports
      PGID_SRC[5] = no ports
      PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
      
      In other words, in the "bad" configuration, the attempt is to remove the
      inactive swp1 from the destination ports via PGID_DST. But when a MAC
      table entry is learned, it is learned towards PGID_DST 1, because that
      is the logical port id of the LAG itself (it is equal to the lowest
      numbered member port). So when swp1 becomes inactive, if we set
      PGID_DST[1] to contain just swp1 and not swp2, the packet will not have
      any chance to reach the destination via swp2.
      
      The "correct" way to remove swp1 as a destination is via PGID_AGGR
      (remove swp1 from the aggregation port groups for all aggregation
      codes). This means that PGID_DST[1] and PGID_DST[2] must still contain
      both swp1 and swp2. This makes the MAC table still treat packets
      destined towards the single-port LAG as "multicast", and the inactive
      ports are removed via the aggregation code tables.
      
      The change presented here is a design one: the ocelot_get_bond_mask()
      function used to take an "only_active_ports" argument. We don't need
      that. The only call site that specifies only_active_ports=true,
      ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because
      it must program that into PGID_DST. Additionally, it must also clear the
      inactive ports from the bond mask here, which it can't do if bond_mask
      just contains the active ports:
      
      	ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
      	ac &= ~bond_mask;  <---- here
      	/* Don't do division by zero if there was no active
      	 * port. Just make all aggregation codes zero.
      	 */
      	if (num_active_ports)
      		ac |= BIT(aggr_idx[i % num_active_ports]);
      	ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
      
      So it becomes the responsibility of ocelot_set_aggr_pgids() to take
      ocelot_port->lag_tx_active into consideration when populating the
      aggr_idx array.
      
      Fixes: 23ca3b72 ("net: mscc: ocelot: rebalance LAGs on link up/down events")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a14e6b69
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · a5e7d9bb
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2022-01-07
      
      This series contains updates to i40e and iavf drivers.
      
      Karen limits per VF MAC filters so that one VF does not consume all
      filters for i40e.
      
      Jedrzej reduces busy wait time for admin queue calls for i40e.
      
      Mateusz updates firmware versions to reflect new supported NVM images
      and renames an error to remove non-inclusive language for i40e.
      
      Yang Li fixes a set but not used warning for i40e.
      
      Jason Wang removes an unneeded variable for iavf.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
        iavf: remove an unneeded variable
        i40e: remove variables set but not used
        i40e: Remove non-inclusive language
        i40e: Update FW API version
        i40e: Minimize amount of busy-waiting during AQ send
        i40e: Add ensurance of MacVlan resources for every trusted VF
      ====================
      
      Link: https://lore.kernel.org/r/20220107175704.438387-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a5e7d9bb
    • Gal Pressman's avatar
      net/tls: Fix skb memory leak when running kTLS traffic · ffef737f
      Gal Pressman authored
      The cited Fixes commit introduced a memory leak when running kTLS
      traffic (with/without hardware offloads).
      I'm running nginx on the server side and wrk on the client side and get
      the following:
      
        unreferenced object 0xffff8881935e9b80 (size 224):
        comm "softirq", pid 0, jiffies 4294903611 (age 43.204s)
        hex dump (first 32 bytes):
          80 9b d0 36 81 88 ff ff 00 00 00 00 00 00 00 00  ...6............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000efe2a999>] build_skb+0x1f/0x170
          [<00000000ef521785>] mlx5e_skb_from_cqe_mpwrq_linear+0x2bc/0x610 [mlx5_core]
          [<00000000945d0ffe>] mlx5e_handle_rx_cqe_mpwrq+0x264/0x9e0 [mlx5_core]
          [<00000000cb675b06>] mlx5e_poll_rx_cq+0x3ad/0x17a0 [mlx5_core]
          [<0000000018aac6a9>] mlx5e_napi_poll+0x28c/0x1b60 [mlx5_core]
          [<000000001f3369d1>] __napi_poll+0x9f/0x560
          [<00000000cfa11f72>] net_rx_action+0x357/0xa60
          [<000000008653b8d7>] __do_softirq+0x282/0x94e
          [<00000000644923c6>] __irq_exit_rcu+0x11f/0x170
          [<00000000d4085f8f>] irq_exit_rcu+0xa/0x20
          [<00000000d412fef4>] common_interrupt+0x7d/0xa0
          [<00000000bfb0cebc>] asm_common_interrupt+0x1e/0x40
          [<00000000d80d0890>] default_idle+0x53/0x70
          [<00000000f2b9780e>] default_idle_call+0x8c/0xd0
          [<00000000c7659e15>] do_idle+0x394/0x450
      
      I'm not familiar with these areas of the code, but I've added this
      sk_defer_free_flush() to tls_sw_recvmsg() based on a hunch and it
      resolved the issue.
      
      Fixes: f35f8219 ("tcp: defer skb freeing after socket lock is released")
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220102081253.9123-1-gal@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ffef737f
  2. 07 Jan, 2022 28 commits