1. 22 Apr, 2023 7 commits
  2. 21 Apr, 2023 12 commits
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · f9bcdcec
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Set on IPS_CONFIRMED before change_status() otherwise EBUSY is
         bogusly hit. This bug was introduced in the 6.3 release cycle.
      
      2) Fix nfnetlink_queue conntrack support: Set/dump timeout
         accordingly for unconfirmed conntrack entries. Make sure this
         is done after IPS_CONFIRMED is set on. This is an old bug, it
         happens since the introduction of this feature.
      
      * tag 'nf-23-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: conntrack: fix wrong ct->timeout value
        netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()
      ====================
      
      Link: https://lore.kernel.org/r/20230421105700.325438-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f9bcdcec
    • Jeroen de Borst's avatar
      e375b503
    • Vlad Buslov's avatar
      Revert "net/mlx5e: Don't use termination table when redundant" · 081abcac
      Vlad Buslov authored
      This reverts commit 14624d72.
      
      The termination table usage is requires for DMFS steering mode as firmware
      doesn't support mixed table destinations list which causes following
      syndrome with hairpin rules:
      
      [81922.283225] mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 25977): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xaca205), err(-22)
      
      Fixes: 14624d72 ("net/mlx5e: Don't use termination table when redundant")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      081abcac
    • Aya Levin's avatar
      net/mlx5e: Nullify table pointer when failing to create · 1b540dec
      Aya Levin authored
      On failing to create promisc flow steering table, the pointer is
      returned with an error. Nullify it so unloading the driver won't try to
      destroy a non existing table.
      
      Failing to create promisc table may happen over BF devices when the ARM
      side is going through a firmware tear down. The host side start a
      reload flow. While the driver unloads, it tries to remove the promisc
      table. Remove WARN in this state as it is a valid error flow.
      
      Fixes: 1c46d740 ("net/mlx5e: Optimize promiscuous mode")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1b540dec
    • Moshe Shemesh's avatar
      net/mlx5: Use recovery timeout on sync reset flow · dfad9975
      Moshe Shemesh authored
      Use the same timeout for sync reset flow and health recovery flow, since
      the former involves driver's recovery from firmware reset, which is
      similar to health recovery. Otherwise, in some cases, such as a firmware
      upgrade on the DPU, the firmware pre-init bit may not be ready within
      current timeout and the driver will abort loading back after reset.
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Fixes: 37ca95e6 ("net/mlx5: Increase FW pre-init timeout for health recovery")
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      dfad9975
    • Moshe Shemesh's avatar
      Revert "net/mlx5: Remove "recovery" arg from mlx5_load_one() function" · 21608a2c
      Moshe Shemesh authored
      This reverts commit 5977ac39.
      
      Revert this patch as we need the "recovery" arg back in mlx5_load_one()
      function. This arg will be used in the next patch for using recovery
      timeout during sync reset flow.
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      21608a2c
    • Roi Dayan's avatar
      net/mlx5e: Fix error flow in representor failing to add vport rx rule · 0a6b069c
      Roi Dayan authored
      On representor init rx error flow the flow steering pointer is being
      released so mlx5e_attach_netdev() doesn't have a valid fs pointer
      in its error flow. Make sure the pointer is nullified when released
      and add a check in mlx5e_fs_cleanup() to verify fs is not null
      as representor cleanup callback would be called anyway.
      
      Fixes: af8bbf73 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0a6b069c
    • Chris Mi's avatar
      net/mlx5: Release tunnel device after tc update skb · 4fbef0f8
      Chris Mi authored
      The cited commit causes a regression. Tunnel device is not released
      after tc update skb if skb needs to be freed. The following error
      message will be printed:
      
        unregister_netdevice: waiting for vxlan1 to become free. Usage count = 11
      
      Fix it by releasing tunnel device if skb needs to be freed.
      
      Fixes: 93a1ab2c ("net/mlx5: Refactor tc miss handling to a single function")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4fbef0f8
    • Chris Mi's avatar
      net/mlx5: E-switch, Don't destroy indirect table in split rule · 4c818930
      Chris Mi authored
      Source port rewrite (forward to ovs internal port or statck device) isn't
      supported in the rule of split action. So there is no indirect table in
      split rule. The cited commit destroyes indirect table in split rule. The
      indirect table for other rules will be destroyed wrongly. It will cause
      traffic loss.
      
      Fix it by removing the destroy function in split rule. And also remove
      the destroy function in error flow.
      
      Fixes: 10742efc ("net/mlx5e: VF tunnel TX traffic offloading")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4c818930
    • Chris Mi's avatar
      net/mlx5: E-switch, Create per vport table based on devlink encap mode · fd745f4c
      Chris Mi authored
      Currently when creating per vport table, create flags are hardcoded.
      Devlink encap mode is set based on user input and HW capability.
      Create per vport table based on devlink encap mode.
      
      Fixes: c796bb7c ("net/mlx5: E-switch, Generalize per vport table API")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      fd745f4c
    • Vlad Buslov's avatar
      net/mlx5e: Release the label when replacing existing ct entry · 8ac04a28
      Vlad Buslov authored
      Cited commit doesn't release the label mapping when replacing existing ct
      entry which leads to following memleak report:
      
      unreferenced object 0xffff8881854cf280 (size 96):
        comm "kworker/u48:74", pid 23093, jiffies 4296664564 (age 175.944s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002722d368>] __kmalloc+0x4b/0x1c0
          [<00000000cc44e18f>] mapping_add+0x6e8/0xc90 [mlx5_core]
          [<000000003ad942a7>] mlx5_get_label_mapping+0x66/0xe0 [mlx5_core]
          [<00000000266308ac>] mlx5_tc_ct_entry_create_mod_hdr+0x1c4/0xf50 [mlx5_core]
          [<000000009a768b4f>] mlx5_tc_ct_entry_add_rule+0x16f/0xaf0 [mlx5_core]
          [<00000000a178f3e5>] mlx5_tc_ct_block_flow_offload_add+0x10cb/0x1f90 [mlx5_core]
          [<000000007b46c496>] mlx5_tc_ct_block_flow_offload+0x14a/0x630 [mlx5_core]
          [<00000000a9a18ac5>] nf_flow_offload_tuple+0x1a3/0x390 [nf_flow_table]
          [<00000000d0881951>] flow_offload_work_handler+0x257/0xd30 [nf_flow_table]
          [<000000009e4935a4>] process_one_work+0x7c2/0x13e0
          [<00000000f5cd36a7>] worker_thread+0x59d/0xec0
          [<00000000baed1daf>] kthread+0x28f/0x330
          [<0000000063d282a4>] ret_from_fork+0x1f/0x30
      
      Fix the issue by correctly releasing the label mapping.
      
      Fixes: 94ceffb4 ("net/mlx5e: Implement CT entry update")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8ac04a28
    • Vlad Buslov's avatar
      net/mlx5e: Don't clone flow post action attributes second time · e9fce818
      Vlad Buslov authored
      The code already clones post action attributes in
      mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
      mlx5e_tc_post_act_add() is a erroneous leftover from original
      implementation. Instead, assign handle->attribute to post_attr provided by
      the caller. Note that cloning the attribute second time is not just
      wasteful but also causes issues like second copy not being properly updated
      in neigh update code which leads to following use-after-free:
      
      Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_report+0xbb/0x1a0
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_stack+0x1e/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_set_track+0x21/0x30
      Feb 21 09:02:00 c-237-177-40-045 kernel:  __kasan_kmalloc+0x7a/0x90
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_stack+0x1e/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_set_track+0x21/0x30
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_free_info+0x2a/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ____kasan_slab_free+0x11a/0x1b0
      Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
      Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
      Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
      Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
      Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
      Feb 21 09:02:00 c-237-177-40-045 kernel:  <TASK>
      Feb 21 09:02:00 c-237-177-40-045 kernel:  dump_stack_lvl+0x57/0x7d
      Feb 21 09:02:00 c-237-177-40-045 kernel:  print_report+0x170/0x471
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_report+0xbb/0x1a0
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? __module_address.part.0+0x62/0x200
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? __raw_spin_lock_init+0x3b/0x110
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  add_rule_fg+0xe80/0x19c0 [mlx5_core]
      --
      Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_stack+0x1e/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_set_track+0x21/0x30
      Feb 21 09:02:00 c-237-177-40-045 kernel:  __kasan_kmalloc+0x7a/0x90
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  post_process_attr+0x305/0xa30 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
      --
      Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_stack+0x1e/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_set_track+0x21/0x30
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_free_info+0x2a/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ____kasan_slab_free+0x11a/0x1b0
      Feb 21 09:02:00 c-237-177-40-045 kernel:  __kmem_cache_free+0x1de/0x400
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  process_one_work+0x7c2/0x1310
      Feb 21 09:02:00 c-237-177-40-045 kernel:  worker_thread+0x59d/0xec0
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kthread+0x28f/0x330
      
      Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e9fce818
  3. 20 Apr, 2023 14 commits
    • Joe Damato's avatar
      ixgbe: Enable setting RSS table to default values · e85d3d55
      Joe Damato authored
      ethtool uses `ETHTOOL_GRXRINGS` to compute how many queues are supported
      by RSS. The driver should return the smaller of either:
        - The maximum number of RSS queues the device supports, OR
        - The number of RX queues configured
      
      Prior to this change, running `ethtool -X $iface default` fails if the
      number of queues configured is larger than the number supported by RSS,
      even though changing the queue count correctly resets the flowhash to
      use all supported queues.
      
      Other drivers (for example, i40e) will succeed but the flow hash will
      reset to support the maximum number of queues supported by RSS, even if
      that amount is smaller than the configured amount.
      
      Prior to this change:
      
      $ sudo ethtool -L eth1 combined 20
      $ sudo ethtool -x eth1
      RX flow hash indirection table for eth1 with 20 RX ring(s):
          0:      0     1     2     3     4     5     6     7
          8:      8     9    10    11    12    13    14    15
         16:      0     1     2     3     4     5     6     7
         24:      8     9    10    11    12    13    14    15
         32:      0     1     2     3     4     5     6     7
      ...
      
      You can see that the flowhash was correctly set to use the maximum
      number of queues supported by the driver (16).
      
      However, asking the NIC to reset to "default" fails:
      
      $ sudo ethtool -X eth1 default
      Cannot set RX flow hash configuration: Invalid argument
      
      After this change, the flowhash can be reset to default which will use
      all of the available RSS queues (16) or the configured queue count,
      whichever is smaller.
      
      Starting with eth1 which has 10 queues and a flowhash distributing to
      all 10 queues:
      
      $ sudo ethtool -x eth1
      RX flow hash indirection table for eth1 with 10 RX ring(s):
          0:      0     1     2     3     4     5     6     7
          8:      8     9     0     1     2     3     4     5
         16:      6     7     8     9     0     1     2     3
      ...
      
      Increasing the queue count to 48 resets the flowhash to distribute to 16
      queues, as it did before this patch:
      
      $ sudo ethtool -L eth1 combined 48
      $ sudo ethtool -x eth1
      RX flow hash indirection table for eth1 with 16 RX ring(s):
          0:      0     1     2     3     4     5     6     7
          8:      8     9    10    11    12    13    14    15
         16:      0     1     2     3     4     5     6     7
      ...
      
      Due to the other bugfix in this series, the flowhash can be set to use
      queues 0-5:
      
      $ sudo ethtool -X eth1 equal 5
      $ sudo ethtool -x eth1
      RX flow hash indirection table for eth1 with 16 RX ring(s):
          0:      0     1     2     3     4     0     1     2
          8:      3     4     0     1     2     3     4     0
         16:      1     2     3     4     0     1     2     3
      ...
      
      Due to this bugfix, the flowhash can be reset to default and use 16
      queues:
      
      $ sudo ethtool -X eth1 default
      $ sudo ethtool -x eth1
      RX flow hash indirection table for eth1 with 16 RX ring(s):
          0:      0     1     2     3     4     5     6     7
          8:      8     9    10    11    12    13    14    15
         16:      0     1     2     3     4     5     6     7
      ...
      
      Fixes: 91cd94bf ("ixgbe: add basic support for setting and getting nfc controls")
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Reviewed-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e85d3d55
    • Joe Damato's avatar
      ixgbe: Allow flow hash to be set via ethtool · 4f3ed129
      Joe Damato authored
      ixgbe currently returns `EINVAL` whenever the flowhash it set by ethtool
      because the ethtool code in the kernel passes a non-zero value for hfunc
      that ixgbe should allow.
      
      When ethtool is called with `ETHTOOL_SRXFHINDIR`,
      `ethtool_set_rxfh_indir` will call ixgbe's set_rxfh function
      with `ETH_RSS_HASH_NO_CHANGE`. This value should be accepted.
      
      When ethtool is called with `ETHTOOL_SRSSH`, `ethtool_set_rxfh` will
      call ixgbe's set_rxfh function with `rxfh.hfunc`, which appears to be
      hardcoded in ixgbe to always be `ETH_RSS_HASH_TOP`. This value should
      also be accepted.
      
      Before this patch:
      
      $ sudo ethtool -L eth1 combined 10
      $ sudo ethtool -X eth1 default
      Cannot set RX flow hash configuration: Invalid argument
      
      After this patch:
      
      $ sudo ethtool -L eth1 combined 10
      $ sudo ethtool -X eth1 default
      $ sudo ethtool -x eth1
      RX flow hash indirection table for eth1 with 10 RX ring(s):
          0:      0     1     2     3     4     5     6     7
          8:      8     9     0     1     2     3     4     5
         16:      6     7     8     9     0     1     2     3
         24:      4     5     6     7     8     9     0     1
         ...
      
      Fixes: 1c7cf078 ("ixgbe: support for ethtool set_rxfh")
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Reviewed-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      4f3ed129
    • Toke Høiland-Jørgensen's avatar
      wifi: ath9k: Don't mark channelmap stack variable read-only in ath9k_mci_update_wlan_channels() · 0f2a4af2
      Toke Høiland-Jørgensen authored
      This partially reverts commit e161d4b6.
      
      Turns out the channelmap variable is not actually read-only, it's modified
      through the MCI_GPM_CLR_CHANNEL_BIT() macro further down in the function,
      so making it read-only causes page faults when that code is hit.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217183
      Link: https://lore.kernel.org/r/20230413214118.153781-1-toke@toke.dk
      Fixes: e161d4b6 ("wifi: ath9k: Make arrays prof_prio and channelmap static const")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f2a4af2
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.3' of https://github.com/Rust-for-Linux/linux · 6a66fdd2
      Linus Torvalds authored
      Pull Rust fixes from Miguel Ojeda:
       "Most of these are straightforward.
      
        The last one is more complex, but it only touches Rust + GCC builds
        which are for the moment best-effort.
      
         - Code: Missing 'extern "C"' fix.
      
         - Scripts: 'is_rust_module.sh' and 'generate_rust_analyzer.py' fixes.
      
         - A couple trivial fixes
      
         - Build: Rust + GCC build fix and 'grep' warning fix"
      
      * tag 'rust-fixes-6.3' of https://github.com/Rust-for-Linux/linux:
        rust: allow to use INIT_STACK_ALL_ZERO
        rust: fix regexp in scripts/is_rust_module.sh
        rust: build: Fix grep warning
        scripts: generate_rust_analyzer: Handle sub-modules with no Makefile
        rust: kernel: Mark rust_fmt_argument as extern "C"
        rust: sort uml documentation arch support table
        rust: str: fix requierments->requirements typo
      6a66fdd2
    • Linus Torvalds's avatar
      Merge tag 'net-6.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 23309d60
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from netfilter and bpf.
      
        There are a few fixes for new code bugs, including the Mellanox one
        noted in the last networking pull. No known regressions outstanding.
      
        Current release - regressions:
      
         - sched: clear actions pointer in miss cookie init fail
      
         - mptcp: fix accept vs worker race
      
         - bpf: fix bpf_arch_text_poke() with new_addr == NULL on s390
      
         - eth: bnxt_en: fix a possible NULL pointer dereference in unload
           path
      
         - eth: veth: take into account peer device for
           NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
      
        Current release - new code bugs:
      
         - eth: revert "net/mlx5: Enable management PF initialization"
      
        Previous releases - regressions:
      
         - netfilter: fix recent physdev match breakage
      
         - bpf: fix incorrect verifier pruning due to missing register
           precision taints
      
         - eth: virtio_net: fix overflow inside xdp_linearize_page()
      
         - eth: cxgb4: fix use after free bugs caused by circular dependency
           problem
      
         - eth: mlxsw: pci: fix possible crash during initialization
      
        Previous releases - always broken:
      
         - sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg
      
         - netfilter: validate catch-all set elements
      
         - bridge: don't notify FDB entries with "master dynamic"
      
         - eth: bonding: fix memory leak when changing bond type to ethernet
      
         - eth: i40e: fix accessing vsi->active_filters without holding lock
      
        Misc:
      
         - Mat is back as MPTCP co-maintainer"
      
      * tag 'net-6.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
        net: bridge: switchdev: don't notify FDB entries with "master dynamic"
        Revert "net/mlx5: Enable management PF initialization"
        MAINTAINERS: Resume MPTCP co-maintainer role
        mailmap: add entries for Mat Martineau
        e1000e: Disable TSO on i219-LM card to increase speed
        bnxt_en: fix free-runnig PHC mode
        net: dsa: microchip: ksz8795: Correctly handle huge frame configuration
        bpf: Fix incorrect verifier pruning due to missing register precision taints
        hamradio: drop ISA_DMA_API dependency
        mlxsw: pci: Fix possible crash during initialization
        mptcp: fix accept vs worker race
        mptcp: stops worker on unaccepted sockets at listener close
        net: rpl: fix rpl header size calculation
        net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()
        bonding: Fix memory leak when changing bond type to Ethernet
        veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
        mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()
        bnxt_en: Fix a possible NULL pointer dereference in unload path
        bnxt_en: Do not initialize PTP on older P3/P4 chips
        netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements
        ...
      23309d60
    • Vladimir Oltean's avatar
      net: bridge: switchdev: don't notify FDB entries with "master dynamic" · 927cdea5
      Vladimir Oltean authored
      There is a structural problem in switchdev, where the flag bits in
      struct switchdev_notifier_fdb_info (added_by_user, is_local etc) only
      represent a simplified / denatured view of what's in struct
      net_bridge_fdb_entry :: flags (BR_FDB_ADDED_BY_USER, BR_FDB_LOCAL etc).
      Each time we want to pass more information about struct
      net_bridge_fdb_entry :: flags to struct switchdev_notifier_fdb_info
      (here, BR_FDB_STATIC), we find that FDB entries were already notified to
      switchdev with no regard to this flag, and thus, switchdev drivers had
      no indication whether the notified entries were static or not.
      
      For example, this command:
      
      ip link add br0 type bridge && ip link set swp0 master br0
      bridge fdb add dev swp0 00:01:02:03:04:05 master dynamic
      
      has never worked as intended with switchdev. It causes a struct
      net_bridge_fdb_entry to be passed to br_switchdev_fdb_notify() which has
      a single flag set: BR_FDB_ADDED_BY_USER.
      
      This is further passed to the switchdev notifier chain, where interested
      drivers have no choice but to assume this is a static (does not age) and
      sticky (does not migrate) FDB entry. So currently, all drivers offload
      it to hardware as such, as can be seen below ("offload" is set).
      
      bridge fdb get 00:01:02:03:04:05 dev swp0 master
      00:01:02:03:04:05 dev swp0 offload master br0
      
      The software FDB entry expires $ageing_time centiseconds after the
      kernel last sees a packet with this MAC SA, and the bridge notifies its
      deletion as well, so it eventually disappears from hardware too.
      
      This is a problem, because it is actually desirable to start offloading
      "master dynamic" FDB entries correctly - they should expire $ageing_time
      centiseconds after the *hardware* port last sees a packet with this
      MAC SA - and this is how the current incorrect behavior was discovered.
      With an offloaded data plane, it can be expected that software only sees
      exception path packets, so an otherwise active dynamic FDB entry would
      be aged out by software sooner than it should.
      
      With the change in place, these FDB entries are no longer offloaded:
      
      bridge fdb get 00:01:02:03:04:05 dev swp0 master
      00:01:02:03:04:05 dev swp0 master br0
      
      and this also constitutes a better way (assuming a backport to stable
      kernels) for user space to determine whether the kernel has the
      capability of doing something sane with these or not.
      
      As opposed to "master dynamic" FDB entries, on the current behavior of
      which no one currently depends on (which can be deduced from the lack of
      kselftests), Ido Schimmel explains that entries with the "extern_learn"
      flag (BR_FDB_ADDED_BY_EXT_LEARN) should still be notified to switchdev,
      since the spectrum driver listens to them (and this is kind of okay,
      because although they are treated identically to "static", they are
      expected to not age, and to roam).
      
      Fixes: 6b26b51b ("net: bridge: Add support for notifying devices about FDB add/del")
      Link: https://lore.kernel.org/netdev/20230327115206.jk5q5l753aoelwus@skbuf/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20230418155902.898627-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      927cdea5
    • Jakub Kicinski's avatar
      Revert "net/mlx5: Enable management PF initialization" · f52cc627
      Jakub Kicinski authored
      This reverts commit fe998a3c.
      
      Paul reports that it causes a regression with IB on CX4
      and FW 12.18.1000. In addition I think that the concept
      of "management PF" is not fully accepted and requires
      a discussion.
      
      Fixes: fe998a3c ("net/mlx5: Enable management PF initialization")
      Reported-by: default avatarPaul Moore <paul@paul-moore.com>
      Link: https://lore.kernel.org/all/CAHC9VhQ7A4+msL38WpbOMYjAqLp0EtOjeLh4Dc6SQtD6OUvCQg@mail.gmail.com/
      Link: https://lore.kernel.org/r/20230413222547.56901-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f52cc627
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 9d947690
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      bpf 2023-04-19
      
      We've added 3 non-merge commits during the last 6 day(s) which contain
      a total of 3 files changed, 34 insertions(+), 9 deletions(-).
      
      The main changes are:
      
      1) Fix a crash on s390's bpf_arch_text_poke() under a NULL new_addr,
         from Ilya Leoshkevich.
      
      2) Fix a bug in BPF verifier's precision tracker, from Daniel Borkmann
         and Andrii Nakryiko.
      
      3) Fix a regression in veth's xdp_features which led to a broken BPF CI
         selftest, from Lorenzo Bianconi.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Fix incorrect verifier pruning due to missing register precision taints
        veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
        s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL
      ====================
      
      Link: https://lore.kernel.org/r/20230419195847.27060-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9d947690
    • Mat Martineau's avatar
      MAINTAINERS: Resume MPTCP co-maintainer role · 52b37ae8
      Mat Martineau authored
      I'm returning to the MPTCP maintainer role I held for most of the
      subsytem's history. This time I'm using my kernel.org email address.
      Acked-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Link: https://lore.kernel.org/mptcp/af85e467-8d0a-4eba-b5f8-e2f2c5d24984@tessares.net/Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20230418231318.115331-1-martineau@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52b37ae8
    • Matthieu Baerts's avatar
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 7b97174d
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-04-17 (i40e)
      
      This series contains updates to i40e only.
      
      Alex moves setting of active filters to occur under lock and checks/takes
      error path in rebuild if re-initializing the misc interrupt vector
      failed.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        i40e: fix i40e_setup_misc_vector() error handling
        i40e: fix accessing vsi->active_filters without holding lock
      ====================
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230417205245.1030733-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b97174d
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of... · cb085634
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "22 hotfixes.
      
        19 are cc:stable and the remainder address issues which were
        introduced during this merge cycle, or aren't considered suitable for
        -stable backporting.
      
        19 are for MM and the remainder are for other subsystems"
      
      * tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
        nilfs2: initialize unused bytes in segment summary blocks
        mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages
        mm/mmap: regression fix for unmapped_area{_topdown}
        maple_tree: fix mas_empty_area() search
        maple_tree: make maple state reusable after mas_empty_area_rev()
        mm: kmsan: handle alloc failures in kmsan_ioremap_page_range()
        mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()
        tools/Makefile: do missed s/vm/mm/
        mm: fix memory leak on mm_init error handling
        mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
        kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
        Revert "userfaultfd: don't fail on unrecognized features"
        writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
        maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
        tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
        mailmap: update jtoppins' entry to reference correct email
        mm/mempolicy: fix use-after-free of VMA iterator
        mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO
        mm/mprotect: fix do_mprotect_pkey() return on error
        mm/khugepaged: check again on anon uffd-wp during isolation
        ...
      cb085634
    • Sebastian Basierski's avatar
      e1000e: Disable TSO on i219-LM card to increase speed · 67d47b95
      Sebastian Basierski authored
      While using i219-LM card currently it was only possible to achieve
      about 60% of maximum speed due to regression introduced in Linux 5.8.
      This was caused by TSO not being disabled by default despite commit
      f2980103 ("e1000e: Disable TSO for buffer overrun workaround").
      Fix that by disabling TSO during driver probe.
      
      Fixes: f2980103 ("e1000e: Disable TSO for buffer overrun workaround")
      Signed-off-by: default avatarSebastian Basierski <sebastianx.basierski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230417205345.1030801-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      67d47b95
    • Vadim Fedorenko's avatar
      bnxt_en: fix free-runnig PHC mode · 8c154d27
      Vadim Fedorenko authored
      The patch in fixes changed the way real-time mode is chosen for PHC on
      the NIC. Apparently there is one more use case of the check outside of
      ptp part of the driver which was not converted to the new macro and is
      making a lot of noise in free-running mode.
      
      Fixes: 131db499 ("bnxt_en: reset PHC frequency in free-running mode")
      Signed-off-by: default avatarVadim Fedorenko <vadfed@meta.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Link: https://lore.kernel.org/r/20230418202511.1544735-1-vadfed@meta.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c154d27
  4. 19 Apr, 2023 7 commits
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 23990b1a
      Linus Torvalds authored
      Pull spi fix from Mark Brown:
       "A small fix in the error handling for the rockchip driver, ensuring we
        don't leak clock enables if we fail to request the interrupt for the
        device"
      
      * tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe()
      23990b1a
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.3-rc7' of... · 72b4fb4c
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "A few driver specific fixes, one build coverage issue and a couple of
        'someone typed in the wrong number' style errors in describing devices
        to the subsystem"
      
      * tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: sm5703: Fix missing n_voltages for fixed regulators
        regulator: fan53555: Fix wrong TCS_SLEW_MASK
        regulator: fan53555: Explicitly include bits header
      72b4fb4c
    • Christophe JAILLET's avatar
      net: dsa: microchip: ksz8795: Correctly handle huge frame configuration · 3d2f8f1f
      Christophe JAILLET authored
      Because of the logic in place, SW_HUGE_PACKET can never be set.
      (If the first condition is true, then the 2nd one is also true, but is not
      executed)
      
      Change the logic and update each bit individually.
      
      Fixes: 29d1e85f ("net: dsa: microchip: ksz8: add MTU configuration support")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/43107d9e8b5b8b05f0cbd4e1f47a2bb88c8747b2.1681755535.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d2f8f1f
    • Andrea Righi's avatar
      rust: allow to use INIT_STACK_ALL_ZERO · d966c3ca
      Andrea Righi authored
      With CONFIG_INIT_STACK_ALL_ZERO enabled, bindgen passes
      -ftrivial-auto-var-init=zero to clang, that triggers the following
      error:
      
       error: '-ftrivial-auto-var-init=zero' hasn't been enabled; enable it at your own peril for benchmarking purpose only with '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang'
      
      However, this additional option that is currently required by clang is
      deprecated since clang-16 and going to be removed in the future,
      likely with clang-18.
      
      So, make sure bindgen is using this extra option if the major version of
      the libclang used by bindgen is < 16.
      
      In this way we can enable CONFIG_INIT_STACK_ALL_ZERO with CONFIG_RUST
      without triggering any build error.
      
      Link: https://github.com/llvm/llvm-project/issues/44842
      Link: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.0-rc2/clang/docs/ReleaseNotes.rst#deprecated-compiler-flagsSigned-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      [Changed to < 16, added link and reworded]
      Signed-off-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      d966c3ca
    • Andrea Righi's avatar
      rust: fix regexp in scripts/is_rust_module.sh · ccc45054
      Andrea Righi authored
      nm can use "R" or "r" to show read-only data sections, but
      scripts/is_rust_module.sh can only recognize "r", so with some versions
      of binutils it can fail to detect if a module is a Rust module or not.
      
      Right now we're using this script only to determine if we need to skip
      BTF generation (that is disabled globally if CONFIG_RUST is enabled),
      but it's still nice to fix this script to do the proper job.
      
      Moreover, with this patch applied I can also relax the constraint of
      "RUST depends on !DEBUG_INFO_BTF" and build a kernel with Rust and BTF
      enabled at the same time (of course BTF generation is still skipped for
      Rust modules).
      
      [ Miguel: The actual reason is likely to be a change on the Rust
        compiler between 1.61.0 and 1.62.0:
      
          echo '#[used] static S: () = ();' |
              rustup run 1.61.0 rustc --emit=obj --crate-type=lib - &&
              nm rust_out.o
      
          echo '#[used] static S: () = ();' |
              rustup run 1.62.0 rustc --emit=obj --crate-type=lib - &&
              nm rust_out.o
      
        Gives:
      
          0000000000000000 r _ZN8rust_out1S17h48027ce0da975467E
          0000000000000000 R _ZN8rust_out1S17h58e1f3d9c0e97cefE
      
        See https://godbolt.org/z/KE6jneoo4. ]
      Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Reviewed-by: default avatarVincenzo Palazzo <vincenzopalazzodev@gmail.com>
      Reviewed-by: default avatarEric Curtin <ecurtin@redhat.com>
      Reviewed-by: default avatarMartin Rodriguez Reboredo <yakoyoku@gmail.com>
      Signed-off-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      ccc45054
    • Daniel Borkmann's avatar
      bpf: Fix incorrect verifier pruning due to missing register precision taints · 71b547f5
      Daniel Borkmann authored
      Juan Jose et al reported an issue found via fuzzing where the verifier's
      pruning logic prematurely marks a program path as safe.
      
      Consider the following program:
      
         0: (b7) r6 = 1024
         1: (b7) r7 = 0
         2: (b7) r8 = 0
         3: (b7) r9 = -2147483648
         4: (97) r6 %= 1025
         5: (05) goto pc+0
         6: (bd) if r6 <= r9 goto pc+2
         7: (97) r6 %= 1
         8: (b7) r9 = 0
         9: (bd) if r6 <= r9 goto pc+1
        10: (b7) r6 = 0
        11: (b7) r0 = 0
        12: (63) *(u32 *)(r10 -4) = r0
        13: (18) r4 = 0xffff888103693400 // map_ptr(ks=4,vs=48)
        15: (bf) r1 = r4
        16: (bf) r2 = r10
        17: (07) r2 += -4
        18: (85) call bpf_map_lookup_elem#1
        19: (55) if r0 != 0x0 goto pc+1
        20: (95) exit
        21: (77) r6 >>= 10
        22: (27) r6 *= 8192
        23: (bf) r1 = r0
        24: (0f) r0 += r6
        25: (79) r3 = *(u64 *)(r0 +0)
        26: (7b) *(u64 *)(r1 +0) = r3
        27: (95) exit
      
      The verifier treats this as safe, leading to oob read/write access due
      to an incorrect verifier conclusion:
      
        func#0 @0
        0: R1=ctx(off=0,imm=0) R10=fp0
        0: (b7) r6 = 1024                     ; R6_w=1024
        1: (b7) r7 = 0                        ; R7_w=0
        2: (b7) r8 = 0                        ; R8_w=0
        3: (b7) r9 = -2147483648              ; R9_w=-2147483648
        4: (97) r6 %= 1025                    ; R6_w=scalar()
        5: (05) goto pc+0
        6: (bd) if r6 <= r9 goto pc+2         ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff00000000; 0xffffffff)) R9_w=-2147483648
        7: (97) r6 %= 1                       ; R6_w=scalar()
        8: (b7) r9 = 0                        ; R9=0
        9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
        10: (b7) r6 = 0                       ; R6_w=0
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 9
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff8ad3886c2a00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
        19: (55) if r0 != 0x0 goto pc+1       ; R0=0
        20: (95) exit
      
        from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
        21: (77) r6 >>= 10                    ; R6_w=0
        22: (27) r6 *= 8192                   ; R6_w=0
        23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
        24: (0f) r0 += r6
        last_idx 24 first_idx 19
        regs=40 stack=0 before 23: (bf) r1 = r0
        regs=40 stack=0 before 22: (27) r6 *= 8192
        regs=40 stack=0 before 21: (77) r6 >>= 10
        regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
        parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
        last_idx 18 first_idx 9
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        regs=40 stack=0 before 10: (b7) r6 = 0
        25: (79) r3 = *(u64 *)(r0 +0)         ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
        26: (7b) *(u64 *)(r1 +0) = r3         ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
        27: (95) exit
      
        from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 11
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff8ad3886c2a00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1
        frame 0: propagating r6
        last_idx 19 first_idx 11
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
        last_idx 9 first_idx 9
        regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
        parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=0 R10=fp0
        last_idx 8 first_idx 0
        regs=40 stack=0 before 8: (b7) r9 = 0
        regs=40 stack=0 before 7: (97) r6 %= 1
        regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=40 stack=0 before 5: (05) goto pc+0
        regs=40 stack=0 before 4: (97) r6 %= 1025
        regs=40 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        19: safe
        frame 0: propagating r6
        last_idx 9 first_idx 0
        regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=40 stack=0 before 5: (05) goto pc+0
        regs=40 stack=0 before 4: (97) r6 %= 1025
        regs=40 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
      
        from 6 to 9: safe
        verification time 110 usec
        stack depth 4
        processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2
      
      The verifier considers this program as safe by mistakenly pruning unsafe
      code paths. In the above func#0, code lines 0-10 are of interest. In line
      0-3 registers r6 to r9 are initialized with known scalar values. In line 4
      the register r6 is reset to an unknown scalar given the verifier does not
      track modulo operations. Due to this, the verifier can also not determine
      precisely which branches in line 6 and 9 are taken, therefore it needs to
      explore them both.
      
      As can be seen, the verifier starts with exploring the false/fall-through
      paths first. The 'from 19 to 21' path has both r6=0 and r9=0 and the pointer
      arithmetic on r0 += r6 is therefore considered safe. Given the arithmetic,
      r6 is correctly marked for precision tracking where backtracking kicks in
      where it walks back the current path all the way where r6 was set to 0 in
      the fall-through branch.
      
      Next, the pruning logics pops the path 'from 9 to 11' from the stack. Also
      here, the state of the registers is the same, that is, r6=0 and r9=0, so
      that at line 19 the path can be pruned as it is considered safe. It is
      interesting to note that the conditional in line 9 turned r6 into a more
      precise state, that is, in the fall-through path at the beginning of line
      10, it is R6=scalar(umin=1), and in the branch-taken path (which is analyzed
      here) at the beginning of line 11, r6 turned into a known const r6=0 as
      r9=0 prior to that and therefore (unsigned) r6 <= 0 concludes that r6 must
      be 0 (**):
      
        [...]                                 ; R6_w=scalar()
        9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
        [...]
      
        from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
        [...]
      
      The next path is 'from 6 to 9'. The verifier considers the old and current
      state equivalent, and therefore prunes the search incorrectly. Looking into
      the two states which are being compared by the pruning logic at line 9, the
      old state consists of R6_rwD=Pscalar() R9_rwD=0 R10=fp0 and the new state
      consists of R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968)
      R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0. While r6 had the reg->precise flag
      correctly set in the old state, r9 did not. Both r6'es are considered as
      equivalent given the old one is a superset of the current, more precise one,
      however, r9's actual values (0 vs 0x80000000) mismatch. Given the old r9
      did not have reg->precise flag set, the verifier does not consider the
      register as contributing to the precision state of r6, and therefore it
      considered both r9 states as equivalent. However, for this specific pruned
      path (which is also the actual path taken at runtime), register r6 will be
      0x400 and r9 0x80000000 when reaching line 21, thus oob-accessing the map.
      
      The purpose of precision tracking is to initially mark registers (including
      spilled ones) as imprecise to help verifier's pruning logic finding equivalent
      states it can then prune if they don't contribute to the program's safety
      aspects. For example, if registers are used for pointer arithmetic or to pass
      constant length to a helper, then the verifier sets reg->precise flag and
      backtracks the BPF program instruction sequence and chain of verifier states
      to ensure that the given register or stack slot including their dependencies
      are marked as precisely tracked scalar. This also includes any other registers
      and slots that contribute to a tracked state of given registers/stack slot.
      This backtracking relies on recorded jmp_history and is able to traverse
      entire chain of parent states. This process ends only when all the necessary
      registers/slots and their transitive dependencies are marked as precise.
      
      The backtrack_insn() is called from the current instruction up to the first
      instruction, and its purpose is to compute a bitmask of registers and stack
      slots that need precision tracking in the parent's verifier state. For example,
      if a current instruction is r6 = r7, then r6 needs precision after this
      instruction and r7 needs precision before this instruction, that is, in the
      parent state. Hence for the latter r7 is marked and r6 unmarked.
      
      For the class of jmp/jmp32 instructions, backtrack_insn() today only looks
      at call and exit instructions and for all other conditionals the masks
      remain as-is. However, in the given situation register r6 has a dependency
      on r9 (as described above in **), so also that one needs to be marked for
      precision tracking. In other words, if an imprecise register influences a
      precise one, then the imprecise register should also be marked precise.
      Meaning, in the parent state both dest and src register need to be tracked
      for precision and therefore the marking must be more conservative by setting
      reg->precise flag for both. The precision propagation needs to cover both
      for the conditional: if the src reg was marked but not the dst reg and vice
      versa.
      
      After the fix the program is correctly rejected:
      
        func#0 @0
        0: R1=ctx(off=0,imm=0) R10=fp0
        0: (b7) r6 = 1024                     ; R6_w=1024
        1: (b7) r7 = 0                        ; R7_w=0
        2: (b7) r8 = 0                        ; R8_w=0
        3: (b7) r9 = -2147483648              ; R9_w=-2147483648
        4: (97) r6 %= 1025                    ; R6_w=scalar()
        5: (05) goto pc+0
        6: (bd) if r6 <= r9 goto pc+2         ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff80000000; 0x7fffffff),u32_min=-2147483648) R9_w=-2147483648
        7: (97) r6 %= 1                       ; R6_w=scalar()
        8: (b7) r9 = 0                        ; R9=0
        9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
        10: (b7) r6 = 0                       ; R6_w=0
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 9
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
        19: (55) if r0 != 0x0 goto pc+1       ; R0=0
        20: (95) exit
      
        from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
        21: (77) r6 >>= 10                    ; R6_w=0
        22: (27) r6 *= 8192                   ; R6_w=0
        23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
        24: (0f) r0 += r6
        last_idx 24 first_idx 19
        regs=40 stack=0 before 23: (bf) r1 = r0
        regs=40 stack=0 before 22: (27) r6 *= 8192
        regs=40 stack=0 before 21: (77) r6 >>= 10
        regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
        parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
        last_idx 18 first_idx 9
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        regs=40 stack=0 before 10: (b7) r6 = 0
        25: (79) r3 = *(u64 *)(r0 +0)         ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
        26: (7b) *(u64 *)(r1 +0) = r3         ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
        27: (95) exit
      
        from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 11
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1
        frame 0: propagating r6
        last_idx 19 first_idx 11
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
        last_idx 9 first_idx 9
        regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
        parent didn't have regs=240 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=P0 R10=fp0
        last_idx 8 first_idx 0
        regs=240 stack=0 before 8: (b7) r9 = 0
        regs=40 stack=0 before 7: (97) r6 %= 1
        regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=240 stack=0 before 5: (05) goto pc+0
        regs=240 stack=0 before 4: (97) r6 %= 1025
        regs=240 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        19: safe
      
        from 6 to 9: R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
        9: (bd) if r6 <= r9 goto pc+1
        last_idx 9 first_idx 0
        regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=240 stack=0 before 5: (05) goto pc+0
        regs=240 stack=0 before 4: (97) r6 %= 1025
        regs=240 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        last_idx 9 first_idx 0
        regs=200 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=240 stack=0 before 5: (05) goto pc+0
        regs=240 stack=0 before 4: (97) r6 %= 1025
        regs=240 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        11: R6=scalar(umax=18446744071562067968) R9=-2147483648
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 11
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=3,off=0,ks=4,vs=48,imm=0)
        19: (55) if r0 != 0x0 goto pc+1       ; R0_w=0
        20: (95) exit
      
        from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=scalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
        21: (77) r6 >>= 10                    ; R6_w=scalar(umax=18014398507384832,var_off=(0x0; 0x3fffffffffffff))
        22: (27) r6 *= 8192                   ; R6_w=scalar(smax=9223372036854767616,umax=18446744073709543424,var_off=(0x0; 0xffffffffffffe000),s32_max=2147475456,u32_max=-8192)
        23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
        24: (0f) r0 += r6
        last_idx 24 first_idx 21
        regs=40 stack=0 before 23: (bf) r1 = r0
        regs=40 stack=0 before 22: (27) r6 *= 8192
        regs=40 stack=0 before 21: (77) r6 >>= 10
        parent didn't have regs=40 stack=0 marks: R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R6_r=Pscalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
        last_idx 19 first_idx 11
        regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
        last_idx 9 first_idx 0
        regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
        regs=240 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=240 stack=0 before 5: (05) goto pc+0
        regs=240 stack=0 before 4: (97) r6 %= 1025
        regs=240 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        math between map_value pointer and register with unbounded min value is not allowed
        verification time 886 usec
        stack depth 4
        processed 49 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 2
      
      Fixes: b5dc0163 ("bpf: precise scalar_value tracking")
      Reported-by: default avatarJuan Jose Lopez Jaimez <jjlopezjaimez@google.com>
      Reported-by: default avatarMeador Inge <meadori@google.com>
      Reported-by: default avatarSimon Scannell <simonscannell@google.com>
      Reported-by: default avatarNenad Stojanovski <thenenadx@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Co-developed-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarJuan Jose Lopez Jaimez <jjlopezjaimez@google.com>
      Reviewed-by: default avatarMeador Inge <meadori@google.com>
      Reviewed-by: default avatarSimon Scannell <simonscannell@google.com>
      71b547f5
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 789b4a41
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Address two issues with the new GSS krb5 Kunit tests
      
      * tag 'nfsd-6.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: Fix failures of checksum Kunit tests
        sunrpc: Fix RFC6803 encryption test
      789b4a41