1. 05 May, 2022 6 commits
  2. 04 May, 2022 20 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2022-05-03' of git://git.kernel.org/pub/scm/linux/kernel/g · ad0724b9
      David S. Miller authored
      it/saeed/linux
      
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2022-05-03
      
      This series provides bug fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad0724b9
    • Mark Bloch's avatar
      net/mlx5: Fix matching on inner TTC · a042d7f5
      Mark Bloch authored
      The cited commits didn't use proper matching on inner TTC
      as a result distribution of encapsulated packets wasn't symmetric
      between the physical ports.
      
      Fixes: 4c71ce50 ("net/mlx5: Support partial TTC rules")
      Fixes: 8e25a2bc ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
      Signed-off-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a042d7f5
    • Moshe Shemesh's avatar
      net/mlx5: Avoid double clear or set of sync reset requested · fc3d3db0
      Moshe Shemesh authored
      Double clear of reset requested state can lead to NULL pointer as it
      will try to delete the timer twice. This can happen for example on a
      race between abort from FW and pci error or reset. Avoid such case using
      test_and_clear_bit() to verify only one time reset requested state clear
      flow. Similarly use test_and_set_bit() to verify only one time reset
      requested state set flow.
      
      Fixes: 7dd6df32 ("net/mlx5: Handle sync reset abort event")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      fc3d3db0
    • Moshe Shemesh's avatar
      net/mlx5: Fix deadlock in sync reset flow · cb7786a7
      Moshe Shemesh authored
      The sync reset flow can lead to the following deadlock when
      poll_sync_reset() is called by timer softirq and waiting on
      del_timer_sync() for the same timer. Fix that by moving the part of the
      flow that waits for the timer to reset_reload_work.
      
      It fixes the following kernel Trace:
      RIP: 0010:del_timer_sync+0x32/0x40
      ...
      Call Trace:
       <IRQ>
       mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
       poll_sync_reset.cold+0x36/0x52 [mlx5_core]
       call_timer_fn+0x32/0x130
       __run_timers.part.0+0x180/0x280
       ? tick_sched_handle+0x33/0x60
       ? tick_sched_timer+0x3d/0x80
       ? ktime_get+0x3e/0xa0
       run_timer_softirq+0x2a/0x50
       __do_softirq+0xe1/0x2d6
       ? hrtimer_interrupt+0x136/0x220
       irq_exit+0xae/0xb0
       smp_apic_timer_interrupt+0x7b/0x140
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      
      Fixes: 3c5193a8 ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cb7786a7
    • Moshe Tal's avatar
      net/mlx5e: Fix trust state reset in reload · b781bff8
      Moshe Tal authored
      Setting dscp2prio during the driver reload can cause dcb ieee app list to
      be not empty after the reload finish and as a result to a conflict between
      the priority trust state reported by the app and the state in the device
      register.
      
      Reset the dcb ieee app list on initialization in case this is
      conflicting with the register status.
      
      Fixes: 2a5e7a13 ("net/mlx5e: Add dcbnl dscp to priority support")
      Signed-off-by: default avatarMoshe Tal <moshet@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b781bff8
    • Ariel Levkovich's avatar
      net/mlx5e: Avoid checking offload capability in post_parse action · 0e322efd
      Ariel Levkovich authored
      During TC action parsing, the can_offload callback is called
      before calling the action's main parsing callback.
      
      Later on, the can_offload callback is called again before handling
      the action's post_parse callback if exists.
      
      Since the main parsing callback might have changed and set parsing
      params for the rule, following can_offload checks might fail because
      some parsing params were already set.
      
      Specifically, the ct action main parsing sets the ct param in the
      parsing status structure and when the second can_offload for ct action
      is called, before handling the ct post parsing, it will return an error
      since it checks this ct param to indicate multiple ct actions which are
      not supported.
      
      Therefore, the can_offload call is removed from the post parsing
      handling to prevent such cases.
      This is allowed since the first can_offload call will ensure that the
      action can be offloaded and the fact the code reached the post parsing
      handling already means that the action can be offloaded.
      
      Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0e322efd
    • Paul Blakey's avatar
      net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release · b069e14f
      Paul Blakey authored
      __mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT,
      if that is the last reference to that tuple, the actual deletion of
      the tuple can happen after the FT is already destroyed and freed.
      
      Flush the used workqueue before destroying the ct FT.
      
      Fixes: a2173131 ("net/mlx5e: CT: manage the lifetime of the ct entry object")
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b069e14f
    • Ariel Levkovich's avatar
      net/mlx5e: TC, fix decap fallback to uplink when int port not supported · e3fdc71b
      Ariel Levkovich authored
      When resolving the decap route device for a tunnel decap rule,
      the result may be an OVS internal port device.
      
      Prior to adding the support for internal port offload, such case
      would result in using the uplink as the default decap route device
      which allowed devices that can't support internal port offload
      to offload this decap rule.
      
      This behavior got broken by adding the internal port offload which
      will fail in case the device can't support internal port offload.
      
      To restore the old behavior, use the uplink device as the decap
      route as before when internal port offload is not supported.
      
      Fixes: b16eb3c8 ("net/mlx5: Support internal port as decap route device")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e3fdc71b
    • Ariel Levkovich's avatar
      net/mlx5e: TC, Fix ct_clear overwriting ct action metadata · 087032ee
      Ariel Levkovich authored
      ct_clear action is translated to clearing reg_c metadata
      which holds ct state and zone information using mod header
      actions.
      These actions are allocated during the actions parsing, as
      part of the flow attributes main mod header action list.
      
      If ct action exists in the rule, the flow's main mod header
      is used only in the post action table rule, after the ct tables
      which set the ct info in the reg_c as part of the ct actions.
      
      Therefore, if the original rule has a ct_clear action followed
      by a ct action, the ct action reg_c setting will be done first and
      will be followed by the ct_clear resetting reg_c and overwriting
      the ct info.
      
      Fix this by moving the ct_clear mod header actions allocation from
      the ct action parsing stage to the ct action post parsing stage where
      it is already known if ct_clear is followed by a ct action.
      In such case, we skip the mod header actions allocation for the ct
      clear since the ct action will write to reg_c anyway after clearing it.
      
      Fixes: 806401c2 ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      087032ee
    • Vlad Buslov's avatar
      net/mlx5e: Lag, Don't skip fib events on current dst · 4a2a664e
      Vlad Buslov authored
      Referenced change added check to skip updating fib when new fib instance
      has same or lower priority. However, new fib instance can be an update on
      same dst address as existing one even though the structure is another
      instance that has different address. Ignoring events on such instances
      causes multipath LAG state to not be correctly updated.
      
      Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info
      structure and don't skip events that have the same value of that fields.
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4a2a664e
    • Vlad Buslov's avatar
      net/mlx5e: Lag, Fix fib_info pointer assignment · a6589155
      Vlad Buslov authored
      Referenced change incorrectly sets single path fib_info even when LAG is
      not active. Fix it by moving call to mlx5_lag_fib_set() into conditional
      that verifies LAG state.
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a6589155
    • Vlad Buslov's avatar
      net/mlx5e: Lag, Fix use-after-free in fib event handler · 27b0420f
      Vlad Buslov authored
      Recent commit that modified fib route event handler to handle events
      according to their priority introduced use-after-free[0] in mp->mfi pointer
      usage. The pointer now is not just cached in order to be compared to
      following fib_info instances, but is also dereferenced to obtain
      fib_priority. However, since mlx5 lag code doesn't hold the reference to
      fin_info during whole mp->mfi lifetime, it could be used after fib_info
      instance has already been freed be kernel infrastructure code.
      
      Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
      type and cache fib_info priority in dedicated integer. Group
      fib_info-related data into dedicated 'fib' structure that will be further
      extended by following patches in the series.
      
      [0]:
      
      [  203.588029] ==================================================================
      [  203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138
      
      [  203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G    B             5.17.0-rc7+ #6
      [  203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
      [  203.600053] Call Trace:
      [  203.600608]  <TASK>
      [  203.601110]  dump_stack_lvl+0x48/0x5e
      [  203.601860]  print_address_description.constprop.0+0x1f/0x160
      [  203.602950]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.604073]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.605177]  kasan_report.cold+0x83/0xdf
      [  203.605969]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.607102]  mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.608199]  ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
      [  203.609382]  ? read_word_at_a_time+0xe/0x20
      [  203.610463]  ? strscpy+0xa0/0x2a0
      [  203.611463]  process_one_work+0x722/0x1270
      [  203.612344]  worker_thread+0x540/0x11e0
      [  203.613136]  ? rescuer_thread+0xd50/0xd50
      [  203.613949]  kthread+0x26e/0x300
      [  203.614627]  ? kthread_complete_and_exit+0x20/0x20
      [  203.615542]  ret_from_fork+0x1f/0x30
      [  203.616273]  </TASK>
      
      [  203.617174] Allocated by task 3746:
      [  203.617874]  kasan_save_stack+0x1e/0x40
      [  203.618644]  __kasan_kmalloc+0x81/0xa0
      [  203.619394]  fib_create_info+0xb41/0x3c50
      [  203.620213]  fib_table_insert+0x190/0x1ff0
      [  203.621020]  fib_magic.isra.0+0x246/0x2e0
      [  203.621803]  fib_add_ifaddr+0x19f/0x670
      [  203.622563]  fib_inetaddr_event+0x13f/0x270
      [  203.623377]  blocking_notifier_call_chain+0xd4/0x130
      [  203.624355]  __inet_insert_ifa+0x641/0xb20
      [  203.625185]  inet_rtm_newaddr+0xc3d/0x16a0
      [  203.626009]  rtnetlink_rcv_msg+0x309/0x880
      [  203.626826]  netlink_rcv_skb+0x11d/0x340
      [  203.627626]  netlink_unicast+0x4cc/0x790
      [  203.628430]  netlink_sendmsg+0x762/0xc00
      [  203.629230]  sock_sendmsg+0xb2/0xe0
      [  203.629955]  ____sys_sendmsg+0x58a/0x770
      [  203.630756]  ___sys_sendmsg+0xd8/0x160
      [  203.631523]  __sys_sendmsg+0xb7/0x140
      [  203.632294]  do_syscall_64+0x35/0x80
      [  203.633045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  203.634427] Freed by task 0:
      [  203.635063]  kasan_save_stack+0x1e/0x40
      [  203.635844]  kasan_set_track+0x21/0x30
      [  203.636618]  kasan_set_free_info+0x20/0x30
      [  203.637450]  __kasan_slab_free+0xfc/0x140
      [  203.638271]  kfree+0x94/0x3b0
      [  203.638903]  rcu_core+0x5e4/0x1990
      [  203.639640]  __do_softirq+0x1ba/0x5d3
      
      [  203.640828] Last potentially related work creation:
      [  203.641785]  kasan_save_stack+0x1e/0x40
      [  203.642571]  __kasan_record_aux_stack+0x9f/0xb0
      [  203.643478]  call_rcu+0x88/0x9c0
      [  203.644178]  fib_release_info+0x539/0x750
      [  203.644997]  fib_table_delete+0x659/0xb80
      [  203.645809]  fib_magic.isra.0+0x1a3/0x2e0
      [  203.646617]  fib_del_ifaddr+0x93f/0x1300
      [  203.647415]  fib_inetaddr_event+0x9f/0x270
      [  203.648251]  blocking_notifier_call_chain+0xd4/0x130
      [  203.649225]  __inet_del_ifa+0x474/0xc10
      [  203.650016]  devinet_ioctl+0x781/0x17f0
      [  203.650788]  inet_ioctl+0x1ad/0x290
      [  203.651533]  sock_do_ioctl+0xce/0x1c0
      [  203.652315]  sock_ioctl+0x27b/0x4f0
      [  203.653058]  __x64_sys_ioctl+0x124/0x190
      [  203.653850]  do_syscall_64+0x35/0x80
      [  203.654608]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  203.666952] The buggy address belongs to the object at ffff888144df2000
                      which belongs to the cache kmalloc-256 of size 256
      [  203.669250] The buggy address is located 80 bytes inside of
                      256-byte region [ffff888144df2000, ffff888144df2100)
      [  203.671332] The buggy address belongs to the page:
      [  203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
      [  203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
      [  203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
      [  203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
      [  203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
      [  203.679928] page dumped because: kasan: bad access detected
      
      [  203.681455] Memory state around the buggy address:
      [  203.682421]  ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.683863]  ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  203.686701]                                                  ^
      [  203.687820]  ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  203.689226]  ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.690620] ==================================================================
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      27b0420f
    • Mark Zhang's avatar
      net/mlx5e: Fix the calling of update_buffer_lossy() API · c4d963a5
      Mark Zhang authored
      The arguments of update_buffer_lossy() is in a wrong order. Fix it.
      
      Fixes: 88b3d5c9 ("net/mlx5e: Fix port buffers cell size value")
      Signed-off-by: default avatarMark Zhang <markzhang@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c4d963a5
    • Vlad Buslov's avatar
      net/mlx5e: Don't match double-vlan packets if cvlan is not set · ada09af9
      Vlad Buslov authored
      Currently, match VLAN rule also matches packets that have multiple VLAN
      headers. This behavior is similar to buggy flower classifier behavior that
      has recently been fixed. Fix the issue by matching on
      outer_second_cvlan_tag with value 0 which will cause the HW to verify the
      packet doesn't contain second vlan header.
      
      Fixes: 699e96dd ("net/mlx5e: Support offloading tc double vlan headers match")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ada09af9
    • Aya Levin's avatar
      net/mlx5: Fix slab-out-of-bounds while reading resource dump menu · 7ba2d9d8
      Aya Levin authored
      Resource dump menu may span over more than a single page, support it.
      Otherwise, menu read may result in a memory access violation: reading
      outside of the allocated page.
      Note that page format of the first menu page contains menu headers while
      the proceeding menu pages contain only records.
      
      The KASAN logs are as follows:
      BUG: KASAN: slab-out-of-bounds in strcmp+0x9b/0xb0
      Read of size 1 at addr ffff88812b2e1fd0 by task systemd-udevd/496
      
      CPU: 5 PID: 496 Comm: systemd-udevd Tainted: G    B  5.16.0_for_upstream_debug_2022_01_10_23_12 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x57/0x7d
       print_address_description.constprop.0+0x1f/0x140
       ? strcmp+0x9b/0xb0
       ? strcmp+0x9b/0xb0
       kasan_report.cold+0x83/0xdf
       ? strcmp+0x9b/0xb0
       strcmp+0x9b/0xb0
       mlx5_rsc_dump_init+0x4ab/0x780 [mlx5_core]
       ? mlx5_rsc_dump_destroy+0x80/0x80 [mlx5_core]
       ? lockdep_hardirqs_on_prepare+0x286/0x400
       ? raw_spin_unlock_irqrestore+0x47/0x50
       ? aomic_notifier_chain_register+0x32/0x40
       mlx5_load+0x104/0x2e0 [mlx5_core]
       mlx5_init_one+0x41b/0x610 [mlx5_core]
       ....
      The buggy address belongs to the object at ffff88812b2e0000
       which belongs to the cache kmalloc-4k of size 4096
      The buggy address is located 4048 bytes to the right of
       4096-byte region [ffff88812b2e0000, ffff88812b2e1000)
      The buggy address belongs to the page:
      page:000000009d69807a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812b2e6000 pfn:0x12b2e0
      head:000000009d69807a order:3 compound_mapcount:0 compound_pincount:0
      flags: 0x8000000000010200(slab|head|zone=2)
      raw: 8000000000010200 0000000000000000 dead000000000001 ffff888100043040
      raw: ffff88812b2e6000 0000000080040000 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88812b2e1e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88812b2e1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88812b2e1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                       ^
       ffff88812b2e2000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88812b2e2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 12206b17 ("net/mlx5: Add support for resource dump")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7ba2d9d8
    • Ariel Levkovich's avatar
      net/mlx5e: Fix wrong source vport matching on tunnel rule · cb0d54cb
      Ariel Levkovich authored
      When OVS internal port is the vtep device, the first decap
      rule is matching on the internal port's vport metadata value
      and then changes the metadata to be the uplink's value.
      
      Therefore, following rules on the tunnel, in chain > 0, should
      avoid matching on internal port metadata and use the uplink
      vport metadata instead.
      
      Select the uplink's metadata value for the source vport match
      in case the rule is in chain greater than zero, even if the tunnel
      route device is internal port.
      
      Fixes: 166f431e ("net/mlx5e: Add indirect tc offload of ovs internal port")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cb0d54cb
    • Jakub Kicinski's avatar
      Merge branch 'bnxt_en-bug-fixes' · 0a806ecc
      Jakub Kicinski authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes
      
      This patch series includes 3 fixes:
       - Fix an occasional VF open failure.
       - Fix a PTP spinlock usage before initialization
       - Fix unnecesary RX packet drops under high TX traffic load.
      ====================
      
      Link: https://lore.kernel.org/r/1651540392-2260-1-git-send-email-michael.chan@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a806ecc
    • Michael Chan's avatar
      bnxt_en: Fix unnecessary dropping of RX packets · 195af579
      Michael Chan authored
      In bnxt_poll_p5(), we first check cpr->has_more_work.  If it is true,
      we are in NAPI polling mode and we will call __bnxt_poll_cqs() to
      continue polling.  It is possible to exhanust the budget again when
      __bnxt_poll_cqs() returns.
      
      We then enter the main while loop to check for new entries in the NQ.
      If we had previously exhausted the NAPI budget, we may call
      __bnxt_poll_work() to process an RX entry with zero budget.  This will
      cause packets to be dropped unnecessarily, thinking that we are in the
      netpoll path.  Fix it by breaking out of the while loop if we need
      to process an RX NQ entry with no budget left.  We will then exit
      NAPI and stay in polling mode.
      
      Fixes: 389a877a ("bnxt_en: Process the NQ under NAPI continuous polling.")
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      195af579
    • Michael Chan's avatar
      bnxt_en: Initiallize bp->ptp_lock first before using it · 2b156fb5
      Michael Chan authored
      bnxt_ptp_init() calls bnxt_ptp_init_rtc() which will acquire the ptp_lock
      spinlock.  The spinlock is not initialized until later.  Move the
      bnxt_ptp_init_rtc() call after the spinlock is initialized.
      
      Fixes: 24ac1ecd ("bnxt_en: Add driver support to use Real Time Counter for PTP")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarSaravanan Vajravel <saravanan.vajravel@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarDamodharam Ammepalli <damodharam.ammepalli@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b156fb5
    • Somnath Kotur's avatar
      bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag · 13ba7943
      Somnath Kotur authored
      bnxt_open() can fail in this code path, especially on a VF when
      it fails to reserve default rings:
      
      bnxt_open()
        __bnxt_open_nic()
          bnxt_clear_int_mode()
          bnxt_init_dflt_ring_mode()
      
      RX rings would be set to 0 when we hit this error path.
      
      It is possible for a subsequent bnxt_open() call to potentially succeed
      with a code path like this:
      
      bnxt_open()
        bnxt_hwrm_if_change()
          bnxt_fw_init_one()
            bnxt_fw_init_one_p3()
              bnxt_set_dflt_rfs()
                bnxt_rfs_capable()
                  bnxt_hwrm_reserve_rings()
      
      On older chips, RFS is capable if we can reserve the number of vnics that
      is equal to RX rings + 1.  But since RX rings is still set to 0 in this
      code path, we may mistakenly think that RFS is supported for 0 RX rings.
      
      Later, when the default RX rings are reserved and we try to enable
      RFS, it would fail and cause bnxt_open() to fail unnecessarily.
      
      We fix this in 2 places.  bnxt_rfs_capable() will always return false if
      RX rings is not yet set.  bnxt_init_dflt_ring_mode() will call
      bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not
      supported.
      
      Fixes: 20d7d1c5 ("bnxt_en: reliably allocate IRQ table on reset to avoid crash")
      Signed-off-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13ba7943
  3. 03 May, 2022 9 commits
  4. 01 May, 2022 5 commits
    • Russell King (Oracle)'s avatar
      net: dsa: b53: convert to phylink_pcs · 79396934
      Russell King (Oracle) authored
      Convert B53 to use phylink_pcs for the serdes rather than hooking it
      into the MAC-layer callbacks.
      
      Fixes: 81c1681c ("net: dsa: b53: mark as non-legacy")
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79396934
    • Thomas Gleixner's avatar
      pci_irq_vector() can't be used in atomic context any longer. This conflicts · 6b292a04
      Thomas Gleixner authored
      with the usage of this function in nic_mbx_intr_handler().
      
      Cache the Linux interrupt numbers in struct nicpf and use that cache in the
      interrupt handler to select the mailbox.
      
      Fixes: 495c66ac ("genirq/msi: Convert to new functions")
      Reported-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Sunil Goutham <sgoutham@marvell.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2041772Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b292a04
    • David S. Miller's avatar
      Merge branch 'nfc-fixes' · b6693611
      David S. Miller authored
      Duoming Zhou says:
      
      ====================
      Replace improper checks and fix bugs in nfc subsystem
      
      The first patch is used to replace improper checks in netlink related
      functions of nfc core, the second patch is used to fix bugs in
      nfcmrvl driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6693611
    • Duoming Zhou's avatar
      nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs · d270453a
      Duoming Zhou authored
      There are destructive operations such as nfcmrvl_fw_dnld_abort and
      gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
      gpio and so on could be destructed while the upper layer functions such as
      nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
      to double-free, use-after-free and null-ptr-deref bugs.
      
      There are three situations that could lead to double-free bugs.
      
      The first situation is shown below:
      
         (Thread 1)                 |      (Thread 2)
      nfcmrvl_fw_dnld_start         |
       ...                          |  nfcmrvl_nci_unregister_dev
       release_firmware()           |   nfcmrvl_fw_dnld_abort
        kfree(fw) //(1)             |    fw_dnld_over
                                    |     release_firmware
        ...                         |      kfree(fw) //(2)
                                    |     ...
      
      The second situation is shown below:
      
         (Thread 1)                 |      (Thread 2)
      nfcmrvl_fw_dnld_start         |
       ...                          |
       mod_timer                    |
       (wait a time)                |
       fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
         fw_dnld_over               |   nfcmrvl_fw_dnld_abort
          release_firmware          |    fw_dnld_over
           kfree(fw) //(1)          |     release_firmware
           ...                      |      kfree(fw) //(2)
      
      The third situation is shown below:
      
             (Thread 1)               |       (Thread 2)
      nfcmrvl_nci_recv_frame          |
       if(..->fw_download_in_progress)|
        nfcmrvl_fw_dnld_recv_frame    |
         queue_work                   |
                                      |
      fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
       fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
        release_firmware              |   fw_dnld_over
         kfree(fw) //(1)              |    release_firmware
                                      |     kfree(fw) //(2)
      
      The firmware struct is deallocated in position (1) and deallocated
      in position (2) again.
      
      The crash trace triggered by POC is like below:
      
      BUG: KASAN: double-free or invalid-free in fw_dnld_over
      Call Trace:
        kfree
        fw_dnld_over
        nfcmrvl_nci_unregister_dev
        nci_uart_tty_close
        tty_ldisc_kill
        tty_ldisc_hangup
        __tty_hangup.part.0
        tty_release
        ...
      
      What's more, there are also use-after-free and null-ptr-deref bugs
      in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
      set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
      then, we dereference firmware, gpio or the members of priv->fw_dnld in
      nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
      
      This patch reorders destructive operations after nci_unregister_device
      in order to synchronize between cleanup routine and firmware download
      routine.
      
      The nci_unregister_device is well synchronized. If the device is
      detaching, the firmware download routine will goto error. If firmware
      download routine is executing, nci_unregister_device will wait until
      firmware download routine is finished.
      
      Fixes: 3194c687 ("NFC: nfcmrvl: add firmware download support")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d270453a
    • Duoming Zhou's avatar
      nfc: replace improper check device_is_registered() in netlink related functions · da5c0f11
      Duoming Zhou authored
      The device_is_registered() in nfc core is used to check whether
      nfc device is registered in netlink related functions such as
      nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
      is protected by device_lock, there is still a race condition between
      device_del() and device_is_registered(). The root cause is that
      kobject_del() in device_del() is not protected by device_lock.
      
         (cleanup task)         |     (netlink task)
                                |
      nfc_unregister_device     | nfc_fw_download
       device_del               |  device_lock
        ...                     |   if (!device_is_registered)//(1)
        kobject_del//(2)        |   ...
       ...                      |  device_unlock
      
      The device_is_registered() returns the value of state_in_sysfs and
      the state_in_sysfs is set to zero in kobject_del(). If we pass check in
      position (1), then set zero in position (2). As a result, the check
      in position (1) is useless.
      
      This patch uses bool variable instead of device_is_registered() to judge
      whether the nfc device is registered, which is well synchronized.
      
      Fixes: 3e256b8f ("NFC: add nfc subsystem core")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da5c0f11