1. 05 May, 2022 3 commits
  2. 04 May, 2022 20 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2022-05-03' of git://git.kernel.org/pub/scm/linux/kernel/g · ad0724b9
      David S. Miller authored
      it/saeed/linux
      
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2022-05-03
      
      This series provides bug fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad0724b9
    • Mark Bloch's avatar
      net/mlx5: Fix matching on inner TTC · a042d7f5
      Mark Bloch authored
      The cited commits didn't use proper matching on inner TTC
      as a result distribution of encapsulated packets wasn't symmetric
      between the physical ports.
      
      Fixes: 4c71ce50 ("net/mlx5: Support partial TTC rules")
      Fixes: 8e25a2bc ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
      Signed-off-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a042d7f5
    • Moshe Shemesh's avatar
      net/mlx5: Avoid double clear or set of sync reset requested · fc3d3db0
      Moshe Shemesh authored
      Double clear of reset requested state can lead to NULL pointer as it
      will try to delete the timer twice. This can happen for example on a
      race between abort from FW and pci error or reset. Avoid such case using
      test_and_clear_bit() to verify only one time reset requested state clear
      flow. Similarly use test_and_set_bit() to verify only one time reset
      requested state set flow.
      
      Fixes: 7dd6df32 ("net/mlx5: Handle sync reset abort event")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      fc3d3db0
    • Moshe Shemesh's avatar
      net/mlx5: Fix deadlock in sync reset flow · cb7786a7
      Moshe Shemesh authored
      The sync reset flow can lead to the following deadlock when
      poll_sync_reset() is called by timer softirq and waiting on
      del_timer_sync() for the same timer. Fix that by moving the part of the
      flow that waits for the timer to reset_reload_work.
      
      It fixes the following kernel Trace:
      RIP: 0010:del_timer_sync+0x32/0x40
      ...
      Call Trace:
       <IRQ>
       mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
       poll_sync_reset.cold+0x36/0x52 [mlx5_core]
       call_timer_fn+0x32/0x130
       __run_timers.part.0+0x180/0x280
       ? tick_sched_handle+0x33/0x60
       ? tick_sched_timer+0x3d/0x80
       ? ktime_get+0x3e/0xa0
       run_timer_softirq+0x2a/0x50
       __do_softirq+0xe1/0x2d6
       ? hrtimer_interrupt+0x136/0x220
       irq_exit+0xae/0xb0
       smp_apic_timer_interrupt+0x7b/0x140
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      
      Fixes: 3c5193a8 ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cb7786a7
    • Moshe Tal's avatar
      net/mlx5e: Fix trust state reset in reload · b781bff8
      Moshe Tal authored
      Setting dscp2prio during the driver reload can cause dcb ieee app list to
      be not empty after the reload finish and as a result to a conflict between
      the priority trust state reported by the app and the state in the device
      register.
      
      Reset the dcb ieee app list on initialization in case this is
      conflicting with the register status.
      
      Fixes: 2a5e7a13 ("net/mlx5e: Add dcbnl dscp to priority support")
      Signed-off-by: default avatarMoshe Tal <moshet@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b781bff8
    • Ariel Levkovich's avatar
      net/mlx5e: Avoid checking offload capability in post_parse action · 0e322efd
      Ariel Levkovich authored
      During TC action parsing, the can_offload callback is called
      before calling the action's main parsing callback.
      
      Later on, the can_offload callback is called again before handling
      the action's post_parse callback if exists.
      
      Since the main parsing callback might have changed and set parsing
      params for the rule, following can_offload checks might fail because
      some parsing params were already set.
      
      Specifically, the ct action main parsing sets the ct param in the
      parsing status structure and when the second can_offload for ct action
      is called, before handling the ct post parsing, it will return an error
      since it checks this ct param to indicate multiple ct actions which are
      not supported.
      
      Therefore, the can_offload call is removed from the post parsing
      handling to prevent such cases.
      This is allowed since the first can_offload call will ensure that the
      action can be offloaded and the fact the code reached the post parsing
      handling already means that the action can be offloaded.
      
      Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0e322efd
    • Paul Blakey's avatar
      net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release · b069e14f
      Paul Blakey authored
      __mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT,
      if that is the last reference to that tuple, the actual deletion of
      the tuple can happen after the FT is already destroyed and freed.
      
      Flush the used workqueue before destroying the ct FT.
      
      Fixes: a2173131 ("net/mlx5e: CT: manage the lifetime of the ct entry object")
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b069e14f
    • Ariel Levkovich's avatar
      net/mlx5e: TC, fix decap fallback to uplink when int port not supported · e3fdc71b
      Ariel Levkovich authored
      When resolving the decap route device for a tunnel decap rule,
      the result may be an OVS internal port device.
      
      Prior to adding the support for internal port offload, such case
      would result in using the uplink as the default decap route device
      which allowed devices that can't support internal port offload
      to offload this decap rule.
      
      This behavior got broken by adding the internal port offload which
      will fail in case the device can't support internal port offload.
      
      To restore the old behavior, use the uplink device as the decap
      route as before when internal port offload is not supported.
      
      Fixes: b16eb3c8 ("net/mlx5: Support internal port as decap route device")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e3fdc71b
    • Ariel Levkovich's avatar
      net/mlx5e: TC, Fix ct_clear overwriting ct action metadata · 087032ee
      Ariel Levkovich authored
      ct_clear action is translated to clearing reg_c metadata
      which holds ct state and zone information using mod header
      actions.
      These actions are allocated during the actions parsing, as
      part of the flow attributes main mod header action list.
      
      If ct action exists in the rule, the flow's main mod header
      is used only in the post action table rule, after the ct tables
      which set the ct info in the reg_c as part of the ct actions.
      
      Therefore, if the original rule has a ct_clear action followed
      by a ct action, the ct action reg_c setting will be done first and
      will be followed by the ct_clear resetting reg_c and overwriting
      the ct info.
      
      Fix this by moving the ct_clear mod header actions allocation from
      the ct action parsing stage to the ct action post parsing stage where
      it is already known if ct_clear is followed by a ct action.
      In such case, we skip the mod header actions allocation for the ct
      clear since the ct action will write to reg_c anyway after clearing it.
      
      Fixes: 806401c2 ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      087032ee
    • Vlad Buslov's avatar
      net/mlx5e: Lag, Don't skip fib events on current dst · 4a2a664e
      Vlad Buslov authored
      Referenced change added check to skip updating fib when new fib instance
      has same or lower priority. However, new fib instance can be an update on
      same dst address as existing one even though the structure is another
      instance that has different address. Ignoring events on such instances
      causes multipath LAG state to not be correctly updated.
      
      Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info
      structure and don't skip events that have the same value of that fields.
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4a2a664e
    • Vlad Buslov's avatar
      net/mlx5e: Lag, Fix fib_info pointer assignment · a6589155
      Vlad Buslov authored
      Referenced change incorrectly sets single path fib_info even when LAG is
      not active. Fix it by moving call to mlx5_lag_fib_set() into conditional
      that verifies LAG state.
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a6589155
    • Vlad Buslov's avatar
      net/mlx5e: Lag, Fix use-after-free in fib event handler · 27b0420f
      Vlad Buslov authored
      Recent commit that modified fib route event handler to handle events
      according to their priority introduced use-after-free[0] in mp->mfi pointer
      usage. The pointer now is not just cached in order to be compared to
      following fib_info instances, but is also dereferenced to obtain
      fib_priority. However, since mlx5 lag code doesn't hold the reference to
      fin_info during whole mp->mfi lifetime, it could be used after fib_info
      instance has already been freed be kernel infrastructure code.
      
      Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
      type and cache fib_info priority in dedicated integer. Group
      fib_info-related data into dedicated 'fib' structure that will be further
      extended by following patches in the series.
      
      [0]:
      
      [  203.588029] ==================================================================
      [  203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138
      
      [  203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G    B             5.17.0-rc7+ #6
      [  203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
      [  203.600053] Call Trace:
      [  203.600608]  <TASK>
      [  203.601110]  dump_stack_lvl+0x48/0x5e
      [  203.601860]  print_address_description.constprop.0+0x1f/0x160
      [  203.602950]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.604073]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.605177]  kasan_report.cold+0x83/0xdf
      [  203.605969]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.607102]  mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.608199]  ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
      [  203.609382]  ? read_word_at_a_time+0xe/0x20
      [  203.610463]  ? strscpy+0xa0/0x2a0
      [  203.611463]  process_one_work+0x722/0x1270
      [  203.612344]  worker_thread+0x540/0x11e0
      [  203.613136]  ? rescuer_thread+0xd50/0xd50
      [  203.613949]  kthread+0x26e/0x300
      [  203.614627]  ? kthread_complete_and_exit+0x20/0x20
      [  203.615542]  ret_from_fork+0x1f/0x30
      [  203.616273]  </TASK>
      
      [  203.617174] Allocated by task 3746:
      [  203.617874]  kasan_save_stack+0x1e/0x40
      [  203.618644]  __kasan_kmalloc+0x81/0xa0
      [  203.619394]  fib_create_info+0xb41/0x3c50
      [  203.620213]  fib_table_insert+0x190/0x1ff0
      [  203.621020]  fib_magic.isra.0+0x246/0x2e0
      [  203.621803]  fib_add_ifaddr+0x19f/0x670
      [  203.622563]  fib_inetaddr_event+0x13f/0x270
      [  203.623377]  blocking_notifier_call_chain+0xd4/0x130
      [  203.624355]  __inet_insert_ifa+0x641/0xb20
      [  203.625185]  inet_rtm_newaddr+0xc3d/0x16a0
      [  203.626009]  rtnetlink_rcv_msg+0x309/0x880
      [  203.626826]  netlink_rcv_skb+0x11d/0x340
      [  203.627626]  netlink_unicast+0x4cc/0x790
      [  203.628430]  netlink_sendmsg+0x762/0xc00
      [  203.629230]  sock_sendmsg+0xb2/0xe0
      [  203.629955]  ____sys_sendmsg+0x58a/0x770
      [  203.630756]  ___sys_sendmsg+0xd8/0x160
      [  203.631523]  __sys_sendmsg+0xb7/0x140
      [  203.632294]  do_syscall_64+0x35/0x80
      [  203.633045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  203.634427] Freed by task 0:
      [  203.635063]  kasan_save_stack+0x1e/0x40
      [  203.635844]  kasan_set_track+0x21/0x30
      [  203.636618]  kasan_set_free_info+0x20/0x30
      [  203.637450]  __kasan_slab_free+0xfc/0x140
      [  203.638271]  kfree+0x94/0x3b0
      [  203.638903]  rcu_core+0x5e4/0x1990
      [  203.639640]  __do_softirq+0x1ba/0x5d3
      
      [  203.640828] Last potentially related work creation:
      [  203.641785]  kasan_save_stack+0x1e/0x40
      [  203.642571]  __kasan_record_aux_stack+0x9f/0xb0
      [  203.643478]  call_rcu+0x88/0x9c0
      [  203.644178]  fib_release_info+0x539/0x750
      [  203.644997]  fib_table_delete+0x659/0xb80
      [  203.645809]  fib_magic.isra.0+0x1a3/0x2e0
      [  203.646617]  fib_del_ifaddr+0x93f/0x1300
      [  203.647415]  fib_inetaddr_event+0x9f/0x270
      [  203.648251]  blocking_notifier_call_chain+0xd4/0x130
      [  203.649225]  __inet_del_ifa+0x474/0xc10
      [  203.650016]  devinet_ioctl+0x781/0x17f0
      [  203.650788]  inet_ioctl+0x1ad/0x290
      [  203.651533]  sock_do_ioctl+0xce/0x1c0
      [  203.652315]  sock_ioctl+0x27b/0x4f0
      [  203.653058]  __x64_sys_ioctl+0x124/0x190
      [  203.653850]  do_syscall_64+0x35/0x80
      [  203.654608]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  203.666952] The buggy address belongs to the object at ffff888144df2000
                      which belongs to the cache kmalloc-256 of size 256
      [  203.669250] The buggy address is located 80 bytes inside of
                      256-byte region [ffff888144df2000, ffff888144df2100)
      [  203.671332] The buggy address belongs to the page:
      [  203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
      [  203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
      [  203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
      [  203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
      [  203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
      [  203.679928] page dumped because: kasan: bad access detected
      
      [  203.681455] Memory state around the buggy address:
      [  203.682421]  ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.683863]  ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  203.686701]                                                  ^
      [  203.687820]  ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  203.689226]  ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.690620] ==================================================================
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      27b0420f
    • Mark Zhang's avatar
      net/mlx5e: Fix the calling of update_buffer_lossy() API · c4d963a5
      Mark Zhang authored
      The arguments of update_buffer_lossy() is in a wrong order. Fix it.
      
      Fixes: 88b3d5c9 ("net/mlx5e: Fix port buffers cell size value")
      Signed-off-by: default avatarMark Zhang <markzhang@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c4d963a5
    • Vlad Buslov's avatar
      net/mlx5e: Don't match double-vlan packets if cvlan is not set · ada09af9
      Vlad Buslov authored
      Currently, match VLAN rule also matches packets that have multiple VLAN
      headers. This behavior is similar to buggy flower classifier behavior that
      has recently been fixed. Fix the issue by matching on
      outer_second_cvlan_tag with value 0 which will cause the HW to verify the
      packet doesn't contain second vlan header.
      
      Fixes: 699e96dd ("net/mlx5e: Support offloading tc double vlan headers match")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ada09af9
    • Aya Levin's avatar
      net/mlx5: Fix slab-out-of-bounds while reading resource dump menu · 7ba2d9d8
      Aya Levin authored
      Resource dump menu may span over more than a single page, support it.
      Otherwise, menu read may result in a memory access violation: reading
      outside of the allocated page.
      Note that page format of the first menu page contains menu headers while
      the proceeding menu pages contain only records.
      
      The KASAN logs are as follows:
      BUG: KASAN: slab-out-of-bounds in strcmp+0x9b/0xb0
      Read of size 1 at addr ffff88812b2e1fd0 by task systemd-udevd/496
      
      CPU: 5 PID: 496 Comm: systemd-udevd Tainted: G    B  5.16.0_for_upstream_debug_2022_01_10_23_12 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x57/0x7d
       print_address_description.constprop.0+0x1f/0x140
       ? strcmp+0x9b/0xb0
       ? strcmp+0x9b/0xb0
       kasan_report.cold+0x83/0xdf
       ? strcmp+0x9b/0xb0
       strcmp+0x9b/0xb0
       mlx5_rsc_dump_init+0x4ab/0x780 [mlx5_core]
       ? mlx5_rsc_dump_destroy+0x80/0x80 [mlx5_core]
       ? lockdep_hardirqs_on_prepare+0x286/0x400
       ? raw_spin_unlock_irqrestore+0x47/0x50
       ? aomic_notifier_chain_register+0x32/0x40
       mlx5_load+0x104/0x2e0 [mlx5_core]
       mlx5_init_one+0x41b/0x610 [mlx5_core]
       ....
      The buggy address belongs to the object at ffff88812b2e0000
       which belongs to the cache kmalloc-4k of size 4096
      The buggy address is located 4048 bytes to the right of
       4096-byte region [ffff88812b2e0000, ffff88812b2e1000)
      The buggy address belongs to the page:
      page:000000009d69807a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812b2e6000 pfn:0x12b2e0
      head:000000009d69807a order:3 compound_mapcount:0 compound_pincount:0
      flags: 0x8000000000010200(slab|head|zone=2)
      raw: 8000000000010200 0000000000000000 dead000000000001 ffff888100043040
      raw: ffff88812b2e6000 0000000080040000 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88812b2e1e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88812b2e1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88812b2e1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                       ^
       ffff88812b2e2000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88812b2e2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 12206b17 ("net/mlx5: Add support for resource dump")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7ba2d9d8
    • Ariel Levkovich's avatar
      net/mlx5e: Fix wrong source vport matching on tunnel rule · cb0d54cb
      Ariel Levkovich authored
      When OVS internal port is the vtep device, the first decap
      rule is matching on the internal port's vport metadata value
      and then changes the metadata to be the uplink's value.
      
      Therefore, following rules on the tunnel, in chain > 0, should
      avoid matching on internal port metadata and use the uplink
      vport metadata instead.
      
      Select the uplink's metadata value for the source vport match
      in case the rule is in chain greater than zero, even if the tunnel
      route device is internal port.
      
      Fixes: 166f431e ("net/mlx5e: Add indirect tc offload of ovs internal port")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cb0d54cb
    • Jakub Kicinski's avatar
      Merge branch 'bnxt_en-bug-fixes' · 0a806ecc
      Jakub Kicinski authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes
      
      This patch series includes 3 fixes:
       - Fix an occasional VF open failure.
       - Fix a PTP spinlock usage before initialization
       - Fix unnecesary RX packet drops under high TX traffic load.
      ====================
      
      Link: https://lore.kernel.org/r/1651540392-2260-1-git-send-email-michael.chan@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a806ecc
    • Michael Chan's avatar
      bnxt_en: Fix unnecessary dropping of RX packets · 195af579
      Michael Chan authored
      In bnxt_poll_p5(), we first check cpr->has_more_work.  If it is true,
      we are in NAPI polling mode and we will call __bnxt_poll_cqs() to
      continue polling.  It is possible to exhanust the budget again when
      __bnxt_poll_cqs() returns.
      
      We then enter the main while loop to check for new entries in the NQ.
      If we had previously exhausted the NAPI budget, we may call
      __bnxt_poll_work() to process an RX entry with zero budget.  This will
      cause packets to be dropped unnecessarily, thinking that we are in the
      netpoll path.  Fix it by breaking out of the while loop if we need
      to process an RX NQ entry with no budget left.  We will then exit
      NAPI and stay in polling mode.
      
      Fixes: 389a877a ("bnxt_en: Process the NQ under NAPI continuous polling.")
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      195af579
    • Michael Chan's avatar
      bnxt_en: Initiallize bp->ptp_lock first before using it · 2b156fb5
      Michael Chan authored
      bnxt_ptp_init() calls bnxt_ptp_init_rtc() which will acquire the ptp_lock
      spinlock.  The spinlock is not initialized until later.  Move the
      bnxt_ptp_init_rtc() call after the spinlock is initialized.
      
      Fixes: 24ac1ecd ("bnxt_en: Add driver support to use Real Time Counter for PTP")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarSaravanan Vajravel <saravanan.vajravel@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarDamodharam Ammepalli <damodharam.ammepalli@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b156fb5
    • Somnath Kotur's avatar
      bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag · 13ba7943
      Somnath Kotur authored
      bnxt_open() can fail in this code path, especially on a VF when
      it fails to reserve default rings:
      
      bnxt_open()
        __bnxt_open_nic()
          bnxt_clear_int_mode()
          bnxt_init_dflt_ring_mode()
      
      RX rings would be set to 0 when we hit this error path.
      
      It is possible for a subsequent bnxt_open() call to potentially succeed
      with a code path like this:
      
      bnxt_open()
        bnxt_hwrm_if_change()
          bnxt_fw_init_one()
            bnxt_fw_init_one_p3()
              bnxt_set_dflt_rfs()
                bnxt_rfs_capable()
                  bnxt_hwrm_reserve_rings()
      
      On older chips, RFS is capable if we can reserve the number of vnics that
      is equal to RX rings + 1.  But since RX rings is still set to 0 in this
      code path, we may mistakenly think that RFS is supported for 0 RX rings.
      
      Later, when the default RX rings are reserved and we try to enable
      RFS, it would fail and cause bnxt_open() to fail unnecessarily.
      
      We fix this in 2 places.  bnxt_rfs_capable() will always return false if
      RX rings is not yet set.  bnxt_init_dflt_ring_mode() will call
      bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not
      supported.
      
      Fixes: 20d7d1c5 ("bnxt_en: reliably allocate IRQ table on reset to avoid crash")
      Signed-off-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13ba7943
  3. 03 May, 2022 9 commits
  4. 01 May, 2022 6 commits
    • Russell King (Oracle)'s avatar
      net: dsa: b53: convert to phylink_pcs · 79396934
      Russell King (Oracle) authored
      Convert B53 to use phylink_pcs for the serdes rather than hooking it
      into the MAC-layer callbacks.
      
      Fixes: 81c1681c ("net: dsa: b53: mark as non-legacy")
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79396934
    • Thomas Gleixner's avatar
      pci_irq_vector() can't be used in atomic context any longer. This conflicts · 6b292a04
      Thomas Gleixner authored
      with the usage of this function in nic_mbx_intr_handler().
      
      Cache the Linux interrupt numbers in struct nicpf and use that cache in the
      interrupt handler to select the mailbox.
      
      Fixes: 495c66ac ("genirq/msi: Convert to new functions")
      Reported-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Sunil Goutham <sgoutham@marvell.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2041772Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b292a04
    • David S. Miller's avatar
      Merge branch 'nfc-fixes' · b6693611
      David S. Miller authored
      Duoming Zhou says:
      
      ====================
      Replace improper checks and fix bugs in nfc subsystem
      
      The first patch is used to replace improper checks in netlink related
      functions of nfc core, the second patch is used to fix bugs in
      nfcmrvl driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6693611
    • Duoming Zhou's avatar
      nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs · d270453a
      Duoming Zhou authored
      There are destructive operations such as nfcmrvl_fw_dnld_abort and
      gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
      gpio and so on could be destructed while the upper layer functions such as
      nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
      to double-free, use-after-free and null-ptr-deref bugs.
      
      There are three situations that could lead to double-free bugs.
      
      The first situation is shown below:
      
         (Thread 1)                 |      (Thread 2)
      nfcmrvl_fw_dnld_start         |
       ...                          |  nfcmrvl_nci_unregister_dev
       release_firmware()           |   nfcmrvl_fw_dnld_abort
        kfree(fw) //(1)             |    fw_dnld_over
                                    |     release_firmware
        ...                         |      kfree(fw) //(2)
                                    |     ...
      
      The second situation is shown below:
      
         (Thread 1)                 |      (Thread 2)
      nfcmrvl_fw_dnld_start         |
       ...                          |
       mod_timer                    |
       (wait a time)                |
       fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
         fw_dnld_over               |   nfcmrvl_fw_dnld_abort
          release_firmware          |    fw_dnld_over
           kfree(fw) //(1)          |     release_firmware
           ...                      |      kfree(fw) //(2)
      
      The third situation is shown below:
      
             (Thread 1)               |       (Thread 2)
      nfcmrvl_nci_recv_frame          |
       if(..->fw_download_in_progress)|
        nfcmrvl_fw_dnld_recv_frame    |
         queue_work                   |
                                      |
      fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
       fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
        release_firmware              |   fw_dnld_over
         kfree(fw) //(1)              |    release_firmware
                                      |     kfree(fw) //(2)
      
      The firmware struct is deallocated in position (1) and deallocated
      in position (2) again.
      
      The crash trace triggered by POC is like below:
      
      BUG: KASAN: double-free or invalid-free in fw_dnld_over
      Call Trace:
        kfree
        fw_dnld_over
        nfcmrvl_nci_unregister_dev
        nci_uart_tty_close
        tty_ldisc_kill
        tty_ldisc_hangup
        __tty_hangup.part.0
        tty_release
        ...
      
      What's more, there are also use-after-free and null-ptr-deref bugs
      in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
      set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
      then, we dereference firmware, gpio or the members of priv->fw_dnld in
      nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
      
      This patch reorders destructive operations after nci_unregister_device
      in order to synchronize between cleanup routine and firmware download
      routine.
      
      The nci_unregister_device is well synchronized. If the device is
      detaching, the firmware download routine will goto error. If firmware
      download routine is executing, nci_unregister_device will wait until
      firmware download routine is finished.
      
      Fixes: 3194c687 ("NFC: nfcmrvl: add firmware download support")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d270453a
    • Duoming Zhou's avatar
      nfc: replace improper check device_is_registered() in netlink related functions · da5c0f11
      Duoming Zhou authored
      The device_is_registered() in nfc core is used to check whether
      nfc device is registered in netlink related functions such as
      nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
      is protected by device_lock, there is still a race condition between
      device_del() and device_is_registered(). The root cause is that
      kobject_del() in device_del() is not protected by device_lock.
      
         (cleanup task)         |     (netlink task)
                                |
      nfc_unregister_device     | nfc_fw_download
       device_del               |  device_lock
        ...                     |   if (!device_is_registered)//(1)
        kobject_del//(2)        |   ...
       ...                      |  device_unlock
      
      The device_is_registered() returns the value of state_in_sysfs and
      the state_in_sysfs is set to zero in kobject_del(). If we pass check in
      position (1), then set zero in position (2). As a result, the check
      in position (1) is useless.
      
      This patch uses bool variable instead of device_is_registered() to judge
      whether the nfc device is registered, which is well synchronized.
      
      Fixes: 3e256b8f ("NFC: add nfc subsystem core")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da5c0f11
    • Tan Tee Min's avatar
      net: stmmac: disable Split Header (SPH) for Intel platforms · 47f753c1
      Tan Tee Min authored
      Based on DesignWare Ethernet QoS datasheet, we are seeing the limitation
      of Split Header (SPH) feature is not supported for Ipv4 fragmented packet.
      This SPH limitation will cause ping failure when the packets size exceed
      the MTU size. For example, the issue happens once the basic ping packet
      size is larger than the configured MTU size and the data is lost inside
      the fragmented packet, replaced by zeros/corrupted values, and leads to
      ping fail.
      
      So, disable the Split Header for Intel platforms.
      
      v2: Add fixes tag in commit message.
      
      Fixes: 67afd6d1("net: stmmac: Add Split Header support and enable it in XGMAC cores")
      Cc: <stable@vger.kernel.org> # 5.10.x
      Suggested-by: default avatarOng, Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarMohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarTan Tee Min <tee.min.tan@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47f753c1
  5. 30 Apr, 2022 2 commits
    • Eric Dumazet's avatar
      mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() · a9384a4c
      Eric Dumazet authored
      Whenever RCU protected list replaces an object,
      the pointer to the new object needs to be updated
      _before_ the call to kfree_rcu() or call_rcu()
      
      Also ip6_mc_msfilter() needs to update the pointer
      before releasing the mc_lock mutex.
      
      Note that linux-5.13 was supporting kfree_rcu(NULL, rcu),
      so this fix does not need the conditional test I was
      forced to use in the equivalent patch for IPv4.
      
      Fixes: 882ba1f7 ("mld: convert ipv6_mc_socklist->sflist to RCU")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Taehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9384a4c
    • Eric Dumazet's avatar
      net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() · dba5bdd5
      Eric Dumazet authored
      syzbot reported an UAF in ip_mc_sf_allow() [1]
      
      Whenever RCU protected list replaces an object,
      the pointer to the new object needs to be updated
      _before_ the call to kfree_rcu() or call_rcu()
      
      Because kfree_rcu(ptr, rcu) got support for NULL ptr
      only recently in commit 12edff04 ("rcu: Make kfree_rcu()
      ignore NULL pointers"), I chose to use the conditional
      to make sure stable backports won't miss this detail.
      
      if (psl)
          kfree_rcu(psl, rcu);
      
      net/ipv6/mcast.c has similar issues, addressed in a separate patch.
      
      [1]
      BUG: KASAN: use-after-free in ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
      Read of size 4 at addr ffff88807d37b904 by task syz-executor.5/908
      
      CPU: 0 PID: 908 Comm: syz-executor.5 Not tainted 5.18.0-rc4-syzkaller-00064-g8f4dd166 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
       raw_v4_input net/ipv4/raw.c:190 [inline]
       raw_local_deliver+0x4d1/0xbe0 net/ipv4/raw.c:218
       ip_protocol_deliver_rcu+0xcf/0xb30 net/ipv4/ip_input.c:193
       ip_local_deliver_finish+0x2ee/0x4c0 net/ipv4/ip_input.c:233
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:254
       dst_input include/net/dst.h:461 [inline]
       ip_rcv_finish+0x1cb/0x2f0 net/ipv4/ip_input.c:437
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_rcv+0xaa/0xd0 net/ipv4/ip_input.c:556
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
       __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
       netif_receive_skb_internal net/core/dev.c:5605 [inline]
       netif_receive_skb+0x13e/0x8e0 net/core/dev.c:5664
       tun_rx_batched.isra.0+0x460/0x720 drivers/net/tun.c:1534
       tun_get_user+0x28b7/0x3e30 drivers/net/tun.c:1985
       tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2015
       call_write_iter include/linux/fs.h:2050 [inline]
       new_sync_write+0x38a/0x560 fs/read_write.c:504
       vfs_write+0x7c0/0xac0 fs/read_write.c:591
       ksys_write+0x127/0x250 fs/read_write.c:644
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f3f12c3bbff
      Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48
      RSP: 002b:00007f3f13ea9130 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f3f12d9bf60 RCX: 00007f3f12c3bbff
      RDX: 0000000000000036 RSI: 0000000020002ac0 RDI: 00000000000000c8
      RBP: 00007f3f12ce308d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000036 R11: 0000000000000293 R12: 0000000000000000
      R13: 00007fffb68dd79f R14: 00007f3f13ea9300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 908:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:45 [inline]
       set_alloc_info mm/kasan/common.c:436 [inline]
       ____kasan_kmalloc mm/kasan/common.c:515 [inline]
       ____kasan_kmalloc mm/kasan/common.c:474 [inline]
       __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:524
       kasan_kmalloc include/linux/kasan.h:234 [inline]
       __do_kmalloc mm/slab.c:3710 [inline]
       __kmalloc+0x209/0x4d0 mm/slab.c:3719
       kmalloc include/linux/slab.h:586 [inline]
       sock_kmalloc net/core/sock.c:2501 [inline]
       sock_kmalloc+0xb5/0x100 net/core/sock.c:2492
       ip_mc_source+0xba2/0x1100 net/ipv4/igmp.c:2392
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1296 [inline]
       ip_setsockopt+0x2312/0x3ab0 net/ipv4/ip_sockglue.c:1432
       raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 753:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:45
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free+0x13d/0x180 mm/kasan/common.c:328
       kasan_slab_free include/linux/kasan.h:200 [inline]
       __cache_free mm/slab.c:3439 [inline]
       kmem_cache_free_bulk+0x69/0x460 mm/slab.c:3774
       kfree_bulk include/linux/slab.h:437 [inline]
       kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3318
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
       kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3595
       ip_mc_msfilter+0x712/0xb60 net/ipv4/igmp.c:2510
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1257 [inline]
       ip_setsockopt+0x32e1/0x3ab0 net/ipv4/ip_sockglue.c:1432
       raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Second to last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
       call_rcu+0x99/0x790 kernel/rcu/tree.c:3074
       mpls_dev_notify+0x552/0x8a0 net/mpls/af_mpls.c:1656
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1938
       call_netdevice_notifiers_extack net/core/dev.c:1976 [inline]
       call_netdevice_notifiers net/core/dev.c:1990 [inline]
       unregister_netdevice_many+0x92e/0x1890 net/core/dev.c:10751
       default_device_exit_batch+0x449/0x590 net/core/dev.c:11245
       ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
       cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
      
      The buggy address belongs to the object at ffff88807d37b900
       which belongs to the cache kmalloc-64 of size 64
      The buggy address is located 4 bytes inside of
       64-byte region [ffff88807d37b900, ffff88807d37b940)
      
      The buggy address belongs to the physical page:
      page:ffffea0001f4dec0 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88807d37b180 pfn:0x7d37b
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 ffff888010c41340 ffffea0001c795c8 ffff888010c40200
      raw: ffff88807d37b180 ffff88807d37b000 000000010000001f 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x342040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_HARDWALL|__GFP_THISNODE), pid 2963, tgid 2963 (udevd), ts 139732238007, free_ts 139730893262
       prep_new_page mm/page_alloc.c:2441 [inline]
       get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408
       __alloc_pages_node include/linux/gfp.h:587 [inline]
       kmem_getpages mm/slab.c:1378 [inline]
       cache_grow_begin+0x75/0x350 mm/slab.c:2584
       cache_alloc_refill+0x27f/0x380 mm/slab.c:2957
       ____cache_alloc mm/slab.c:3040 [inline]
       ____cache_alloc mm/slab.c:3023 [inline]
       __do_cache_alloc mm/slab.c:3267 [inline]
       slab_alloc mm/slab.c:3309 [inline]
       __do_kmalloc mm/slab.c:3708 [inline]
       __kmalloc+0x3b3/0x4d0 mm/slab.c:3719
       kmalloc include/linux/slab.h:586 [inline]
       kzalloc include/linux/slab.h:714 [inline]
       tomoyo_encode2.part.0+0xe9/0x3a0 security/tomoyo/realpath.c:45
       tomoyo_encode2 security/tomoyo/realpath.c:31 [inline]
       tomoyo_encode+0x28/0x50 security/tomoyo/realpath.c:80
       tomoyo_realpath_from_path+0x186/0x620 security/tomoyo/realpath.c:288
       tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
       tomoyo_path_perm+0x21b/0x400 security/tomoyo/file.c:822
       security_inode_getattr+0xcf/0x140 security/security.c:1350
       vfs_getattr fs/stat.c:157 [inline]
       vfs_statx+0x16a/0x390 fs/stat.c:232
       vfs_fstatat+0x8c/0xb0 fs/stat.c:255
       __do_sys_newfstatat+0x91/0x110 fs/stat.c:425
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1356 [inline]
       free_pcp_prepare+0x549/0xd20 mm/page_alloc.c:1406
       free_unref_page_prepare mm/page_alloc.c:3328 [inline]
       free_unref_page+0x19/0x6a0 mm/page_alloc.c:3423
       __vunmap+0x85d/0xd30 mm/vmalloc.c:2667
       __vfree+0x3c/0xd0 mm/vmalloc.c:2715
       vfree+0x5a/0x90 mm/vmalloc.c:2746
       __do_replace+0x16b/0x890 net/ipv6/netfilter/ip6_tables.c:1117
       do_replace net/ipv6/netfilter/ip6_tables.c:1157 [inline]
       do_ip6t_set_ctl+0x90d/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
       nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
       ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1026
       tcp_setsockopt+0x136/0x2520 net/ipv4/tcp.c:3696
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Memory state around the buggy address:
       ffff88807d37b800: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
       ffff88807d37b880: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      >ffff88807d37b900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                         ^
       ffff88807d37b980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff88807d37ba00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: c85bb41e ("igmp: fix ip_mc_sf_allow race [v5]")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Flavio Leitner <fbl@sysclose.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba5bdd5