1. 17 Jul, 2023 15 commits
    • Ahmed Zaki's avatar
      iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies · d1639a17
      Ahmed Zaki authored
      A driver's lock (crit_lock) is used to serialize all the driver's tasks.
      Lockdep, however, shows a circular dependency between rtnl and
      crit_lock. This happens when an ndo that already holds the rtnl requests
      the driver to reset, since the reset task (in some paths) tries to grab
      rtnl to either change real number of queues of update netdev features.
      
        [566.241851] ======================================================
        [566.241893] WARNING: possible circular locking dependency detected
        [566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G           OE
        [566.241984] ------------------------------------------------------
        [566.242025] repro.sh/2604 is trying to acquire lock:
        [566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
        [566.242167]
                     but task is already holding lock:
        [566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
        [566.242300]
                     which lock already depends on the new lock.
      
        [566.242353]
                     the existing dependency chain (in reverse order) is:
        [566.242401]
                     -> #1 (rtnl_mutex){+.+.}-{3:3}:
        [566.242451]        __mutex_lock+0xc1/0xbb0
        [566.242489]        iavf_init_interrupt_scheme+0x179/0x440 [iavf]
        [566.242560]        iavf_watchdog_task+0x80b/0x1400 [iavf]
        [566.242627]        process_one_work+0x2b3/0x560
        [566.242663]        worker_thread+0x4f/0x3a0
        [566.242696]        kthread+0xf2/0x120
        [566.242730]        ret_from_fork+0x29/0x50
        [566.242763]
                     -> #0 (&adapter->crit_lock){+.+.}-{3:3}:
        [566.242815]        __lock_acquire+0x15ff/0x22b0
        [566.242869]        lock_acquire+0xd2/0x2c0
        [566.242901]        __mutex_lock+0xc1/0xbb0
        [566.242934]        iavf_close+0x3c/0x240 [iavf]
        [566.242997]        __dev_close_many+0xac/0x120
        [566.243036]        dev_close_many+0x8b/0x140
        [566.243071]        unregister_netdevice_many_notify+0x165/0x7c0
        [566.243116]        unregister_netdevice_queue+0xd3/0x110
        [566.243157]        iavf_remove+0x6c1/0x730 [iavf]
        [566.243217]        pci_device_remove+0x33/0xa0
        [566.243257]        device_release_driver_internal+0x1bc/0x240
        [566.243299]        pci_stop_bus_device+0x6c/0x90
        [566.243338]        pci_stop_and_remove_bus_device+0xe/0x20
        [566.243380]        pci_iov_remove_virtfn+0xd1/0x130
        [566.243417]        sriov_disable+0x34/0xe0
        [566.243448]        ice_free_vfs+0x2da/0x330 [ice]
        [566.244383]        ice_sriov_configure+0x88/0xad0 [ice]
        [566.245353]        sriov_numvfs_store+0xde/0x1d0
        [566.246156]        kernfs_fop_write_iter+0x15e/0x210
        [566.246921]        vfs_write+0x288/0x530
        [566.247671]        ksys_write+0x74/0xf0
        [566.248408]        do_syscall_64+0x58/0x80
        [566.249145]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [566.249886]
                       other info that might help us debug this:
      
        [566.252014]  Possible unsafe locking scenario:
      
        [566.253432]        CPU0                    CPU1
        [566.254118]        ----                    ----
        [566.254800]   lock(rtnl_mutex);
        [566.255514]                                lock(&adapter->crit_lock);
        [566.256233]                                lock(rtnl_mutex);
        [566.256897]   lock(&adapter->crit_lock);
        [566.257388]
                        *** DEADLOCK ***
      
      The deadlock can be triggered by a script that is continuously resetting
      the VF adapter while doing other operations requiring RTNL, e.g:
      
      	while :; do
      		ip link set $VF up
      		ethtool --set-channels $VF combined 2
      		ip link set $VF down
      		ip link set $VF up
      		ethtool --set-channels $VF combined 4
      		ip link set $VF down
      	done
      
      Any operation that triggers a reset can substitute "ethtool --set-channles"
      
      As a fix, add a new task "finish_config" that do all the work which
      needs rtnl lock. With the exception of iavf_remove(), all work that
      require rtnl should be called from this task.
      
      As for iavf_remove(), at the point where we need to call
      unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
      task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
      finish_config work cannot restart after that since the task is guarded
      by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
      
      Fixes: 5ac49f3c ("iavf: use mutexes for locking of critical sections")
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d1639a17
    • Marcin Szycik's avatar
      Revert "iavf: Do not restart Tx queues after reset task failure" · d916d273
      Marcin Szycik authored
      This reverts commit 08f1c147.
      
      Netdev is no longer being detached during reset, so this fix can be
      reverted. We leave the removal of "hacky" IFF_UP flag update.
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d916d273
    • Marcin Szycik's avatar
      Revert "iavf: Detach device during reset task" · d2806d96
      Marcin Szycik authored
      This reverts commit aa626da9.
      
      Detaching device during reset was not fully fixing the rtnl locking issue,
      as there could be a situation where callback was already in progress before
      detaching netdev.
      
      Furthermore, detaching netdevice causes TX timeouts if traffic is running.
      To reproduce:
      
      ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null &
      while :; do
              for i in 200 7000 400 5000 300 3000 ; do
      		ip netns exec ns1 ip link set $VF1 mtu $i
                      sleep 2
              done
              sleep 10
      done
      
      Currently, callbacks such as iavf_change_mtu() wait for the reset.
      If the reset fails to acquire the rtnl_lock, they schedule the netdev
      update for later while continuing the reset flow. Operations like MTU
      changes are performed under the rtnl_lock. Therefore, when the operation
      finishes, another callback that uses rtnl_lock can start.
      Signed-off-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d2806d96
    • Marcin Szycik's avatar
      iavf: Wait for reset in callbacks which trigger it · c2ed2403
      Marcin Szycik authored
      There was a fail when trying to add the interface to bonding
      right after changing the MTU on the interface. It was caused
      by bonding interface unable to open the interface due to
      interface being in __RESETTING state because of MTU change.
      
      Add new reset_waitqueue to indicate that reset has finished.
      
      Add waiting for reset to finish in callbacks which trigger hw reset:
      iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
      We use a 5000ms timeout period because on Hyper-V based systems,
      this operation takes around 3000-4000ms. In normal circumstances,
      it doesn't take more than 500ms to complete.
      
      Add a function iavf_wait_for_reset() to reuse waiting for reset code and
      use it also in iavf_set_channels(), which already waits for reset.
      We don't use error handling in iavf_set_channels() as this could
      cause the device to be in incorrect state if the reset was scheduled
      but hit timeout or the waitng function was interrupted by a signal.
      
      Fixes: 4e5e6b5d ("iavf: Fix return of set the new channel count")
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Co-developed-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Signed-off-by: default avatarKamil Maziarz <kamil.maziarz@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c2ed2403
    • Ahmed Zaki's avatar
      iavf: use internal state to free traffic IRQs · a77ed5c5
      Ahmed Zaki authored
      If the system tries to close the netdev while iavf_reset_task() is
      running, __LINK_STATE_START will be cleared and netif_running() will
      return false in iavf_reinit_interrupt_scheme(). This will result in
      iavf_free_traffic_irqs() not being called and a leak as follows:
      
          [7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
          [7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
      
      is shown when pci_disable_msix() is later called. Fix by using the
      internal adapter state. The traffic IRQs will always exist if
      state == __IAVF_RUNNING.
      
      Fixes: 5b36e8d0 ("i40evf: Enable VF to request an alternate queue allocation")
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a77ed5c5
    • Ding Hui's avatar
      iavf: Fix out-of-bounds when setting channels on remove · 7c4bced3
      Ding Hui authored
      If we set channels greater during iavf_remove(), and waiting reset done
      would be timeout, then returned with error but changed num_active_queues
      directly, that will lead to OOB like the following logs. Because the
      num_active_queues is greater than tx/rx_rings[] allocated actually.
      
      Reproducer:
      
        [root@host ~]# cat repro.sh
        #!/bin/bash
      
        pf_dbsf="0000:41:00.0"
        vf0_dbsf="0000:41:02.0"
        g_pids=()
      
        function do_set_numvf()
        {
            echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
            echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
        }
      
        function do_set_channel()
        {
            local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
            [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
            ifconfig $nic 192.168.18.5 netmask 255.255.255.0
            ifconfig $nic up
            ethtool -L $nic combined 1
            ethtool -L $nic combined 4
            sleep $((RANDOM%3))
        }
      
        function on_exit()
        {
            local pid
            for pid in "${g_pids[@]}"; do
                kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
            done
            g_pids=()
        }
      
        trap "on_exit; exit" EXIT
      
        while :; do do_set_numvf ; done &
        g_pids+=($!)
        while :; do do_set_channel ; done &
        g_pids+=($!)
      
        wait
      
      Result:
      
      [ 3506.152887] iavf 0000:41:02.0: Removing device
      [ 3510.400799] ==================================================================
      [ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536
      [ 3510.400823]
      [ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
      [ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
      [ 3510.400835] Call Trace:
      [ 3510.400851]  dump_stack+0x71/0xab
      [ 3510.400860]  print_address_description+0x6b/0x290
      [ 3510.400865]  ? iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400868]  kasan_report+0x14a/0x2b0
      [ 3510.400873]  iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400880]  iavf_remove+0x2b6/0xc70 [iavf]
      [ 3510.400884]  ? iavf_free_all_rx_resources+0x160/0x160 [iavf]
      [ 3510.400891]  ? wait_woken+0x1d0/0x1d0
      [ 3510.400895]  ? notifier_call_chain+0xc1/0x130
      [ 3510.400903]  pci_device_remove+0xa8/0x1f0
      [ 3510.400910]  device_release_driver_internal+0x1c6/0x460
      [ 3510.400916]  pci_stop_bus_device+0x101/0x150
      [ 3510.400919]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 3510.400924]  pci_iov_remove_virtfn+0x187/0x420
      [ 3510.400927]  ? pci_iov_add_virtfn+0xe10/0xe10
      [ 3510.400929]  ? pci_get_subsys+0x90/0x90
      [ 3510.400932]  sriov_disable+0xed/0x3e0
      [ 3510.400936]  ? bus_find_device+0x12d/0x1a0
      [ 3510.400953]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 3510.400966]  ? i40e_reset_all_vfs+0x880/0x880 [i40e]
      [ 3510.400968]  ? pci_get_device+0x7c/0x90
      [ 3510.400970]  ? pci_get_subsys+0x90/0x90
      [ 3510.400982]  ? pci_vfs_assigned.part.7+0x144/0x210
      [ 3510.400987]  ? __mutex_lock_slowpath+0x10/0x10
      [ 3510.400996]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 3510.401001]  sriov_numvfs_store+0x214/0x290
      [ 3510.401005]  ? sriov_totalvfs_show+0x30/0x30
      [ 3510.401007]  ? __mutex_lock_slowpath+0x10/0x10
      [ 3510.401011]  ? __check_object_size+0x15a/0x350
      [ 3510.401018]  kernfs_fop_write+0x280/0x3f0
      [ 3510.401022]  vfs_write+0x145/0x440
      [ 3510.401025]  ksys_write+0xab/0x160
      [ 3510.401028]  ? __ia32_sys_read+0xb0/0xb0
      [ 3510.401031]  ? fput_many+0x1a/0x120
      [ 3510.401032]  ? filp_close+0xf0/0x130
      [ 3510.401038]  do_syscall_64+0xa0/0x370
      [ 3510.401041]  ? page_fault+0x8/0x30
      [ 3510.401043]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 3510.401073] RIP: 0033:0x7f3a9bb842c0
      [ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
      [ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0
      [ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001
      [ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700
      [ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
      [ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001
      [ 3510.401090]
      [ 3510.401093] Allocated by task 76795:
      [ 3510.401098]  kasan_kmalloc+0xa6/0xd0
      [ 3510.401099]  __kmalloc+0xfb/0x200
      [ 3510.401104]  iavf_init_interrupt_scheme+0x26f/0x1310 [iavf]
      [ 3510.401108]  iavf_watchdog_task+0x1d58/0x4050 [iavf]
      [ 3510.401114]  process_one_work+0x56a/0x11f0
      [ 3510.401115]  worker_thread+0x8f/0xf40
      [ 3510.401117]  kthread+0x2a0/0x390
      [ 3510.401119]  ret_from_fork+0x1f/0x40
      [ 3510.401122]  0xffffffffffffffff
      [ 3510.401123]
      
      In timeout handling, we should keep the original num_active_queues
      and reset num_req_queues to 0.
      
      Fixes: 4e5e6b5d ("iavf: Fix return of set the new channel count")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
      Cc: Huang Cun <huangcun@sangfor.com.cn>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7c4bced3
    • Ding Hui's avatar
      iavf: Fix use-after-free in free_netdev · 5f4fa167
      Ding Hui authored
      We do netif_napi_add() for all allocated q_vectors[], but potentially
      do netif_napi_del() for part of them, then kfree q_vectors and leave
      invalid pointers at dev->napi_list.
      
      Reproducer:
      
        [root@host ~]# cat repro.sh
        #!/bin/bash
      
        pf_dbsf="0000:41:00.0"
        vf0_dbsf="0000:41:02.0"
        g_pids=()
      
        function do_set_numvf()
        {
            echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
            echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
        }
      
        function do_set_channel()
        {
            local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
            [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
            ifconfig $nic 192.168.18.5 netmask 255.255.255.0
            ifconfig $nic up
            ethtool -L $nic combined 1
            ethtool -L $nic combined 4
            sleep $((RANDOM%3))
        }
      
        function on_exit()
        {
            local pid
            for pid in "${g_pids[@]}"; do
                kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
            done
            g_pids=()
        }
      
        trap "on_exit; exit" EXIT
      
        while :; do do_set_numvf ; done &
        g_pids+=($!)
        while :; do do_set_channel ; done &
        g_pids+=($!)
      
        wait
      
      Result:
      
      [ 4093.900222] ==================================================================
      [ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390
      [ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699
      [ 4093.900233]
      [ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
      [ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
      [ 4093.900239] Call Trace:
      [ 4093.900244]  dump_stack+0x71/0xab
      [ 4093.900249]  print_address_description+0x6b/0x290
      [ 4093.900251]  ? free_netdev+0x308/0x390
      [ 4093.900252]  kasan_report+0x14a/0x2b0
      [ 4093.900254]  free_netdev+0x308/0x390
      [ 4093.900261]  iavf_remove+0x825/0xd20 [iavf]
      [ 4093.900265]  pci_device_remove+0xa8/0x1f0
      [ 4093.900268]  device_release_driver_internal+0x1c6/0x460
      [ 4093.900271]  pci_stop_bus_device+0x101/0x150
      [ 4093.900273]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 4093.900275]  pci_iov_remove_virtfn+0x187/0x420
      [ 4093.900277]  ? pci_iov_add_virtfn+0xe10/0xe10
      [ 4093.900278]  ? pci_get_subsys+0x90/0x90
      [ 4093.900280]  sriov_disable+0xed/0x3e0
      [ 4093.900282]  ? bus_find_device+0x12d/0x1a0
      [ 4093.900290]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 4093.900298]  ? i40e_reset_all_vfs+0x880/0x880 [i40e]
      [ 4093.900299]  ? pci_get_device+0x7c/0x90
      [ 4093.900300]  ? pci_get_subsys+0x90/0x90
      [ 4093.900306]  ? pci_vfs_assigned.part.7+0x144/0x210
      [ 4093.900309]  ? __mutex_lock_slowpath+0x10/0x10
      [ 4093.900315]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 4093.900318]  sriov_numvfs_store+0x214/0x290
      [ 4093.900320]  ? sriov_totalvfs_show+0x30/0x30
      [ 4093.900321]  ? __mutex_lock_slowpath+0x10/0x10
      [ 4093.900323]  ? __check_object_size+0x15a/0x350
      [ 4093.900326]  kernfs_fop_write+0x280/0x3f0
      [ 4093.900329]  vfs_write+0x145/0x440
      [ 4093.900330]  ksys_write+0xab/0x160
      [ 4093.900332]  ? __ia32_sys_read+0xb0/0xb0
      [ 4093.900334]  ? fput_many+0x1a/0x120
      [ 4093.900335]  ? filp_close+0xf0/0x130
      [ 4093.900338]  do_syscall_64+0xa0/0x370
      [ 4093.900339]  ? page_fault+0x8/0x30
      [ 4093.900341]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 4093.900357] RIP: 0033:0x7f16ad4d22c0
      [ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
      [ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0
      [ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001
      [ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700
      [ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
      [ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001
      [ 4093.900367]
      [ 4093.900368] Allocated by task 820:
      [ 4093.900371]  kasan_kmalloc+0xa6/0xd0
      [ 4093.900373]  __kmalloc+0xfb/0x200
      [ 4093.900376]  iavf_init_interrupt_scheme+0x63b/0x1320 [iavf]
      [ 4093.900380]  iavf_watchdog_task+0x3d51/0x52c0 [iavf]
      [ 4093.900382]  process_one_work+0x56a/0x11f0
      [ 4093.900383]  worker_thread+0x8f/0xf40
      [ 4093.900384]  kthread+0x2a0/0x390
      [ 4093.900385]  ret_from_fork+0x1f/0x40
      [ 4093.900387]  0xffffffffffffffff
      [ 4093.900387]
      [ 4093.900388] Freed by task 6699:
      [ 4093.900390]  __kasan_slab_free+0x137/0x190
      [ 4093.900391]  kfree+0x8b/0x1b0
      [ 4093.900394]  iavf_free_q_vectors+0x11d/0x1a0 [iavf]
      [ 4093.900397]  iavf_remove+0x35a/0xd20 [iavf]
      [ 4093.900399]  pci_device_remove+0xa8/0x1f0
      [ 4093.900400]  device_release_driver_internal+0x1c6/0x460
      [ 4093.900401]  pci_stop_bus_device+0x101/0x150
      [ 4093.900402]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 4093.900403]  pci_iov_remove_virtfn+0x187/0x420
      [ 4093.900404]  sriov_disable+0xed/0x3e0
      [ 4093.900409]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 4093.900415]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 4093.900416]  sriov_numvfs_store+0x214/0x290
      [ 4093.900417]  kernfs_fop_write+0x280/0x3f0
      [ 4093.900418]  vfs_write+0x145/0x440
      [ 4093.900419]  ksys_write+0xab/0x160
      [ 4093.900420]  do_syscall_64+0xa0/0x370
      [ 4093.900421]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 4093.900422]  0xffffffffffffffff
      [ 4093.900422]
      [ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200
                      which belongs to the cache kmalloc-8k of size 8192
      [ 4093.900425] The buggy address is located 5184 bytes inside of
                      8192-byte region [ffff88b4dc144200, ffff88b4dc146200)
      [ 4093.900425] The buggy address belongs to the page:
      [ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0
      [ 4093.900430] flags: 0x10000000008100(slab|head)
      [ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80
      [ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
      [ 4093.900434] page dumped because: kasan: bad access detected
      [ 4093.900435]
      [ 4093.900435] Memory state around the buggy address:
      [ 4093.900436]  ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900437]  ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900438]                                            ^
      [ 4093.900439]  ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900440]  ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900440] ==================================================================
      
      Although the patch #2 (of 2) can avoid the issue triggered by this
      repro.sh, there still are other potential risks that if num_active_queues
      is changed to less than allocated q_vectors[] by unexpected, the
      mismatched netif_napi_add/del() can also cause UAF.
      
      Since we actually call netif_napi_add() for all allocated q_vectors
      unconditionally in iavf_alloc_q_vectors(), so we should fix it by
      letting netif_napi_del() match to netif_napi_add().
      
      Fixes: 5eae00c5 ("i40evf: main driver core")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
      Cc: Huang Cun <huangcun@sangfor.com.cn>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarMadhu Chittim <madhu.chittim@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5f4fa167
    • Heiner Kallweit's avatar
      r8169: fix ASPM-related problem for chip version 42 and 43 · 162d626f
      Heiner Kallweit authored
      Referenced commit missed that for chip versions 42 and 43 ASPM
      remained disabled in the respective rtl_hw_start_...() routines.
      This resulted in problems as described in the referenced bug
      ticket. Therefore re-instantiate the previous logic.
      
      Fixes: 5fc3f6c9 ("r8169: consolidate disabling ASPM before EPHY access")
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217635Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      162d626f
    • Tristram Ha's avatar
      net: dsa: microchip: correct KSZ8795 static MAC table access · 4bdf79d6
      Tristram Ha authored
      The KSZ8795 driver code was modified to use on KSZ8863/73, which has
      different register definitions.  Some of the new KSZ8795 register
      information are wrong compared to previous code.
      
      KSZ8795 also behaves differently in that the STATIC_MAC_TABLE_USE_FID
      and STATIC_MAC_TABLE_FID bits are off by 1 when doing MAC table reading
      than writing.  To compensate that a special code was added to shift the
      register value by 1 before applying those bits.  This is wrong when the
      code is running on KSZ8863, so this special code is only executed when
      KSZ8795 is detected.
      
      Fixes: 4b20a07e ("net: dsa: microchip: ksz8795: add support for ksz88xx chips")
      Signed-off-by: default avatarTristram Ha <Tristram.Ha@microchip.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bdf79d6
    • David S. Miller's avatar
      Merge branch 'sched-fixes' · 6e8778f8
      David S. Miller authored
      Victor Nogueira says:
      
      ====================
      net: sched: Fixes for classifiers
      
      Four different classifiers (bpf, u32, matchall, and flower) are
      calling tcf_bind_filter in their callbacks, but arent't undoing it by
      calling tcf_unbind_filter if their was an error after binding.
      
      This patch set fixes all this by calling tcf_unbind_filter in such
      cases.
      
      This set also undoes a refcount decrement in cls_u32 when an update
      fails under specific conditions which are described in patch #3.
      
      v1 -> v2:
      * Remove blank line after fixes tag
      * Fix reverse xmas tree issues pointed out by Simon
      
      v2 -> v3:
      * Inline functions cls_bpf_set_parms and fl_set_parms to avoid adding
        yet another parameter (and a return value at it) to them.
      * Remove similar fixes for u32 and matchall, which will be sent soon,
        once we find a way to do the fixes without adding a return parameter
        to their set_parms functions.
      
      v3 -> v4:
      * Inline mall_set_parms to avoid adding yet another parameter.
      * Remove set_flags parameter from u32_set_parms and create a separate
        function for calling tcf_bind_filter and tcf_unbind_filter in case of
        failure.
      * Change cover letter title to also encompass refcnt fix for u32
      
      v4 -> v5:
      * Change back tag to net
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e8778f8
    • Victor Nogueira's avatar
      net: sched: cls_flower: Undo tcf_bind_filter in case of an error · ac177a33
      Victor Nogueira authored
      If TCA_FLOWER_CLASSID is specified in the netlink message, the code will
      call tcf_bind_filter. However, if any error occurs after that, the code
      should undo this by calling tcf_unbind_filter.
      
      Fixes: 77b9900e ("tc: introduce Flower classifier")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac177a33
    • Victor Nogueira's avatar
      net: sched: cls_bpf: Undo tcf_bind_filter in case of an error · 26a22194
      Victor Nogueira authored
      If cls_bpf_offload errors out, we must also undo tcf_bind_filter that
      was done before the error.
      
      Fix that by calling tcf_unbind_filter in errout_parms.
      
      Fixes: eadb4148 ("net: cls_bpf: add support for marking filters as hardware-only")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26a22194
    • Victor Nogueira's avatar
      net: sched: cls_u32: Undo refcount decrement in case update failed · e8d3d78c
      Victor Nogueira authored
      In the case of an update, when TCA_U32_LINK is set, u32_set_parms will
      decrement the refcount of the ht_down (struct tc_u_hnode) pointer
      present in the older u32 filter which we are replacing. However, if
      u32_replace_hw_knode errors out, the update command fails and that
      ht_down pointer continues decremented. To fix that, when
      u32_replace_hw_knode fails, check if ht_down's refcount was decremented
      and undo the decrement.
      
      Fixes: d34e3e18 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8d3d78c
    • Victor Nogueira's avatar
      net: sched: cls_u32: Undo tcf_bind_filter if u32_replace_hw_knode · 9cb36fae
      Victor Nogueira authored
      When u32_replace_hw_knode fails, we need to undo the tcf_bind_filter
      operation done at u32_set_parms.
      
      Fixes: d34e3e18 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cb36fae
    • Victor Nogueira's avatar
      net: sched: cls_matchall: Undo tcf_bind_filter in case of failure after mall_set_parms · b3d0e048
      Victor Nogueira authored
      In case an error occurred after mall_set_parms executed successfully, we
      must undo the tcf_bind_filter call it issues.
      
      Fix that by calling tcf_unbind_filter in err_replace_hw_filter label.
      
      Fixes: ec2507d2 ("net/sched: cls_matchall: Fix error path")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3d0e048
  2. 15 Jul, 2023 10 commits
  3. 14 Jul, 2023 9 commits
  4. 13 Jul, 2023 6 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b1983d42
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from netfilter, wireless and ebpf.
      
        Current release - regressions:
      
         - netfilter: conntrack: gre: don't set assured flag for clash entries
      
         - wifi: iwlwifi: remove 'use_tfh' config to fix crash
      
        Previous releases - regressions:
      
         - ipv6: fix a potential refcount underflow for idev
      
         - icmp6: ifix null-ptr-deref of ip6_null_entry->rt6i_idev in
           icmp6_dev()
      
         - bpf: fix max stack depth check for async callbacks
      
         - eth: mlx5e:
            - check for NOT_READY flag state after locking
            - fix page_pool page fragment tracking for XDP
      
         - eth: igc:
            - fix tx hang issue when QBV gate is closed
            - fix corner cases for TSN offload
      
         - eth: octeontx2-af: Move validation of ptp pointer before its usage
      
         - eth: ena: fix shift-out-of-bounds in exponential backoff
      
        Previous releases - always broken:
      
         - core: prevent skb corruption on frag list segmentation
      
         - sched:
            - cls_fw: fix improper refcount update leads to use-after-free
            - sch_qfq: account for stab overhead in qfq_enqueue
      
         - netfilter:
            - report use refcount overflow
            - prevent OOB access in nft_byteorder_eval
      
         - wifi: mt7921e: fix init command fail with enabled device
      
         - eth: ocelot: fix oversize frame dropping for preemptible TCs
      
         - eth: fec: recycle pages for transmitted XDP frames"
      
      * tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
        selftests: tc-testing: add test for qfq with stab overhead
        net/sched: sch_qfq: account for stab overhead in qfq_enqueue
        selftests: tc-testing: add tests for qfq mtu sanity check
        net/sched: sch_qfq: reintroduce lmax bound check for MTU
        wifi: cfg80211: fix receiving mesh packets without RFC1042 header
        wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set()
        net: txgbe: fix eeprom calculation error
        net/sched: make psched_mtu() RTNL-less safe
        net: ena: fix shift-out-of-bounds in exponential backoff
        netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
        net/sched: flower: Ensure both minimum and maximum ports are specified
        MAINTAINERS: Add another mailing list for QUALCOMM ETHQOS ETHERNET DRIVER
        docs: netdev: update the URL of the status page
        wifi: iwlwifi: remove 'use_tfh' config to fix crash
        xdp: use trusted arguments in XDP hints kfuncs
        bpf: cpumap: Fix memory leak in cpu_map_update_elem
        wifi: airo: avoid uninitialized warning in airo_get_rate()
        octeontx2-pf: Add additional check for MCAM rules
        net: dsa: Removed unneeded of_node_put in felix_parse_ports_node
        net: fec: use netdev_err_once() instead of netdev_err()
        ...
      b1983d42
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ebc27aac
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix some missing-prototype warnings
      
       - Fix user events struct args (did not include size of struct)
      
         When creating a user event, the "struct" keyword is to denote that
         the size of the field will be passed in. But the parsing failed to
         handle this case.
      
       - Add selftest to struct sizes for user events
      
       - Fix sample code for direct trampolines.
      
         The sample code for direct trampolines attached to handle_mm_fault().
         But the prototype changed and the direct trampoline sample code was
         not updated. Direct trampolines needs to have the arguments correct
         otherwise it can fail or crash the system.
      
       - Remove unused ftrace_regs_caller_ret() prototype.
      
       - Quiet false positive of FORTIFY_SOURCE
      
         Due to backward compatibility, the structure used to save stack
         traces in the kernel had a fixed size of 8. This structure is
         exported to user space via the tracing format file. A change was made
         to allow more than 8 functions to be recorded, and user space now
         uses the size field to know how many functions are actually in the
         stack.
      
         But the structure still has size of 8 (even though it points into the
         ring buffer that has the required amount allocated to hold a full
         stack.
      
         This was fine until the fortifier noticed that the
         memcpy(&entry->caller, stack, size) was greater than the 8 functions
         and would complain at runtime about it.
      
         Hide this by using a pointer to the stack location on the ring buffer
         instead of using the address of the entry structure caller field.
      
       - Fix a deadloop in reading trace_pipe that was caused by a mismatch
         between ring_buffer_empty() returning false which then asked to read
         the data, but the read code uses rb_num_of_entries() that returned
         zero, and causing a infinite "retry".
      
       - Fix a warning caused by not using all pages allocated to store ftrace
         functions, where this can happen if the linker inserts a bunch of
         "NULL" entries, causing the accounting of how many pages needed to be
         off.
      
       - Fix histogram synthetic event crashing when the start event is
         removed and the end event is still using a variable from it
      
       - Fix memory leak in freeing iter->temp in tracing_release_pipe()
      
      * tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Fix memory leak of iter->temp when reading trace_pipe
        tracing/histograms: Add histograms to hist_vars if they have referenced variables
        tracing: Stop FORTIFY_SOURCE complaining about stack trace caller
        ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
        ring-buffer: Fix deadloop issue on reading trace_pipe
        tracing: arm64: Avoid missing-prototype warnings
        selftests/user_events: Test struct size match cases
        tracing/user_events: Fix struct arg size match check
        x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()
        arm64: ftrace: Add direct call trampoline samples support
        samples: ftrace: Save required argument registers in sample trampolines
      ebc27aac
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 15999328
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - a cleanup of the Xen related ELF-notes
      
       - a fix for virtio handling in Xen dom0 when running Xen in a VM
      
      * tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
        x86/Xen: tidy xen-head.S
      15999328
    • Linus Torvalds's avatar
      Merge tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux · 9350cd01
      Linus Torvalds authored
      Pull sh fixes from John Paul Adrian Glaubitz:
       "The sh updates introduced multiple regressions.
      
        In particular, the change a8ac2961 ("sh: Avoid using IRQ0 on SH3
        and SH4") causes several boards to hang during boot due to incorrect
        IRQ numbers.
      
        Geert Uytterhoeven has contributed patches that handle the virq offset
        in the IRQ code for the dreamcast, highlander and r2d boards while
        Artur Rojek has contributed a patch which handles the virq offset for
        the hd64461 companion chip"
      
      * tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
        sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ
        sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux
        sh: mach-highlander: Handle virq offset in cascaded IRL demux
        sh: mach-r2d: Handle virq offset in cascaded IRL demux
      9350cd01
    • Zheng Yejian's avatar
      tracing: Fix memory leak of iter->temp when reading trace_pipe · d5a82189
      Zheng Yejian authored
      kmemleak reports:
        unreferenced object 0xffff88814d14e200 (size 256):
          comm "cat", pid 336, jiffies 4294871818 (age 779.490s)
          hex dump (first 32 bytes):
            04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00  ................
            0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff  .........Z......
          backtrace:
            [<ffffffff9bdff18f>] __kmalloc+0x4f/0x140
            [<ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
            [<ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
            [<ffffffff9bc94490>] print_trace_line+0x3e0/0x950
            [<ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
            [<ffffffff9bf03a43>] vfs_read+0x143/0x520
            [<ffffffff9bf04c2d>] ksys_read+0xbd/0x160
            [<ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
            [<ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
      in trace_find_next_entry() but not freed before 'trace_pipe' is closed.
      
      To fix it, free 'iter->temp' in tracing_release_pipe().
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230713141435.1133021-1-zhengyejian1@huawei.com
      
      Cc: stable@vger.kernel.org
      Fixes: ff895103 ("tracing: Save off entry when peeking at next entry")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d5a82189
    • Paolo Abeni's avatar
      Merge branch 'net-sched-fixes-for-sch_qfq' · 9d23aac8
      Paolo Abeni authored
      Pedro Tammela says:
      
      ====================
      net/sched: fixes for sch_qfq
      
      Patch 1 fixes a regression introduced in 6.4 where the MTU size could be
      bigger than 'lmax'.
      
      Patch 3 fixes an issue where the code doesn't account for qdisc_pkt_len()
      returning a size bigger then 'lmax'.
      
      Patches 2 and 4 are selftests for the issues above.
      ====================
      
      Link: https://lore.kernel.org/r/20230711210103.597831-1-pctammela@mojatatu.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9d23aac8