1. 11 Sep, 2021 1 commit
  2. 03 Sep, 2021 1 commit
    • Vitaly Kuznetsov's avatar
      Drivers: hv: vmbus: Fix kernel crash upon unbinding a device from uio_hv_generic driver · f1940d4e
      Vitaly Kuznetsov authored
      The following crash happens when a never-used device is unbound from
      uio_hv_generic driver:
      
       kernel BUG at mm/slub.c:321!
       invalid opcode: 0000 [#1] SMP PTI
       CPU: 0 PID: 4001 Comm: bash Kdump: loaded Tainted: G               X --------- ---  5.14.0-0.rc2.23.el9.x86_64 #1
       Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
       RIP: 0010:__slab_free+0x1d5/0x3d0
      ...
       Call Trace:
        ? pick_next_task_fair+0x18e/0x3b0
        ? __cond_resched+0x16/0x40
        ? vunmap_pmd_range.isra.0+0x154/0x1c0
        ? __vunmap+0x22d/0x290
        ? hv_ringbuffer_cleanup+0x36/0x40 [hv_vmbus]
        kfree+0x331/0x380
        ? hv_uio_remove+0x43/0x60 [uio_hv_generic]
        hv_ringbuffer_cleanup+0x36/0x40 [hv_vmbus]
        vmbus_free_ring+0x21/0x60 [hv_vmbus]
        hv_uio_remove+0x4f/0x60 [uio_hv_generic]
        vmbus_remove+0x23/0x30 [hv_vmbus]
        __device_release_driver+0x17a/0x230
        device_driver_detach+0x3c/0xa0
        unbind_store+0x113/0x130
      ...
      
      The problem appears to be that we free 'ring_info->pkt_buffer' twice:
      first, when the device is unbound from in-kernel driver (netvsc in this
      case) and second from hv_uio_remove(). Normally, ring buffer is supposed
      to be re-initialized from hv_uio_open() but this happens when UIO device
      is being opened and this is not guaranteed to happen.
      
      Generally, it is OK to call hv_ringbuffer_cleanup() twice for the same
      channel (which is being handed over between in-kernel drivers and UIO) even
      if we didn't call hv_ringbuffer_init() in between. We, however, need to
      avoid kfree() call for an already freed pointer.
      
      Fixes: adae1e93 ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarAndrea Parri <parri.andrea@gmail.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Link: https://lore.kernel.org/r/20210831143916.144983-1-vkuznets@redhat.comSigned-off-by: default avatarWei Liu <wei.liu@kernel.org>
      f1940d4e
  3. 29 Aug, 2021 8 commits
  4. 28 Aug, 2021 3 commits
  5. 27 Aug, 2021 18 commits
  6. 26 Aug, 2021 9 commits
    • Marek Marczykowski-Górecki's avatar
      PCI/MSI: Skip masking MSI-X on Xen PV · 1a519dc7
      Marek Marczykowski-Górecki authored
      When running as Xen PV guest, masking MSI-X is a responsibility of the
      hypervisor. The guest has no write access to the relevant BAR at all - when
      it tries to, it results in a crash like this:
      
          BUG: unable to handle page fault for address: ffffc9004069100c
          #PF: supervisor write access in kernel mode
          #PF: error_code(0x0003) - permissions violation
          RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
           e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
           e1000_probe+0x41f/0xdb0 [e1000e]
           local_pci_probe+0x42/0x80
          (...)
      
      The recently introduced function msix_mask_all() does not check the global
      variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking
      of MSI[-X] interrupts.
      
      Add the check to make this function XEN PV compatible.
      
      Fixes: 7d5ec3d3 ("PCI/MSI: Mask all unused MSI-X entries")
      Signed-off-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.com
      1a519dc7
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux · 73367f05
      Linus Torvalds authored
      Pull nfsd fix from Bruce Fields:
       "This is a one-liner fix for a serious bug that can cause the server to
        become unresponsive to a client, so I think it's worth the last-minute
        inclusion for 5.14"
      
      * tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
        SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
      73367f05
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8a2cb8bd
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from can and bpf.
      
        Closing three hw-dependent regressions. Any fixes of note are in the
        'old code' category. Nothing blocking release from our perspective.
      
        Current release - regressions:
      
         - stmmac: revert "stmmac: align RX buffers"
      
         - usb: asix: ax88772: move embedded PHY detection as early as
           possible
      
         - usb: asix: do not call phy_disconnect() for ax88178
      
         - Revert "net: really fix the build...", from Kalle to fix QCA6390
      
        Current release - new code bugs:
      
         - phy: mediatek: add the missing suspend/resume callbacks
      
        Previous releases - regressions:
      
         - qrtr: fix another OOB Read in qrtr_endpoint_post
      
         - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
      
        Previous releases - always broken:
      
         - inet: use siphash in exception handling
      
         - ip_gre: add validation for csum_start
      
         - bpf: fix ringbuf helper function compatibility
      
         - rtnetlink: return correct error on changing device netns
      
         - e1000e: do not try to recover the NVM checksum on Tiger Lake"
      
      * tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
        Revert "net: really fix the build..."
        net: hns3: fix get wrong pfc_en when query PFC configuration
        net: hns3: fix GRO configuration error after reset
        net: hns3: change the method of getting cmd index in debugfs
        net: hns3: fix duplicate node in VLAN list
        net: hns3: fix speed unknown issue in bond 4
        net: hns3: add waiting time before cmdq memory is released
        net: hns3: clear hardware resource when loading driver
        net: fix NULL pointer reference in cipso_v4_doi_free
        rtnetlink: Return correct error on changing device netns
        net: dsa: hellcreek: Adjust schedule look ahead window
        net: dsa: hellcreek: Fix incorrect setting of GCL
        cxgb4: dont touch blocked freelist bitmap after free
        ipv4: use siphash instead of Jenkins in fnhe_hashfun()
        ipv6: use siphash in rt6_exception_hash()
        can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
        net: usb: asix: ax88772: fix boolconv.cocci warnings
        net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
        qede: Fix memset corruption
        net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
        ...
      8a2cb8bd
    • Jens Axboe's avatar
      Revert "block/mq-deadline: Prioritize high-priority requests" · 7b05bf77
      Jens Axboe authored
      This reverts commit fb926032.
      
      Zhen reports that this commit slows down mq-deadline on a 128 thread
      box, going from 258K IOPS to 170-180K. My testing shows that Optane
      gen2 IOPS goes from 2.3M IOPS to 1.2M IOPS on a 64 thread box.
      
      Looking in detail at the code, the main culprit here is needing to sum
      percpu counters in the dispatch hot path, leading to very high CPU
      utilization there. To make matters worse, the code currently needs to
      sum 2 percpu counters, and it does so in the most naive way of iterating
      possible CPUs _twice_.
      
      Since we're close to release, revert this commit and we can re-do it
      with regular per-priority counters instead for the 5.15 kernel.
      
      Link: https://lore.kernel.org/linux-block/20210826144039.2143-1-thunder.leizhen@huawei.com/Reported-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7b05bf77
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 1a6d80ff
      Linus Torvalds authored
      Pull arm64 fix from Will Deacon:
       "We received a report this week that the generic version of
        pfn_valid(), which we switched to this merge window in 16c9afc7
        ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts badly with
        dma_map_resource() due to the following check:
      
              /* Don't allow RAM to be mapped */
              if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
                      return DMA_MAPPING_ERROR;
      
        Since the ongoing saga to determine the semantics of pfn_valid() is
        unlikely to be resolved this week (does it indicate valid memory, or
        just the presence of a struct page, or whether that struct page has
        been initialised?), just revert back to our old version of pfn_valid()
        for 5.14.
      
        Summary:
      
         - Fix dma_map_resource() by reverting back to old pfn_valid() code"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
      1a6d80ff
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client · 97d8cc20
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two memory management fixes for the filesystem"
      
      * tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client:
        ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
        ceph: correctly handle releasing an embedded cap flush
      97d8cc20
    • Kalle Valo's avatar
      Revert "net: really fix the build..." · 9ebc2758
      Kalle Valo authored
      This reverts commit ce78ffa3.
      
      Wren and Nicolas reported that ath11k was failing to initialise QCA6390
      Wi-Fi 6 device with error:
      
      qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22
      
      Commit ce78ffa3 ("net: really fix the build..."), introduced in
      v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
      devices are broken, but I only tested QCA6390. Let's revert the broken
      commit so that ath11k works again.
      Reported-by: default avatarWren Turkal <wt@penguintechs.org>
      Reported-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9ebc2758
    • Linus Torvalds's avatar
      Merge tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 9b49ceb8
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "One more fix that I think qualifies for a late merge. It's a revert of
        a one-liner fix that meanwhile got backported to stable kernels and we
        got reports from users.
      
        The broken fix prevents creating compressed inline extents, which
        could be noticeable on space consumption.
      
        Technically it's a regression as the patch was merged in 5.14-rc1 but
        got propagated to several stable kernels and has higher exposure than
        a 'typical' development cycle bug"
      
      * tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Revert "btrfs: compression: don't try to compress if we don't have enough pages"
      9b49ceb8
    • Sebastian Andrzej Siewior's avatar
      sched: Fix get_push_task() vs migrate_disable() · e681dcba
      Sebastian Andrzej Siewior authored
      push_rt_task() attempts to move the currently running task away if the
      next runnable task has migration disabled and therefore is pinned on the
      current CPU.
      
      The current task is retrieved via get_push_task() which only checks for
      nr_cpus_allowed == 1, but does not check whether the task has migration
      disabled and therefore cannot be moved either. The consequence is a
      pointless invocation of the migration thread which correctly observes
      that the task cannot be moved.
      
      Return NULL if the task has migration disabled and cannot be moved to
      another CPU.
      
      Fixes: a7c81556 ("sched: Fix migrate_disable() vs rt/dl balancing")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de
      e681dcba