1. 20 Oct, 2021 1 commit
    • Eric W. Biederman's avatar
      ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring · 5ebcbe34
      Eric W. Biederman authored
      Setting cred->ucounts in cred_alloc_blank does not make sense.  The
      uid and user_ns are deliberately not set in cred_alloc_blank but
      instead the setting is delayed until key_change_session_keyring.
      
      So move dealing with ucounts into key_change_session_keyring as well.
      
      Unfortunately that movement of get_ucounts adds a new failure mode to
      key_change_session_keyring.  I do not see anything stopping the parent
      process from calling setuid and changing the relevant part of it's
      cred while keyctl_session_to_parent is running making it fundamentally
      necessary to call get_ucounts in key_change_session_keyring.  Which
      means that the new failure mode cannot be avoided.
      
      A failure of key_change_session_keyring results in a single threaded
      parent keeping it's existing credentials.  Which results in the parent
      process not being able to access the session keyring and whichever
      keys are in the new keyring.
      
      Further get_ucounts is only expected to fail if the number of bits in
      the refernece count for the structure is too few.
      
      Since the code has no other way to report the failure of get_ucounts
      and because such failures are not expected to be common add a WARN_ONCE
      to report this problem to userspace.
      
      Between the WARN_ONCE and the parent process not having access to
      the keys in the new session keyring I expect any failure of get_ucounts
      will be noticed and reported and we can find another way to handle this
      condition.  (Possibly by just making ucounts->count an atomic_long_t).
      
      Cc: stable@vger.kernel.org
      Fixes: 905ae01c ("Add a reference to ucounts for each cred")
      Link: https://lkml.kernel.org/r/7k0ias0uf.fsf_-_@disp2133Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarAlexey Gladkov <legion@kernel.org>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      5ebcbe34
  2. 19 Oct, 2021 2 commits
  3. 18 Oct, 2021 1 commit
    • Eric W. Biederman's avatar
      ucounts: Fix signal ucount refcounting · 15bc01ef
      Eric W. Biederman authored
      In commit fda31c50 ("signal: avoid double atomic counter
      increments for user accounting") Linus made a clever optimization to
      how rlimits and the struct user_struct.  Unfortunately that
      optimization does not work in the obvious way when moved to nested
      rlimits.  The problem is that the last decrement of the per user
      namespace per user sigpending counter might also be the last decrement
      of the sigpending counter in the parent user namespace as well.  Which
      means that simply freeing the leaf ucount in __free_sigqueue is not
      enough.
      
      Maintain the optimization and handle the tricky cases by introducing
      inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
      
      By moving the entire optimization into functions that perform all of
      the work it becomes possible to ensure that every level is handled
      properly.
      
      The new function inc_rlimit_get_ucounts returns 0 on failure to
      increment the ucount.  This is different than inc_rlimit_ucounts which
      increments the ucounts and returns LONG_MAX if the ucount counter has
      exceeded it's maximum or it wrapped (to indicate the counter needs to
      decremented).
      
      I wish we had a single user to account all pending signals to across
      all of the threads of a process so this complexity was not necessary
      
      Cc: stable@vger.kernel.org
      Fixes: d6469690 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
      v1: https://lkml.kernel.org/r/87mtnavszx.fsf_-_@disp2133
      Link: https://lkml.kernel.org/r/87fssytizw.fsf_-_@disp2133Reviewed-by: default avatarAlexey Gladkov <legion@kernel.org>
      Tested-by: default avatarRune Kleveland <rune.kleveland@infomedia.dk>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Tested-by: default avatarJordan Glover <Golden_Miller83@protonmail.ch>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      15bc01ef
  4. 29 Aug, 2021 8 commits
  5. 28 Aug, 2021 3 commits
  6. 27 Aug, 2021 18 commits
  7. 26 Aug, 2021 7 commits
    • Marek Marczykowski-Górecki's avatar
      PCI/MSI: Skip masking MSI-X on Xen PV · 1a519dc7
      Marek Marczykowski-Górecki authored
      When running as Xen PV guest, masking MSI-X is a responsibility of the
      hypervisor. The guest has no write access to the relevant BAR at all - when
      it tries to, it results in a crash like this:
      
          BUG: unable to handle page fault for address: ffffc9004069100c
          #PF: supervisor write access in kernel mode
          #PF: error_code(0x0003) - permissions violation
          RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
           e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
           e1000_probe+0x41f/0xdb0 [e1000e]
           local_pci_probe+0x42/0x80
          (...)
      
      The recently introduced function msix_mask_all() does not check the global
      variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking
      of MSI[-X] interrupts.
      
      Add the check to make this function XEN PV compatible.
      
      Fixes: 7d5ec3d3 ("PCI/MSI: Mask all unused MSI-X entries")
      Signed-off-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.com
      1a519dc7
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux · 73367f05
      Linus Torvalds authored
      Pull nfsd fix from Bruce Fields:
       "This is a one-liner fix for a serious bug that can cause the server to
        become unresponsive to a client, so I think it's worth the last-minute
        inclusion for 5.14"
      
      * tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
        SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
      73367f05
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8a2cb8bd
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from can and bpf.
      
        Closing three hw-dependent regressions. Any fixes of note are in the
        'old code' category. Nothing blocking release from our perspective.
      
        Current release - regressions:
      
         - stmmac: revert "stmmac: align RX buffers"
      
         - usb: asix: ax88772: move embedded PHY detection as early as
           possible
      
         - usb: asix: do not call phy_disconnect() for ax88178
      
         - Revert "net: really fix the build...", from Kalle to fix QCA6390
      
        Current release - new code bugs:
      
         - phy: mediatek: add the missing suspend/resume callbacks
      
        Previous releases - regressions:
      
         - qrtr: fix another OOB Read in qrtr_endpoint_post
      
         - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
      
        Previous releases - always broken:
      
         - inet: use siphash in exception handling
      
         - ip_gre: add validation for csum_start
      
         - bpf: fix ringbuf helper function compatibility
      
         - rtnetlink: return correct error on changing device netns
      
         - e1000e: do not try to recover the NVM checksum on Tiger Lake"
      
      * tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
        Revert "net: really fix the build..."
        net: hns3: fix get wrong pfc_en when query PFC configuration
        net: hns3: fix GRO configuration error after reset
        net: hns3: change the method of getting cmd index in debugfs
        net: hns3: fix duplicate node in VLAN list
        net: hns3: fix speed unknown issue in bond 4
        net: hns3: add waiting time before cmdq memory is released
        net: hns3: clear hardware resource when loading driver
        net: fix NULL pointer reference in cipso_v4_doi_free
        rtnetlink: Return correct error on changing device netns
        net: dsa: hellcreek: Adjust schedule look ahead window
        net: dsa: hellcreek: Fix incorrect setting of GCL
        cxgb4: dont touch blocked freelist bitmap after free
        ipv4: use siphash instead of Jenkins in fnhe_hashfun()
        ipv6: use siphash in rt6_exception_hash()
        can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
        net: usb: asix: ax88772: fix boolconv.cocci warnings
        net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
        qede: Fix memset corruption
        net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
        ...
      8a2cb8bd
    • Jens Axboe's avatar
      Revert "block/mq-deadline: Prioritize high-priority requests" · 7b05bf77
      Jens Axboe authored
      This reverts commit fb926032.
      
      Zhen reports that this commit slows down mq-deadline on a 128 thread
      box, going from 258K IOPS to 170-180K. My testing shows that Optane
      gen2 IOPS goes from 2.3M IOPS to 1.2M IOPS on a 64 thread box.
      
      Looking in detail at the code, the main culprit here is needing to sum
      percpu counters in the dispatch hot path, leading to very high CPU
      utilization there. To make matters worse, the code currently needs to
      sum 2 percpu counters, and it does so in the most naive way of iterating
      possible CPUs _twice_.
      
      Since we're close to release, revert this commit and we can re-do it
      with regular per-priority counters instead for the 5.15 kernel.
      
      Link: https://lore.kernel.org/linux-block/20210826144039.2143-1-thunder.leizhen@huawei.com/Reported-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7b05bf77
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 1a6d80ff
      Linus Torvalds authored
      Pull arm64 fix from Will Deacon:
       "We received a report this week that the generic version of
        pfn_valid(), which we switched to this merge window in 16c9afc7
        ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts badly with
        dma_map_resource() due to the following check:
      
              /* Don't allow RAM to be mapped */
              if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
                      return DMA_MAPPING_ERROR;
      
        Since the ongoing saga to determine the semantics of pfn_valid() is
        unlikely to be resolved this week (does it indicate valid memory, or
        just the presence of a struct page, or whether that struct page has
        been initialised?), just revert back to our old version of pfn_valid()
        for 5.14.
      
        Summary:
      
         - Fix dma_map_resource() by reverting back to old pfn_valid() code"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
      1a6d80ff
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client · 97d8cc20
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two memory management fixes for the filesystem"
      
      * tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client:
        ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
        ceph: correctly handle releasing an embedded cap flush
      97d8cc20
    • Kalle Valo's avatar
      Revert "net: really fix the build..." · 9ebc2758
      Kalle Valo authored
      This reverts commit ce78ffa3.
      
      Wren and Nicolas reported that ath11k was failing to initialise QCA6390
      Wi-Fi 6 device with error:
      
      qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22
      
      Commit ce78ffa3 ("net: really fix the build..."), introduced in
      v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
      devices are broken, but I only tested QCA6390. Let's revert the broken
      commit so that ath11k works again.
      Reported-by: default avatarWren Turkal <wt@penguintechs.org>
      Reported-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9ebc2758