1. 31 Dec, 2021 7 commits
    • Mel Gorman's avatar
      mm: vmscan: Reduce throttling due to a failure to make progress · 1b4e3f26
      Mel Gorman authored
      Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
      problems due to reclaim throttling for excessive lengths of time.  In
      Alexey's case, a memory hog that should go OOM quickly stalls for
      several minutes before stalling.  In Mike and Darrick's cases, a small
      memcg environment stalled excessively even though the system had enough
      memory overall.
      
      Commit 69392a40 ("mm/vmscan: throttle reclaim when no progress is
      being made") introduced the problem although commit a19594ca
      ("mm/vmscan: increase the timeout if page reclaim is not making
      progress") made it worse.  Systems at or near an OOM state that cannot
      be recovered must reach OOM quickly and memcg should kill tasks if a
      memcg is near OOM.
      
      To address this, only stall for the first zone in the zonelist, reduce
      the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
      the scan control nr_reclaimed is 0, kswapd is still active and there
      were excessive pages pending for writeback.  If kswapd has stopped
      reclaiming due to excessive failures, do not stall at all so that OOM
      triggers relatively quickly.  Similarly, if an LRU is simply congested,
      only lightly throttle similar to NOPROGRESS.
      
      Alexey's original case was the most straight forward
      
      	for i in {1..3}; do tail /dev/zero; done
      
      On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
      completes in a few seconds similar to 5.15.
      
      Alexey's second test case added watching a youtube video while tail runs
      10 times.  On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
      lot with lots of frames missing and numerous audio glitches.  With this
      patch applies, the video plays similarly to 5.15.
      
      [lkp@intel.com: Fix W=1 build warning]
      
      Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
      Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
      Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net
      Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net
      Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/Reported-and-tested-by: default avatarAlexey Avramov <hakavlad@inbox.lv>
      Reported-and-tested-by: default avatarMike Galbraith <efault@gmx.de>
      Reported-and-tested-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Tracked-by: default avatarThorsten Leemhuis <regressions@leemhuis.info>
      Fixes: 69392a40 ("mm/vmscan: throttle reclaim when no progress is being made")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b4e3f26
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · f87bcc88
      Linus Torvalds authored
      Merge misc mm fixes from Andrew Morton:
       "2 patches.
      
        Subsystems affected by this patch series: mm (userfaultfd and damon)"
      
      * akpm:
        mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
        userfaultfd/selftests: fix hugetlb area allocations
      f87bcc88
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · e46227bf
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three fixes, all in drivers. The lpfc one doesn't look exploitable,
        but nasty things could happen in string operations if mybuf ends up
        with an on stack unterminated string"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: vmw_pvscsi: Set residual data length conditionally
        scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
        scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()
      e46227bf
    • SeongJae Park's avatar
      mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()' · ebb3f994
      SeongJae Park authored
      DAMON debugfs interface increases the reference counts of 'struct pid's
      for targets from the 'target_ids' file write callback
      ('dbgfs_target_ids_write()'), but decreases the counts only in DAMON
      monitoring termination callback ('dbgfs_before_terminate()').
      
      Therefore, when 'target_ids' file is repeatedly written without DAMON
      monitoring start/termination, the reference count is not decreased and
      therefore memory for the 'struct pid' cannot be freed.  This commit
      fixes this issue by decreasing the reference counts when 'target_ids' is
      written.
      
      Link: https://lkml.kernel.org/r/20211229124029.23348-1-sj@kernel.org
      Fixes: 4bc05954 ("mm/damon: implement a debugfs-based user space interface")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: <stable@vger.kernel.org>	[5.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ebb3f994
    • Mike Kravetz's avatar
      userfaultfd/selftests: fix hugetlb area allocations · f5c73297
      Mike Kravetz authored
      Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh
      or any environment where there are 'just enough' hugetlb pages will
      always fail with:
      
        testing events (fork, remap, remove):
      		ERROR: UFFDIO_COPY error: -12 (errno=12, line=616)
      
      The ENOMEM error code implies there are not enough hugetlb pages.
      However, there are free hugetlb pages but they are all reserved.  There
      is a basic problem with the way the test allocates hugetlb pages which
      has existed since the test was originally written.
      
      Due to the way 'cleanup' was done between different phases of the test,
      this issue was masked until recently.  The issue was uncovered by commit
      8ba6e864 ("userfaultfd/selftests: reinitialize test context in each
      test").
      
      For the hugetlb test, src and dst areas are allocated as PRIVATE
      mappings of a hugetlb file.  This means that at mmap time, pages are
      reserved for the src and dst areas.  At the start of event testing (and
      other tests) the src area is populated which results in allocation of
      huge pages to fill the area and consumption of reserves associated with
      the area.  Then, a child is forked to fault in the dst area.  Note that
      the dst area was allocated in the parent and hence the parent owns the
      reserves associated with the mapping.  The child has normal access to
      the dst area, but can not use the reserves created/owned by the parent.
      Thus, if there are no other huge pages available allocation of a page
      for the dst by the child will fail.
      
      Fix by not creating reserves for the dst area.  In this way the child
      can use free (non-reserved) pages.
      
      Also, MAP_PRIVATE of a file only makes sense if you are interested in
      the contents of the file before making a COW copy.  The test does not do
      this.  So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous
      hugetlb mapping.  There is no need to create a hugetlb file in the
      non-shared case.
      
      Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5c73297
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm · 4f3d93c6
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "This is a bit bigger than I'd like, however it has two weeks of amdgpu
        fixes in it, since they missed last week, which was very small.
      
        The nouveau regression is probably the biggest fix in here, and it
        needs to go into 5.15 as well, two i915 fixes, and then a scattering
        of amdgpu fixes. The biggest fix in there is for a fencing NULL
        pointer dereference, the rest are pretty minor.
      
        For the misc team, I've pulled the two misc fixes manually since I'm
        not sure what is happening at this time of year!
      
        The amdgpu maintainers have the outstanding runpm regression to fix
        still, they are just working through the last bits of it now.
      
        Summary:
      
        nouveau:
         - fencing regression fix
      
        i915:
         - Fix possible uninitialized variable
         - Fix composite fence seqno icrement on each fence creation
      
        amdgpu:
         - Fencing fix
         - XGMI fix
         - VCN regression fix
         - IP discovery regression fixes
         - Fix runpm documentation
         - Suspend/resume fixes
         - Yellow Carp display fixes
         - MCLK power management fix
         - dma-buf fix"
      
      * tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm:
        drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
        drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
        drm/amd/display: Set optimize_pwr_state for DCN31
        drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
        drm/amd/display: Added power down for DCN10
        drm/amd/display: fix B0 TMDS deepcolor no dislay issue
        drm/amdgpu: no DC support for headless chips
        drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
        drm/amdgpu: always reset the asic in suspend (v2)
        drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
        drm/i915: Increment composite fence seqno
        drm/i915: Fix possible uninitialized variable in parallel extension
        drm/amdgpu: fix runpm documentation
        drm/nouveau: wait for the exclusive fence after the shared ones v2
        drm/amdgpu: add support for IP discovery gc_info table v2
        drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
        drm/amd/pm: Fix xgmi link control on aldebaran
        drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence
        drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify
      4f3d93c6
    • Dave Airlie's avatar
      Merge branch 'drm-misc-fixes' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes · ce9b333c
      Dave Airlie authored
      This merges two fixes that haven't been sent to me yet, but I wanted to get in.
      
      One amdgpu fix, but one nouveau regression fixer.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      ce9b333c
  2. 30 Dec, 2021 15 commits
  3. 29 Dec, 2021 10 commits
  4. 28 Dec, 2021 8 commits