1. 21 Mar, 2022 20 commits
  2. 03 Mar, 2022 15 commits
  3. 17 Feb, 2022 5 commits
    • Hugh Dickins's avatar
      mm/thp: shrink_page_list() avoid splitting VM_LOCKED THP · 47d4f3ee
      Hugh Dickins authored
      4.8 commit 7751b2da ("vmscan: split file huge pages before paging
      them out") inserted a split_huge_page_to_list() into shrink_page_list()
      without considering the mlock case: no problem if the page has already
      been marked as Mlocked (the !page_evictable check much higher up will
      have skipped all this), but it has always been the case that races or
      omissions in setting Mlocked can rely on page reclaim to detect this
      and correct it before actually reclaiming - and that remains so, but
      what a shame if a hugepage is needlessly split before discovering it.
      
      It is surprising that page_check_references() returns PAGEREF_RECLAIM
      when VM_LOCKED, but there was a good reason for that: try_to_unmap_one()
      is where the condition is detected and corrected; and until now it could
      not be done in page_referenced_one(), because that does not always have
      the page locked.  Now that mlock's requirement for page lock has gone,
      copy try_to_unmap_one()'s mlock restoration into page_referenced_one(),
      and let page_check_references() return PAGEREF_ACTIVATE in this case.
      
      But page_referenced_one() may find a pte mapping one part of a hugepage:
      what hold should a pte mapped in a VM_LOCKED area exert over the entire
      huge page?  That's debatable.  The approach taken here is to treat that
      pte mapping in page_referenced_one() as if not VM_LOCKED, and if no
      VM_LOCKED pmd mapping is found later in the walk, and lack of reference
      permits, then PAGEREF_RECLAIM take it to attempted splitting as before.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      47d4f3ee
    • Hugh Dickins's avatar
      mm/thp: collapse_file() do try_to_unmap(TTU_BATCH_FLUSH) · 6d9df8a5
      Hugh Dickins authored
      collapse_file() is using unmap_mapping_pages(1) on each small page found
      mapped, unlike others (reclaim, migration, splitting, memory-failure) who
      use try_to_unmap().  There are four advantages to try_to_unmap(): first,
      its TTU_IGNORE_MLOCK option now avoids leaving mlocked page in pagevec;
      second, its vma lookup uses i_mmap_lock_read() not i_mmap_lock_write();
      third, it breaks out early if page is not mapped everywhere it might be;
      fourth, its TTU_BATCH_FLUSH option can be used, as in page reclaim, to
      save up all the TLB flushing until all of the pages have been unmapped.
      
      Wild guess: perhaps it was originally written to use try_to_unmap(),
      but hit the VM_BUG_ON_PAGE(page_mapped) after unmapping, because without
      TTU_SYNC it may skip page table locks; but unmap_mapping_pages() never
      skips them, so fixed the issue.  I did once hit that VM_BUG_ON_PAGE()
      since making this change: we could pass TTU_SYNC here, but I think just
      delete the check - the race is very rare, this is an ordinary small page
      so we don't need to be so paranoid about mapcount surprises, and the
      page_ref_freeze() just below already handles the case adequately.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      6d9df8a5
    • Hugh Dickins's avatar
      mm/munlock: page migration needs mlock pagevec drained · b7435507
      Hugh Dickins authored
      Page migration of a VM_LOCKED page tends to fail, because when the old
      page is unmapped, it is put on the mlock pagevec with raised refcount,
      which then fails the freeze.
      
      At first I thought this would be fixed by a local mlock_page_drain() at
      the upper rmap_walk() level - which would have nicely batched all the
      munlocks of that page; but tests show that the task can too easily move
      to another cpu, leaving pagevec residue behind which fails the migration.
      
      So try_to_migrate_one() drain the local pagevec after page_remove_rmap()
      from a VM_LOCKED vma; and do the same in try_to_unmap_one(), whose
      TTU_IGNORE_MLOCK users would want the same treatment; and do the same
      in remove_migration_pte() - not important when successfully inserting
      a new page, but necessary when hoping to retry after failure.
      
      Any new pagevec runs the risk of adding a new way of stranding, and we
      might discover other corners where mlock_page_drain() or lru_add_drain()
      would now help.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      b7435507
    • Hugh Dickins's avatar
      mm/munlock: mlock_page() munlock_page() batch by pagevec · 2fbb0c10
      Hugh Dickins authored
      A weakness of the page->mlock_count approach is the need for lruvec lock
      while holding page table lock.  That is not an overhead we would allow on
      normal pages, but I think acceptable just for pages in an mlocked area.
      But let's try to amortize the extra cost by gathering on per-cpu pagevec
      before acquiring the lruvec lock.
      
      I have an unverified conjecture that the mlock pagevec might work out
      well for delaying the mlock processing of new file pages until they have
      got off lru_cache_add()'s pagevec and on to LRU.
      
      The initialization of page->mlock_count is subject to races and awkward:
      0 or !!PageMlocked or 1?  Was it wrong even in the implementation before
      this commit, which just widens the window?  I haven't gone back to think
      it through.  Maybe someone can point out a better way to initialize it.
      
      Bringing lru_cache_add_inactive_or_unevictable()'s mlock initialization
      into mm/mlock.c has helped: mlock_new_page(), using the mlock pagevec,
      rather than lru_cache_add()'s pagevec.
      
      Experimented with various orderings: the right thing seems to be for
      mlock_page() and mlock_new_page() to TestSetPageMlocked before adding to
      pagevec, but munlock_page() to leave TestClearPageMlocked to the later
      pagevec processing.
      
      Dropped the VM_BUG_ON_PAGE(PageTail)s this time around: they have made
      their point, and the thp_nr_page()s already contain a VM_BUG_ON_PGFLAGS()
      for that.
      
      This still leaves acquiring lruvec locks under page table lock each time
      the pagevec fills (or a THP is added): which I suppose is rather silly,
      since they sit on pagevec waiting to be processed long after page table
      lock has been dropped; but I'm disinclined to uglify the calling sequence
      until some load shows an actual problem with it (nothing wrong with
      taking lruvec lock under page table lock, just "nicer" to do it less).
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      2fbb0c10
    • Hugh Dickins's avatar
      mm/munlock: delete smp_mb() from __pagevec_lru_add_fn() · 2262ace6
      Hugh Dickins authored
      My reading of comment on smp_mb__after_atomic() in __pagevec_lru_add_fn()
      says that it can now be deleted; and that remains so when the next patch
      is added.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      2262ace6