• David Hildenbrand's avatar
    mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT · d74943a2
    David Hildenbrand authored
    Unfortunately commit 474098ed ("mm/gup: replace FOLL_NUMA by
    gup_can_follow_protnone()") missed that follow_page() and
    follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really
    don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting
    or due to inaccessible (PROT_NONE) VMAs.
    
    As spelled out in commit 0b9d7052 ("mm: numa: Support NUMA hinting
    page faults from gup/gup_fast"): "Other follow_page callers like KSM
    should not use FOLL_NUMA, or they would fail to get the pages if they use
    follow_page instead of get_user_pages."
    
    liubo reported [1] that smaps_rollup results are imprecise, because they
    miss accounting of pages that are mapped PROT_NONE.  Further, it's easy to
    reproduce that KSM no longer works on inaccessible VMAs on x86-64, because
    pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs,
    and follow_page() refuses to return such pages right now.
    
    As KVM really depends on these NUMA hinting faults, removing the
    pte_protnone()/pmd_protnone() handling in GUP code completely is not
    really an option.
    
    To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
    to restore the original behavior for now and add better comments.
    
    Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in
    is_valid_gup_args(), to add that flag for all external GUP users.
    
    Note that there are three GUP-internal __get_user_pages() users that don't
    end up calling is_valid_gup_args() and consequently won't get
    FOLL_HONOR_NUMA_FAULT set.
    
    1) get_dump_page(): we really don't want to handle NUMA hinting
       faults. It specifies FOLL_FORCE and wouldn't have honored NUMA
       hinting faults already.
    2) populate_vma_page_range(): we really don't want to handle NUMA hinting
       faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have
       honored NUMA hinting faults already.
    3) faultin_vma_page_range(): we similarly don't want to handle NUMA
       hinting faults.
    
    To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in
    inaccessible VMAs properly, we have to perform VMA accessibility checks in
    gup_can_follow_protnone().
    
    As GUP-fast should reject such pages either way in
    pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and
    arm64 that both implement pte_protnone() -- let's just always fallback to
    ordinary GUP when stumbling over pte_protnone()/pmd_protnone().
    
    As Linus notes [2], honoring NUMA faults might only make sense for
    selected GUP users.
    
    So we should really see if we can instead let relevant GUP callers specify
    it manually, and not trigger NUMA hinting faults from GUP as default. 
    Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and
    adding appropriate documenation.
    
    While at it, remove a stale comment from follow_trans_huge_pmd(): That
    comment for pmd_protnone() was added in commit 2b4847e7 ("mm: numa:
    serialise parallel get_user_page against THP migration"), which noted:
    
    	THP does not unmap pages due to a lack of support for migration
    	entries at a PMD level.  This allows races with get_user_pages
    
    Nowadays, we do have PMD migration entries, so the comment no longer
    applies.  Let's drop it.
    
    [1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
    [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@mail.gmail.com
    
    Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com
    Fixes: 474098ed ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reported-by: default avatarliubo <liubo254@huawei.com>
    Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.comReported-by: default avatarPeter Xu <peterx@redhat.com>
    Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Acked-by: default avatarPeter Xu <peterx@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    d74943a2
huge_memory.c 88.3 KB