• Yang Shi's avatar
    mm: gup: fix the fast GUP race against THP collapse · 70cbc3cc
    Yang Shi authored
    Since general RCU GUP fast was introduced in commit 2667f50e ("mm:
    introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
    sufficient to handle concurrent GUP-fast in all cases, it only handles
    traditional IPI-based GUP-fast correctly.  On architectures that send an
    IPI broadcast on TLB flush, it works as expected.  But on the
    architectures that do not use IPI to broadcast TLB flush, it may have the
    below race:
    
       CPU A                                          CPU B
    THP collapse                                     fast GUP
                                                  gup_pmd_range() <-- see valid pmd
                                                      gup_pte_range() <-- work on pte
    pmdp_collapse_flush() <-- clear pmd and flush
    __collapse_huge_page_isolate()
        check page pinned <-- before GUP bump refcount
                                                          pin the page
                                                          check PTE <-- no change
    __collapse_huge_page_copy()
        copy data to huge page
        ptep_clear()
    install huge pmd for the huge page
                                                          return the stale page
    discard the stale page
    
    The race can be fixed by checking whether PMD is changed or not after
    taking the page pin in fast GUP, just like what it does for PTE.  If the
    PMD is changed it means there may be parallel THP collapse, so GUP should
    back off.
    
    Also update the stale comment about serializing against fast GUP in
    khugepaged.
    
    Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
    Fixes: 2667f50e ("mm: introduce a general RCU get_user_pages_fast()")
    Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
    Acked-by: default avatarPeter Xu <peterx@redhat.com>
    Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
    Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    70cbc3cc
khugepaged.c 59.1 KB