• David Hildenbrand's avatar
    mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly · 631426ba
    David Hildenbrand authored
    Darrick reports that in some cases where pread() would fail with -EIO and
    mmap()+access would generate a SIGBUS signal, MADV_POPULATE_READ /
    MADV_POPULATE_WRITE will keep retrying forever and not fail with -EFAULT.
    
    While the madvise() call can be interrupted by a signal, this is not the
    desired behavior.  MADV_POPULATE_READ / MADV_POPULATE_WRITE should behave
    like page faults in that case: fail and not retry forever.
    
    A reproducer can be found at [1].
    
    The reason is that __get_user_pages(), as called by
    faultin_vma_page_range(), will not handle VM_FAULT_RETRY in a proper way:
    it will simply return 0 when VM_FAULT_RETRY happened, making
    madvise_populate()->faultin_vma_page_range() retry again and again, never
    setting FOLL_TRIED->FAULT_FLAG_TRIED for __get_user_pages().
    
    __get_user_pages_locked() does what we want, but duplicating that logic in
    faultin_vma_page_range() feels wrong.
    
    So let's use __get_user_pages_locked() instead, that will detect
    VM_FAULT_RETRY and set FOLL_TRIED when retrying, making the fault handler
    return VM_FAULT_SIGBUS (VM_FAULT_ERROR) at some point, propagating -EFAULT
    from faultin_page() to __get_user_pages(), all the way to
    madvise_populate().
    
    But, there is an issue: __get_user_pages_locked() will end up re-taking
    the MM lock and then __get_user_pages() will do another VMA lookup.  In
    the meantime, the VMA layout could have changed and we'd fail with
    different error codes than we'd want to.
    
    As __get_user_pages() will currently do a new VMA lookup either way, let
    it do the VMA handling in a different way, controlled by a new
    FOLL_MADV_POPULATE flag, effectively moving these checks from
    madvise_populate() + faultin_page_range() in there.
    
    With this change, Darricks reproducer properly fails with -EFAULT, as
    documented for MADV_POPULATE_READ / MADV_POPULATE_WRITE.
    
    [1] https://lore.kernel.org/all/20240313171936.GN1927156@frogsfrogsfrogs/
    
    Link: https://lkml.kernel.org/r/20240314161300.382526-1-david@redhat.com
    Link: https://lkml.kernel.org/r/20240314161300.382526-2-david@redhat.com
    Fixes: 4ca9b385 ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reported-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Closes: https://lore.kernel.org/all/20240311223815.GW1927156@frogsfrogsfrogs/
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    631426ba
internal.h 42.1 KB