• Mike Kravetz's avatar
    hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race · 87bf91d3
    Mike Kravetz authored
    hugetlbfs page faults can race with truncate and hole punch operations.
    Current code in the page fault path attempts to handle this by 'backing
    out' operations if we encounter the race.  One obvious omission in the
    current code is removing a page newly added to the page cache.  This is
    pretty straight forward to address, but there is a more subtle and
    difficult issue of backing out hugetlb reservations.  To handle this
    correctly, the 'reservation state' before page allocation needs to be
    noted so that it can be properly backed out.  There are four distinct
    possibilities for reservation state: shared/reserved, shared/no-resv,
    private/reserved and private/no-resv.  Backing out a reservation may
    require memory allocation which could fail so that needs to be taken
    into account as well.
    
    Instead of writing the required complicated code for this rare
    occurrence, just eliminate the race.  i_mmap_rwsem is now held in read
    mode for the duration of page fault processing.  Hold i_mmap_rwsem in
    write mode when modifying i_size.  In this way, truncation can not
    proceed when page faults are being processed.  In addition, i_size
    will not change during fault processing so a single check can be made
    to ensure faults are not beyond (proposed) end of file.  Faults can
    still race with hole punch, but that race is handled by existing code
    and the use of hugetlb_fault_mutex.
    
    With this modification, checks for races with truncation in the page
    fault path can be simplified and removed.  remove_inode_hugepages no
    longer needs to take hugetlb_fault_mutex in the case of truncation.
    Comments are expanded to explain reasoning behind locking.
    Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
    Link: http://lkml.kernel.org/r/20200316205756.146666-3-mike.kravetz@oracle.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    87bf91d3
hugetlb.c 142 KB