• Hans Rosenfeld's avatar
    x86: fix pmd_bad and pud_bad to support huge pages · cded932b
    Hans Rosenfeld authored
    I recently stumbled upon a problem in the support for huge pages. If a
    program using huge pages does not explicitly unmap them, they remain
    mapped (and therefore, are lost) after the program exits.
    
    I observed that the free huge page count in /proc/meminfo decreased when
    running my program, and it did not increase after the program exited.
    After running the program a few times, no more huge pages could be
    allocated.
    
    The reason for this seems to be that the x86 pmd_bad and pud_bad
    consider pmd/pud entries having the PSE bit set invalid. I think there
    is nothing wrong with this bit being set, it just indicates that the
    lowest level of translation has been reached. This bit has to be (and
    is) checked after the basic validity of the entry has been checked, like
    in this fragment from follow_page() in mm/memory.c:
    
      if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
              goto no_page_table;
    
      if (pmd_huge(*pmd)) {
              BUG_ON(flags & FOLL_GET);
              page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
              goto out;
      }
    
    Note that this code currently doesn't work as intended if the pmd refers
    to a huge page, the pmd_huge() check can not be reached if the page is
    huge.
    
    Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to
    allow for the PSE bit being set fixes this. For similar reasons,
    allowing the NX bit being set is necessary, too. I have seen huge pages
    having the NX bit set in their pmd entry, which would cause the same
    problem.
    Signed-Off-By: default avatarHans Rosenfeld <hans.rosenfeld@amd.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    cded932b
pgtable_32.h 6.99 KB