• Andrew Morton's avatar
    [PATCH] Infrastructure for correct hugepage refcounting · eefb08ee
    Andrew Morton authored
    We currently have a problem when things like ptrace, futexes and direct-io
    try to pin user pages.  If the user's address is in a huge page we're
    elevting the refcount of a constituent 4k page, not the head page of the
    high-order allocation unit.
    
    To solve this, a generic way of handling higher-order pages has been
    implemented:
    
    - A higher-order page is called a "compound page".  Chose this because
      "huge page", "large page", "super page", etc all seem to mean different
      things to different people.
    
    - The first (controlling) 4k page of a compound page is referred to as the
      "head" page.
    
    - The remaining pages are tail pages.
    
    All pages have PG_compound set.  All pages have their lru.next pointing at
    the head page (even the head page has this).
    
    The head page's lru.prev, if non-zero, holds the address of the compound
    page's put_page() function.
    
    The order of the allocation is stored in the first tail page's lru.prev.
    This is only for debug at present.  This usage means that zero-order pages
    may not be compound.
    
    The above relationships are established for _all_ higher-order pages in the
    page allocator.  Which has some cost, but not much - another atomic op during
    fork(), mainly.
    
    This functionality is only enabled if CONFIG_HUGETLB_PAGE, although it could
    be turned on permanently.  There's a little extra cost in get_page/put_page.
    
    These changes do not preclude adding compound pages to the LRU in the future
    - we can add a new page flag to the head page and then move all the
    additional data to the first tail page's lru.next, lru.prev, list.next,
    list.prev, index, private, etc.
    eefb08ee
page_alloc.c 37.2 KB