• Charan Teja Kalla's avatar
    mm: fix use-after free of page_ext after race with memory-offline · b1d5488a
    Charan Teja Kalla authored
    The below is one path where race between page_ext and offline of the
    respective memory blocks will cause use-after-free on the access of
    page_ext structure.
    
    process1		              process2
    ---------                             ---------
    a)doing /proc/page_owner           doing memory offline
    			           through offline_pages.
    
    b) PageBuddy check is failed
       thus proceed to get the
       page_owner information
       through page_ext access.
    page_ext = lookup_page_ext(page);
    
    				    migrate_pages();
    				    .................
    				Since all pages are successfully
    				migrated as part of the offline
    				operation,send MEM_OFFLINE notification
    				where for page_ext it calls:
    				offline_page_ext()-->
    				__free_page_ext()-->
    				   free_page_ext()-->
    				     vfree(ms->page_ext)
    			           mem_section->page_ext = NULL
    
    c) Check for the PAGE_EXT
       flags in the page_ext->flags
       access results into the
       use-after-free (leading to
       the translation faults).
    
    As mentioned above, there is really no synchronization between page_ext
    access and its freeing in the memory_offline.
    
    The memory offline steps(roughly) on a memory block is as below:
    
    1) Isolate all the pages
    
    2) while(1)
      try free the pages to buddy.(->free_list[MIGRATE_ISOLATE])
    
    3) delete the pages from this buddy list.
    
    4) Then free page_ext.(Note: The struct page is still alive as it is
       freed only during hot remove of the memory which frees the memmap,
       which steps the user might not perform).
    
    This design leads to the state where struct page is alive but the struct
    page_ext is freed, where the later is ideally part of the former which
    just representing the page_flags (check [3] for why this design is
    chosen).
    
    The abovementioned race is just one example __but the problem persists in
    the other paths too involving page_ext->flags access(eg:
    page_is_idle())__.
    
    Fix all the paths where offline races with page_ext access by maintaining
    synchronization with rcu lock and is achieved in 3 steps:
    
    1) Invalidate all the page_ext's of the sections of a memory block by
       storing a flag in the LSB of mem_section->page_ext.
    
    2) Wait until all the existing readers to finish working with the
       ->page_ext's with synchronize_rcu().  Any parallel process that starts
       after this call will not get page_ext, through lookup_page_ext(), for
       the block parallel offline operation is being performed.
    
    3) Now safely free all sections ->page_ext's of the block on which
       offline operation is being performed.
    
    Note: If synchronize_rcu() takes time then optimizations can be done in
    this path through call_rcu()[2].
    
    Thanks to David Hildenbrand for his views/suggestions on the initial
    discussion[1] and Pavan kondeti for various inputs on this patch.
    
    [1] https://lore.kernel.org/linux-mm/59edde13-4167-8550-86f0-11fc67882107@quicinc.com/
    [2] https://lore.kernel.org/all/a26ce299-aed1-b8ad-711e-a49e82bdd180@quicinc.com/T/#u
    [3] https://lore.kernel.org/all/6fa6b7aa-731e-891c-3efb-a03d6a700efa@redhat.com/
    
    [quic_charante@quicinc.com: rename label `loop' to `ext_put_continue' per David]
      Link: https://lkml.kernel.org/r/1661496993-11473-1-git-send-email-quic_charante@quicinc.com
    Link: https://lkml.kernel.org/r/1660830600-9068-1-git-send-email-quic_charante@quicinc.comSigned-off-by: default avatarCharan Teja Kalla <quic_charante@quicinc.com>
    Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
    Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: Fernand Sieber <sieberf@amazon.com>
    Cc: Minchan Kim <minchan@google.com>
    Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Pavan Kondeti <quic_pkondeti@quicinc.com>
    Cc: SeongJae Park <sjpark@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: William Kucharski <william.kucharski@oracle.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    b1d5488a
page_table_check.c 5.92 KB