• Joerg Roedel's avatar
    x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all() · 9a62d200
    Joerg Roedel authored
    The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc()
    ranges: before such vmap ranges are reused we make sure that they are
    unmapped from every task's page tables.
    
    This is really easy on pagetable setups where the kernel page tables
    are shared between all tasks - this is the case on 32-bit kernels
    with SHARED_KERNEL_PMD = 1.
    
    But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating
    over the pgd_list and clearing all pmd entries in the pgds that
    are cleared in the init_mm.pgd, which is the reference pagetable
    that the vmalloc() code uses.
    
    In that context the current practice of vmalloc_sync_all() iterating
    until FIX_ADDR_TOP is buggy:
    
            for (address = VMALLOC_START & PMD_MASK;
                 address >= TASK_SIZE_MAX && address < FIXADDR_TOP;
                 address += PMD_SIZE) {
                    struct page *page;
    
    Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc
    address ranges:
    
    	VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR
    
    This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges
    that don't clear their pmds, but it's lethal for the LDT range,
    which relies on having different mappings in different processes,
    and 'synchronizing' them in the vmalloc sense corrupts those
    pagetable entries (clearing them).
    
    This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD
    off and makes this the dominant mapping mode on 32-bit.
    
    To make LDT working again vmalloc_sync_all() must only iterate over
    the volatile parts of the kernel address range that are identical
    between all processes.
    
    So the correct check in vmalloc_sync_all() is "address < VMALLOC_END"
    to make sure the VMALLOC areas are synchronized and the LDT
    mapping is not falsely overwritten.
    
    The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either,
    but this is not really a proplem since their PMDs get established
    during bootup and never change.
    
    This change fixes the ldt_gdt selftest in my setup.
    
    [ mingo: Fixed up the changelog to explain the logic and modified the
             copying to only happen up until VMALLOC_END. ]
    Reported-by: default avatarBorislav Petkov <bp@suse.de>
    Tested-by: default avatarBorislav Petkov <bp@suse.de>
    Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
    Cc: <stable@vger.kernel.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Joerg Roedel <joro@8bytes.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: hpa@zytor.com
    Fixes: 7757d607: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32")
    Link: https://lkml.kernel.org/r/20191126111119.GA110513@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    9a62d200
fault.c 39.7 KB