• Daniel Borkmann's avatar
    x86/tlb: Fix tlb flushing when lguest clears PGE · 459bc506
    Daniel Borkmann authored
    commit 2c4ea6e2 upstream.
    
    Fengguang reported random corruptions from various locations on x86-32
    after commits d2852a22 ("arch: add ARCH_HAS_SET_MEMORY config") and
    9d876e79 ("bpf: fix unlocking of jited image when module ronx not set")
    that uses the former. While x86-32 doesn't have a JIT like x86_64, the
    bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to
    ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module
    support built in and therefore never had the DEBUG_SET_MODULE_RONX setting
    enabled.
    
    After investigating the crashes further, it turned out that using
    set_memory_ro() and set_memory_rw() didn't have the desired effect, for
    example, setting the pages as read-only on x86-32 would still let
    probe_kernel_write() succeed without error. This behavior would manifest
    itself in situations where the vmalloc'ed buffer was accessed prior to
    set_memory_*() such as in case of bpf_prog_alloc(). In cases where it
    wasn't, the page attribute changes seemed to have taken effect, leading to
    the conclusion that a TLB invalidate didn't happen. Moreover, it turned out
    that this issue reproduced with qemu in "-cpu kvm64" mode, but not for
    "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger
    a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(),
    though.
    
    There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends
    on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush
    (depends on X86_FEATURE_PGE), and cr3 based flush.  For "-cpu host" case in
    my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu
    kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually
    worked fine, and further investigating the cr4 one turned out that
    X86_CR4_PGE bit was not set in cr4 register, meaning the
    __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same
    value instead of clearing X86_CR4_PGE in the first write to trigger the
    flush.
    
    It turned out that X86_CR4_PGE was cleared from cr4 during init from
    lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also
    cleared from there due to concerns of using PGE in guest kernel that can
    lead to hard to trace bugs (see bff672e6 ("lguest: documentation V:
    Host") in init()). The CPU feature bits are cleared in dynamic
    boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses
    static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB
    flushing to use, meaning they still used the old setting of the host
    kernel.
    
    Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate
    to static_cpu_has() checks is too late at this point as sections have been
    patched already, so for now, it seems reasonable to switch back to
    boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95
    ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via
    cr3 as originally intended, properly makes the new page attributes visible
    and thus fixes the crashes seen by Fengguang.
    
    Fixes: c109bf95 ("x86/cpufeature: Remove cpu_has_pge")
    Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Cc: bp@suse.de
    Cc: Kees Cook <keescook@chromium.org>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: netdev@vger.kernel.org
    Cc: Rusty Russell <rusty@rustcorp.com.au>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: lkp@01.org
    Cc: Laura Abbott <labbott@redhat.com>
    Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com
    Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    459bc506
tlbflush.h 8.09 KB