• Rick Edgecombe's avatar
    x86/mm: Introduce _PAGE_SAVED_DIRTY · fca4d413
    Rick Edgecombe authored
    Some OSes have a greater dependence on software available bits in PTEs than
    Linux. That left the hardware architects looking for a way to represent a
    new memory type (shadow stack) within the existing bits. They chose to
    repurpose a lightly-used state: Write=0,Dirty=1. So in order to support
    shadow stack memory, Linux should avoid creating memory with this PTE bit
    combination unless it intends for it to be shadow stack.
    
    The reason it's lightly used is that Dirty=1 is normally set by HW
    _before_ a write. A write with a Write=0 PTE would typically only generate
    a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1 *and*
    generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware which
    supports shadow stacks will no longer exhibit this oddity.
    
    So that leaves Write=0,Dirty=1 PTEs created in software. To avoid
    inadvertently created shadow stack memory, in places where Linux normally
    creates Write=0,Dirty=1, it can use the software-defined _PAGE_SAVED_DIRTY
    in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs
    to create Write=0,Dirty=1, it instead creates Write=0,SavedDirty=1 except
    for shadow stack, which is Write=0,Dirty=1.
    
    There are six bits left available to software in the 64-bit PTE after
    consuming a bit for _PAGE_SAVED_DIRTY. For 32 bit, the same bit as
    _PAGE_BIT_UFFD_WP is used, since user fault fd is not supported on 32
    bit. This leaves one unused software bit on 32 bit (_PAGE_BIT_SOFT_DIRTY,
    as this is also not supported on 32 bit).
    
    Implement only the infrastructure for _PAGE_SAVED_DIRTY. Changes to
    actually begin creating _PAGE_SAVED_DIRTY PTEs will follow once other
    pieces are in place.
    
    Since this SavedDirty shifting is done for all x86 CPUs, this leaves
    the possibility for the hardware oddity to still create Write=0,Dirty=1
    PTEs in rare cases. Since these CPUs also don't support shadow stack, this
    will be harmless as it was before the introduction of SavedDirty.
    
    Implement the shifting logic to be branchless. Embed the logic of whether
    to do the shifting (including checking the Write bits) so that it can be
    called by future callers that would otherwise need additional branching
    logic. This efficiency allows the logic of when to do the shifting to be
    centralized, making the code easier to reason about.
    Co-developed-by: default avatarYu-cheng Yu <yu-cheng.yu@intel.com>
    Signed-off-by: default avatarYu-cheng Yu <yu-cheng.yu@intel.com>
    Signed-off-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
    Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Tested-by: default avatarPengfei Xu <pengfei.xu@intel.com>
    Tested-by: default avatarJohn Allen <john.allen@amd.com>
    Tested-by: default avatarKees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/all/20230613001108.3040476-11-rick.p.edgecombe%40intel.com
    fca4d413
pgtable.h 36.8 KB