• Mark Rutland's avatar
    stackleak: rework stack high bound handling · 0cfa2ccd
    Mark Rutland authored
    Prior to returning to userspace, we reset current->lowest_stack to a
    reasonable high bound. Currently we do this by subtracting the arbitrary
    value `THREAD_SIZE/64` from the top of the stack, for reasons lost to
    history.
    
    Looking at configurations today:
    
    * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The
      pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of
      padding above), and so this covers an additional portion of 44 to 60
      bytes.
    
    * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the
      bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs
      at the top of the stack is 168 bytes, and so this cover an additional
      88 bytes of stack (up to 344 with KASAN).
    
    * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages
      and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with
      KASAN). The pt_regs at the top of the stack is 336 bytes, so this can
      fall within the pt_regs, or can cover an additional 688 bytes of
      stack.
    
    Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the
    worst case, this will cause more than 600 bytes of stack to be erased
    for every syscall, even if actual stack usage were substantially
    smaller.
    
    This patches makes this slightly less nonsensical by consistently
    resetting current->lowest_stack to the base of the task pt_regs. For
    clarity and for consistency with the handling of the low bound, the
    generation of the high bound is split into a helper with commentary
    explaining why.
    
    Since the pt_regs at the top of the stack will be clobbered upon the
    next exception entry, we don't need to poison these at exception exit.
    By using task_pt_regs() as the high stack boundary instead of
    current_top_of_stack() we avoid some redundant poisoning, and the
    compiler can share the address generation between the poisoning and
    resetting of `current->lowest_stack`, making the generated code more
    optimal.
    
    It's not clear to me whether the existing `THREAD_SIZE/64` offset was a
    dodgy heuristic to skip the pt_regs, or whether it was attempting to
    minimize the number of times stackleak_check_stack() would have to
    update `current->lowest_stack` when stack usage was shallow at the cost
    of unconditionally poisoning a small portion of the stack for every exit
    to userspace.
    
    For now I've simply removed the offset, and if we need/want to minimize
    updates for shallow stack usage it should be easy to add a better
    heuristic atop, with appropriate commentary so we know what's going on.
    Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
    Cc: Alexander Popov <alex.popov@linux.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Kees Cook <keescook@chromium.org>
    Signed-off-by: default avatarKees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com
    0cfa2ccd
stackleak.c 4.08 KB