• Linus Torvalds's avatar
    i387: fix x86-64 preemption-unsafe user stack save/restore · 15d8791c
    Linus Torvalds authored
    Commit 5b1cbac3 ("i387: make irq_fpu_usable() tests more robust")
    added a sanity check to the #NM handler to verify that we never cause
    the "Device Not Available" exception in kernel mode.
    
    However, that check actually pinpointed a (fundamental) race where we do
    cause that exception as part of the signal stack FPU state save/restore
    code.
    
    Because we use the floating point instructions themselves to save and
    restore state directly from user mode, we cannot do that atomically with
    testing the TS_USEDFPU bit: the user mode access itself may cause a page
    fault, which causes a task switch, which saves and restores the FP/MMX
    state from the kernel buffers.
    
    This kind of "recursive" FP state save is fine per se, but it means that
    when the signal stack save/restore gets restarted, it will now take the
    '#NM' exception we originally tried to avoid.  With preemption this can
    happen even without the page fault - but because of the user access, we
    cannot just disable preemption around the save/restore instruction.
    
    There are various ways to solve this, including using the
    "enable/disable_page_fault()" helpers to not allow page faults at all
    during the sequence, and fall back to copying things by hand without the
    use of the native FP state save/restore instructions.
    
    However, the simplest thing to do is to just allow the #NM from kernel
    space, but fix the race in setting and clearing CR0.TS that this all
    exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
    atomic wrt scheduling, so while the actual state save/restore can be
    interrupted and restarted, the act of actually clearing/setting CR0.TS
    and the TS_USEDFPU bit together must not.
    
    Instead of just adding random "preempt_disable/enable()" calls to what
    is already excessively ugly code, this introduces some helper functions
    that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
    the user state instead.
    
    Those helper functions should probably eventually replace the other
    ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
    some more: the task switching functionality in particular needs to
    expose the difference between the 'prev' and 'next' threads, while the
    new helper functions intentionally were written to only work with
    'current'.
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    15d8791c
xsave.c 11 KB