• Rick Edgecombe's avatar
    x86/fpu: Invalidate FPU state correctly on exec() · 1f69383b
    Rick Edgecombe authored
    The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
    valid and should be reloaded when returning to userspace. However, the
    kernel will skip doing this if the FPU registers are already valid as
    determined by fpregs_state_valid(). The logic embedded there considers
    the state valid if two cases are both true:
    
      1: fpu_fpregs_owner_ctx points to the current tasks FPU state
      2: the last CPU the registers were live in was the current CPU.
    
    This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
    the current FPU during the fpregs_restore_userregs() operation, so it
    indicates that the registers have been restored on this CPU. But this
    alone doesn’t preclude that the task hasn’t been rescheduled to a
    different CPU, where the registers were modified, and then back to the
    current CPU. To verify that this was not the case the logic relies on the
    second condition. So the assumption is that if the registers have been
    restored, AND they haven’t had the chance to be modified (by being
    loaded on another CPU), then they MUST be valid on the current CPU.
    
    Besides the lazy FPU optimizations, the other cases where the FPU
    registers might not be valid are when the kernel modifies the FPU register
    state or the FPU saved buffer. In this case the operation modifying the
    FPU state needs to let the kernel know the correspondence has been
    broken. The comment in “arch/x86/kernel/fpu/context.h” has:
    /*
    ...
     * If the FPU register state is valid, the kernel can skip restoring the
     * FPU state from memory.
     *
     * Any code that clobbers the FPU registers or updates the in-memory
     * FPU state for a task MUST let the rest of the kernel know that the
     * FPU registers are no longer valid for this task.
     *
     * Either one of these invalidation functions is enough. Invalidate
     * a resource you control: CPU if using the CPU for something else
     * (with preemption disabled), FPU for the current task, or a task that
     * is prevented from running by the current task.
     */
    
    However, this is not completely true. When the kernel modifies the
    registers or saved FPU state, it can only rely on
    __fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
    tracking. The exec path instead relies on fpregs_deactivate(), which sets
    the CPU’s FPU context to NULL. This was observed to fail to restore the
    reset FPU state to the registers when returning to userspace in the
    following scenario:
    
    1. A task is executing in userspace on CPU0
    	- CPU0’s FPU context points to tasks
    	- fpu->last_cpu=CPU0
    
    2. The task exec()’s
    
    3. While in the kernel the task is preempted
    	- CPU0 gets a thread executing in the kernel (such that no other
    		FPU context is activated)
    	- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
    
    4. Task is migrated to CPU1
    
    5. Continuing the exec(), the task gets to
       fpu_flush_thread()->fpu_reset_fpregs()
    	- Sets CPU1’s fpu context to NULL
    	- Copies the init state to the task’s FPU buffer
    	- Sets TIF_NEED_FPU_LOAD on the task
    
    6. The task reschedules back to CPU0 before completing the exec() and
       returning to userspace
    	- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
    	- Skips saving the registers and updating task’s fpu→last_cpu,
    	  because TIF_NEED_FPU_LOAD is the canonical source.
    
    7. Now CPU0’s FPU context is still pointing to the task’s, and
       fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
       though the reset FPU state has not been restored.
    
    So the root cause is that exec() is doing the wrong kind of invalidate. It
    should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
    fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
    going away, they are just getting reset as part of an exec. So switch to
    __fpu_invalidate_fpregs_state().
    
    Also, delete the misleading comment that says that either kind of
    invalidate will be enough, because it’s not always the case.
    
    Fixes: 33344368 ("x86/fpu: Clean up the fpu__clear() variants")
    Reported-by: default avatarLei Wang <lei4.wang@intel.com>
    Signed-off-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarLijun Pan <lijun.pan@intel.com>
    Reviewed-by: default avatarSohil Mehta <sohil.mehta@intel.com>
    Acked-by: default avatarLijun Pan <lijun.pan@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
    1f69383b
core.c 23.5 KB