1. 19 May, 2015 40 commits
    • Ingo Molnar's avatar
      x86/fpu: Enumerate xfeature bits · 677b98bd
      Ingo Molnar authored
      Transform the xstate masks into an enumerated list of xfeature bits.
      
      This removes the hard coding of XFEATURES_NR_MAX.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      677b98bd
    • Ingo Molnar's avatar
      x86/fpu: Simplify print_xstate_features() · 33588b52
      Ingo Molnar authored
      We do a boot time printout of xfeatures in print_xstate_features(),
      simplify this code to make use of the recently introduced cpu_has_xfeature()
      method.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      33588b52
    • Ingo Molnar's avatar
      x86/fpu: Introduce cpu_has_xfeatures(xfeatures_mask, feature_name) · 5b073430
      Ingo Molnar authored
      A lot of FPU using driver code is querying complex CPU features to be
      able to figure out whether a given set of xstate features is supported
      by the CPU or not.
      
      Introduce a simplified API function that can be used on any CPU type
      to get this information. Also add an error string return pointer,
      so that the driver can print a meaningful error message with a
      standardized feature name.
      
      Also mark xfeatures_mask as __read_only.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5b073430
    • Ingo Molnar's avatar
      x86/fpu: Rename fpu/xsave.c to fpu/xstate.c · 62784854
      Ingo Molnar authored
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      62784854
    • Ingo Molnar's avatar
      x86/fpu: Rename fpu/xsave.h to fpu/xstate.h · 669ebabb
      Ingo Molnar authored
      'xsave' is an x86 instruction name to most people - but xsave.h is
      about a lot more than just the XSAVE instruction: it includes
      definitions and support, both internal and external, related to
      xstate and xfeatures support.
      
      As a first step in cleaning up the various xstate uses rename this
      header to 'fpu/xstate.h' to better reflect what this header file
      is about.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      669ebabb
    • Ingo Molnar's avatar
      x86/fpu: Optimize fpu_copy() some more on lazy switching systems · b1652900
      Ingo Molnar authored
      The current fpu_copy() code on lazy switching CPUs always saves
      into the current fpstate and then copies it over into the child
      context:
      
      		preempt_disable();
      		if (!copy_fpregs_to_fpstate(src_fpu))
      			fpregs_deactivate(src_fpu);
      		preempt_enable();
      		memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);
      
      That memcpy() can be avoided on all lazy switching setups except
      really old FNSAVE-only systems: change fpu_copy() to directly save
      into the child context, for both the lazy and the eager context
      switching case.
      
      Note that we still have to do a memcpy() back into the parent
      context in the FNSAVE case, but this won't be executed on the
      majority of x86 systems that got built in the last 10 years or so.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b1652900
    • Ingo Molnar's avatar
      x86/fpu: Optimize fpu_copy() · 68271c6a
      Ingo Molnar authored
      Optimize fpu_copy() a bit by expanding the ->fpstate_active == 1
      portion of fpu__save() into it.
      
      ( The main purpose of this change is to enable another, larger
        optimization that will be done in the next patch. )
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      68271c6a
    • Ingo Molnar's avatar
      x86/fpu: Optimize fpu__save() · 48c4717f
      Ingo Molnar authored
      So fpu__save() does this currently:
      
      		copy_fpregs_to_fpstate(fpu);
      		if (!use_eager_fpu())
      			fpregs_deactivate(fpu);
      
      ... which deactivates the FPU on lazy switching systems unconditionally.
      
      Both usecases of fpu__save() use this function to save the
      FPU state into a fpstate: fork()/clone() and math error signal handling.
      
      The unconditional disabling of FPU registers in the lazy switching
      case is probably a mistaken conversion of old FNSAVE code (that had
      to disable FPU registers).
      
      So speed up this code by only disabling FPU registers when absolutely
      necessary: when indicated by the copy_fpregs_to_fpstate() return
      code:
      
      		if (!copy_fpregs_to_fpstate(fpu))
      			fpregs_deactivate(fpu);
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      48c4717f
    • Ingo Molnar's avatar
      x86/fpu: Simplify fpu__save() · fea435a2
      Ingo Molnar authored
      Factor out a common call.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fea435a2
    • Ingo Molnar's avatar
      x86/fpu: Eliminate __save_fpu() · 9f876d67
      Ingo Molnar authored
      The current implementation of __save_fpu():
      
      	if (use_xsave()) {
      		xsave_state(&fpu->state.xsave);
      	} else {
      		fpu_fxsave(fpu);
      	}
      
      Is actually a simplified version of copy_fpregs_to_fpstate(),
      if use_eager_fpu() is true.
      
      But all call sites of __save_fpu() call it only it when use_eager_fpu()
      is true.
      
      So we can eliminate __save_fpu() altogether and use the standard
      copy_fpregs_to_fpstate() function. This cleans up the code
      by making it use fewer variants of FPU register saving.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9f876d67
    • Ingo Molnar's avatar
      x86/fpu: Simplify __save_fpu() · 72ee6f87
      Ingo Molnar authored
      __save_fpu() has this pattern:
      
      		if (unlikely(system_state == SYSTEM_BOOTING))
      			xsave_state_booting(&fpu->state.xsave);
      		else
      			xsave_state(&fpu->state.xsave);
      
      ... but it does not actually get called during system bootup.
      
      So remove the complication and always call xsave_state().
      
      To make sure this assumption is correct, add a WARN_ONCE()
      debug check to xsave_state().
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      72ee6f87
    • Ingo Molnar's avatar
      x86/fpu: Factor out FPU hw activation/deactivation · 32b49b3c
      Ingo Molnar authored
      We have repeat patterns of:
      
      	if (!use_eager_fpu())
      		clts();
      
      ... to activate FPU registers, and:
      
      	if (!use_eager_fpu())
      		stts();
      
      ... to deactivate them.
      
      Encapsulate these in:
      
      	__fpregs_activate_hw();
      	__fpregs_activate_hw();
      
      and use them accordingly.
      
      Doing this synchronizes the idiom with the fpu->fpregs_active
      software-flag's handling functions, creating clear patterns of:
      
      	__fpregs_activate_hw();
      	__fpregs_activate(fpu);
      
      etc., which improves readability.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      32b49b3c
    • Ingo Molnar's avatar
      x86/fpu: Rename fpu__unlazy_stopped() to fpu__activate_stopped() · 67ee658e
      Ingo Molnar authored
      In line with the fpstate_activate() change, name
      fpu__unlazy_stopped() in a similar fashion as well: its purpose
      is to make the fpstate of a stopped task the current and active FPU
      context, which may require unlazying and initialization.
      
      The unlazying is just part of the job, the main concept is to make
      the fpstate active.
      
      Also clarify the function's description to clarify its exact
      usage and the background behind it all.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      67ee658e
    • Ingo Molnar's avatar
      x86/fpu: Simplify fpstate_init_curr() usage · c4d72e2d
      Ingo Molnar authored
      Now that fpstate_init_curr() is not doing implicit allocations
      anymore, almost all uses of it involve a very simple pattern:
      
      	if (!fpu->fpstate_active)
      		fpstate_init_curr(fpu);
      
      which is basically activating the FPU fpstate if it was not active
      before.
      
      So propagate the check into the function itself, and rename the
      function according to its new purpose:
      
      	fpu__activate_curr(fpu);
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c4d72e2d
    • Ingo Molnar's avatar
      x86/fpu, kvm: Simplify fx_init() · 0ee6a517
      Ingo Molnar authored
      Now that fpstate_init() cannot fail the error return of fx_init()
      has lost its purpose. Eliminate the error return and propagate this
      change to all callers.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0ee6a517
    • Ingo Molnar's avatar
      x86/fpu: Simplify fpu__unlazy_stopped() error handling · 2fb29fc7
      Ingo Molnar authored
      Now that FPU contexts are always allocated, fpu__unlazy_stopped()
      cannot fail. Remove its error return and propagate the changes to
      the callers.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2fb29fc7
    • Ingo Molnar's avatar
      x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr() · e62bb3d8
      Ingo Molnar authored
      Now that there are no FPU context allocations, rename fpstate_alloc_init()
      to fpstate_init_curr(), to signal that it initializes the fpstate and
      marks it active, for the current task.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e62bb3d8
    • Ingo Molnar's avatar
      x86/fpu: Remove failure return from fpstate_alloc_init() · 91d93d0e
      Ingo Molnar authored
      Remove the failure code and propagate this down to callers.
      
      Note that this function still has an 'init' aspect, which must be
      called.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      91d93d0e
    • Ingo Molnar's avatar
      x86/fpu: Remove failure paths from fpstate-alloc low level functions · c4d6ee6e
      Ingo Molnar authored
      Now that we always allocate the FPU context as part of task_struct there's
      no need for separate allocations - remove them and their primary failure
      handling code.
      
      ( Note that there's still secondary error codes that have become superfluous,
        those will be removed in separate patches. )
      
      Move the somewhat misplaced setup_xstate_comp() call to the core.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c4d6ee6e
    • Ingo Molnar's avatar
      x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again) · 7366ed77
      Ingo Molnar authored
      So 6 years ago we made the FPU fpstate dynamically allocated:
      
        aa283f49 ("x86, fpu: lazy allocation of FPU area - v5")
        61c4628b ("x86, fpu: split FPU state from task struct - v5")
      
      In hindsight this was a mistake:
      
         - it complicated context allocation failure handling, such as:
      
      		/* kthread execs. TODO: cleanup this horror. */
      		if (WARN_ON(fpstate_alloc_init(fpu)))
      			force_sig(SIGKILL, tsk);
      
         - it caused us to enable irqs in fpu__restore():
      
                      local_irq_enable();
                      /*
                       * does a slab alloc which can sleep
                       */
                      if (fpstate_alloc_init(fpu)) {
                              /*
                               * ran out of memory!
                               */
                              do_group_exit(SIGKILL);
                              return;
                      }
                      local_irq_disable();
      
         - it (slightly) slowed down task creation/destruction by adding
           slab allocation/free pattens.
      
         - it made access to context contents (slightly) slower by adding
           one more pointer dereference.
      
      The motivation for the dynamic allocation was two-fold:
      
         - reduce memory consumption by non-FPU tasks
      
         - allocate and handle only the necessary amount of context for
           various XSAVE processors that have varying hardware frame
           sizes.
      
      These days, with glibc using SSE memcpy by default and GCC optimizing
      for SSE/AVX by default, the scope of FPU using apps on an x86 system is
      much larger than it was 6 years ago.
      
      For example on a freshly installed Fedora 21 desktop system, with a
      recent kernel, all non-kthread tasks have used the FPU shortly after
      bootup.
      
      Also, even modern embedded x86 CPUs try to support the latest vector
      instruction set - so they'll too often use the larger xstate frame
      sizes.
      
      So remove the dynamic allocation complication by embedding the FPU
      fpstate in task_struct again. This should make the FPU a lot more
      accessible to all sorts of atomic contexts.
      
      We could still optimize for the xstate frame size in the future,
      by moving the state structure to the last element of task_struct,
      and allocating only a part of that.
      
      This change is kept minimal by still keeping the ctx_alloc()/free()
      routines (that now do nothing substantial) - we'll remove them in
      the following patches.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7366ed77
    • Ingo Molnar's avatar
      x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX... · 1bc6b056
      Ingo Molnar authored
      x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
      
      So we have the following ancient code in copy_fpregs_to_fpstate():
      
      	if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
      		asm volatile("fnclex");
      		goto drop_fpregs;
      	}
      
      which clears pending FPU exceptions and then drops registers, which
      causes the next FP instruction of the saved context to re-load the
      saved FPU state, with all pending exceptions marked properly, and
      will re-start the exception handling mechanism in the hardware.
      
      Since FPU exceptions are always issued on instruction boundaries,
      in particular on the next FP instruction following the exception
      generating instruction, there's no fear of getting an FP exception
      asynchronously.
      
      They were truly asynchronous back in the IRQ13 days, when the FPU was
      a weird and expensive co-processor that did its own processing, and we
      had to synchronize with them, but that code is not working anymore:
      we don't have IRQ13 mapped in the IDT anymore.
      
      With the introduction of optimized XSAVE support there's a new
      complication: if the xstate features bit indicates that a particular
      state component is unused (in 'init state'), then the hardware does
      not guarantee that the XSAVE (et al) instruction keeps the underlying
      FPU state image in memory valid and current. In practice this means
      that the hardware won't write it, and the exceptions flag in the
      state might be an older version, with it still being set. This
      meant that we had to check the xfeatures flag as well, adding
      another memory load and branch to a critical hot path of the scheduler.
      
      So optimize all this by removing both the old quirk and the new check,
      and straight-line optimizing the most common cases with likely()
      hints. Quite a bit of code gets removed this way:
      
        arch/x86/kernel/process_64.o:
      
          text    data     bss     dec     filename
          5484       8       0    5492     process_64.o.before
          5416       8       0    5424     process_64.o.after
      
      Now there's also a chance that some weird behavior or erratum was
      masked by our IRQ13 handling quirk (or that I misunderstood the
      nature of the quirk), and that this change triggers some badness.
      
      There's no real good way to protect against that possibility other
      than keeping this change well isolated, well commented and well
      bisectable. If you bisect a weird (or not so weird) breakage to
      this commit then please let us know!
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1bc6b056
    • Ingo Molnar's avatar
      x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() · 4f836347
      Ingo Molnar authored
      So fpu_save_init() is a historic name that got its name when the only
      way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
      state after saving it.
      
      Nowadays the name is misleading, because ever since the introduction of
      FXSAVE (and more modern FPU saving instructions) the 'we need to reload
      the FPU state' part is only true if there's a pending FPU exception [*],
      which is almost never the case.
      
      So rename it to copy_fpregs_to_fpstate() to make it clear what's
      happening. Also add a few comments about why we cannot keep registers
      in certain cases.
      
      Also clean up the control flow a bit, to make it more apparent when
      we are dropping/keeping FP registers, and to optimize the common
      case (of keeping fpregs) some more.
      
      [*] Probably not true anymore, modern instructions always leave the FPU
          state intact, even if exceptions are pending: because pending FP
          exceptions are posted on the next FP instruction, not asynchronously.
      
          They were truly asynchronous back in the IRQ13 case, and we had to
          synchronize with them, but that code is not working anymore: we don't
          have IRQ13 mapped in the IDT anymore.
      
          But a cleanup patch is obviously not the place to change subtle behavior.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4f836347
    • Ingo Molnar's avatar
      x86/fpu: Uninline the irq_ts_save()/restore() functions · 91066588
      Ingo Molnar authored
      Especially the irq_ts_save() function is pretty bloaty, generating
      over a dozen instructions, so uninline them.
      
      Even though the API is used rarely, the space savings are measurable:
      
         text    data     bss     dec     hex filename
         13331995        2572920 1634304 17539219        10ba093 vmlinux.before
         13331739        2572920 1634304 17538963        10b9f93 vmlinux.after
      
      ( This also allows the removal of an include file inclusion from fpu/api.h,
        speeding up the kernel build slightly. )
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      91066588
    • Ingo Molnar's avatar
      x86/fpu: Move various internal function prototypes to fpu/internal.h · 952f07ec
      Ingo Molnar authored
      There are a number of FPU internal function prototypes and an inline function
      in fpu/api.h, mostly placed so historically as the code grew over the years.
      
      Move them over into fpu/internal.h where they belong. (Add sched.h include
      to stackprotector.h which incorrectly relied on getting it from fpu/api.h.)
      
      fpu/api.h is now a pure file that only contains FPU APIs intended for driver
      use.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      952f07ec
    • Ingo Molnar's avatar
      x86/fpu: Uninline kernel_fpu_begin()/end() · d63e79b1
      Ingo Molnar authored
      Both inline functions call an inline function unconditionally, so we
      already pay the function call based clobbering cost. Uninline them.
      
      This saves quite a bit of code in various performance sensitive
      code paths:
      
         text            data    bss     dec             hex     filename
         13321334        2569888 1634304 17525526        10b6b16 vmlinux.before
         13320246        2569888 1634304 17524438        10b66d6 vmlinux.after
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d63e79b1
    • Ingo Molnar's avatar
      x86/fpu: Move fpu__save() to fpu/internals.h · e2295375
      Ingo Molnar authored
      It's an internal method, not a driver API, so move it from fpu/api.h
      to fpu/internal.h.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e2295375
    • Ingo Molnar's avatar
      x86/fpu: Add more comments to the FPU init code · ae02679c
      Ingo Molnar authored
      Extend the comments of the FPU init code, and fix old ones.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ae02679c
    • Ingo Molnar's avatar
      x86/fpu: Reorder init methods · 41e78410
      Ingo Molnar authored
      Reorder init methods in order of their relationship and usage, to
      form coherent blocks throughout the whole file.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      41e78410
    • Ingo Molnar's avatar
      x86/fpu: Rename fpstate_xstate_init_size() to fpu__init_system_xstate_size_legacy() · 7638b74b
      Ingo Molnar authored
      To bring it in line with the other init_system*() methods.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7638b74b
    • Ingo Molnar's avatar
      x86/fpu: Remove the extra fpu__detect() layer · c66e3f28
      Ingo Molnar authored
      Now that fpu__detect() has become an empty layer around
      fpu__init_system(), eliminate it and make fpu__init_system()
      the main system initialization routine.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c66e3f28
    • Ingo Molnar's avatar
      x86/fpu: Move fpu__init_system_early_generic() out of fpu__detect() · dd863880
      Ingo Molnar authored
      Move the fpu__init_system_early_generic() call into fpu__init_system(),
      which hosts all the system init calls.
      
      Expose fpu__init_system() to other modules - this will be our main and only
      system init function.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dd863880
    • Ingo Molnar's avatar
      x86/fpu: Make check_fpu() init ordering independent · 71eb3c6d
      Ingo Molnar authored
      check_fpu() currently relies on being called early in the init sequence,
      when CR0::TS has not been set up yet.
      
      Save/restore CR0::TS across this function, to make it invariant to
      init ordering. This way we'll be able to move the generic FPU setup
      routines earlier in the init sequence.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      71eb3c6d
    • Ingo Molnar's avatar
      x86/fpu: Factor out FPU bug checks into fpu/bugs.c · 0bf23f3d
      Ingo Molnar authored
      Create separate fpu/bugs.c code so that if we read generic FPU code
      we don't have to wade through all the bugcheck related code first.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0bf23f3d
    • Ingo Molnar's avatar
      x86/fpu: Move !FPU check ingo fpu__init_system_early_generic() · e83ab9ad
      Ingo Molnar authored
      There's a !FPU related sanity check in fpu__init_cpu_generic(),
      which is executed on every CPU onlining - even though we should do
      this only once, and during system init.
      
      Move this check to fpu__init_system_early_generic().
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e83ab9ad
    • Ingo Molnar's avatar
      x86/fpu: Factor out fpu__init_system_early_generic() · 2e2f3da7
      Ingo Molnar authored
      Move the generic bits of fpu__detect() into fpu__init_system_early_generic().
      
      We'll move some other code here too in a followup patch.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2e2f3da7
    • Ingo Molnar's avatar
      x86/fpu: Factor out fpu__init_system_generic() · 7218e8b7
      Ingo Molnar authored
      Factor out the generic bits from fpu__init_system().
      
      Rename mxcsr_feature_mask_init() to fpu__init_system_mxcsr()
      to bring it in line with the rest of the nomenclature.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7218e8b7
    • Ingo Molnar's avatar
      x86/fpu: Factor out fpu__init_cpu_generic() · b11316ed
      Ingo Molnar authored
      Factor out the generic bits from fpu__init_cpu(), to create
      a flat sequence of per CPU initialization function calls:
      
      	fpu__init_cpu_generic();
      	fpu__init_cpu_xstate();
      	fpu__init_cpu_ctx_switch();
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b11316ed
    • Ingo Molnar's avatar
      x86/fpu: Simplify fpu__cpu_init() · 21c4cd10
      Ingo Molnar authored
      After the latest round of cleanups, fpu__cpu_init() has become
      a simple call to fpu__init_cpu().
      
      Rename fpu__init_cpu() to fpu__cpu_init() and remove the
      extra layer.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      21c4cd10
    • Ingo Molnar's avatar
      x86/fpu: Remove fpu__init_cpu_ctx_switch() call from fpu__init_system() · 7202ab46
      Ingo Molnar authored
      We are now doing the fpu__init_cpu_ctx_switch() call from fpu__init_cpu(),
      so there's no need to call it from fpu__init_system() anymore.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7202ab46
    • Ingo Molnar's avatar
      x86/fpu: Do system-wide setup from fpu__detect() · 067051cc
      Ingo Molnar authored
      fpu__cpu_init() is called on every CPU, so it is the wrong place
      to call fpu__init_system() from. Call it from fpu__detect():
      this is early CPU init code, but we already have CPU features detected,
      so we can call the system-wide FPU init code from here.
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      067051cc