1. 09 Jun, 2015 19 commits
    • Dave Hansen's avatar
      x86/mpx: Allow 32-bit binaries on 64-bit kernels again · 97ac46a5
      Dave Hansen authored
      Now that the bugs in mixed mode MPX handling are fixed, re-allow
      32-bit binaries on 64-bit kernels again.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183706.70277DAD@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      97ac46a5
    • Dave Hansen's avatar
      x86/mpx: Do not count MPX VMAs as neighbors when unmapping · bea03c50
      Dave Hansen authored
      The comment pretty much says it all.
      
      I wrote a test program that does lots of random allocations
      and forces bounds tables to be created.  It came up with a
      layout like this:
      
        ....   | BOUNDS DIRECTORY ENTRY COVERS |  ....
               |    BOUNDS TABLE COVERS        |
      |  BOUNDS TABLE |  REAL ALLOC | BOUNDS TABLE |
      
      Unmapping "REAL ALLOC" should have been able to free the
      bounds table "covering" the "REAL ALLOC" because it was the
      last real user.  But, the neighboring VMA bounds tables were
      found, considered as real neighbors, and we declined to free
      the bounds table covering the area.
      
      Doing this over and over left a small but significant number
      of these orphans.  Handling them is fairly straighforward.
      All we have to do is walk the VMAs and skip all of the MPX
      ones when looking for neighbors.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183706.A6BD90BF@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bea03c50
    • Dave Hansen's avatar
      x86/mpx: Rewrite the unmap code · 3ceaccdf
      Dave Hansen authored
      The MPX code needs to clear out bounds tables for memory which
      is no longer in use.  We do this when a userspace mapping is
      torn down (unmapped).
      
      There are two modes:
      
        1. An entire bounds table becomes unused, and can be freed
           and its pointer removed from the bounds directory.  This
           happens either when a large mapping is torn down, or when
           a small mapping is torn down and it is the last mapping
           "covered" by a bounds table.
      
        2. Only part of a bounds table becomes unused, in which case
           we free the backing memory as if MADV_DONTNEED was called.
      
      The old code was a spaghetti mess of "edge" bounds tables
      where the edges were handled specially, even if we were
      unmapping an entire one.  Non-edge bounds tables are always
      fully unmapped, but share a different code path from the edge
      ones.  The old code had a bug where it was unmapping too much
      memory.  I worked on fixing it for two days and gave up.
      
      I didn't write the original code.  I didn't particularly like
      it, but it worked, so I left it.  After my debug session, I
      realized it was undebuggagle *and* buggy, so out it went.
      
      I also wrote a new unmapping test program which uncovers bugs
      pretty nicely.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183706.DCAEC67D@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3ceaccdf
    • Dave Hansen's avatar
      x86/mpx: Support 32-bit binaries on 64-bit kernels · 613fcb7d
      Dave Hansen authored
      Right now, the kernel can only switch between 64-bit and 32-bit
      binaries at compile time. This patch adds support for 32-bit
      binaries on 64-bit kernels when we support ia32 emulation.
      
      We essentially choose which set of table sizes to use when doing
      arithmetic for the bounds table calculations.
      
      This also uses a different approach for calculating the table
      indexes than before.  I think the new one makes it much more
      clear what is going on, and allows us to share more code between
      the 32-bit and 64-bit cases.
      Based-on-patch-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183705.E01F21E2@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      613fcb7d
    • Dave Hansen's avatar
      x86/mpx: Use 32-bit-only cmpxchg() for 32-bit apps · 6ac52bb4
      Dave Hansen authored
      user_atomic_cmpxchg_inatomic() actually looks at sizeof(*ptr) to
      figure out how many bytes to copy.  If we run it on a 64-bit
      kernel with a 64-bit pointer, it will copy a 64-bit bounds
      directory entry.  That's fine, except when we have 32-bit
      programs with 32-bit bounds directory entries and we only *want*
      32-bits.
      
      This patch breaks the cmpxchg() operation out in to its own
      function and performs the 32-bit type swizzling in there.
      
      Note, the "64-bit" version of this code _would_ work on a
      32-bit-only kernel.  The issue this patch addresses is only for
      when the kernel's 'long' is mismatched from the size of the
      bounds directory entry of the process we are working on.
      
      The new helper modifies 'actual_old_val' or returns an error.
      But gcc doesn't know this, so it warns about 'actual_old_val'
      being unused.  Shut it up with an uninitialized_var().
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183705.672B115E@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6ac52bb4
    • Dave Hansen's avatar
      x86/mpx: Introduce new 'directory entry' to 'addr' helper function · 54587653
      Dave Hansen authored
      Currently, to get from a bounds directory entry to the virtual
      address of a bounds table, we simply mask off a few low bits.
      However, the set of bits we mask off is different for 32-bit and
      64-bit binaries.
      
      This breaks the operation out in to a helper function and also
      adds a temporary variable to store the result until we are
      sure we are returning one.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183704.007686CE@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      54587653
    • Dave Hansen's avatar
      x86/mpx: Add temporary variable to reduce masking · a1149fc8
      Dave Hansen authored
      When we allocate a bounds table, we call mmap(), then add a
      "valid" bit to the value before storing it in to the bounds
      directory.
      
      If we fail along the way, we go and mask that valid bit
      _back_ out.  That seems a little silly, and this makes it
      much more clear when we have a plain address versus an
      actual table _entry_.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183704.3D69D5F4@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a1149fc8
    • Dave Hansen's avatar
      x86: Make is_64bit_mm() widely available · b0e9b09b
      Dave Hansen authored
      The uprobes code has a nice helper, is_64bit_mm(), that consults
      both the runtime and compile-time flags for 32-bit support.
      Instead of reinventing the wheel, pull it in to an x86 header so
      we can use it for MPX.
      
      I prefer passing the 'mm' around to test_thread_flag(TIF_IA32)
      because it makes it explicit where the context is coming from.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183704.F0209999@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b0e9b09b
    • Dave Hansen's avatar
      x86/mpx: Trace allocation of new bounds tables · cd4996dc
      Dave Hansen authored
      Bounds tables are a significant consumer of memory.  It is
      important to know when they are being allocated.  Add a trace
      point to trace whenever an allocation occurs and also its
      virtual address.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183704.EC23A93E@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cd4996dc
    • Dave Hansen's avatar
      x86/mpx: Trace the attempts to find bounds tables · 2a1dcb1f
      Dave Hansen authored
      There are two different events being traced here.  They are
      doing similar things so share a trace "EVENT_CLASS" and are
      presented together.
      
      1. Trace when MPX is zapping pages "mpx_unmap_zap":
      
      	When MPX can not free an entire bounds table, it will
      	instead try to zap unused parts of a bounds table to free
      	the backing memory.  This decreases RSS (resident set
      	size) without decreasing the virtual space allocated
      	for bounds tables.
      
      2. Trace attempts to find bounds tables "mpx_unmap_search":
      
      	This event traces any time we go looking to unmap a
      	bounds table for a given virtual address range.  This is
      	useful to ensure that the kernel actually "tried" to free
      	a bounds table versus times it succeeded in finding one.
      
      	It might try and fail if it realized that a table was
      	shared with an adjacent VMA which is not being unmapped.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183703.B9D2468B@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2a1dcb1f
    • Dave Hansen's avatar
      x86/mpx: Trace entry to bounds exception paths · 97efebf1
      Dave Hansen authored
      There are two basic things that can happen as the result of
      a bounds exception (#BR):
      
      	1. We allocate a new bounds table
      	2. We pass up a bounds exception to userspace.
      
      This patch adds a trace point for the case where we are
      passing the exception up to userspace with a signal.
      
      We are also explicit that we're printing out the inverse of
      the 'upper' that we encounter.  If you want to filter, for
      instance, you need to ~ the value first.  The reason we do
      this is because of how 'upper' is stored in the bounds table.
      
      If a pointer's range is:
      
      	0x1000 -> 0x2000
      
      it is stored in the bounds table as (32-bits here for brevity):
      
      	lower: 0x00001000
      	upper: 0xffffdfff
      
      That is so that an all 0's entry:
      
      	lower: 0x00000000
      	upper: 0x00000000
      
      corresponds to the "init" bounds which store a *range* of:
      
      	0x00000000 -> 0xffffffff
      
      That is, by far, the common case, and that lets us use the
      zero page, or deduplicate the memory, etc... The 'upper'
      stored in the table is gibberish to print by itself, so we
      print ~upper to get the *actual*, logical, human-readable
      value printed out.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183703.027BB9B0@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      97efebf1
    • Dave Hansen's avatar
      x86/mpx: Trace #BR exceptions · e7126cf5
      Dave Hansen authored
      This is the first in a series of MPX tracing patches.
      I've found these extremely useful in the process of
      debugging applications and the kernel code itself.
      
      This exception hooks in to the bounds (#BR) exception
      very early and allows capturing the key registers which
      would influence how the exception is handled.
      
      Note that bndcfgu/bndstatus are technically still
      64-bit registers even in 32-bit mode.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183703.5FE2619A@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e7126cf5
    • Dave Hansen's avatar
      x86/mpx: Introduce a boot-time disable flag · 8c3641e9
      Dave Hansen authored
      MPX has the _potential_ to cause some issues.  Say part of your
      init system tried to protect one of its components from buffer
      overflows with MPX.  If there were a false positive, it's
      possible that MPX could keep a system from booting.
      
      MPX could also potentially cause performance issues since it is
      present in hot paths like the unmap path.
      
      Allow it to be disabled at boot time.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: Thomas Gleixner <tglx@linutronix.de
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150607183702.2E8B77AB@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8c3641e9
    • Dave Hansen's avatar
      x86/mpx: Restrict the mmap() size check to bounds tables · eb099e5b
      Dave Hansen authored
      The comment and code here are confusing.  We do not currently
      allocate the bounds directory in the kernel.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183702.222CEC2A@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      eb099e5b
    • Qiaowei Ren's avatar
      x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK · 3c1d3230
      Qiaowei Ren authored
      MPX_BNDCFG_ADDR_MASK is defined two times, so this patch removes
      redundant one.
      Signed-off-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183702.5F129376@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3c1d3230
    • Dave Hansen's avatar
      x86/mpx: Clean up the code by not passing a task pointer around when unnecessary · 46a6e0cf
      Dave Hansen authored
      The MPX code can only work on the current task.  You can not,
      for instance, enable MPX management in another process or
      thread. You can also not handle a fault for another process or
      thread.
      
      Despite this, we pass a task_struct around prolifically.  This
      patch removes all of the task struct passing for code paths
      where the code can not deal with another task (which turns out
      to be all of them).
      
      This has no functional changes.  It's just a cleanup.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: bp@alien8.de
      Link: http://lkml.kernel.org/r/20150607183702.6A81DA2C@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      46a6e0cf
    • Dave Hansen's avatar
      x86/mpx: Use the new get_xsave_field_ptr()API · a84eeaa9
      Dave Hansen authored
      The MPX registers (bndcsr/bndcfgu/bndstatus) are not directly
      accessible via normal instructions.  They essentially act as
      if they were floating point registers and are saved/restored
      along with those registers.
      
      There are two main paths in the MPX code where we care about
      the contents of these registers:
      
      	1. #BR (bounds) faults
      	2. the prctl() code where we are setting MPX up
      
      Both of those paths _might_ be called without the FPU having
      been used.  That means that 'tsk->thread.fpu.state' might
      never be allocated.
      
      Also, fpu_save_init() is not preempt-safe.  It was a bug to
      call it without disabling preemption.  The new
      get_xsave_addr() calls unlazy_fpu() instead and properly
      disables preemption.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: bp@alien8.de
      Link: http://lkml.kernel.org/r/20150607183701.BC0D37CF@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a84eeaa9
    • Dave Hansen's avatar
      x86/fpu/xstate: Wrap get_xsave_addr() to make it safer · 04cd027b
      Dave Hansen authored
      The MPX code appears is calling a low-level FPU function
      (copy_fpregs_to_fpstate()).  This function is not able to
      be called in all contexts, although it is safe to call
      directly in some cases.
      
      Although probably correct, the current code is ugly and
      potentially error-prone.  So, add a wrapper that calls
      the (slightly) higher-level fpu__save() (which is preempt-
      safe) and also ensures that we even *have* an FPU context
      (in the case that this was called when in lazy FPU mode).
      
      Ingo had this to say about the details about when we need
      preemption disabled:
      
      > it's indeed generally unsafe to access/copy FPU registers with preemption enabled,
      > for two reasons:
      >
      >   - on older systems that use FSAVE the instruction destroys FPU register
      >     contents, which has to be handled carefully
      >
      >   - even on newer systems if we copy to FPU registers (which this code doesn't)
      >     then we don't want a context switch to occur in the middle of it, because a
      >     context switch will write to the fpstate, potentially overwriting our new data
      >     with old FPU state.
      >
      > But it's safe to access FPU registers with preemption enabled in a couple of
      > special cases:
      >
      >   - potentially destructively saving FPU registers: the signal handling code does
      >     this in copy_fpstate_to_sigframe(), because it can rely on the signal restore
      >     side to restore the original FPU state.
      >
      >   - reading FPU registers on modern systems: we don't do this anywhere at the
      >     moment, mostly to keep symmetry with older systems where FSAVE is
      >     destructive.
      >
      >   - initializing FPU registers on modern systems: fpu__clear() does this. Here
      >     it's safe because we don't copy from the fpstate.
      >
      >   - directly writing FPU registers from user-space memory (!). We do this in
      >     fpu__restore_sig(), and it's safe because neither context switches nor
      >     irq-handler FPU use can corrupt the source context of the copy (which is
      >     user-space memory).
      >
      > Note that the MPX code's current use of copy_fpregs_to_fpstate() was safe I think,
      > because:
      >
      >  - MPX is predicated on eagerfpu, so the destructive F[N]SAVE instruction won't be
      >    used.
      >
      >  - the code was only reading FPU registers, and was doing it only in places that
      >    guaranteed that an FPU state was already active (i.e. didn't do it in
      >    kthreads)
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: bp@alien8.de
      Link: http://lkml.kernel.org/r/20150607183700.AA881696@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      04cd027b
    • Dave Hansen's avatar
      x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions · 0c4109be
      Dave Hansen authored
      get_xsave_addr() assumes that if an xsave bit is present in the
      hardware (pcntxt_mask) that it is present in a given xsave
      buffer.  Due to an bug in the xsave code on all of the systems
      that have MPX (and thus all the users of this code), that has
      been a true assumption.
      
      But, the bug is getting fixed, so our assumption is not going
      to hold any more.
      
      It's quite possible (and normal) for an enabled state to be
      present on 'pcntxt_mask', but *not* in 'xstate_bv'.  We need
      to consult 'xstate_bv'.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183700.1E739B34@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0c4109be
  2. 27 May, 2015 12 commits
    • Ingo Molnar's avatar
      x86/fpu: Make WARN_ON_FPU() more robust in the !CONFIG_X86_DEBUG_FPU case · 83242c51
      Ingo Molnar authored
      Make sure the WARN_ON_FPU() macro consumes the macro argument,
      to avoid 'unused variable' build warnings if the only use of
      a variable is in debugging code.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      83242c51
    • Ingo Molnar's avatar
      x86/fpu: Simplify copy_kernel_to_xregs_booting() · d65fcd60
      Ingo Molnar authored
      copy_kernel_to_xregs_booting() has a second parameter that is the mask
      of xfeatures that should be copied - but this parameter is always -1.
      
      Simplify the call site of this function, this also makes it more
      similar to the function call signature of other copy_kernel_to*regs()
      functions.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d65fcd60
    • Ingo Molnar's avatar
      x86/fpu: Standardize the parameter type of copy_kernel_to_fpregs() · 003e2e8b
      Ingo Molnar authored
      Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
      in line with the parameter passing convention of other kernel-to-FPU-registers
      copying functions: pass around an in-memory FPU register state pointer,
      instead of struct fpu *.
      
      NOTE: This patch also changes the assembly constraint of the FXSAVE-leak
            workaround from 'fpu->fpregs_active' to 'fpstate' - but that is fine,
            as we only need a valid memory address there for the FILDL instruction.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      003e2e8b
    • Ingo Molnar's avatar
      x86/fpu: Remove error return values from copy_kernel_to_*regs() functions · 9ccc27a5
      Ingo Molnar authored
      None of the copy_kernel_to_*regs() FPU register copying functions are
      supposed to fail, and all of them have debugging checks that enforce
      this.
      
      Remove their return values and simplify their call sites, which have
      redundant error checks and error handling code paths.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9ccc27a5
    • Ingo Molnar's avatar
      x86/fpu: Rename copy_fpstate_to_fpregs() to copy_kernel_to_fpregs() · 3e1bf47e
      Ingo Molnar authored
      Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
      in line with the naming of other kernel-to-FPU-registers copying functions.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3e1bf47e
    • Ingo Molnar's avatar
      x86/fpu: Add debugging checks to all copy_kernel_to_*() functions · 43b287b3
      Ingo Molnar authored
      Copying from in-kernel FPU context buffers to FPU registers are
      never supposed to fault.
      
      Add debugging checks to copy_kernel_to_fxregs() and copy_kernel_to_fregs()
      to double check this assumption.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43b287b3
    • Ingo Molnar's avatar
      x86/fpu: Add debugging check to fpu__restore() · ce2a1e67
      Ingo Molnar authored
      The copy_fpstate_to_fpregs() function is never supposed to fail,
      so add a debugging check to its call site in fpu__restore().
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ce2a1e67
    • Ingo Molnar's avatar
      x86/fpu: Optimize fpu__activate_fpstate_write() · 343763c3
      Ingo Molnar authored
      fpu__activate_fpstate_write() is used before ptrace writes to the fpstate
      context. Because it expects the modified registers to be reloaded on the
      nexts context switch, it's only valid to call this function for stopped
      child tasks.
      
        - add a debugging check for this assumption
      
        - remove code that only runs if the current task's FPU state needs
          to be saved, which cannot occur here
      
        - update comments to match the implementation
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      343763c3
    • Ingo Molnar's avatar
      x86/fpu: Rename fpu__activate_fpstate() to fpu__activate_fpstate_write() · 6a81d7eb
      Ingo Molnar authored
      Remaining users of fpu__activate_fpstate() are all places that want to modify
      FPU registers, rename the function to fpu__activate_fpstate_write() according
      to this usage.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6a81d7eb
    • Ingo Molnar's avatar
      x86/fpu: Optimize fpu__activate_fpstate_read() · 9ba6b791
      Ingo Molnar authored
      fpu__activate_fpstate_read() is used before FPU registers are
      read from the fpstate by ptrace and core dumping.
      
      It's not necessary to unlazy non-current child tasks in this case,
      since the reading of registers is non-destructive.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9ba6b791
    • Ingo Molnar's avatar
      x86/fpu: Split out the fpu__activate_fpstate_read() method · 05602812
      Ingo Molnar authored
      Currently fpu__activate_fpstate() is used for two distinct purposes:
      
        - read access by ptrace and core dumping, where in the core dumping
          case the current task's FPU state may be examined as well.
      
        - write access by ptrace, which modifies FPU registers and expects
          the modified registers to be reloaded on the next context switch.
      
      Split out the reading side into fpu__activate_fpstate_read().
      
      ( Note that this is just a pure duplication of fpu__activate_fpstate()
        for the time being, we'll optimize the new function in the next patch. )
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      05602812
    • Ingo Molnar's avatar
      x86/fpu: Fix FPU register read access to the current task · 47f01e8c
      Ingo Molnar authored
      Bobby Powers reported the following FPU warning during ELF coredumping:
      
         WARNING: CPU: 0 PID: 27452 at arch/x86/kernel/fpu/core.c:324 fpu__activate_stopped+0x8a/0xa0()
      
      This warning unearthed an invalid assumption about fpu__activate_stopped()
      that I added in:
      
        67e97fc2 ("x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check")
      
      the old init_fpu() function had an (intentional but obscure) side effect:
      when FPU registers are accessed for the current task, for reading, then
      it synchronized live in-register FPU state with the fpstate by saving it.
      
      So fix this bug by saving the FPU if we are the current task. We'll
      still warn in fpu__save() if this is called for not yet stopped
      child tasks, so the debugging check is still preserved.
      
      Also rename the function to fpu__activate_fpstate(), because it's not
      exclusively used for stopped tasks, but for the current task as well.
      
      ( Note that this bug calls for a cleaner separation of access-for-read
        and access-for-modification FPU methods, but we'll do that in separate
        patches. )
      Reported-by: default avatarBobby Powers <bobbypowers@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      47f01e8c
  3. 25 May, 2015 9 commits
    • Ingo Molnar's avatar
      x86/fpu: Micro-optimize the copy_xregs_to_kernel*() and copy_kernel_to_xregs*() functions · 8c05f05e
      Ingo Molnar authored
      The copy_xregs_to_kernel*() and copy_kernel_to_xregs*() functions are used
      to copy FPU registers to kernel memory and vice versa.
      
      They are never expected to fail, yet they have a return code, mostly because
      that way they can share the assembly macros with the copy*user*() functions.
      
      This error code is then silently ignored by the context switching
      and other code - which made the bug in:
      
        b8c1b8ea ("x86/fpu: Fix FPU state save area alignment bug")
      
      harder to fix than necessary.
      
      So remove the return values and check for no faults when FPU debugging
      is enabled in the .config.
      
      This improves the eagerfpu context switching fast path by a couple of
      instructions, when FPU debugging is disabled:
      
         ffffffff810407fa:      89 c2                   mov    %eax,%edx
         ffffffff810407fc:      48 0f ae 2f             xrstor64 (%rdi)
         ffffffff81040800:      31 c0                   xor    %eax,%eax
        -ffffffff81040802:      eb 0a                   jmp    ffffffff8104080e <__switch_to+0x321>
        +ffffffff81040802:      eb 16                   jmp    ffffffff8104081a <__switch_to+0x32d>
         ffffffff81040804:      31 c0                   xor    %eax,%eax
         ffffffff81040806:      48 0f ae 8b c0 05 00    fxrstor64 0x5c0(%rbx)
         ffffffff8104080d:      00
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8c05f05e
    • Ingo Molnar's avatar
      x86/fpu: Improve the initialization logic of 'err' around xstate_fault() constraints · 685c9616
      Ingo Molnar authored
      There's a confusing aspect of how xstate_fault() constraints are
      handled by the FPU register/memory copying functions in
      fpu/internal.h: they use "0" (0) to signal that the asm code
      will not always set 'err' to a valid value.
      
      But 'err' is already initialized to 0 in C code, which is duplicated
      by the asm() constraint. Should the initialization value ever be
      changed, it might become subtly inconsistent with the not too clear
      asm() constraint.
      
      Use 'err' as the value of the input variable instead, to clarify
      this all.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      685c9616
    • Ingo Molnar's avatar
      x86/fpu: Improve xstate_fault() handling · 87b6559d
      Ingo Molnar authored
      There are two problems with xstate_fault handling:
      
       - The xstate_fault() macro takes an argument, but that's
         propagated into the assembly named label as well. This
         is technically correct currently but might result in
         failures if anytime a more complex argument is used.
         So use a separate '_err' name instead for the label.
      
       - All the xstate_fault() using functions have an error
         variable named 'err', which is an output variable to
         the asm() they are using. The problem is, it's not always
         set by the asm(), in which case the compiler might
         optimize out its initialization, so that the C variable
         'err' might become corrupted after the asm() - confusing
         anyone who tries to take advantage of this variable
         after the asm(). Mark it an input variable as well.
      
         This is a latent bug currently, but an upcoming debug
         patch will make use of 'err'.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      87b6559d
    • Ingo Molnar's avatar
      x86/fpu: Rename xstate related 'fx' references to 'xstate' · 87dafd41
      Ingo Molnar authored
      So the xstate code was probably first copied from the fxregs code,
      hence it carried over the 'fx' naming for the state pointer variable.
      
      But this is slightly confusing, as we usually on call the (legacy)
      MMX/SSE state 'fx', both in data structures and in the functions
      build around FXSAVE/FXRSTOR.
      
      So rename it to 'xstate' to make it more apparent what it is related to.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      87dafd41
    • Ingo Molnar's avatar
      x86/fpu: Fix fpu__init_system_xstate() comments · 6e553594
      Ingo Molnar authored
      Remove obsolete comment about __init limitations: in the new code there aren't any.
      
      Also standardize the comment style in the function while at it.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6e553594
    • Ingo Molnar's avatar
      x86/fpu: Move the xstate copying functions into fpu/internal.h · fd169b05
      Ingo Molnar authored
      All the other register<-> memory copying functions are defined
      in fpu/internal.h, so move the xstate variants there too.
      
      Beyond being more consistent, this also allows FPU debugging
      checks to be added to them. (Because they can now use the
      macros defined in fpu/internal.h.)
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fd169b05
    • Ingo Molnar's avatar
      Merge branch 'linus' into x86/fpu · 3152657f
      Ingo Molnar authored
      Resolve semantic conflict in arch/x86/kvm/cpuid.c with:
      
        c447e76b ("kvm/fpu: Enable eager restore kvm FPU for MPX")
      
      By removing the FPU internal include files.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3152657f
    • Ingo Molnar's avatar
      x86/fpu: Fix FPU state save area alignment bug · b8c1b8ea
      Ingo Molnar authored
      On most configs task-struct is cache line aligned, which makes
      the XSAVE area's 64-byte required alignment work out fine.
      
      But on some .config's task_struct is aligned only to 16 bytes
      (enforced by ARCH_MIN_TASKALIGN), which makes things like
      fpu__copy() (that XSAVEOPT uses) not work so well.
      
      I broke this in:
      
        7366ed77 ("x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)")
      
      which embedded the fpstate in the task_struct.
      
      The alignment requirements of the FPU code were originally present
      in ARCH_MIN_TASKALIGN, which still has a value of 16, which was the
      alignment requirement of the FPU state area prior XSAVE. But this
      link was not documented (and not required) and the link got lost
      when the FPU state area was made dynamic years ago.
      
      With XSAVEOPT the minimum alignment requirment went up to 64 bytes,
      and the embedding of the FPU state area in task_struct exposed it
      again - and '16' was not increased to '64'.
      
      So fix this bug, but also try to address the underlying lost link
      of information that made it easier to happen:
      
        - document ARCH_MIN_TASKALIGN a bit better
      
        - use alignof() to recover the current alignment requirements.
          This would work in the future as well, should the alignment
          requirements go up to 128 bytes with things like AVX512.
      
      ( We should probably also use the vSMP alignment rules for all
        of x86, but that's for another patch. )
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b8c1b8ea
    • Linus Torvalds's avatar
      Linux 4.1-rc5 · ba155e2d
      Linus Torvalds authored
      ba155e2d