1. 17 Nov, 2014 11 commits
    • Dave Hansen's avatar
      x86, mpx: On-demand kernel allocation of bounds tables · fe3d197f
      Dave Hansen authored
      This is really the meat of the MPX patch set.  If there is one patch to
      review in the entire series, this is the one.  There is a new ABI here
      and this kernel code also interacts with userspace memory in a
      relatively unusual manner.  (small FAQ below).
      
      Long Description:
      
      This patch adds two prctl() commands to provide enable or disable the
      management of bounds tables in kernel, including on-demand kernel
      allocation (See the patch "on-demand kernel allocation of bounds tables")
      and cleanup (See the patch "cleanup unused bound tables"). Applications
      do not strictly need the kernel to manage bounds tables and we expect
      some applications to use MPX without taking advantage of this kernel
      support. This means the kernel can not simply infer whether an application
      needs bounds table management from the MPX registers.  The prctl() is an
      explicit signal from userspace.
      
      PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
      require kernel's help in managing bounds tables.
      
      PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
      want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
      won't allocate and free bounds tables even if the CPU supports MPX.
      
      PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
      directory out of a userspace register (bndcfgu) and then cache it into
      a new field (->bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
      will set "bd_addr" to an invalid address.  Using this scheme, we can
      use "bd_addr" to determine whether the management of bounds tables in
      kernel is enabled.
      
      Also, the only way to access that bndcfgu register is via an xsaves,
      which can be expensive.  Caching "bd_addr" like this also helps reduce
      the cost of those xsaves when doing table cleanup at munmap() time.
      Unfortunately, we can not apply this optimization to #BR fault time
      because we need an xsave to get the value of BNDSTATUS.
      
      ==== Why does the hardware even have these Bounds Tables? ====
      
      MPX only has 4 hardware registers for storing bounds information.
      If MPX-enabled code needs more than these 4 registers, it needs to
      spill them somewhere. It has two special instructions for this
      which allow the bounds to be moved between the bounds registers
      and some new "bounds tables".
      
      They are similar conceptually to a page fault and will be raised by
      the MPX hardware during both bounds violations or when the tables
      are not present. This patch handles those #BR exceptions for
      not-present tables by carving the space out of the normal processes
      address space (essentially calling the new mmap() interface indroduced
      earlier in this patch set.) and then pointing the bounds-directory
      over to it.
      
      The tables *need* to be accessed and controlled by userspace because
      the instructions for moving bounds in and out of them are extremely
      frequent. They potentially happen every time a register pointing to
      memory is dereferenced. Any direct kernel involvement (like a syscall)
      to access the tables would obviously destroy performance.
      
      ==== Why not do this in userspace? ====
      
      This patch is obviously doing this allocation in the kernel.
      However, MPX does not strictly *require* anything in the kernel.
      It can theoretically be done completely from userspace. Here are
      a few ways this *could* be done. I don't think any of them are
      practical in the real-world, but here they are.
      
      Q: Can virtual space simply be reserved for the bounds tables so
         that we never have to allocate them?
      A: As noted earlier, these tables are *HUGE*. An X-GB virtual
         area needs 4*X GB of virtual space, plus 2GB for the bounds
         directory. If we were to preallocate them for the 128TB of
         user virtual address space, we would need to reserve 512TB+2GB,
         which is larger than the entire virtual address space today.
         This means they can not be reserved ahead of time. Also, a
         single process's pre-popualated bounds directory consumes 2GB
         of virtual *AND* physical memory. IOW, it's completely
         infeasible to prepopulate bounds directories.
      
      Q: Can we preallocate bounds table space at the same time memory
         is allocated which might contain pointers that might eventually
         need bounds tables?
      A: This would work if we could hook the site of each and every
         memory allocation syscall. This can be done for small,
         constrained applications. But, it isn't practical at a larger
         scale since a given app has no way of controlling how all the
         parts of the app might allocate memory (think libraries). The
         kernel is really the only place to intercept these calls.
      
      Q: Could a bounds fault be handed to userspace and the tables
         allocated there in a signal handler instead of in the kernel?
      A: (thanks to tglx) mmap() is not on the list of safe async
         handler functions and even if mmap() would work it still
         requires locking or nasty tricks to keep track of the
         allocation state there.
      
      Having ruled out all of the userspace-only approaches for managing
      bounds tables that we could think of, we create them on demand in
      the kernel.
      Based-on-patch-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      fe3d197f
    • Dave Hansen's avatar
      x86, mpx: Decode MPX instruction to get bound violation information · fcc7ffd6
      Dave Hansen authored
      This patch sets bound violation fields of siginfo struct in #BR
      exception handler by decoding the user instruction and constructing
      the faulting pointer.
      
      We have to be very careful when decoding these instructions.  They
      are completely controlled by userspace and may be changed at any
      time up to and including the point where we try to copy them in to
      the kernel.  They may or may not be MPX instructions and could be
      completely invalid for all we know.
      
      Note: This code is based on Qiaowei Ren's specialized MPX
      decoder, but uses the generic decoder whenever possible.  It was
      tested for robustness by generating a completely random data
      stream and trying to decode that stream.  I also unmapped random
      pages inside the stream to test the "partial instruction" short
      read code.
      
      We kzalloc() the siginfo instead of stack allocating it because
      we need to memset() it anyway, and doing this makes it much more
      clear when it got initialized by the MPX instruction decoder.
      
      Changes from the old decoder:
       * Use the generic decoder instead of custom functions.  Saved
         ~70 lines of code overall.
       * Remove insn->addr_bytes code (never used??)
       * Make sure never to possibly overflow the regoff[] array, plus
         check the register range correctly in 32 and 64-bit modes.
       * Allow get_reg() to return an error and have mpx_get_addr_ref()
         handle when it sees errors.
       * Only call insn_get_*() near where we actually use the values
         instead if trying to call them all at once.
       * Handle short reads from copy_from_user() and check the actual
         number of read bytes against what we expect from
         insn_get_length().  If a read stops in the middle of an
         instruction, we error out.
       * Actually check the opcodes intead of ignoring them.
       * Dynamically kzalloc() siginfo_t so we don't leak any stack
         data.
       * Detect and handle decoder failures instead of ignoring them.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Based-on-patch-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151828.5BDD0915@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      fcc7ffd6
    • Qiaowei Ren's avatar
      x86, mpx: Add MPX-specific mmap interface · 57319d80
      Qiaowei Ren authored
      We have chosen to perform the allocation of bounds tables in
      kernel (See the patch "on-demand kernel allocation of bounds
      tables") and to mark these VMAs with VM_MPX.
      
      However, there is currently no suitable interface to actually do
      this.  Existing interfaces, like do_mmap_pgoff(), have no way to
      set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem
      long enough to let a caller do it.
      
      This patch wraps mmap_region() and hold mmap_sem long enough to
      make the modifications to the VMA which we need.
      
      Also note the 32/64-bit #ifdef in the header.  We actually need
      to do this at runtime eventually.  But, for now, we don't support
      running 32-bit binaries on 64-bit kernels.  Support for this will
      come in later patches.
      Signed-off-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      57319d80
    • Qiaowei Ren's avatar
      x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific · 4aae7e43
      Qiaowei Ren authored
      MPX-enabled applications using large swaths of memory can
      potentially have large numbers of bounds tables in process
      address space to save bounds information. These tables can take
      up huge swaths of memory (as much as 80% of the memory on the
      system) even if we clean them up aggressively. In the worst-case
      scenario, the tables can be 4x the size of the data structure
      being tracked. IOW, a 1-page structure can require 4 bounds-table
      pages.
      
      Being this huge, our expectation is that folks using MPX are
      going to be keen on figuring out how much memory is being
      dedicated to it. So we need a way to track memory use for MPX.
      
      If we want to specifically track MPX VMAs we need to be able to
      distinguish them from normal VMAs, and keep them from getting
      merged with normal VMAs. A new VM_ flag set only on MPX VMAs does
      both of those things. With this flag, MPX bounds-table VMAs can
      be distinguished from other VMAs, and userspace can also walk
      /proc/$pid/smaps to get memory usage for MPX.
      
      In addition to this flag, we also introduce a special ->vm_ops
      specific to MPX VMAs (see the patch "add MPX specific mmap
      interface"), but currently different ->vm_ops do not by
      themselves prevent VMA merging, so we still need this flag.
      
      We understand that VM_ flags are scarce and are open to other
      options.
      Signed-off-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151825.565625B3@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      4aae7e43
    • Dave Hansen's avatar
      x86, mpx: Add MPX to disabled features · 95290cf1
      Dave Hansen authored
      This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as
      both a runtime and compile-time check.
      
      When CONFIG_X86_INTEL_MPX is disabled,
      cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at
      compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid
      flag will be checked at runtime.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151823.B358EAD2@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      95290cf1
    • Qiaowei Ren's avatar
      ia64: Sync struct siginfo with general version · 53f037b0
      Qiaowei Ren authored
      New fields about bound violation are added into general struct
      siginfo. This will impact MIPS and IA64, which extend general
      struct siginfo. This patch syncs this struct for IA64 with
      general version.
      Signed-off-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151822.82B3B486@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      53f037b0
    • Qiaowei Ren's avatar
      mips: Sync struct siginfo with general version · 232b5fff
      Qiaowei Ren authored
      New fields about bound violation are added into general struct
      siginfo. This will impact MIPS and IA64, which extend general
      struct siginfo. This patch syncs this struct for MIPS with
      general version.
      Signed-off-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151820.F7EDC3CC@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      232b5fff
    • Qiaowei Ren's avatar
      mpx: Extend siginfo structure to include bound violation information · ee1b58d3
      Qiaowei Ren authored
      This patch adds new fields about bound violation into siginfo
      structure. si_lower and si_upper are respectively lower bound
      and upper bound when bound violation is caused.
      Signed-off-by: default avatarQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151819.1908C900@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      ee1b58d3
    • Dave Hansen's avatar
      x86, mpx: Rename cfg_reg_u and status_reg · 62e7759b
      Dave Hansen authored
      According to Intel SDM extension, MPX configuration and status registers
      should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and
      status_reg to bndcfgu and bndstatus.
      
      [ tglx: Renamed 'struct bndscr_struct' to 'struct bndscr' ]
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Link: http://lkml.kernel.org/r/20141114151817.031762AC@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      62e7759b
    • Dave Hansen's avatar
      x86: mpx: Give bndX registers actual names · c04e051c
      Dave Hansen authored
      Consider the bndX MPX registers.  There 4 registers each
      containing a 64-bit lower and a 64-bit upper bound.  That's 8*64
      bits and we declare it thusly:
      
      	struct bndregs_struct {
      		u64 bndregs[8];
      	}
          
      Let's say you want to read the upper bound from the MPX register
      bnd2 out of the xsave buf.  You do:
      
      	bndregno = 2;
      	upper_bound = xsave_buf->bndregs.bndregs[2*bndregno+1];
      
      That kinda sucks.  Every time you access it, you need to know:
      1. Each bndX register is two entries wide in "bndregs"
      2. The lower comes first followed by upper.  We do the +1 to get
         upper vs. lower.
      
      This replaces the old definition.  You can now access them
      indexed by the register number directly, and with a meaningful
      name for the lower and upper bound:
      
      	bndregno = 2;
      	xsave_buf->bndreg[bndregno].upper_bound;
      
      It's now *VERY* clear that there are 4 registers.  The programmer
      now doesn't have to care what order the lower and upper bounds
      are in, and it's harder to get it wrong.
      
      [ tglx: Changed ub/lb to upper_bound/lower_bound and renamed struct
      bndreg_struct to struct bndreg ]
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: x86@kernel.org
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: "Yu, Fenghua" <fenghua.yu@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141031215820.5EA5E0EC@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c04e051c
    • Dave Hansen's avatar
      x86: Remove arbitrary instruction size limit in instruction decoder · 6ba48ff4
      Dave Hansen authored
      The current x86 instruction decoder steps along through the
      instruction stream but always ensures that it never steps farther
      than the largest possible instruction size (MAX_INSN_SIZE).
      
      The MPX code is now going to be doing some decoding of userspace
      instructions.  We copy those from userspace in to the kernel and
      they're obviously completely untrusted coming from userspace.  In
      addition to the constraint that instructions can only be so long,
      we also have to be aware of how long the buffer is that came in
      from userspace.  This _looks_ to be similar to what the perf and
      kprobes is doing, but it's unclear to me whether they are
      affected.
      
      The whole reason we need this is that it is perfectly valid to be
      executing an instruction within MAX_INSN_SIZE bytes of an
      unreadable page. We should be able to gracefully handle short
      reads in those cases.
      
      This adds support to the decoder to record how long the buffer
      being decoded is and to refuse to "validate" the instruction if
      we would have gone over the end of the buffer to decode it.
      
      The kprobes code probably needs to be looked at here a bit more
      carefully.  This patch still respects the MAX_INSN_SIZE limit
      there but the kprobes code does look like it might be able to
      be a bit more strict than it currently is.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Acked-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      6ba48ff4
  2. 09 Nov, 2014 10 commits
  3. 08 Nov, 2014 4 commits
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · fe606dff
      Linus Torvalds authored
      Pull i2c bugfixes from Wolfram Sang:
       "One bigger cleanup (FSF address removal) and two bugfixes for I2C"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: core: Dispose OF IRQ mapping at client removal time
        i2c: at91: don't account as iowait
        i2c: remove FSF address
      fe606dff
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a50d7156
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Two fixlets for the armada SoC interrupt controller"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip: armada-370-xp: Fix MPIC interrupt handling
        irqchip: armada-370-xp: Fix MSI interrupt handling
      a50d7156
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · ae04e1ca
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "For:
         - some regression fixes at the Remote Controller core and imon driver
         - a build fix for certain randconfigs with ir-hix5hd2
         - don't feed power to satellite system at ds3000 driver init
      
        It also contains some fixes for drivers added for Kernel 3.18:
         - some fixes at the new ISDB-S driver, and the corresponding bits to
           fix some descriptors for this Japanese TV standard at the DVB core
         - two warning cleanups for sp2 driver if PM is disabled
         - change the default mode for the new vivid driver"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        [media] sp2: sp2_init() can be static
        [media] dvb:tc90522: fix always-false expression
        [media] dvb-core: set default properties of ISDB-S
        [media] dvb:tc90522: fix stats report
        [media] vivid: default to single planar device instances
        [media] imon: fix other RC type protocol support
        [media] ir-hix5hd2 fix build warning
        [media] ds3000: fix LNB supply voltage on Tevii S480 on initialization
        [media] rc5-decoder: BZ#85721: Fix RC5-SZ decoding
        [media] rc-core: fix protocol_change regression in ir_raw_event_register
      ae04e1ca
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 6ac94d3a
      Linus Torvalds authored
      Pull MIPS updates from Ralf Baechle:
       "This weeks' round of MIPS bug fixes for 3.18:
      
         - wire up the bpf syscall
         - fix TLB dump output for R3000 class TLBs
         - fix strnlen_user return value if no NUL character was found.
         - fix build with binutils 2.24.51+.  While there is no binutils 2.25
           release yet, toolchains derived from binutils 2.24.51+ are already
           in common use.
         - the Octeon GPIO code forgot to offline GPIO IRQs.
         - fix build error for XLP.
         - fix possible BUG assertion with EVA for CMA"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: Fix build with binutils 2.24.51+
        MIPS: R3000: Fix debug output for Virtual page number
        MIPS: Fix strnlen_user() return value in case of overlong strings.
        MIPS: CMA: Do not reserve memory if not required
        MIPS: Wire up bpf syscall.
        MIPS/Xlp: Remove the dead function destroy_irq() to fix build error
        MIPS: Octeon: Make Octeon GPIO IRQ chip CPU hotplug-aware
      6ac94d3a
  4. 07 Nov, 2014 11 commits
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs · 661b99e9
      Linus Torvalds authored
      Pull xfs fixes from Dave Chinner:
       "This update fixes a warning in the new pagecache_isize_extended() and
        updates some related comments, another fix for zero-range
        misbehaviour, and an unforntuately large set of fixes for regressions
        in the bulkstat code.
      
        The bulkstat fixes are large but necessary.  I wouldn't normally push
        such a rework for a -rcX update, but right now xfsdump can silently
        create incomplete dumps on 3.17 and it's possible that even xfsrestore
        won't notice that the dumps were incomplete.  Hence we need to get
        this update into 3.17-stable kernels ASAP.
      
        In more detail, the refactoring work I committed in 3.17 has exposed a
        major hole in our QA coverage.  With both xfsdump (the major user of
        bulkstat) and xfsrestore silently ignoring missing files in the
        dump/restore process, incomplete dumps were going unnoticed if they
        were being triggered.  Many of the dump/restore filesets were so small
        that they didn't evenhave a chance of triggering the loop iteration
        bugs we introduced in 3.17, so we didn't exercise the code
        sufficiently, either.
      
        We have already taken steps to improve QA coverage in xfstests to
        avoid this happening again, and I've done a lot of manual verification
        of dump/restore on very large data sets (tens of millions of inodes)
        of the past week to verify this patch set results in bulkstat behaving
        the same way as it does on 3.16.
      
        Unfortunately, the fixes are not exactly simple - in tracking down the
        problem historic API warts were discovered (e.g xfsdump has been
        working around a 20 year old bug in the bulkstat API for the past 10
        years) and so that complicated the process of diagnosing and fixing
        the problems.  i.e. we had to fix bugs in the code as well as
        discover and re-introduce the userspace visible API bugs that we
        unwittingly "fixed" in 3.17 that xfsdump relied on to work correctly.
      
        Summary:
      
         - incorrect warnings about i_mutex locking in pagecache_isize_extended()
           and updates comments to match expected locking
         - another zero-range bug fix for stray file size updates
         - a bunch of fixes for regression in the bulkstat code introduced in
           3.17"
      
      * tag 'xfs-for-linus-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
        xfs: track bulkstat progress by agino
        xfs: bulkstat error handling is broken
        xfs: bulkstat main loop logic is a mess
        xfs: bulkstat chunk-formatter has issues
        xfs: bulkstat chunk formatting cursor is broken
        xfs: bulkstat btree walk doesn't terminate
        mm: Fix comment before truncate_setsize()
        xfs: rework zero range to prevent invalid i_size updates
        mm: Remove false WARN_ON from pagecache_isize_extended()
        xfs: Check error during inode btree iteration in xfs_bulkstat()
        xfs: bulkstat doesn't release AGI buffer on error
      661b99e9
    • Linus Torvalds's avatar
      Merge tag 'regulator-v3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 51f83ef0
      Linus Torvalds authored
      Pull regulator fixes from Mark Brown:
       "More changes than I'd like here, most of them for a single bug
        repeated in a bunch of drivers with data not being initialized
        correctly, plus a fix to lower the severity of a warning introduced in
        the last merge window which can legitimately go off so we don't want
        to alarm users excessively"
      
      * tag 'regulator-v3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: s2mpa01: zero-initialize regulator match table array
        regulator: max8660: zero-initialize regulator match table array
        regulator: max77802: zero-initialize regulator match table
        regulator: max77686: zero-initialize regulator match table
        regulator: max1586: zero-initialize regulator match table array
        regulator: max77693: Fix use of uninitialized regulator config
        regulator: of: Lower the severity of the error with no container
      51f83ef0
    • Linus Torvalds's avatar
      Merge tag 'spi-v3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 1395b9cf
      Linus Torvalds authored
      Pull spi bugfixes from Mark Brown:
       "A couple of small driver fixes for v3.18, both quite problematic if
        you hit a use case that's affected"
      
      * tag 'spi-v3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: pxa2xx: toggle clocks on suspend if not disabled by runtime PM
        spi: fsl-dspi: Fix CTAR selection
      1395b9cf
    • Johannes Berg's avatar
      tiny: rename ENABLE_DEV_COREDUMP to ALLOW_DEV_COREDUMP · cd3d9ea1
      Johannes Berg authored
      The ENABLE_DEV_COREDUMP option is misleading as it implies that
      it gets the framework enabled, this isn't true it just allows it
      to get enabled if a driver needs it.
      
      Rename it to ALLOW_DEV_COREDUMP to better capture its semantics.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd3d9ea1
    • Aristeu Rozanski's avatar
      tiny: reverse logic for DISABLE_DEV_COREDUMP · 9c602699
      Aristeu Rozanski authored
      It's desirable for allnconfig and tinyconfig targets to result in the
      least amount of code possible. DISABLE_DEV_COREDUMP exists as a way to
      switch off DEV_COREDUMP regardless if any drivers select
      WANT_DEV_COREDUMP.
      
      This patch renames the option to ENABLE_DEV_COREDUMP and setting it to
      'n' (as in allnconfig or tinyconfig) will effectively disable device
      coredump.
      
      Cc: Josh Triplett <josh@joshtriplett.org>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarAristeu Rozanski <arozansk@redhat.com>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c602699
    • Laurent Pinchart's avatar
      i2c: core: Dispose OF IRQ mapping at client removal time · e4df3a0b
      Laurent Pinchart authored
      Clients instantiated from OF get an IRQ mapping created at device
      registration time. Dispose the mapping when the client is removed.
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org
      e4df3a0b
    • Wolfram Sang's avatar
      i2c: at91: don't account as iowait · 11cfbfb0
      Wolfram Sang authored
      iowait is for blkio [1]. I2C shouldn't use it.
      
      [1] https://lkml.org/lkml/2014/11/3/317Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Cc: stable@kernel.org
      11cfbfb0
    • Wolfram Sang's avatar
      i2c: remove FSF address · ca1f8da9
      Wolfram Sang authored
      We have a central copy of the GPL for that. Some addresses were already
      outdated.
      Signed-off-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      ca1f8da9
    • Mark Knibbs's avatar
      USB: Update default usb-storage delay_use value in kernel-parameters.txt · 19101954
      Mark Knibbs authored
      Back in 2010 the default usb-storage delay_use time was reduced from 5 to 1
      second (commit a4a47bc0), but
      kernel-parameters.txt wasn't updated to reflect that.
      Signed-off-by: default avatarMark Knibbs <markk@clara.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19101954
    • Yijing Wang's avatar
      sysfs: driver core: Fix glue dir race condition by gdp_mutex · e4a60d13
      Yijing Wang authored
      There is a race condition when removing glue directory.
      It can be reproduced in following test:
      
      path 1: Add first child device
      device_add()
          get_device_parent()
                  /*find parent from glue_dirs.list*/
                  list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
                          if (k->parent == parent_kobj) {
                                  kobj = kobject_get(k);
                                  break;
                          }
                  ....
                  class_dir_create_and_add()
      
      path2: Remove last child device under glue dir
      device_del()
          cleanup_device_parent()
                  cleanup_glue_dir()
                          kobject_put(glue_dir);
      
      If path2 has been called cleanup_glue_dir(), but not
      call kobject_put(glue_dir), the glue dir is still
      in parent's kset list. Meanwhile, path1 find the glue
      dir from the glue_dirs.list. Path2 may release glue dir
      before path1 call kobject_get(). So kernel will report
      the warning and bug_on.
      
      This is a "classic" problem we have of a kref in a list
      that can be found while the last instance could be removed
      at the same time.
      
      This patch reuse gdp_mutex to fix this race condition.
      
      The following calltrace is captured in kernel 3.4, but
      the latest kernel still has this bug.
      
      -----------------------------------------------------
      <4>[ 3965.441471] WARNING: at ...include/linux/kref.h:41 kobject_get+0x33/0x40()
      <4>[ 3965.441474] Hardware name: Romley
      <4>[ 3965.441475] Modules linked in: isd_iop(O) isd_xda(O)...
      ...
      <4>[ 3965.441605] Call Trace:
      <4>[ 3965.441611]  [<ffffffff8103717a>] warn_slowpath_common+0x7a/0xb0
      <4>[ 3965.441615]  [<ffffffff810371c5>] warn_slowpath_null+0x15/0x20
      <4>[ 3965.441618]  [<ffffffff81215963>] kobject_get+0x33/0x40
      <4>[ 3965.441624]  [<ffffffff812d1e45>] get_device_parent.isra.11+0x135/0x1f0
      <4>[ 3965.441627]  [<ffffffff812d22d4>] device_add+0xd4/0x6d0
      <4>[ 3965.441631]  [<ffffffff812d0dbc>] ? dev_set_name+0x3c/0x40
      ....
      <2>[ 3965.441912] kernel BUG at ..../fs/sysfs/group.c:65!
      <4>[ 3965.441915] invalid opcode: 0000 [#1] SMP
      ...
      <4>[ 3965.686743]  [<ffffffff811a677e>] sysfs_create_group+0xe/0x10
      <4>[ 3965.686748]  [<ffffffff810cfb04>] blk_trace_init_sysfs+0x14/0x20
      <4>[ 3965.686753]  [<ffffffff811fcabb>] blk_register_queue+0x3b/0x120
      <4>[ 3965.686756]  [<ffffffff812030bc>] add_disk+0x1cc/0x490
      ....
      -------------------------------------------------------
      Signed-off-by: default avatarYijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarWeng Meiling <wengmeiling.weng@huawei.com>
      Cc: <stable@vger.kernel.org> #3.4+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4a60d13
    • Manuel Lauss's avatar
      MIPS: Fix build with binutils 2.24.51+ · 842dfc11
      Manuel Lauss authored
      Starting with version 2.24.51.20140728 MIPS binutils complain loudly
      about mixing soft-float and hard-float object files, leading to this
      build failure since GCC is invoked with "-msoft-float" on MIPS:
      
      {standard input}: Warning: .gnu_attribute 4,3 requires `softfloat'
        LD      arch/mips/alchemy/common/built-in.o
      mipsel-softfloat-linux-gnu-ld: Warning: arch/mips/alchemy/common/built-in.o
       uses -msoft-float (set by arch/mips/alchemy/common/prom.o),
       arch/mips/alchemy/common/sleeper.o uses -mhard-float
      
      To fix this, we detect if GAS is new enough to support "-msoft-float" command
      option, and if it does, we can let GCC pass it to GAS;  but then we also need
      to sprinkle the files which make use of floating point registers with the
      necessary ".set hardfloat" directives.
      Signed-off-by: default avatarManuel Lauss <manuel.lauss@gmail.com>
      Cc: Linux-MIPS <linux-mips@linux-mips.org>
      Cc: Matthew Fortune <Matthew.Fortune@imgtec.com>
      Cc: Markos Chandras <Markos.Chandras@imgtec.com>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Patchwork: https://patchwork.linux-mips.org/patch/8355/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      842dfc11
  5. 06 Nov, 2014 4 commits
    • Dave Chinner's avatar
      xfs: track bulkstat progress by agino · 00275899
      Dave Chinner authored
      The bulkstat main loop progress is tracked by the "lastino"
      variable, which is a full 64 bit inode. However, the loop actually
      works on agno/agino pairs, and so there's a significant disconnect
      between the rest of the loop and the main cursor. Convert this to
      use the agino, and pass the agino into the chunk formatting function
      and convert it too.
      
      This gets rid of the inconsistency in the loop processing, and
      finally makes it simple for us to skip inodes at any point in the
      loop simply by incrementing the agino cursor.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      00275899
    • Dave Chinner's avatar
      xfs: bulkstat error handling is broken · febe3cbe
      Dave Chinner authored
      The error propagation is a horror - xfs_bulkstat() returns
      a rval variable which is only set if there are formatter errors. Any
      sort of btree walk error or corruption will cause the bulkstat walk
      to terminate but will not pass an error back to userspace. Worse
      is the fact that formatter errors will also be ignored if any inodes
      were correctly formatted into the user buffer.
      
      Hence bulkstat can fail badly yet still report success to userspace.
      This causes significant issues with xfsdump not dumping everything
      in the filesystem yet reporting success. It's not until a restore
      fails that there is any indication that the dump was bad and tha
      bulkstat failed. This patch now triggers xfsdump to fail with
      bulkstat errors rather than silently missing files in the dump.
      
      This now causes bulkstat to fail when the lastino cookie does not
      fall inside an existing inode chunk. The pre-3.17 code tolerated
      that error by allowing the code to move to the next inode chunk
      as the agino target is guaranteed to fall into the next btree
      record.
      
      With the fixes up to this point in the series, xfsdump now passes on
      the troublesome filesystem image that exposes all these bugs.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      febe3cbe
    • Dave Chinner's avatar
      xfs: bulkstat main loop logic is a mess · 6e57c542
      Dave Chinner authored
      There are a bunch of variables tha tare more wildy scoped than they
      need to be, obfuscated user buffer checks and tortured "next inode"
      tracking. This all needs cleaning up to expose the real issues that
      need fixing.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      6e57c542
    • Dave Chinner's avatar
      xfs: bulkstat chunk-formatter has issues · 2b831ac6
      Dave Chinner authored
      The loop construct has issues:
      	- clustidx is completely unused, so remove it.
      	- the loop tries to be smart by terminating when the
      	  "freecount" tells it that all inodes are free. Just drop
      	  it as in most cases we have to scan all inodes in the
      	  chunk anyway.
      	- move the "user buffer left" condition check to the only
      	  point where we consume space int eh user buffer.
      	- move the initialisation of agino out of the loop, leaving
      	  just a simple loop control logic using the clusteridx.
      
      Also, double handling of the user buffer variables leads to problems
      tracking the current state - use the cursor variables directly
      rather than keeping local copies and then having to update the
      cursor before returning.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      2b831ac6