1. 31 Jan, 2017 11 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9 · ef8c640c
      Paul Mackerras authored
      POWER9 adds a register called ASDR (Access Segment Descriptor
      Register), which is set by hypervisor data/instruction storage
      interrupts to contain the segment descriptor for the address
      being accessed, assuming the guest is using HPT translation.
      (For radix guests, it contains the guest real address of the
      access.)
      
      Thus, for HPT guests on POWER9, we can use this register rather
      than looking up the SLB with the slbfee. instruction.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ef8c640c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9 · 468808bd
      Paul Mackerras authored
      This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
      for HPT guests on POWER9.  With this, we can return 1 for the
      KVM_CAP_PPC_MMU_HASH_V3 capability.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      468808bd
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU · c9270132
      Paul Mackerras authored
      This adds two capabilities and two ioctls to allow userspace to
      find out about and configure the POWER9 MMU in a guest.  The two
      capabilities tell userspace whether KVM can support a guest using
      the radix MMU, or using the hashed page table (HPT) MMU with a
      process table and segment tables.  (Note that the MMUs in the
      POWER9 processor cores do not use the process and segment tables
      when in HPT mode, but the nest MMU does).
      
      The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
      whether a guest will use the radix MMU or the HPT MMU, and to
      specify the size and location (in guest space) of the process
      table.
      
      The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
      the radix MMU.  It returns a list of supported radix tree geometries
      (base page size and number of bits indexed at each level of the
      radix tree) and the encoding used to specify the various page
      sizes for the TLB invalidate entry instruction.
      
      Initially, both capabilities return 0 and the ioctls return -EINVAL,
      until the necessary infrastructure for them to operate correctly
      is added.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c9270132
    • Paul Mackerras's avatar
      powerpc/64: Allow for relocation-on interrupts from guest to host · bc355125
      Paul Mackerras authored
      With host and guest both using radix translation, it is feasible
      for the host to take interrupts that come from the guest with
      relocation on, and that is in fact what the POWER9 hardware will
      do when LPCR[AIL] = 3.  All such interrupts use HSRR0/1 not SRR0/1
      except for system call with LEV=1 (hcall).
      
      Therefore this adds the KVM tests to the _HV variants of the
      relocation-on interrupt handlers, and adds the KVM test to the
      relocation-on system call entry point.
      
      We also instantiate the relocation-on versions of the hypervisor
      data storage and instruction interrupt handlers, since these can
      occur with relocation on in radix guests.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bc355125
    • Paul Mackerras's avatar
      powerpc/64: Make type of partition table flush depend on partition type · 16ed1416
      Paul Mackerras authored
      When changing a partition table entry on POWER9, we do a particular
      form of the tlbie instruction which flushes all TLBs and caches of
      the partition table for a given logical partition ID (LPID).
      This instruction has a field in the instruction word, labelled R
      (radix), which should be 1 if the partition was previously a radix
      partition and 0 if it was a HPT partition.  This implements that
      logic.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      16ed1416
    • Paul Mackerras's avatar
      powerpc/64: Export pgtable_cache and pgtable_cache_add for KVM · ba9b399a
      Paul Mackerras authored
      This exports the pgtable_cache array and the pgtable_cache_add
      function so that HV KVM can use them for allocating radix page
      tables for guests.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ba9b399a
    • Paul Mackerras's avatar
      powerpc/64: More definitions for POWER9 · dbcbfee0
      Paul Mackerras authored
      This adds definitions for bits in the DSISR register which are used
      by POWER9 for various translation-related exception conditions, and
      for some more bits in the partition table entry that will be needed
      by KVM.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      dbcbfee0
    • Paul Mackerras's avatar
      powerpc/64: Enable use of radix MMU under hypervisor on POWER9 · cc3d2940
      Paul Mackerras authored
      To use radix as a guest, we first need to tell the hypervisor via
      the ibm,client-architecture call first that we support POWER9 and
      architecture v3.00, and that we can do either radix or hash and
      that we would like to choose later using an hcall (the
      H_REGISTER_PROC_TBL hcall).
      
      Then we need to check whether the hypervisor agreed to us using
      radix.  We need to do this very early on in the kernel boot process
      before any of the MMU initialization is done.  If the hypervisor
      doesn't agree, we can't use radix and therefore clear the radix
      MMU feature bit.
      
      Later, when we have set up our process table, which points to the
      radix tree for each process, we need to install that using the
      H_REGISTER_PROC_TBL hcall.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cc3d2940
    • Paul Mackerras's avatar
      powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options · 3f4ab2f8
      Paul Mackerras authored
      This fixes the byte index values for some of the option bits in
      the "ibm,architectur-vec-5" property. The "platform facilities options"
      bits are in byte 17 not byte 14, so the upper 8 bits of their
      definitions need to be 0x11 not 0x0E. The "sub processor support" option
      is in byte 21 not byte 15.
      
      Note none of these options are actually looked up in
      "ibm,architecture-vec-5" at this time, so there is no bug.
      
      When checking whether option bits are set, we should check that
      the offset of the byte being checked is less than the vector
      length that we got from the hypervisor.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3f4ab2f8
    • Paul Mackerras's avatar
      powerpc/64: Don't try to use radix MMU under a hypervisor · 18569c1f
      Paul Mackerras authored
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it will try to use the radix MMU even though it doesn't have
      the necessary code to use radix under a hypervisor (it doesn't negotiate
      use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
      result is that the guest kernel will crash when it tries to turn on the
      MMU.
      
      This fixes it by looking for the /chosen/ibm,architecture-vec-5
      property, and if it exists, clears the radix MMU feature bit, before we
      decide whether to initialize for radix or HPT. This property is created
      by the hypervisor as a result of the guest calling the
      ibm,client-architecture-support method to indicate its capabilities, so
      it will indicate whether the hypervisor agreed to us using radix.
      
      Systems without a hypervisor may have this property also (for example,
      skiboot creates it), so we check the HV bit in the MSR to see whether we
      are running as a guest or not. If we are in hypervisor mode, then we can
      do whatever we like including using the radix MMU.
      
      The reason for using this property is that in future, when we have
      support for using radix under a hypervisor, we will need to check this
      property to see whether the hypervisor agreed to us using radix.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      18569c1f
    • Nicholas Piggin's avatar
      KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts · a97a65d5
      Nicholas Piggin authored
      64-bit Book3S exception handlers must find the dynamic kernel base
      to add to the target address when branching beyond __end_interrupts,
      in order to support kernel running at non-0 physical address.
      
      Support this in KVM by branching with CTR, similarly to regular
      interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and
      restored after the branch.
      
      Without this, the host kernel hangs and crashes randomly when it is
      running at a non-0 address and a KVM guest is started.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a97a65d5
  2. 27 Jan, 2017 2 commits
  3. 16 Jan, 2017 2 commits
    • Linus Torvalds's avatar
      Linux 4.10-rc4 · 49def185
      Linus Torvalds authored
      49def185
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 99421c1c
      Linus Torvalds authored
      Pull namespace fixes from Eric Biederman:
       "This tree contains 4 fixes.
      
        The first is a fix for a race that can causes oopses under the right
        circumstances, and that someone just recently encountered.
      
        Past that are several small trivial correct fixes. A real issue that
        was blocking development of an out of tree driver, but does not appear
        to have caused any actual problems for in-tree code. A potential
        deadlock that was reported by lockdep. And a deadlock people have
        experienced and took the time to track down caused by a cleanup that
        removed the code to drop a reference count"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        sysctl: Drop reference added by grab_header in proc_sys_readdir
        pid: fix lockdep deadlock warning due to ucount_lock
        libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount
        mnt: Protect the mountpoint hashtable with mount_lock
      99421c1c
  4. 15 Jan, 2017 14 commits
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · c9281627
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small char/misc driver fixes for 4.10-rc4 that resolve
        some reported issues.
      
        The MEI driver issue resolves a lot of problems that people have been
        having, as does the mem driver fix. The other minor fixes resolve
        other reported issues.
      
        All of these have been in linux-next for a while"
      
      * tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        vme: Fix wrong pointer utilization in ca91cx42_slave_get
        auxdisplay: fix new ht16k33 build errors
        ppdev: don't print a free'd string
        extcon: return error code on failure
        drivers: char: mem: Fix thinkos in kmem address checks
        mei: bus: enable OS version only for SPT and newer
      c9281627
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.10-rc4' of... · 2d5a7101
      Linus Torvalds authored
      Merge tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fix from Greg KH:
       "Here is a single patch being reverted to remove a feature that was
        added in 4.10-rc1 that isn't quite ready for release.
      
        It will be redone as a debugfs file instead of a sysfs file in the
        future"
      
      * tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        Revert "driver core: Add deferred_probe attribute to devices in sysfs"
      2d5a7101
    • Linus Torvalds's avatar
      Merge tag 'tty-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 7f138b97
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some small tty/serial driver fixes for 4.10-rc4 to resolve a
        number of reported issues.
      
        Nothing major here at all, one revert of a problematic patch, and some
        other tiny bugfixes. Full details are in the shortlog below.
      
        All have been in linux-next with no reported issues"
      
      * tag 'tty-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        sysrq: attach sysrq handler correctly for 32-bit kernel
        Revert "tty: serial: 8250: add CON_CONSDEV to flags"
        Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break
        8250_pci: Fix potential use-after-free in error path
        tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
        tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx
      7f138b97
    • Linus Torvalds's avatar
      Merge tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 793e039e
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a few small USB driver fixes for 4.10-rc4 to resolve some
        reported issues.
      
        The "largest" here is a number of bugs being fixed in the ch341
        usb-serial driver, to hopefully resolve the mess of different devices
        floating around that use this driver that have been having problems
        with the 4.10-rc1 release.
      
        There's also a tiny musb fix that I missed in the last pull request,
        as well as the traditional xhci fix rounding out the batch.
      
        All have been in linux-next with no reported issues"
      
      * tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        xhci: fix deadlock at host remove by running watchdog correctly
        USB: serial: ch341: fix control-message error handling
        usb: musb: fix runtime PM in debugfs
        wusbcore: Fix one more crypto-on-the-stack bug
        USB: serial: kl5kusb105: fix line-state error handling
        USB: serial: ch341: fix baud rate and line-control handling
        USB: serial: ch341: fix line settings after reset-resume
        USB: serial: ch341: fix resume after reset
        USB: serial: ch341: fix open error handling
        USB: serial: ch341: fix modem-control and B0 handling
        USB: serial: ch341: fix open and resume after B0
        USB: serial: ch341: fix initial modem-control state
      793e039e
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · aa2797b3
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Bugfixes for I2C. Mostly core this time which is a bit unusual but
        nothing really scary in there"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: piix4: Avoid race conditions with IMC
        i2c: fix spelling mistake: "insufficent" -> "insufficient"
        i2c: print correct device invalid address
        i2c: do not enable fall back to Host Notify by default
        i2c: fix kernel memory disclosure in dev interface
      aa2797b3
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 83346fbc
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - unwinder fixes
         - AMD CPU topology enumeration fixes
         - microcode loader fixes
         - x86 embedded platform fixes
         - fix for a bootup crash that may trigger when clearcpuid= is used
           with invalid values"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mpx: Use compatible types in comparison to fix sparse error
        x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
        x86/entry: Fix the end of the stack for newly forked tasks
        x86/unwind: Include __schedule() in stack traces
        x86/unwind: Disable KASAN checks for non-current tasks
        x86/unwind: Silence warnings for non-current tasks
        x86/microcode/intel: Use correct buffer size for saving microcode data
        x86/microcode/intel: Fix allocation size of struct ucode_patch
        x86/microcode/intel: Add a helper which gives the microcode revision
        x86/microcode: Use native CPUID to tickle out microcode revision
        x86/CPU: Add native CPUID variants returning a single datum
        x86/boot: Add missing declaration of string functions
        x86/CPU/AMD: Fix Bulldozer topology
        x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
        x86/cpu: Fix typo in the comment for Anniedale
        x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
      83346fbc
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a11ce3a4
      Linus Torvalds authored
      Pull NOHZ fix from Ingo Molnar:
       "This fixes an old NOHZ race where we incorrectly calculate the next
        timer interrupt in certain circumstances where hrtimers are pending,
        that can cause hard to reproduce stalled-values artifacts in
        /proc/stat"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        nohz: Fix collision between tick and other hrtimers
      a11ce3a4
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 79078c53
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
        driver fixes, plus miscellanous tooling fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Reject non sampling events with precise_ip
        perf/x86/intel: Account interrupts for PEBS errors
        perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
        perf/core: Fix sys_perf_event_open() vs. hotplug
        perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
        perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
        perf/x86: Set pmu->module in Intel PMU modules
        perf probe: Fix to probe on gcc generated symbols for offline kernel
        perf probe: Fix --funcs to show correct symbols for offline module
        perf symbols: Robustify reading of build-id from sysfs
        perf tools: Install tools/lib/traceevent plugins with install-bin
        tools lib traceevent: Fix prev/next_prio for deadline tasks
        perf record: Fix --switch-output documentation and comment
        perf record: Make __record_options static
        tools lib subcmd: Add OPT_STRING_OPTARG_SET option
        perf probe: Fix to get correct modname from elf header
        samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
        samples/bpf sock_example: Avoid getting ethhdr from two includes
        perf sched timehist: Show total scheduling time
      79078c53
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 255e6140
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "A number of regression fixes:
      
         - Fix a boot hang on machines that have somewhat unusual memory map
           entries of phys_addr=0x0 num_pages=0, which broke due to a recent
           commit. This commit got cherry-picked from the v4.11 queue because
           the bug is affecting real machines.
      
         - Fix a boot hang also reported by KASAN, caused by incorrect init
           ordering introduced by a recent optimization.
      
         - Fix a recent robustification fix to allocate_new_fdt_and_exit_boot()
           that introduced an invalid assumption. Neither bugs were seen in
           the wild AFAIK"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/x86: Prune invalid memory map entries and fix boot regression
        x86/efi: Don't allocate memmap through memblock after mm_init()
        efi/libstub/arm*: Pass latest memory map to the kernel
      255e6140
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f4d3935e
      Linus Torvalds authored
      Pull vfs fixes from Al Viro.
      
      The most notable fix here is probably the fix for a splice regression
      ("fix a fencepost error in pipe_advance()") noticed by Alan Wylie.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix a fencepost error in pipe_advance()
        coredump: Ensure proper size of sparse core files
        aio: fix lock dep warning
        tmpfs: clear S_ISGID when setting posix ACLs
      f4d3935e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 34241af7
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - the virtio_blk stack DMA corruption fix from Christoph, fixing and
         issue with VMAP stacks.
      
       - O_DIRECT blkbits calculation fix from Chandan.
      
       - discard regression fix from Christoph.
      
       - queue init error handling fixes for nbd and virtio_blk, from Omar and
         Jeff.
      
       - two small nvme fixes, from Christoph and Guilherme.
      
       - rename of blk_queue_zone_size and bdev_zone_size to _sectors instead,
         to more closely follow what we do in other places in the block layer.
         This interface is new for this series, so let's get the naming right
         before releasing a kernel with this feature. From Damien.
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: don't try to discard from __blkdev_issue_zeroout
        sd: remove __data_len hack for WRITE SAME
        nvme: use blk_rq_payload_bytes
        scsi: use blk_rq_payload_bytes
        block: add blk_rq_payload_bytes
        block: Rename blk_queue_zone_size and bdev_zone_size
        nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
        nvme-rdma: fix nvme_rdma_queue_is_ready
        virtio_blk: fix panic in initialization error path
        nbd: blk_mq_init_queue returns an error code on failure, not NULL
        virtio_blk: avoid DMA to stack for the sense buffer
        do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned
      34241af7
    • Al Viro's avatar
      fix a fencepost error in pipe_advance() · b9dc6f65
      Al Viro authored
      The logics in pipe_advance() used to release all buffers past the new
      position failed in cases when the number of buffers to release was equal
      to pipe->buffers.  If that happened, none of them had been released,
      leaving pipe full.  Worse, it was trivial to trigger and we end up with
      pipe full of uninitialized pages.  IOW, it's an infoleak.
      
      Cc: stable@vger.kernel.org # v4.9
      Reported-by: default avatar"Alan J. Wylie" <alan@wylie.me.uk>
      Tested-by: default avatar"Alan J. Wylie" <alan@wylie.me.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b9dc6f65
    • Dave Kleikamp's avatar
      coredump: Ensure proper size of sparse core files · 4d22c75d
      Dave Kleikamp authored
      If the last section of a core file ends with an unmapped or zero page,
      the size of the file does not correspond with the last dump_skip() call.
      gdb complains that the file is truncated and can be confusing to users.
      
      After all of the vma sections are written, make sure that the file size
      is no smaller than the current file position.
      
      This problem can be demonstrated with gdb's bigcore testcase on the
      sparc architecture.
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4d22c75d
    • Shaohua Li's avatar
      aio: fix lock dep warning · a12f1ae6
      Shaohua Li authored
      lockdep reports a warnning. file_start_write/file_end_write only
      acquire/release the lock for regular files. So checking the files in aio
      side too.
      
      [  453.532141] ------------[ cut here ]------------
      [  453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670
      [  453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0)
      [  453.533011] Modules linked in:
      [  453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964
      [  453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
      [  453.533011]  ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000
      [  453.533011]  ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008
      [  453.533011]  ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef
      [  453.533011] Call Trace:
      [  453.533011]  [<ffffffff8196cffb>] dump_stack+0x67/0x9c
      [  453.533011]  [<ffffffff81091ee1>] __warn+0x111/0x130
      [  453.533011]  [<ffffffff81091f97>] warn_slowpath_fmt+0x97/0xb0
      [  453.533011]  [<ffffffff81091f00>] ? __warn+0x130/0x130
      [  453.533011]  [<ffffffff8191b789>] ? blk_finish_plug+0x29/0x60
      [  453.533011]  [<ffffffff811205d4>] lock_release+0x434/0x670
      [  453.533011]  [<ffffffff8198af94>] ? import_single_range+0xd4/0x110
      [  453.533011]  [<ffffffff81322195>] ? rw_verify_area+0x65/0x140
      [  453.533011]  [<ffffffff813aa696>] ? aio_write+0x1f6/0x280
      [  453.533011]  [<ffffffff813aa6c9>] aio_write+0x229/0x280
      [  453.533011]  [<ffffffff813aa4a0>] ? aio_complete+0x640/0x640
      [  453.533011]  [<ffffffff8111df20>] ? debug_check_no_locks_freed+0x1a0/0x1a0
      [  453.533011]  [<ffffffff8114793a>] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30
      [  453.533011]  [<ffffffff81147985>] ? debug_lockdep_rcu_enabled+0x35/0x40
      [  453.533011]  [<ffffffff812a92be>] ? __might_fault+0x7e/0xf0
      [  453.533011]  [<ffffffff813ac9bc>] do_io_submit+0x94c/0xb10
      [  453.533011]  [<ffffffff813ac2ae>] ? do_io_submit+0x23e/0xb10
      [  453.533011]  [<ffffffff813ac070>] ? SyS_io_destroy+0x270/0x270
      [  453.533011]  [<ffffffff8111d7b3>] ? mark_held_locks+0x23/0xc0
      [  453.533011]  [<ffffffff8100201a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
      [  453.533011]  [<ffffffff813acb90>] SyS_io_submit+0x10/0x20
      [  453.533011]  [<ffffffff824f96aa>] entry_SYSCALL_64_fastpath+0x18/0xad
      [  453.533011]  [<ffffffff81119190>] ? trace_hardirqs_off_caller+0xc0/0x110
      [  453.533011] ---[ end trace b2fbe664d1cc0082 ]---
      
      Cc: Dmitry Monakhov <dmonakhov@openvz.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a12f1ae6
  5. 14 Jan, 2017 11 commits
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma · f0ad1771
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "The fixes this time around are spread over drivers, pretty normal
        update:
      
         - PCI ID for SKL ioatdma, workaround for SKX and
           ioat_alloc_chan_resources sleepy allocation fix
      
         - dw kconfig typo fix
      
         - null pointer deref for stm32
      
         - MAINTAINERS Update for at_hdmac
      
         - pl330 runtime pm fixes
      
         - omap-dma port window fix
      
         - rcar-dmac unmap slave resource fix"
      
      * tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: rcar-dmac: unmap slave resource when channel is freed
        dmaengine: omap-dma: Fix the port_window support
        dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations.
        dmaengine: pl330: Fix runtime PM support for terminated transfers
        MAINTAINERS: dmaengine: Update + Hand over the at_hdmac driver to Ludovic
        dmaengine: omap-dma: Fix dynamic lch_map allocation
        dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
        dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
        dmaengine: stm32-dma: Set correct args number for DMA request from DT
        dmaengine: dw: fix typo in Kconfig
        dmaengine: ioatdma: workaround SKX ioatdma version
        dmaengine: ioatdma: Add Skylake PCI Dev ID
      f0ad1771
    • Peter Jones's avatar
      efi/x86: Prune invalid memory map entries and fix boot regression · 0100a3e6
      Peter Jones authored
      Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
      (2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
      
      These machines fail to boot after the following commit,
      
        commit 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
      
      Fix this by removing such bogus entries from the memory map.
      
      Furthermore, currently the log output for this case (with efi=debug)
      looks like:
      
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |  |  |  |  |  ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
      
      This is clearly wrong, and also not as informative as it could be.  This
      patch changes it so that if we find obviously invalid memory map
      entries, we print an error and skip those entries.  It also detects the
      display of the address range calculation overflow, so the new output is:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[0x0000000000000000-0x0000000000000000] (invalid)
      
      It also detects memory map sizes that would overflow the physical
      address, for example phys_addr=0xfffffffffffff000 and
      num_pages=0x0200000000000001, and prints:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
      
      It then removes these entries from the memory map.
      Signed-off-by: default avatarPeter Jones <pjones@redhat.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      [Matt: Include bugzilla info in commit log]
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0100a3e6
    • Greg Kroah-Hartman's avatar
      Revert "driver core: Add deferred_probe attribute to devices in sysfs" · c7334ce8
      Greg Kroah-Hartman authored
      This reverts commit 6751667a.
      
      Rob Herring objected to it, and a replacement for it will be added using
      debugfs in the future.
      
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Reported-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7334ce8
    • Jiri Olsa's avatar
      perf/x86: Reject non sampling events with precise_ip · 18e7a45a
      Jiri Olsa authored
      As Peter suggested [1] rejecting non sampling PEBS events,
      because they dont make any sense and could cause bugs
      in the NMI handler [2].
      
        [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
        [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20170103142454.GA26251@kravaSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      18e7a45a
    • Jiri Olsa's avatar
      perf/x86/intel: Account interrupts for PEBS errors · 475113d9
      Jiri Olsa authored
      It's possible to set up PEBS events to get only errors and not
      any data, like on SNB-X (model 45) and IVB-EP (model 62)
      via 2 perf commands running simultaneously:
      
          taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10
      
      This leads to a soft lock up, because the error path of the
      intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
      for error PEBS interrupts, so in case you're getting ONLY
      errors you don't have a way to stop the event when it's over
      the max_samples_per_tick limit:
      
        NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
        ...
        RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
        ...
        Call Trace:
         ? trace_hardirqs_on_caller+0xf5/0x1b0
         ? perf_cgroup_attach+0x70/0x70
         perf_install_in_context+0x199/0x1b0
         ? ctx_resched+0x90/0x90
         SYSC_perf_event_open+0x641/0xf90
         SyS_perf_event_open+0x9/0x10
         do_syscall_64+0x6c/0x1f0
         entry_SYSCALL64_slow_path+0x25/0x25
      
      Add perf_event_account_interrupt() which does the interrupt
      and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
      error path.
      
      We keep the pending_kill and pending_wakeup logic only in the
      __perf_event_overflow() path, because they make sense only if
      there's any data to deliver.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      475113d9
    • Peter Zijlstra's avatar
      perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race · 321027c1
      Peter Zijlstra authored
      Di Shen reported a race between two concurrent sys_perf_event_open()
      calls where both try and move the same pre-existing software group
      into a hardware context.
      
      The problem is exactly that described in commit:
      
        f63a8daa ("perf: Fix event->ctx locking")
      
      ... where, while we wait for a ctx->mutex acquisition, the event->ctx
      relation can have changed under us.
      
      That very same commit failed to recognise sys_perf_event_context() as an
      external access vector to the events and thereby didn't apply the
      established locking rules correctly.
      
      So while one sys_perf_event_open() call is stuck waiting on
      mutex_lock_double(), the other (which owns said locks) moves the group
      about. So by the time the former sys_perf_event_open() acquires the
      locks, the context we've acquired is stale (and possibly dead).
      
      Apply the established locking rules as per perf_event_ctx_lock_nested()
      to the mutex_lock_double() for the 'move_group' case. This obviously means
      we need to validate state after we acquire the locks.
      
      Reported-by: Di Shen (Keen Lab)
      Tested-by: default avatarJohn Dias <joaodias@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Min Chong <mchong@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: f63a8daa ("perf: Fix event->ctx locking")
      Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      321027c1
    • Peter Zijlstra's avatar
      perf/core: Fix sys_perf_event_open() vs. hotplug · 63cae12b
      Peter Zijlstra authored
      There is problem with installing an event in a task that is 'stuck' on
      an offline CPU.
      
      Blocked tasks are not dis-assosciated from offlined CPUs, after all, a
      blocked task doesn't run and doesn't require a CPU etc.. Only on
      wakeup do we ammend the situation and place the task on a available
      CPU.
      
      If we hit such a task with perf_install_in_context() we'll loop until
      either that task wakes up or the CPU comes back online, if the task
      waking depends on the event being installed, we're stuck.
      
      While looking into this issue, I also spotted another problem, if we
      hit a task with perf_install_in_context() that is in the middle of
      being migrated, that is we observe the old CPU before sending the IPI,
      but run the IPI (on the old CPU) while the task is already running on
      the new CPU, things also go sideways.
      
      Rework things to rely on task_curr() -- outside of rq->lock -- which
      is rather tricky. Imagine the following scenario where we're trying to
      install the first event into our task 't':
      
      CPU0            CPU1            CPU2
      
                      (current == t)
      
      t->perf_event_ctxp[] = ctx;
      smp_mb();
      cpu = task_cpu(t);
      
                      switch(t, n);
                                      migrate(t, 2);
                                      switch(p, t);
      
                                      ctx = t->perf_event_ctxp[]; // must not be NULL
      
      smp_function_call(cpu, ..);
      
                      generic_exec_single()
                        func();
                          spin_lock(ctx->lock);
                          if (task_curr(t)) // false
      
                          add_event_to_ctx();
                          spin_unlock(ctx->lock);
      
                                      perf_event_context_sched_in();
                                        spin_lock(ctx->lock);
                                        // sees event
      
      So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'.
      Because if CPU2's load of that variable were to observe NULL, it would
      not try to schedule the ctx and we'd have a task running without its
      counter, which would be 'bad'.
      
      As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
      first and not see the event yet, then CPU0 must observe task_curr()
      and retry. If the install happens first, then we must see the event on
      sched-in and all is well.
      
      I think we can translate the first part (until the 'must not be NULL')
      of the scenario to a litmus test like:
      
        C C-peterz
      
        {
        }
      
        P0(int *x, int *y)
        {
                int r1;
      
                WRITE_ONCE(*x, 1);
                smp_mb();
                r1 = READ_ONCE(*y);
        }
      
        P1(int *y, int *z)
        {
                WRITE_ONCE(*y, 1);
                smp_store_release(z, 1);
        }
      
        P2(int *x, int *z)
        {
                int r1;
                int r2;
      
                r1 = smp_load_acquire(z);
      	  smp_mb();
                r2 = READ_ONCE(*x);
        }
      
        exists
        (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
      
      Where:
        x is perf_event_ctxp[],
        y is our tasks's CPU, and
        z is our task being placed on the rq of CPU2.
      
      The P0 smp_mb() is the one added by this patch, ordering the store to
      perf_event_ctxp[] from find_get_context() and the load of task_cpu()
      in task_function_call().
      
      The smp_store_release/smp_load_acquire model the RCpc locking of the
      rq->lock and the smp_mb() of P2 is the context switch switching from
      whatever CPU2 was running to our task 't'.
      
      This litmus test evaluates into:
      
        Test C-peterz Allowed
        States 7
        0:r1=0; 2:r1=0; 2:r2=0;
        0:r1=0; 2:r1=0; 2:r2=1;
        0:r1=0; 2:r1=1; 2:r2=1;
        0:r1=1; 2:r1=0; 2:r2=0;
        0:r1=1; 2:r1=0; 2:r2=1;
        0:r1=1; 2:r1=1; 2:r2=0;
        0:r1=1; 2:r1=1; 2:r2=1;
        No
        Witnesses
        Positive: 0 Negative: 7
        Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
        Observation C-peterz Never 0 7
        Hash=e427f41d9146b2a5445101d3e2fcaa34
      
      And the strong and weak model agree.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: jeremy.linton@arm.com
      Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      63cae12b
    • Tobias Klauser's avatar
      x86/mpx: Use compatible types in comparison to fix sparse error · 45382862
      Tobias Klauser authored
      info->si_addr is of type void __user *, so it should be compared against
      something from the same address space.
      
      This fixes the following sparse error:
      
        arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces)
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      45382862
    • Len Brown's avatar
      x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() · 695085b4
      Len Brown authored
      The Intel Denverton microserver uses a 25 MHz TSC crystal,
      so we can derive its exact [*] TSC frequency
      using CPUID and some arithmetic, eg.:
      
        TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000)
      
      [*] 'exact' is only as good as the crystal, which should be +/- 20ppm
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      695085b4
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e96f8f18
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "These are all over the place.
      
        The tracepoint part of the pull fixes a crash and adds a little more
        information to two tracepoints, while the rest are good old fashioned
        fixes"
      
      * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: make tracepoint format strings more compact
        Btrfs: add truncated_len for ordered extent tracepoints
        Btrfs: add 'inode' for extent map tracepoint
        btrfs: fix crash when tracepoint arguments are freed by wq callbacks
        Btrfs: adjust outstanding_extents counter properly when dio write is split
        Btrfs: fix lockdep warning about log_mutex
        Btrfs: use down_read_nested to make lockdep silent
        btrfs: fix locking when we put back a delayed ref that's too new
        btrfs: fix error handling when run_delayed_extent_op fails
        btrfs: return the actual error value from  from btrfs_uuid_tree_iterate
      e96f8f18
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client · 04e39627
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two small fixups for the filesystem changes that went into this merge
        window"
      
      * tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
        ceph: fix get_oldest_context()
        ceph: fix mds cluster availability check
      04e39627