1. 01 Jul, 2017 2 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Close race with testing for signals on guest entry · 8b24e69f
      Paul Mackerras authored
      At present, interrupts are hard-disabled fairly late in the guest
      entry path, in the assembly code.  Since we check for pending signals
      for the vCPU(s) task(s) earlier in the guest entry path, it is
      possible for a signal to be delivered before we enter the guest but
      not be noticed until after we exit the guest for some other reason.
      
      Similarly, it is possible for the scheduler to request a reschedule
      while we are in the guest entry path, and we won't notice until after
      we have run the guest, potentially for a whole timeslice.
      
      Furthermore, with a radix guest on POWER9, we can take the interrupt
      with the MMU on.  In this case we end up leaving interrupts
      hard-disabled after the guest exit, and they are likely to stay
      hard-disabled until we exit to userspace or context-switch to
      another process.  This was masking the fact that we were also not
      setting the RI (recoverable interrupt) bit in the MSR, meaning
      that if we had taken an interrupt, it would have crashed the host
      kernel with an unrecoverable interrupt message.
      
      To close these races, we need to check for signals and reschedule
      requests after hard-disabling interrupts, and then keep interrupts
      hard-disabled until we enter the guest.  If there is a signal or a
      reschedule request from another CPU, it will send an IPI, which will
      cause a guest exit.
      
      This puts the interrupt disabling before we call kvmppc_start_thread()
      for all the secondary threads of this core that are going to run vCPUs.
      The reason for that is that once we have started the secondary threads
      there is no easy way to back out without going through at least part
      of the guest entry path.  However, kvmppc_start_thread() includes some
      code for radix guests which needs to call smp_call_function(), which
      must be called with interrupts enabled.  To solve this problem, this
      patch moves that code into a separate function that is called earlier.
      
      When the guest exit is caused by an external interrupt, a hypervisor
      doorbell or a hypervisor maintenance interrupt, we now handle these
      using the replay facility.  __kvmppc_vcore_entry() now returns the
      trap number that caused the exit on this thread, and instead of the
      assembly code jumping to the handler entry, we return to C code with
      interrupts still hard-disabled and set the irq_happened flag in the
      PACA, so that when we do local_irq_enable() the appropriate handler
      gets called.
      
      With all this, we now have the interrupt soft-enable flag clear while
      we are in the guest.  This is useful because code in the real-mode
      hypercall handlers that checks whether interrupts are enabled will
      now see that they are disabled, which is correct, since interrupts
      are hard-disabled in the real-mode code.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      8b24e69f
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Simplify dynamic micro-threading code · 898b25b2
      Paul Mackerras authored
      Since commit b009031f ("KVM: PPC: Book3S HV: Take out virtual
      core piggybacking code", 2016-09-15), we only have at most one
      vcore per subcore.  Previously, the fact that there might be more
      than one vcore per subcore meant that we had the notion of a
      "master vcore", which was the vcore that controlled thread 0 of
      the subcore.  We also needed a list per subcore in the core_info
      struct to record which vcores belonged to each subcore.  Now that
      there can only be one vcore in the subcore, we can replace the
      list with a simple pointer and get rid of the notion of the
      master vcore (and in fact treat every vcore as a master vcore).
      
      We can also get rid of the subcore_vm[] field in the core_info
      struct since it is never read.
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      898b25b2
  2. 22 Jun, 2017 2 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add capability to report possible virtual SMT modes · 2ed4f9dd
      Paul Mackerras authored
      Now that userspace can set the virtual SMT mode by enabling the
      KVM_CAP_PPC_SMT capability, it is useful for userspace to be able
      to query the set of possible virtual SMT modes.  This provides a
      new capability, KVM_CAP_PPC_SMT_POSSIBLE, to provide this
      information.  The return value is a bitmap of possible modes, with
      bit N set if virtual SMT mode 2^N is available.  That is, 1 indicates
      SMT1 is available, 2 indicates that SMT2 is available, 3 indicates
      that both SMT1 and SMT2 are available, and so on.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      2ed4f9dd
    • Aravinda Prasad's avatar
      KVM: PPC: Book3S HV: Exit guest upon MCE when FWNMI capability is enabled · e20bbd3d
      Aravinda Prasad authored
      Enhance KVM to cause a guest exit with KVM_EXIT_NMI
      exit reason upon a machine check exception (MCE) in
      the guest address space if the KVM_CAP_PPC_FWNMI
      capability is enabled (instead of delivering a 0x200
      interrupt to guest). This enables QEMU to build error
      log and deliver machine check exception to guest via
      guest registered machine check handler.
      
      This approach simplifies the delivery of machine
      check exception to guest OS compared to the earlier
      approach of KVM directly invoking 0x200 guest interrupt
      vector.
      
      This design/approach is based on the feedback for the
      QEMU patches to handle machine check exception. Details
      of earlier approach of handling machine check exception
      in QEMU and related discussions can be found at:
      
      https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
      
      Note:
      
      This patch now directly invokes machine_check_print_event_info()
      from kvmppc_handle_exit_hv() to print the event to host console
      at the time of guest exit before the exception is passed on to the
      guest. Hence, the host-side handling which was performed earlier
      via machine_check_fwnmi is removed.
      
      The reasons for this approach is (i) it is not possible
      to distinguish whether the exception occurred in the
      guest or the host from the pt_regs passed on the
      machine_check_exception(). Hence machine_check_exception()
      calls panic, instead of passing on the exception to
      the guest, if the machine check exception is not
      recoverable. (ii) the approach introduced in this
      patch gives opportunity to the host kernel to perform
      actions in virtual mode before passing on the exception
      to the guest. This approach does not require complex
      tweaks to machine_check_fwnmi and friends.
      Signed-off-by: default avatarAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      e20bbd3d
  3. 21 Jun, 2017 2 commits
  4. 20 Jun, 2017 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't sleep if XIVE interrupt pending on POWER9 · ee3308a2
      Paul Mackerras authored
      On a POWER9 system, it is possible for an interrupt to become pending
      for a VCPU when that VCPU is about to cede (execute a H_CEDE hypercall)
      and has already disabled interrupts, or in the H_CEDE processing up
      to the point where the XIVE context is pulled from the hardware.  In
      such a case, the H_CEDE should not sleep, but should return immediately
      to the guest.  However, the conditions tested in kvmppc_vcpu_woken()
      don't include the condition that a XIVE interrupt is pending, so the
      VCPU could sleep until the next decrementer interrupt.
      
      To fix this, we add a new xive_interrupt_pending() helper which looks
      in the XIVE context that was pulled from the hardware to see if the
      priority of any pending interrupt is higher (numerically lower than)
      the CPU priority.  If so then kvmppc_vcpu_woken() will return true.
      If the XIVE context has never been used, then both the pipr and the
      cppr fields will be zero and the test will indicate that no interrupt
      is pending.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      ee3308a2
  5. 19 Jun, 2017 5 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9 · 57900694
      Paul Mackerras authored
      On POWER9, we no longer have the restriction that we had on POWER8
      where all threads in a core have to be in the same partition, so
      the CPU threads are now independent.  However, we still want to be
      able to run guests with a virtual SMT topology, if only to allow
      migration of guests from POWER8 systems to POWER9.
      
      A guest that has a virtual SMT mode greater than 1 will expect to
      be able to use the doorbell facility; it will expect the msgsndp
      and msgclrp instructions to work appropriately and to be able to read
      sensible values from the TIR (thread identification register) and
      DPDES (directed privileged doorbell exception status) special-purpose
      registers.  However, since each CPU thread is a separate sub-processor
      in POWER9, these instructions and registers can only be used within
      a single CPU thread.
      
      In order for these instructions to appear to act correctly according
      to the guest's virtual SMT mode, we have to trap and emulate them.
      We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR
      register.  The emulation is triggered by the hypervisor facility
      unavailable interrupt that occurs when the guest uses them.
      
      To cause a doorbell interrupt to occur within the guest, we set the
      DPDES register to 1.  If the guest has interrupts enabled, the CPU
      will generate a doorbell interrupt and clear the DPDES register in
      hardware.  The DPDES hardware register for the guest is saved in the
      vcpu->arch.vcore->dpdes field.  Since this gets written by the guest
      exit code, other VCPUs wishing to cause a doorbell interrupt don't
      write that field directly, but instead set a vcpu->arch.doorbell_request
      flag.  This is consumed and set to 0 by the guest entry code, which
      then sets DPDES to 1.
      
      Emulating reads of the DPDES register is somewhat involved, because
      it requires reading the doorbell pending interrupt status of all of the
      VCPU threads in the virtual core, and if any of those VCPUs are
      running, their doorbell status is only up-to-date in the hardware
      DPDES registers of the CPUs where they are running.  In order to get
      a reasonable approximation of the current doorbell status, we send
      those CPUs an IPI, causing an exit from the guest which will update
      the vcpu->arch.vcore->dpdes field.  We then use that value in
      constructing the emulated DPDES register value.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      57900694
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode · 3c313524
      Paul Mackerras authored
      This allows userspace to set the desired virtual SMT (simultaneous
      multithreading) mode for a VM, that is, the number of VCPUs that
      get assigned to each virtual core.  Previously, the virtual SMT mode
      was fixed to the number of threads per subcore, and if userspace
      wanted to have fewer vcpus per vcore, then it would achieve that by
      using a sparse CPU numbering.  This had the disadvantage that the
      vcpu numbers can get quite large, particularly for SMT1 guests on
      a POWER8 with 8 threads per core.  With this patch, userspace can
      set its desired virtual SMT mode and then use contiguous vcpu
      numbering.
      
      On POWER8, where the threading mode is "strict", the virtual SMT mode
      must be less than or equal to the number of threads per subcore.  On
      POWER9, which implements a "loose" threading mode, the virtual SMT
      mode can be any power of 2 between 1 and 8, even though there is
      effectively one thread per subcore, since the threads are independent
      and can all be in different partitions.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      3c313524
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Context-switch HFSCR between host and guest on POWER9 · 769377f7
      Paul Mackerras authored
      This adds code to allow us to use a different value for the HFSCR
      (Hypervisor Facilities Status and Control Register) when running the
      guest from that which applies in the host.  The reason for doing this
      is to allow us to trap the msgsndp instruction and related operations
      in future so that they can be virtualized.  We also save the value of
      HFSCR when a hypervisor facility unavailable interrupt occurs, because
      the high byte of HFSCR indicates which facility the guest attempted to
      access.
      
      We save and restore the host value on guest entry/exit because some
      bits of it affect host userspace execution.
      
      We only do all this on POWER9, not on POWER8, because we are not
      intending to virtualize any of the facilities controlled by HFSCR on
      POWER8.  In particular, the HFSCR bit that controls execution of
      msgsndp and related operations does not exist on POWER8.  The HFSCR
      doesn't exist at all on POWER7.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      769377f7
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't let VCPU sleep if it has a doorbell pending · 1da4e2f4
      Paul Mackerras authored
      It is possible, through a narrow race condition, for a VCPU to exit
      the guest with a H_CEDE hypercall while it has a doorbell interrupt
      pending.  In this case, the H_CEDE should return immediately, but in
      fact it puts the VCPU to sleep until some other interrupt becomes
      pending or a prod is received (via another VCPU doing H_PROD).
      
      This fixes it by checking the DPDES (Directed Privileged Doorbell
      Exception Status) bit for the thread along with the other interrupt
      pending bits.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      1da4e2f4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9 · 1bc3fe81
      Paul Mackerras authored
      This allows userspace (e.g. QEMU) to enable large decrementer mode for
      the guest when running on a POWER9 host, by setting the LPCR_LD bit in
      the guest LPCR value.  With this, the guest exit code saves 64 bits of
      the guest DEC value on exit.  Other places that use the guest DEC
      value check the LPCR_LD bit in the guest LPCR value, and if it is set,
      omit the 32-bit sign extension that would otherwise be done.
      
      This doesn't change the DEC emulation used by PR KVM because PR KVM
      is not supported on POWER9 yet.
      
      This is partly based on an earlier patch by Oliver O'Halloran.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      1bc3fe81
  6. 16 Jun, 2017 2 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 · 3d3efb68
      Paul Mackerras authored
      POWER9 DD1 has an erratum where writing to the TBU40 register, which
      is used to apply an offset to the timebase, can cause the timebase to
      lose counts.  This results in the timebase on some CPUs getting out of
      sync with other CPUs, which then results in misbehaviour of the
      timekeeping code.
      
      To work around the problem, we make KVM ignore the timebase offset for
      all guests on POWER9 DD1 machines.  This means that live migration
      cannot be supported on POWER9 DD1 machines.
      
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      3d3efb68
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Save/restore host values of debug registers · 7ceaa6dc
      Paul Mackerras authored
      At present, HV KVM on POWER8 and POWER9 machines loses any instruction
      or data breakpoint set in the host whenever a guest is run.
      Instruction breakpoints are currently only used by xmon, but ptrace
      and the perf_event subsystem can set data breakpoints as well as xmon.
      
      To fix this, we save the host values of the debug registers (CIABR,
      DAWR and DAWRX) before entering the guest and restore them on exit.
      To provide space to save them in the stack frame, we expand the stack
      frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      7ceaa6dc
  7. 15 Jun, 2017 2 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Preserve userspace HTM state properly · 46a704f8
      Paul Mackerras authored
      If userspace attempts to call the KVM_RUN ioctl when it has hardware
      transactional memory (HTM) enabled, the values that it has put in the
      HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
      guest values.  To fix this, we detect this condition and save those
      SPR values in the thread struct, and disable HTM for the task.  If
      userspace goes to access those SPRs or the HTM facility in future,
      a TM-unavailable interrupt will occur and the handler will reload
      those SPRs and re-enable HTM.
      
      If userspace has started a transaction and suspended it, we would
      currently lose the transactional state in the guest entry path and
      would almost certainly get a "TM Bad Thing" interrupt, which would
      cause the host to crash.  To avoid this, we detect this case and
      return from the KVM_RUN ioctl with an EINVAL error, with the KVM
      exit reason set to KVM_EXIT_FAIL_ENTRY.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      46a704f8
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit · 4c3bb4cc
      Paul Mackerras authored
      This restores several special-purpose registers (SPRs) to sane values
      on guest exit that were missed before.
      
      TAR and VRSAVE are readable and writable by userspace, and we need to
      save and restore them to prevent the guest from potentially affecting
      userspace execution (not that TAR or VRSAVE are used by any known
      program that run uses the KVM_RUN ioctl).  We save/restore these
      in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.
      
      FSCR affects userspace execution in that it can prohibit access to
      certain facilities by userspace.  We restore it to the normal value
      for the task on exit from the KVM_RUN ioctl.
      
      IAMR is normally 0, and is restored to 0 on guest exit.  However,
      with a radix host on POWER9, it is set to a value that prevents the
      kernel from executing user-accessible memory.  On POWER9, we save
      IAMR on guest entry and restore it on guest exit to the saved value
      rather than 0.  On POWER8 we continue to set it to 0 on guest exit.
      
      PSPB is normally 0.  We restore it to 0 on guest exit to prevent
      userspace taking advantage of the guest having set it non-zero
      (which would allow userspace to set its SMT priority to high).
      
      UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
      the AMR from being used as a covert channel between userspace
      processes, since the AMR is not context-switched at present.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      4c3bb4cc
  8. 13 Jun, 2017 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Context-switch EBB registers properly · ca8efa1d
      Paul Mackerras authored
      This adds code to save the values of three SPRs (special-purpose
      registers) used by userspace to control event-based branches (EBBs),
      which are essentially interrupts that get delivered directly to
      userspace.  These registers are loaded up with guest values when
      entering the guest, and their values are saved when exiting the
      guest, but we were not saving the host values and restoring them
      before going back to userspace.
      
      On POWER8 this would only affect userspace programs which explicitly
      request the use of EBBs and also use the KVM_RUN ioctl, since the
      only source of EBBs on POWER8 is the PMU, and there is an explicit
      enable bit in the PMU registers (and those PMU registers do get
      properly context-switched between host and guest).  On POWER9 there
      is provision for externally-generated EBBs, and these are not subject
      to the control in the PMU registers.
      
      Since these registers only affect userspace, we can save them when
      we first come in from userspace and restore them before returning to
      userspace, rather than saving/restoring the host values on every
      guest entry/exit.  Similarly, we don't need to worry about their
      values on offline secondary threads since they execute in the context
      of the idle task, which never executes in userspace.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      ca8efa1d
  9. 29 May, 2017 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Cope with host using large decrementer mode · 2f272463
      Paul Mackerras authored
      POWER9 introduces a new mode for the decrementer register, called
      large decrementer mode, in which the decrementer counter is 56 bits
      wide rather than 32, and reads are sign-extended rather than
      zero-extended.  For the decrementer, this new mode is optional and
      controlled by a bit in the LPCR.  The hypervisor decrementer (HDEC)
      is 56 bits wide on POWER9 and has no mode control.
      
      Since KVM code reads and writes the decrementer and hypervisor
      decrementer registers in a few places, it needs to be aware of the
      need to treat the decrementer value as a 64-bit quantity, and only do
      a 32-bit sign extension when large decrementer mode is not in effect.
      Similarly, the HDEC should always be treated as a 64-bit quantity on
      POWER9.  We define a new EXTEND_HDEC macro to encapsulate the feature
      test for POWER9 and the sign extension.
      
      To enable the sign extension to be removed in large decrementer mode,
      we test the LPCR_LD bit in the host LPCR image stored in the struct
      kvm for the guest.  If is set then large decrementer mode is enabled
      and the sign extension should be skipped.
      
      This is partly based on an earlier patch by Oliver O'Halloran.
      
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      2f272463
  10. 22 May, 2017 2 commits
    • Linus Torvalds's avatar
      Linux 4.12-rc2 · 08332893
      Linus Torvalds authored
      08332893
    • Linus Torvalds's avatar
      x86: fix 32-bit case of __get_user_asm_u64() · 33c9e972
      Linus Torvalds authored
      The code to fetch a 64-bit value from user space was entirely buggered,
      and has been since the code was merged in early 2016 in commit
      b2f68038 ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
      kernels").
      
      Happily the buggered routine is almost certainly entirely unused, since
      the normal way to access user space memory is just with the non-inlined
      "get_user()", and the inlined version didn't even historically exist.
      
      The normal "get_user()" case is handled by external hand-written asm in
      arch/x86/lib/getuser.S that doesn't have either of these issues.
      
      There were two independent bugs in __get_user_asm_u64():
      
       - it still did the STAC/CLAC user space access marking, even though
         that is now done by the wrapper macros, see commit 11f1a4b9
         ("x86: reorganize SMAP handling in user space accesses").
      
         This didn't result in a semantic error, it just means that the
         inlined optimized version was hugely less efficient than the
         allegedly slower standard version, since the CLAC/STAC overhead is
         quite high on modern Intel CPU's.
      
       - the double register %eax/%edx was marked as an output, but the %eax
         part of it was touched early in the asm, and could thus clobber other
         inputs to the asm that gcc didn't expect it to touch.
      
         In particular, that meant that the generated code could look like
         this:
      
              mov    (%eax),%eax
              mov    0x4(%eax),%edx
      
         where the load of %edx obviously was _supposed_ to be from the 32-bit
         word that followed the source of %eax, but because %eax was
         overwritten by the first instruction, the source of %edx was
         basically random garbage.
      
      The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
      the 64-bit output as early-clobber to let gcc know that no inputs should
      alias with the output register.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable@kernel.org   # v4.8+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33c9e972
  11. 21 May, 2017 7 commits
    • Linus Torvalds's avatar
      Clean up x86 unsafe_get/put_user() type handling · 334a023e
      Linus Torvalds authored
      Al noticed that unsafe_put_user() had type problems, and fixed them in
      commit a7cc722f ("fix unsafe_put_user()"), which made me look more
      at those functions.
      
      It turns out that unsafe_get_user() had a type issue too: it limited the
      largest size of the type it could handle to "unsigned long".  Which is
      fine with the current users, but doesn't match our existing normal
      get_user() semantics, which can also handle "u64" even when that does
      not fit in a long.
      
      While at it, also clean up the type cast in unsafe_put_user().  We
      actually want to just make it an assignment to the expected type of the
      pointer, because we actually do want warnings from types that don't
      convert silently.  And it makes the code more readable by not having
      that one very long and complex line.
      
      [ This patch might become stable material if we ever end up back-porting
        any new users of the unsafe uaccess code, but as things stand now this
        doesn't matter for any current existing uses. ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      334a023e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f3926e4c
      Linus Torvalds authored
      Pull misc uaccess fixes from Al Viro:
       "Fix for unsafe_put_user() (no callers currently in mainline, but
        anyone starting to use it will step into that) + alpha osf_wait4()
        infoleak fix"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        osf_wait4(): fix infoleak
        fix unsafe_put_user()
      f3926e4c
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 970c305a
      Linus Torvalds authored
      Pull scheduler fix from Thomas Gleixner:
       "A single scheduler fix:
      
        Prevent idle task from ever being preempted. That makes sure that
        synchronize_rcu_tasks() which is ignoring idle task does not pretend
        that no task is stuck in preempted state. If that happens and idle was
        preempted on a ftrace trampoline the machine crashes due to
        inconsistent state"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Call __schedule() from do_idle() without enabling preemption
      970c305a
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e7a3d627
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of small fixes for the irq subsystem:
      
         - Cure a data ordering problem with chained interrupts
      
         - Three small fixlets for the mbigen irq chip"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Fix chained interrupt data ordering
        irqchip/mbigen: Fix the clear register offset calculation
        irqchip/mbigen: Fix potential NULL dereferencing
        irqchip/mbigen: Fix memory mapping code
      e7a3d627
    • Al Viro's avatar
      osf_wait4(): fix infoleak · a8c39544
      Al Viro authored
      failing sys_wait4() won't fill struct rusage...
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a8c39544
    • Al Viro's avatar
      fix unsafe_put_user() · a7cc722f
      Al Viro authored
      __put_user_size() relies upon its first argument having the same type as what
      the second one points to; the only other user makes sure of that and
      unsafe_put_user() should do the same.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a7cc722f
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 56f410cf
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix a bug caused by not cleaning up the new instance unique triggers
         when deleting an instance. It also creates a selftest that triggers
         that bug.
      
       - Fix the delayed optimization happening after kprobes boot up self
         tests being removed by freeing of init memory.
      
       - Comment kprobes on why the delay optimization is not a problem for
         removal of modules, to keep other developers from searching that
         riddle.
      
       - Fix another case of rcu not watching in stack trace tracing.
      
      * tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Make sure RCU is watching before calling a stack trace
        kprobes: Document how optimized kprobes are removed from module unload
        selftests/ftrace: Add test to remove instance with active event triggers
        selftests/ftrace: Fix bashisms
        ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
        ftrace/instances: Clear function triggers when removing instances
        ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()
        tracing/kprobes: Enforce kprobes teardown after testing
        tracing: Move postpone selftests to core from early_initcall
      56f410cf
  12. 20 May, 2017 13 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 894e2164
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A small collection of fixes that should go into this cycle.
      
         - a pull request from Christoph for NVMe, which ended up being
           manually applied to avoid pulling in newer bits in master. Mostly
           fibre channel fixes from James, but also a few fixes from Jon and
           Vijay
      
         - a pull request from Konrad, with just a single fix for xen-blkback
           from Gustavo.
      
         - a fuseblk bdi fix from Jan, fixing a regression in this series with
           the dynamic backing devices.
      
         - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull().
      
         - a request leak fix for drbd from Lars, fixing a regression in the
           last series with the kref changes. This will go to stable as well"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        nvmet: release the sq ref on rdma read errors
        nvmet-fc: remove target cpu scheduling flag
        nvme-fc: stop queues on error detection
        nvme-fc: require target or discovery role for fc-nvme targets
        nvme-fc: correct port role bits
        nvme: unmap CMB and remove sysfs file in reset path
        blktrace: fix integer parse
        fuseblk: Fix warning in super_setup_bdi_name()
        block: xen-blkback: add null check to avoid null pointer dereference
        drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()
      894e2164
    • Vijay Immanuel's avatar
      nvmet: release the sq ref on rdma read errors · 549f01ae
      Vijay Immanuel authored
      On rdma read errors, release the sq ref that was taken
      when the req was initialized. This avoids a hang in
      nvmet_sq_destroy() when the queue is being freed.
      Signed-off-by: default avatarVijay Immanuel <vijayi@attalasystems.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      549f01ae
    • James Smart's avatar
      nvmet-fc: remove target cpu scheduling flag · 4b8ba5fa
      James Smart authored
      Remove NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED. It's unnecessary.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4b8ba5fa
    • James Smart's avatar
      nvme-fc: stop queues on error detection · 2952a879
      James Smart authored
      Per the recommendation by Sagi on:
      http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html
      
      Rather than waiting for reset work thread to stop queues and abort the ios,
      immediately stop the queues on error detection. Reset thread will restop
      the queues (as it's called on other paths), but it does not appear to have
      a side effect.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2952a879
    • James Smart's avatar
      nvme-fc: require target or discovery role for fc-nvme targets · 85e6a6ad
      James Smart authored
      In order to create an association, the remoteport must be
      serving either a target role or a discovery role.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      85e6a6ad
    • James Smart's avatar
      nvme-fc: correct port role bits · 41231090
      James Smart authored
      FC Port roles is a bit mask, not individual values.
      Correct nvme definitions to unique bits.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      41231090
    • Jon Derrick's avatar
      nvme: unmap CMB and remove sysfs file in reset path · f63572df
      Jon Derrick authored
      CMB doesn't get unmapped until removal while getting remapped on every
      reset. Add the unmapping and sysfs file removal to the reset path in
      nvme_pci_disable to match the mapping path in nvme_pci_enable.
      
      Fixes: 202021c1 ("nvme : Add sysfs entry for NVMe CMBs when appropriate")
      Signed-off-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Acked-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-By: default avatarStephen Bates <sbates@raithlin.com>
      Cc: <stable@vger.kernel.org> # 4.9+
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f63572df
    • Linus Torvalds's avatar
      Merge tag 'staging-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · ef82f1ad
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are a number of staging driver fixes for 4.12-rc2
      
        Most of them are typec driver fixes found by reviewers and users of
        the code. There are also some removals of files no longer needed in
        the tree due to the ion driver rewrite in 4.12-rc1, as well as some
        wifi driver fixes. And to round it out, a MAINTAINERS file update.
      
        All have been in linux-next with no reported issues"
      
      * tag 'staging-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (22 commits)
        MAINTAINERS: greybus-dev list is members-only
        staging: fsl-dpaa2/eth: add ETHERNET dependency
        staging: typec: fusb302: refactor resume retry mechanism
        staging: typec: fusb302: reset i2c_busy state in error
        staging: rtl8723bs: remove re-positioned call to kfree in os_dep/ioctl_cfg80211.c
        staging: rtl8192e: GetTs Fix invalid TID 7 warning.
        staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
        staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
        staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
        staging: vc04_services: Fix bulk cache maintenance
        staging: ccree: remove extraneous spin_unlock_bh() in error handler
        staging: typec: Fix sparse warnings about incorrect types
        staging: typec: fusb302: do not free gpio from managed resource
        staging: typec: tcpm: Fix Port Power Role field in PS_RDY messages
        staging: typec: tcpm: Respond to Discover Identity commands
        staging: typec: tcpm: Set correct flags in PD request messages
        staging: typec: tcpm: Drop duplicate PD messages
        staging: typec: fusb302: Fix chip->vbus_present init value
        staging: typec: fusb302: Fix module autoload
        staging: typec: tcpci: declare private structure as static
        ...
      ef82f1ad
    • Linus Torvalds's avatar
      Merge tag 'usb-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 32026293
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a number of small USB fixes for 4.12-rc2
      
        Most of them come from Johan, in his valiant quest to fix up all
        drivers that could be affected by "malicious" USB devices. There's
        also some fixes for more "obscure" drivers to handle some of the
        vmalloc stack fallout (which for USB drivers, was always the case, but
        very few people actually ran those systems...)
      
        Other than that, the normal set of xhci and gadget and musb driver
        fixes as well.
      
        All have been in linux-next with no reported issues"
      
      * tag 'usb-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (42 commits)
        usb: musb: tusb6010_omap: Do not reset the other direction's packet size
        usb: musb: Fix trying to suspend while active for OTG configurations
        usb: host: xhci-plat: propagate return value of platform_get_irq()
        xhci: Fix command ring stop regression in 4.11
        xhci: remove GFP_DMA flag from allocation
        USB: xhci: fix lock-inversion problem
        usb: host: xhci-ring: don't need to clear interrupt pending for MSI enabled hcd
        usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
        xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
        usb: xhci: trace URB before giving it back instead of after
        USB: serial: qcserial: add more Lenovo EM74xx device IDs
        USB: host: xhci: use max-port define
        USB: hub: fix SS max number of ports
        USB: hub: fix non-SS hub-descriptor handling
        USB: hub: fix SS hub-descriptor handling
        USB: usbip: fix nonconforming hub descriptor
        USB: gadget: dummy_hcd: fix hub-descriptor removable fields
        doc-rst: fixed kernel-doc directives in usb/typec.rst
        USB: core: of: document reference taken by companion helper
        USB: ehci-platform: fix companion-device leak
        ...
      32026293
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 331da109
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are five small bugfixes for reported issues with 4.12-rc1 and
        earlier kernels. Nothing huge here, just a lp, mem, vpd, and uio
        driver fix, along with a Kconfig fixup for one of the misc drivers.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'char-misc-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        firmware: Google VPD: Fix memory allocation error handling
        drivers: char: mem: Check for address space wraparound with mmap()
        uio: fix incorrect memory leak cleanup
        misc: pci_endpoint_test: select CRC32
        char: lp: fix possible integer overflow in lp_setup()
      331da109
    • Linus Torvalds's avatar
      Merge git://www.linux-watchdog.org/linux-watchdog · ec53c027
      Linus Torvalds authored
      Pull watchdog fixes from Wim Van Sebroeck:
       - orion_wdt compile-test dependencies
       - sama5d4_wdt: WDDIS handling and a race confition
       - pcwd_usb: fix NULL-deref at probe
       - cadence_wdt: fix timeout setting
       - wdt_pci: fix build error if SOFTWARE_REBOOT is defined
       - iTCO_wdt: all versions count down twice
       - zx2967: remove redundant dev_err call in zx2967_wdt_probe()
       - bcm281xx: Fix use of uninitialized spinlock
      
      * git://www.linux-watchdog.org/linux-watchdog:
        watchdog: bcm281xx: Fix use of uninitialized spinlock.
        watchdog: zx2967: remove redundant dev_err call in zx2967_wdt_probe()
        iTCO_wdt: all versions count down twice
        watchdog: wdt_pci: fix build error if define SOFTWARE_REBOOT
        watchdog: cadence_wdt: fix timeout setting
        watchdog: pcwd_usb: fix NULL-deref at probe
        watchdog: sama5d4: fix race condition
        watchdog: sama5d4: fix WDDIS handling
        watchdog: orion: fix compile-test dependencies
      ec53c027
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.12-rc2' of git://people.freedesktop.org/~airlied/linux · cf80a6fb
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Mostly nouveau and i915, fairly quiet as usual for rc2"
      
      * tag 'drm-fixes-for-v4.12-rc2' of git://people.freedesktop.org/~airlied/linux:
        drm/atmel-hlcdc: Fix output initialization
        gpu: host1x: select IOMMU_IOVA
        drm/nouveau/fifo/gk104-: Silence a locking warning
        drm/nouveau/secboot: plug memory leak in ls_ucode_img_load_gr() error path
        drm/nouveau: Fix drm poll_helper handling
        drm/i915: don't do allocate_va_range again on PIN_UPDATE
        drm/i915: Fix rawclk readout for g4x
        drm/i915: Fix runtime PM for LPE audio
        drm/i915/glk: Fix DSI "*ERROR* ULPS is still active" messages
        drm/i915/gvt: avoid unnecessary vgpu switch
        drm/i915/gvt: not to restore in-context mmio
        drm/etnaviv: don't put fence in case of submit failure
        drm/i915/gvt: fix typo: "supporte" -> "support"
        drm: hdlcd: Fix the calculation of the scanout start address
      cf80a6fb
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 6fe1de43
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is the first sweep of mostly minor fixes. There's one security
        one: the read past the end of a buffer in qedf, and a panic fix for
        lpfc SLI-3 adapters, but the rest are a set of include and build
        dependency tidy ups and assorted other small fixes and updates"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: pmcraid: remove redundant check to see if request_size is less than zero
        scsi: lpfc: ensure els_wq is being checked before destroying it
        scsi: cxlflash: Select IRQ_POLL
        scsi: qedf: Avoid reading past end of buffer
        scsi: qedf: Cleanup the type of io_log->op
        scsi: lpfc: double lock typo in lpfc_ns_rsp()
        scsi: qedf: properly update arguments position in function call
        scsi: scsi_lib: Add #include <scsi/scsi_transport.h>
        scsi: MAINTAINERS: update OSD entries
        scsi: Skip deleted devices in __scsi_device_lookup
        scsi: lpfc: Fix panic on BFS configuration
        scsi: libfc: do not flood console with messages 'libfc: queue full ...'
      6fe1de43