1. 06 Feb, 2017 40 commits
    • Al Viro's avatar
      mn10300: failing __get_user() and get_user() should zero · 7d84a5d5
      Al Viro authored
      commit 43403eab upstream.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7d84a5d5
    • Al Viro's avatar
      microblaze: fix copy_from_user() · 22e232d6
      Al Viro authored
      commit d0cf3851 upstream.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [wt: s/might_fault/might_sleep]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      22e232d6
    • Al Viro's avatar
      microblaze: fix __get_user() · 90f2278f
      Al Viro authored
      commit e98b9e37 upstream.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      90f2278f
    • John David Anglin's avatar
      parisc: Ensure consistent state when switching to kernel stack at syscall entry · 61f2d84a
      John David Anglin authored
      commit 6ed51832 upstream.
      
      We have one critical section in the syscall entry path in which we switch from
      the userspace stack to kernel stack. In the event of an external interrupt, the
      interrupt code distinguishes between those two states by analyzing the value of
      sr7. If sr7 is zero, it uses the kernel stack. Therefore it's important, that
      the value of sr7 is in sync with the currently enabled stack.
      
      This patch now disables interrupts while executing the critical section.  This
      prevents the interrupt handler to possibly see an inconsistent state which in
      the worst case can lead to crashes.
      
      Interestingly, in the syscall exit path interrupts were already disabled in the
      critical section which switches back to the userspace stack.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      61f2d84a
    • Stefan Haberland's avatar
      s390/dasd: fix hanging device after clear subchannel · 2fe6f38f
      Stefan Haberland authored
      commit 9ba333dc upstream.
      
      When a device is in a status where CIO has killed all I/O by itself the
      interrupt for a clear request may not contain an irb to determine the
      clear function. Instead it contains an error pointer -EIO.
      This was ignored by the DASD int_handler leading to a hanging device
      waiting for a clear interrupt.
      
      Handle -EIO error pointer correctly for requests that are clear pending and
      treat the clear as successful.
      Signed-off-by: default avatarStefan Haberland <sth@linux.vnet.ibm.com>
      Reviewed-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2fe6f38f
    • Dan Carpenter's avatar
      avr32: off by one in at32_init_pio() · cba31cb9
      Dan Carpenter authored
      commit 55f1cf83 upstream.
      
      The pio_dev[] array has MAX_NR_PIO_DEVICES elements so the > should be
      >=.
      
      Fixes: 5f97f7f9 ('[PATCH] avr32 architecture')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cba31cb9
    • Guenter Roeck's avatar
      avr32: fix 'undefined reference to `___copy_from_user' · da189d04
      Guenter Roeck authored
      commit 65c0044c upstream.
      
      avr32 builds fail with:
      
      arch/avr32/kernel/built-in.o: In function `arch_ptrace':
      (.text+0x650): undefined reference to `___copy_from_user'
      arch/avr32/kernel/built-in.o:(___ksymtab+___copy_from_user+0x0): undefined
      reference to `___copy_from_user'
      kernel/built-in.o: In function `proc_doulongvec_ms_jiffies_minmax':
      (.text+0x5dd8): undefined reference to `___copy_from_user'
      kernel/built-in.o: In function `proc_dointvec_minmax_sysadmin':
      sysctl.c:(.text+0x6174): undefined reference to `___copy_from_user'
      kernel/built-in.o: In function `ptrace_has_cap':
      ptrace.c:(.text+0x69c0): undefined reference to `___copy_from_user'
      kernel/built-in.o:ptrace.c:(.text+0x6b90): more undefined references to
      `___copy_from_user' follow
      
      Fixes: 8630c322 ("avr32: fix copy_from_user()")
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarHavard Skinnemoen <hskinnemoen@gmail.com>
      Acked-by: default avatarHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      da189d04
    • Al Viro's avatar
      avr32: fix copy_from_user() · 91be3ab4
      Al Viro authored
      commit 8630c322 upstream.
      
      really ugly, but apparently avr32 compilers turns access_ok() into
      something so bad that they want it in assembler.  Left that way,
      zeroing added in inline wrapper.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      91be3ab4
    • Pan Xinhui's avatar
      powerpc/nvram: Fix an incorrect partition merge · ac78e19e
      Pan Xinhui authored
      commit 11b7e154 upstream.
      
      When we merge two contiguous partitions whose signatures are marked
      NVRAM_SIG_FREE, We need update prev's length and checksum, then write it
      to nvram, not cur's. So lets fix this mistake now.
      
      Also use memset instead of strncpy to set the partition's name. It's
      more readable if we want to fill up with duplicate chars .
      
      Fixes: fa2b4e54 ("powerpc/nvram: Improve partition removal")
      Signed-off-by: default avatarPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ac78e19e
    • Paul Mackerras's avatar
      powerpc/64: Fix incorrect return value from __copy_tofrom_user · 5daad2cc
      Paul Mackerras authored
      commit 1a34439e upstream.
      
      Debugging a data corruption issue with virtio-net/vhost-net led to
      the observation that __copy_tofrom_user was occasionally returning
      a value 16 larger than it should.  Since the return value from
      __copy_tofrom_user is the number of bytes not copied, this means
      that __copy_tofrom_user can occasionally return a value larger
      than the number of bytes it was asked to copy.  In turn this can
      cause higher-level copy functions such as copy_page_to_iter_iovec
      to corrupt memory by copying data into the wrong memory locations.
      
      It turns out that the failing case involves a fault on the store
      at label 79, and at that point the first unmodified byte of the
      destination is at R3 + 16.  Consequently the exception handler
      for that store needs to add 16 to R3 before using it to work out
      how many bytes were not copied, but in this one case it was not
      adding the offset to R3.  To fix it, this moves the label 179 to
      the point where we add 16 to R3.  I have checked manually all the
      exception handlers for the loads and stores in this code and the
      rest of them are correct (it would be excellent to have an
      automated test of all the exception cases).
      
      This bug has been present since this code was initially
      committed in May 2002 to Linux version 2.5.20.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5daad2cc
    • Gavin Shan's avatar
      powerpc/powernv: Use CPU-endian PEST in pnv_pci_dump_p7ioc_diag_data() · 6d13a7b0
      Gavin Shan authored
      commit 5adaf862 upstream.
      
      This fixes the warnings reported from sparse:
      
        pci.c:312:33: warning: restricted __be64 degrades to integer
        pci.c:313:33: warning: restricted __be64 degrades to integer
      
      Fixes: cee72d5b ("powerpc/powernv: Display diag data on p7ioc EEH errors")
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6d13a7b0
    • Anton Blanchard's avatar
      powerpc/vdso64: Use double word compare on pointers · 546731aa
      Anton Blanchard authored
      commit 5045ea37 upstream.
      
      __kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to
      check if the passed in pointer is non zero. cmpli maps to a 32 bit
      compare on binutils, so we ignore the top 32 bits.
      
      A simple test case can be created by passing in a bogus pointer with
      the bottom 32 bits clear. Using a clk_id that is handled by the VDSO,
      then one that is handled by the kernel shows the problem:
      
        printf("%d\n", clock_getres(CLOCK_REALTIME, (void *)0x100000000));
        printf("%d\n", clock_getres(CLOCK_BOOTTIME, (void *)0x100000000));
      
      And we get:
      
        0
        -1
      
      The bigger issue is if we pass a valid pointer with the bottom 32 bits
      clear, in this case we will return success but won't write any data
      to the pointer.
      
      I stumbled across this issue because the LLVM integrated assembler
      doesn't accept cmpli with 3 arguments. Fix this by converting them to
      cmpldi.
      
      Fixes: a7f290da ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel")
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      546731aa
    • Paul Mackerras's avatar
      powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET · b0a4c167
      Paul Mackerras authored
      commit f077aaf0 upstream.
      
      In commit c60ac569 ("powerpc: Update kernel VSID range", 2013-03-13)
      we lost a check on the region number (the top four bits of the effective
      address) for addresses below PAGE_OFFSET.  That commit replaced a check
      that the top 18 bits were all zero with a check that bits 46 - 59 were
      zero (performed for all addresses, not just user addresses).
      
      This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
      and we will insert a valid SLB entry for it.  The VSID used will be the
      same as if the top 4 bits were 0, but the page size will be some random
      value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
      array in the paca.  If that page size is the same as would be used for
      region 0, then userspace just has an alias of the region 0 space.  If the
      page size is different, then no HPTE will be found for the access, and
      the process will get a SIGSEGV (since hash_page_mm() will refuse to create
      a HPTE for the bogus address).
      
      The access beyond the end of the mm_ctx_high_slices_psize can be at most
      5.5MB past the array, and so will be in RAM somewhere.  Since the access
      is a load performed in real mode, it won't fault or crash the kernel.
      At most this bug could perhaps leak a little bit of information about
      blocks of 32 bytes of memory located at offsets of i * 512kB past the
      paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.
      
      Fixes: c60ac569 ("powerpc: Update kernel VSID range")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b0a4c167
    • Marcin Nowakowski's avatar
      MIPS: ptrace: Fix regs_return_value for kernel context · 4787d839
      Marcin Nowakowski authored
      commit 74f1077b upstream.
      
      Currently regs_return_value always negates reg[2] if it determines
      the syscall has failed, but when called in kernel context this check is
      invalid and may result in returning a wrong value.
      
      This fixes errors reported by CONFIG_KPROBES_SANITY_TEST
      
      Fixes: d7e7528b ("Audit: push audit success and retcode into arch ptrace.h")
      Signed-off-by: default avatarMarcin Nowakowski <marcin.nowakowski@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/14381/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4787d839
    • Paul Burton's avatar
      MIPS: Malta: Fix IOCU disable switch read for MIPS64 · bc9f83ea
      Paul Burton authored
      commit 305723ab upstream.
      
      Malta boards used with CPU emulators feature a switch to disable use of
      an IOCU. Software has to check this switch & ignore any present IOCU if
      the switch is closed. The read used to do this was unsafe for 64 bit
      kernels, as it simply casted the address 0xbf403000 to a pointer &
      dereferenced it. Whilst in a 32 bit kernel this would access kseg1, in a
      64 bit kernel this attempts to access xuseg & results in an address
      error exception.
      
      Fix by accessing a correctly formed ckseg1 address generated using the
      CKSEG1ADDR macro.
      
      Whilst modifying this code, define the name of the register and the bit
      we care about within it, which indicates whether PCI DMA is routed to
      the IOCU or straight to DRAM. The code previously checked that bit 0 was
      also set, but the least significant 7 bits of the CONFIG_GEN0 register
      contain the value of the MReqInfo signal provided to the IOCU OCP bus,
      so singling out bit 0 makes little sense & that part of the check is
      dropped.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Fixes: b6d92b4a ("MIPS: Add option to disable software I/O coherency.")
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/14187/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bc9f83ea
    • Will Deacon's avatar
      arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP · 83099f39
      Will Deacon authored
      commit 3a402a70 upstream.
      
      When TIF_SINGLESTEP is set for a task, the single-step state machine is
      enabled and we must take care not to reset it to the active-not-pending
      state if it is already in the active-pending state.
      
      Unfortunately, that's exactly what user_enable_single_step does, by
      unconditionally setting the SS bit in the SPSR for the current task.
      This causes failures in the GDB testsuite, where GDB ends up missing
      expected step traps if the instruction being stepped generates another
      trap, e.g. PTRACE_EVENT_FORK from an SVC instruction.
      
      This patch fixes the problem by preserving the current state of the
      stepping state machine when TIF_SINGLESTEP is set on the current thread.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarYao Qi <yao.qi@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      83099f39
    • Will Deacon's avatar
      arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb() · d65df517
      Will Deacon authored
      commit 872c63fb upstream.
      
      smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation
      to a full barrier, such that prior stores are ordered with respect to
      loads and stores occuring inside the critical section.
      
      Unfortunately, the core code defines the barrier as smp_wmb(), which
      is insufficient to provide the required ordering guarantees when used in
      conjunction with our load-acquire-based spinlock implementation.
      
      This patch overrides the arm64 definition of smp_mb__before_spinlock()
      to map to a full smp_mb().
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reported-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d65df517
    • James Hogan's avatar
      arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO · 1fd5c7b6
      James Hogan authored
      commit 3146bc64 upstream.
      
      AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
      NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
      for arm64 at all even though ARCH_DLINFO will contain one NEW_AUX_ENT
      for the VDSO address.
      
      This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
      AT_BASE_PLATFORM which arm64 doesn't use, but lets define it now and add
      the comment above ARCH_DLINFO as found in several other architectures to
      remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
      date.
      
      Fixes: f668cd16 ("arm64: ELF definitions")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1fd5c7b6
    • Mark Rutland's avatar
      arm64: avoid returning from bad_mode · e5471def
      Mark Rutland authored
      commit 7d9e8f71 upstream.
      
      Generally, taking an unexpected exception should be a fatal event, and
      bad_mode is intended to cater for this. However, it should be possible
      to contain unexpected synchronous exceptions from EL0 without bringing
      the kernel down, by sending a SIGILL to the task.
      
      We tried to apply this approach in commit 9955ac47 ("arm64:
      don't kill the kernel on a bad esr from el0"), by sending a signal for
      any bad_mode call resulting from an EL0 exception.
      
      However, this also applies to other unexpected exceptions, such as
      SError and FIQ. The entry paths for these exceptions branch to bad_mode
      without configuring the link register, and have no kernel_exit. Thus, if
      we take one of these exceptions from EL0, bad_mode will eventually
      return to the original user link register value.
      
      This patch fixes this by introducing a new bad_el0_sync handler to cater
      for the recoverable case, and restoring bad_mode to its original state,
      whereby it calls panic() and never returns. The recoverable case
      branches to bad_el0_sync with a bl, and returns to userspace via the
      usual ret_to_user mechanism.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: 9955ac47 ("arm64: don't kill the kernel on a bad esr from el0")
      Reported-by: default avatarMark Salter <msalter@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e5471def
    • Russell King's avatar
      ARM: sa1111: fix pcmcia suspend/resume · eee1bdb5
      Russell King authored
      commit 06dfe5cc upstream.
      
      SA1111 PCMCIA was broken when PCMCIA switched to using dev_pm_ops for
      the PCMCIA socket class.  PCMCIA used to handle suspend/resume via the
      socket hosting device, which happened at normal device suspend/resume
      time.
      
      However, the referenced commit changed this: much of the resume now
      happens much earlier, in the noirq resume handler of dev_pm_ops.
      
      However, on SA1111, the PCMCIA device is not accessible as the SA1111
      has not been resumed at _noirq time.  It's slightly worse than that,
      because the SA1111 has already been put to sleep at _noirq time, so
      suspend doesn't work properly.
      
      Fix this by converting the core SA1111 code to use dev_pm_ops as well,
      and performing its own suspend/resume at noirq time.
      
      This fixes these errors in the kernel log:
      
      pcmcia_socket pcmcia_socket0: time out after reset
      pcmcia_socket pcmcia_socket1: time out after reset
      
      and the resulting lack of PCMCIA cards after a S2RAM cycle.
      
      Fixes: d7646f76 ("pcmcia: use dev_pm_ops for class pcmcia_socket_class")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      eee1bdb5
    • Russell King's avatar
      ARM: sa1100: clear reset status prior to reboot · 5b4918cc
      Russell King authored
      commit da60626e upstream.
      
      Clear the current reset status prior to rebooting the platform.  This
      adds the bit missing from 04fef228 ("[ARM] pxa: introduce
      reset_status and clear_reset_status for driver's usage").
      
      Fixes: 04fef228 ("[ARM] pxa: introduce reset_status and clear_reset_status for driver's usage")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5b4918cc
    • Srinivas Ramana's avatar
      ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7 · 1774ca81
      Srinivas Ramana authored
      commit 117e5e9c upstream.
      
      If the bootloader uses the long descriptor format and jumps to
      kernel decompressor code, TTBCR may not be in a right state.
      Before enabling the MMU, it is required to clear the TTBCR.PD0
      field to use TTBR0 for translation table walks.
      
      The commit dbece458 ("ARM: 7501/1: decompressor:
      reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but
      doesn't consider all the bits for the size of TTBCR.N.
      
      Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to
      indicate the use of TTBR0 and the correct base address width.
      
      Fixes: dbece458 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores")
      Acked-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarSrinivas Ramana <sramana@codeaurora.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1774ca81
    • Robin Murphy's avatar
      ARM: 8616/1: dt: Respect property size when parsing CPUs · 88654a15
      Robin Murphy authored
      commit ba6dea4f upstream.
      
      Whilst MPIDR values themselves are less than 32 bits, it is still
      perfectly valid for a DT to have #address-cells > 1 in the CPUs node,
      resulting in the "reg" property having leading zero cell(s). In that
      situation, the big-endian nature of the data conspires with the current
      behaviour of only reading the first cell to cause the kernel to think
      all CPUs have ID 0, and become resoundingly unhappy as a consequence.
      
      Take the full property length into account when parsing CPUs so as to
      be correct under any circumstances.
      
      Cc: Russell King <linux@armlinux.org.uk>
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      88654a15
    • Baoquan He's avatar
      iommu/amd: Free domain id when free a domain of struct dma_ops_domain · 7a6111b8
      Baoquan He authored
      commit c3db901c upstream.
      
      The current code missed freeing domain id when free a domain of
      struct dma_ops_domain.
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Fixes: ec487d1a ('x86, AMD IOMMU: add domain allocation and deallocation functions')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7a6111b8
    • Joerg Roedel's avatar
      iommu/amd: Update Alias-DTE in update_device_table() · 56eb0df4
      Joerg Roedel authored
      commit 3254de6b upstream.
      
      Not doing so might cause IO-Page-Faults when a device uses
      an alias request-id and the alias-dte is left in a lower
      page-mode which does not cover the address allocated from
      the iova-allocator.
      
      Fixes: 492667da ('x86/amd-iommu: Remove amd_iommu_pd_table')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      56eb0df4
    • Michael S. Tsirkin's avatar
      x86/um: reuse asm-generic/barrier.h · 69c373d8
      Michael S. Tsirkin authored
      commit 577f183a upstream.
      
      On x86/um CONFIG_SMP is never defined.  As a result, several macros
      match the asm-generic variant exactly. Drop the local definitions and
      pull in asm-generic/barrier.h instead.
      
      This is in preparation to refactoring this code area.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarRichard Weinberger <richard@nod.at>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      69c373d8
    • H.J. Lu's avatar
      x86/build: Build compressed x86 kernels as PIE · 186c5f34
      H.J. Lu authored
      commit 6d92bc9d upstream.
      
      The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
      relocation to get the symbol address in PIC.  When the compressed x86
      kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations
      to their fixed symbol addresses.  However, when the compressed x86
      kernel is loaded at a different address, it leads to the following
      load failure:
      
        Failed to allocate space for phdrs
      
      during the decompression stage.
      
      If the compressed x86 kernel is relocatable at run-time, it should be
      compiled with -fPIE, instead of -fPIC, if possible and should be built as
      Position Independent Executable (PIE) so that linker won't optimize
      R_386_GOT32X relocation to its fixed symbol address.
      
      Older linkers generate R_386_32 relocations against locally defined
      symbols, _bss, _ebss, _got and _egot, in PIE.  It isn't wrong, just less
      optimal than R_386_RELATIVE.  But the x86 kernel fails to properly handle
      R_386_32 relocations when relocating the kernel.  To generate
      R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
      hidden in both 32-bit and 64-bit x86 kernels.
      
      To build a 64-bit compressed x86 kernel as PIE, we need to disable the
      relocation overflow check to avoid relocation overflow errors. We do
      this with a new linker command-line option, -z noreloc-overflow, which
      got added recently:
      
       commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7
       Author: H.J. Lu <hjl.tools@gmail.com>
       Date:   Tue Mar 15 11:07:06 2016 -0700
      
          Add -z noreloc-overflow option to x86-64 ld
      
          Add -z noreloc-overflow command-line option to the x86-64 ELF linker to
          disable relocation overflow check.  This can be used to avoid relocation
          overflow check if there will be no dynamic relocation overflow at
          run-time.
      
      The 64-bit compressed x86 kernel is built as PIE only if the linker supports
      -z noreloc-overflow.  So far 64-bit relocatable compressed x86 kernel
      boots fine even when it is built as a normal executable.
      Signed-off-by: default avatarH.J. Lu <hjl.tools@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      [ Edited the changelog and comments. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      186c5f34
    • Steven Rostedt's avatar
      x86/paravirt: Do not trace _paravirt_ident_*() functions · 6523fa8c
      Steven Rostedt authored
      commit 15301a57 upstream.
      
      Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
      after enabling function tracer. I asked him to bisect the functions within
      available_filter_functions, which he did and it came down to three:
      
        _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()
      
      It was found that this is only an issue when noreplace-paravirt is added
      to the kernel command line.
      
      This means that those functions are most likely called within critical
      sections of the funtion tracer, and must not be traced.
      
      In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
      longer an issue.  But both _paravirt_ident_{32,64}() causes the
      following splat when they are traced:
      
       mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
       mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
       mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
       mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
       NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
       Modules linked in: e1000e
       CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
       RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
       RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
       RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
       RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
       R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
       R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
       FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
       Call Trace:
         _raw_spin_lock+0x27/0x30
         handle_pte_fault+0x13db/0x16b0
         handle_mm_fault+0x312/0x670
         __do_page_fault+0x1b1/0x4e0
         do_page_fault+0x22/0x30
         page_fault+0x28/0x30
         __vfs_read+0x28/0xe0
         vfs_read+0x86/0x130
         SyS_read+0x46/0xa0
         entry_SYSCALL_64_fastpath+0x1e/0xa8
       Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b
      Reported-by: default avatarŁukasz Daniluk <lukasz.daniluk@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6523fa8c
    • Jiri Kosina's avatar
      x86/mm/pat, /dev/mem: Remove superfluous error message · 1eae225e
      Jiri Kosina authored
      commit 39380b80 upstream.
      
      Currently it's possible for broken (or malicious) userspace to flood a
      kernel log indefinitely with messages a-la
      
      	Program dmidecode tried to access /dev/mem between f0000->100000
      
      because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on
      dumps this information each and every time devmem_is_allowed() fails.
      
      Reportedly userspace that is able to trigger contignuous flow of these
      messages exists.
      
      It would be possible to rate limit this message, but that'd have a
      questionable value; the administrator wouldn't get information about all
      the failing accessess, so then the information would be both superfluous
      and incomplete at the same time :)
      
      Returning EPERM (which is what is actually happening) is enough indication
      for userspace what has happened; no need to log this particular error as
      some sort of special condition.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pmSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1eae225e
    • Wanpeng Li's avatar
      x86/apic: Do not init irq remapping if ioapic is disabled · 928a2775
      Wanpeng Li authored
      commit 2e63ad4b upstream.
      
      native_smp_prepare_cpus
        -> default_setup_apic_routing
          -> enable_IR_x2apic
            -> irq_remapping_prepare
              -> intel_prepare_irq_remapping
                -> intel_setup_irq_remapping
      
      So IR table is setup even if "noapic" boot parameter is added. As a result we
      crash later when the interrupt affinity is set due to a half initialized
      remapping infrastructure.
      
      Prevent remap initialization when IOAPIC is disabled.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      928a2775
    • Sebastian Andrzej Siewior's avatar
      x86/mm: Disable preemption during CR3 read+write · b591901e
      Sebastian Andrzej Siewior authored
      commit 5cf0791d upstream.
      
      There's a subtle preemption race on UP kernels:
      
      Usually current->mm (and therefore mm->pgd) stays the same during the
      lifetime of a task so it does not matter if a task gets preempted during
      the read and write of the CR3.
      
      But then, there is this scenario on x86-UP:
      
      TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:
      
       -> mmput()
       -> exit_mmap()
       -> tlb_finish_mmu()
       -> tlb_flush_mmu()
       -> tlb_flush_mmu_tlbonly()
       -> tlb_flush()
       -> flush_tlb_mm_range()
       -> __flush_tlb_up()
       -> __flush_tlb()
       ->  __native_flush_tlb()
      
      At this point current->mm is NULL but current->active_mm still points to
      the "old" mm.
      
      Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
      own mm so CR3 has changed.
      
      Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
      mm and so CR3 remains unchanged. Once taskA gets active it continues
      where it was interrupted and that means it writes its old CR3 value
      back. Everything is fine because userland won't need its memory
      anymore.
      
      Now the fun part:
      
      Let's preempt taskA one more time and get back to taskB. This
      time switch_mm() won't do a thing because oldmm (->active_mm)
      is the same as mm (as per context_switch()). So we remain
      with a bad CR3 / PGD and return to userland.
      
      The next thing that happens is handle_mm_fault() with an address for
      the execution of its code in userland. handle_mm_fault() realizes that
      it has a PTE with proper rights so it returns doing nothing. But the
      CPU looks at the wrong PGD and insists that something is wrong and
      faults again. And again. And one more time…
      
      This pagefault circle continues until the scheduler gets tired of it and
      puts another task on the CPU. It gets little difficult if the task is a
      RT task with a high priority. The system will either freeze or it gets
      fixed by the software watchdog thread which usually runs at RT-max prio.
      But waiting for the watchdog will increase the latency of the RT task
      which is no good.
      
      Fix this by disabling preemption across the critical code section.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
      [ Prettified the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b591901e
    • Andy Lutomirski's avatar
      x86/traps: Ignore high word of regs->cs in early_idt_handler_common · 213060d9
      Andy Lutomirski authored
      This is a backport of:
      commit fc0e81b2 upstream
      
      On the 80486 DX, it seems that some exceptions may leave garbage in
      the high bits of CS.  This causes sporadic failures in which
      early_fixup_exception() refuses to fix up an exception.
      
      As far as I can tell, this has been buggy for a long time, but the
      problem seems to have been exacerbated by commits:
      
        1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
        e1bfc11c ("x86/init: Fix cr4_init_shadow() on CR4-less machines")
      
      This appears to have broken for as long as we've had early
      exception handling.
      
      [ This backport should apply to kernels from 3.4 - 4.5. ]
      
      Fixes: 4c5023a3 ("x86-32: Handle exception table entries during early boot")
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMatthew Whitehead <tedheadster@gmail.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      213060d9
    • Juergen Gross's avatar
      x86/xen: fix upper bound of pmd loop in xen_cleanhighmap() · 506750b7
      Juergen Gross authored
      commit 1cf38741 upstream.
      
      xen_cleanhighmap() is operating on level2_kernel_pgt only. The upper
      bound of the loop setting non-kernel-image entries to zero should not
      exceed the size of level2_kernel_pgt.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      506750b7
    • Ben Hutchings's avatar
      xen-pciback: Add name prefix to global 'permissive' variable · f014225a
      Ben Hutchings authored
      commit 8014bcc8 upstream.
      
      The variable for the 'permissive' module parameter used to be static
      but was recently changed to be extern.  This puts it in the kernel
      global namespace if the driver is built-in, so its name should begin
      with a prefix identifying the driver.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: af6fc858 ("xen-pciback: limit guest control of command register")
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f014225a
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. · 1d9d642c
      Konrad Rzeszutek Wilk authored
      commit 408fb0e5 upstream.
      
      commit f598282f ("PCI: Fix the NIU MSI-X problem in a better way")
      teaches us that dealing with MSI-X can be troublesome.
      
      Further checks in the MSI-X architecture shows that if the
      PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
      may not be able to access the BAR (since they are memory regions).
      
      Since the MSI-X tables are located in there.. that can lead
      to us causing PCIe errors. Inhibit us performing any
      operation on the MSI-X unless the MEMORY bit is set.
      
      Note that Xen hypervisor with:
      "x86/MSI-X: access MSI-X table only after having enabled MSI-X"
      will return:
      xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!
      
      When the generic MSI code tries to setup the PIRQ without
      MEMORY bit set. Which means with later versions of Xen
      (4.6) this patch is not neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1d9d642c
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled. · 4c2e2fd2
      Konrad Rzeszutek Wilk authored
      commit 7cfb905b upstream.
      
      Otherwise just continue on, returning the same values as
      previously (return of 0, and op->result has the PIRQ value).
      
      This does not change the behavior of XEN_PCI_OP_disable_msi[|x].
      
      The pci_disable_msi or pci_disable_msix have the checks for
      msi_enabled or msix_enabled so they will error out immediately.
      
      However the guest can still call these operations and cause
      us to disable the 'ack_intr'. That means the backend IRQ handler
      for the legacy interrupt will not respond to interrupts anymore.
      
      This will lead to (if the device is causing an interrupt storm)
      for the Linux generic code to disable the interrupt line.
      
      Naturally this will only happen if the device in question
      is plugged in on the motherboard on shared level interrupt GSI.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4c2e2fd2
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Do not install an IRQ handler for MSI interrupts. · 5701f8a1
      Konrad Rzeszutek Wilk authored
      commit a396f3a2 upstream.
      
      Otherwise an guest can subvert the generic MSI code to trigger
      an BUG_ON condition during MSI interrupt freeing:
      
       for (i = 0; i < entry->nvec_used; i++)
              BUG_ON(irq_has_action(entry->irq + i));
      
      Xen PCI backed installs an IRQ handler (request_irq) for
      the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
      (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
      done in case the device has legacy interrupts the GSI line
      is shared by the backend devices.
      
      To subvert the backend the guest needs to make the backend
      to change the dev->irq from the GSI to the MSI interrupt line,
      make the backend allocate an interrupt handler, and then command
      the backend to free the MSI interrupt and hit the BUG_ON.
      
      Since the backend only calls 'request_irq' when the guest
      writes to the PCI_COMMAND register the guest needs to call
      XEN_PCI_OP_enable_msi before any other operation. This will
      cause the generic MSI code to setup an MSI entry and
      populate dev->irq with the new PIRQ value.
      
      Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
      and cause the backend to setup an IRQ handler for dev->irq
      (which instead of the GSI value has the MSI pirq). See
      'xen_pcibk_control_isr'.
      
      Then the guest disables the MSI: XEN_PCI_OP_disable_msi
      which ends up triggering the BUG_ON condition in 'free_msi_irqs'
      as there is an IRQ handler for the entry->irq (dev->irq).
      
      Note that this cannot be done using MSI-X as the generic
      code does not over-write dev->irq with the MSI-X PIRQ values.
      
      The patch inhibits setting up the IRQ handler if MSI or
      MSI-X (for symmetry reasons) code had been called successfully.
      
      P.S.
      Xen PCIBack when it sets up the device for the guest consumption
      ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
      XSA-120 addendum patch removed that - however when upstreaming said
      addendum we found that it caused issues with qemu upstream. That
      has now been fixed in qemu upstream.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5701f8a1
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled · d2dfdbab
      Konrad Rzeszutek Wilk authored
      commit 5e0ce145 upstream.
      
      The guest sequence of:
      
        a) XEN_PCI_OP_enable_msix
        b) XEN_PCI_OP_enable_msix
      
      results in hitting an NULL pointer due to using freed pointers.
      
      The device passed in the guest MUST have MSI-X capability.
      
      The a) constructs and SysFS representation of MSI and MSI groups.
      The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
      'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
      in a) pdev->msi_irq_groups is still set) and also free's ALL of the
      MSI-X entries of the device (the ones allocated in step a) and b)).
      
      The unwind code: 'free_msi_irqs' deletes all the entries and tries to
      delete the pdev->msi_irq_groups (which hasn't been set to NULL).
      However the pointers in the SysFS are already freed and we hit an
      NULL pointer further on when 'strlen' is attempted on a freed pointer.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
      against that. The check for msi_enabled is not stricly neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d2dfdbab
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled · 871e37be
      Konrad Rzeszutek Wilk authored
      commit 56441f3c upstream.
      
      The guest sequence of:
      
       a) XEN_PCI_OP_enable_msi
       b) XEN_PCI_OP_enable_msi
       c) XEN_PCI_OP_disable_msi
      
      results in hitting an BUG_ON condition in the msi.c code.
      
      The MSI code uses an dev->msi_list to which it adds MSI entries.
      Under the above conditions an BUG_ON() can be hit. The device
      passed in the guest MUST have MSI capability.
      
      The a) adds the entry to the dev->msi_list and sets msi_enabled.
      The b) adds a second entry but adding in to SysFS fails (duplicate entry)
      and deletes all of the entries from msi_list and returns (with msi_enabled
      is still set).  c) pci_disable_msi passes the msi_enabled checks and hits:
      
      BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));
      
      and blows up.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
      against that. The check for msix_enabled is not stricly neccessary.
      
      This is part of XSA-157.
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      871e37be
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Save the number of MSI-X entries to be copied later. · 28018142
      Konrad Rzeszutek Wilk authored
      commit d159457b upstream.
      
      Commit 8135cf8b (xen/pciback: Save
      xen_pci_op commands before processing it) broke enabling MSI-X because
      it would never copy the resulting vectors into the response.  The
      number of vectors requested was being overwritten by the return value
      (typically zero for success).
      
      Save the number of vectors before processing the op, so the correct
      number of vectors are copied afterwards.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      28018142