1. 21 Jul, 2017 40 commits
    • Eric W. Biederman's avatar
      mnt: Make propagate_umount less slow for overlapping mount propagation trees · 32582402
      Eric W. Biederman authored
      commit 296990de upstream.
      
      Andrei Vagin pointed out that time to executue propagate_umount can go
      non-linear (and take a ludicrious amount of time) when the mount
      propogation trees of the mounts to be unmunted by a lazy unmount
      overlap.
      
      Make the walk of the mount propagation trees nearly linear by
      remembering which mounts have already been visited, allowing
      subsequent walks to detect when walking a mount propgation tree or a
      subtree of a mount propgation tree would be duplicate work and to skip
      them entirely.
      
      Walk the list of mounts whose propgatation trees need to be traversed
      from the mount highest in the mount tree to mounts lower in the mount
      tree so that odds are higher that the code will walk the largest trees
      first, allowing later tree walks to be skipped entirely.
      
      Add cleanup_umount_visitation to remover the code's memory of which
      mounts have been visited.
      
      Add the functions last_slave and skip_propagation_subtree to allow
      skipping appropriate parts of the mount propagation tree without
      needing to change the logic of the rest of the code.
      
      A script to generate overlapping mount propagation trees:
      
      $ cat runs.h
      set -e
      mount -t tmpfs zdtm /mnt
      mkdir -p /mnt/1 /mnt/2
      mount -t tmpfs zdtm /mnt/1
      mount --make-shared /mnt/1
      mkdir /mnt/1/1
      
      iteration=10
      if [ -n "$1" ] ; then
      	iteration=$1
      fi
      
      for i in $(seq $iteration); do
      	mount --bind /mnt/1/1 /mnt/1/1
      done
      
      mount --rbind /mnt/1 /mnt/2
      
      TIMEFORMAT='%Rs'
      nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 ))
      echo -n "umount -l /mnt/1 -> $nr        "
      time umount -l /mnt/1
      
      nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l )
      time umount -l /mnt/2
      
      $ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done
      
      Here are the performance numbers with and without the patch:
      
           mhash |  8192   |  8192  | 1048576 | 1048576
          mounts | before  | after  |  before | after
          ------------------------------------------------
            1025 |  0.040s | 0.016s |  0.038s | 0.019s
            2049 |  0.094s | 0.017s |  0.080s | 0.018s
            4097 |  0.243s | 0.019s |  0.206s | 0.023s
            8193 |  1.202s | 0.028s |  1.562s | 0.032s
           16385 |  9.635s | 0.036s |  9.952s | 0.041s
           32769 | 60.928s | 0.063s | 44.321s | 0.064s
           65537 |         | 0.097s |         | 0.097s
          131073 |         | 0.233s |         | 0.176s
          262145 |         | 0.653s |         | 0.344s
          524289 |         | 2.305s |         | 0.735s
         1048577 |         | 7.107s |         | 2.603s
      
      Andrei Vagin reports fixing the performance problem is part of the
      work to fix CVE-2016-6213.
      
      Fixes: a05964f3 ("[PATCH] shared mounts handling: umount")
      Reported-by: default avatarAndrei Vagin <avagin@openvz.org>
      Reviewed-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32582402
    • Eric W. Biederman's avatar
      mnt: In propgate_umount handle visiting mounts in any order · 81311d16
      Eric W. Biederman authored
      commit 99b19d16 upstream.
      
      While investigating some poor umount performance I realized that in
      the case of overlapping mount trees where some of the mounts are locked
      the code has been failing to unmount all of the mounts it should
      have been unmounting.
      
      This failure to unmount all of the necessary
      mounts can be reproduced with:
      
      $ cat locked_mounts_test.sh
      
      mount -t tmpfs test-base /mnt
      mount --make-shared /mnt
      mkdir -p /mnt/b
      
      mount -t tmpfs test1 /mnt/b
      mount --make-shared /mnt/b
      mkdir -p /mnt/b/10
      
      mount -t tmpfs test2 /mnt/b/10
      mount --make-shared /mnt/b/10
      mkdir -p /mnt/b/10/20
      
      mount --rbind /mnt/b /mnt/b/10/20
      
      unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi'
      sleep 1
      umount -l /mnt/b
      wait %%
      
      $ unshare -Urm ./locked_mounts_test.sh
      
      This failure is corrected by removing the prepass that marks mounts
      that may be umounted.
      
      A first pass is added that umounts mounts if possible and if not sets
      mount mark if they could be unmounted if they weren't locked and adds
      them to a list to umount possibilities.  This first pass reconsiders
      the mounts parent if it is on the list of umount possibilities, ensuring
      that information of umoutability will pass from child to mount parent.
      
      A second pass then walks through all mounts that are umounted and processes
      their children unmounting them or marking them for reparenting.
      
      A last pass cleans up the state on the mounts that could not be umounted
      and if applicable reparents them to their first parent that remained
      mounted.
      
      While a bit longer than the old code this code is much more robust
      as it allows information to flow up from the leaves and down
      from the trunk making the order in which mounts are encountered
      in the umount propgation tree irrelevant.
      
      Fixes: 0c56fe31 ("mnt: Don't propagate unmounts to locked mounts")
      Reviewed-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81311d16
    • Eric W. Biederman's avatar
      mnt: In umount propagation reparent in a separate pass · 26502fb8
      Eric W. Biederman authored
      commit 570487d3 upstream.
      
      It was observed that in some pathlogical cases that the current code
      does not unmount everything it should.  After investigation it
      was determined that the issue is that mnt_change_mntpoint can
      can change which mounts are available to be unmounted during mount
      propagation which is wrong.
      
      The trivial reproducer is:
      $ cat ./pathological.sh
      
      mount -t tmpfs test-base /mnt
      cd /mnt
      mkdir 1 2 1/1
      mount --bind 1 1
      mount --make-shared 1
      mount --bind 1 2
      mount --bind 1/1 1/1
      mount --bind 1/1 1/1
      echo
      grep test-base /proc/self/mountinfo
      umount 1/1
      echo
      grep test-base /proc/self/mountinfo
      
      $ unshare -Urm ./pathological.sh
      
      The expected output looks like:
      46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
      47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      
      46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
      47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      
      The output without the fix looks like:
      46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
      47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      
      46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
      47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
      
      That last mount in the output was in the propgation tree to be unmounted but
      was missed because the mnt_change_mountpoint changed it's parent before the walk
      through the mount propagation tree observed it.
      
      Fixes: 1064f874 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.")
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26502fb8
    • Michael Kelley's avatar
      Drivers: hv: vmbus: Close timing hole that can corrupt per-cpu page · 2b71d293
      Michael Kelley authored
      commit 13b9abfc upstream.
      
      Extend the disabling of preemption to include the hypercall so that
      another thread can't get the CPU and corrupt the per-cpu page used
      for hypercall arguments.
      Signed-off-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b71d293
    • Johan Hovold's avatar
      nvmem: core: fix leaks on registration errors · 6e559d09
      Johan Hovold authored
      commit 3360acdf upstream.
      
      Make sure to deregister and release the nvmem device and underlying
      memory on registration errors.
      
      Note that the private data must be freed using put_device() once the
      struct device has been initialised.
      
      Also note that there's a related reference leak in the deregistration
      function as reported by Mika Westerberg which is being fixed separately.
      
      Fixes: b6c217ab ("nvmem: Add backwards compatibility support for older EEPROM drivers.")
      Fixes: eace75cf ("nvmem: Add a simple NVMEM framework for nvmem providers")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Acked-by: default avatarAndrey Smirnov <andrew.smirnov@gmail.com>
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e559d09
    • Paul E. McKenney's avatar
      rcu: Add memory barriers for NOCB leader wakeup · c5e9bfe6
      Paul E. McKenney authored
      commit 6b5fc3a1 upstream.
      
      Wait/wakeup operations do not guarantee ordering on their own.  Instead,
      either locking or memory barriers are required.  This commit therefore
      adds memory barriers to wake_nocb_leader() and nocb_leader_wait().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5e9bfe6
    • Adam Borowski's avatar
      vt: fix unchecked __put_user() in tioclinux ioctls · 101cbdea
      Adam Borowski authored
      commit 6987dc8a upstream.
      
      Only read access is checked before this call.
      
      Actually, at the moment this is not an issue, as every in-tree arch does
      the same manual checks for VERIFY_READ vs VERIFY_WRITE, relying on the MMU
      to tell them apart, but this wasn't the case in the past and may happen
      again on some odd arch in the future.
      
      If anyone cares about 3.7 and earlier, this is a security hole (untested)
      on real 80386 CPUs.
      Signed-off-by: default avatarAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      101cbdea
    • Dong Bo's avatar
      arm64: Preventing READ_IMPLIES_EXEC propagation · e6eed40f
      Dong Bo authored
      commit 48f99c8e upstream.
      
      Like arch/arm/, we inherit the READ_IMPLIES_EXEC personality flag across
      fork(). This is undesirable for a number of reasons:
      
        * ELF files that don't require executable stack can end up with it
          anyway
      
        * We end up performing un-necessary I-cache maintenance when mapping
          what should be non-executable pages
      
        * Restricting what is executable is generally desirable when defending
          against overflow attacks
      
      This patch clears the personality flag when setting up the personality for
      newly spwaned native tasks. Given that semi-recent AArch64 toolchains emit
      a non-executable PT_GNU_STACK header, userspace applications can already
      not rely on READ_IMPLIES_EXEC so shouldn't be adversely affected by this
      change.
      Reported-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: default avatarDong Bo <dongbo4@huawei.com>
      [will: added comment to compat code, rewrote commit message]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6eed40f
    • Marc Zyngier's avatar
      ARM64: dts: marvell: armada37xx: Fix timer interrupt specifiers · dc1974ee
      Marc Zyngier authored
      commit 88cda007 upstream.
      
      Contrary to popular belief, PPIs connected to a GICv3 to not have
      an affinity field similar to that of GICv2. That is consistent
      with the fact that GICv3 is designed to accomodate thousands of
      CPUs, and fitting them as a bitmap in a byte is... difficult.
      
      Fixes: adbc3695 ("arm64: dts: add the Marvell Armada 3700 family and a development board")
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc1974ee
    • Balbir Singh's avatar
      powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR · 4f43bc71
      Balbir Singh authored
      commit 1e2a516e upstream.
      
      This patch fixes a crash seen while doing a kexec from radix mode to
      hash mode. Key 0 is special in hash and used in the RPN by default, we
      set the key values to 0 today. In radix mode key 0 is used to control
      supervisor<->user access. In hash key 0 is used by default, so the
      first instruction after the switch causes a crash on kexec.
      
      Commit 3b10d009 ("powerpc/mm/radix: Prevent kernel execution of
      user space") introduced the setting of IAMR and AMOR values to prevent
      execution of user mode instructions from supervisor mode. We need to
      clean up these SPR's on kexec.
      
      Fixes: 3b10d009 ("powerpc/mm/radix: Prevent kernel execution of user space")
      Reported-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f43bc71
    • Kees Cook's avatar
      exec: Limit arg stack to at most 75% of _STK_LIM · c1152f16
      Kees Cook authored
      commit da029c11 upstream.
      
      To avoid pathological stack usage or the need to special-case setuid
      execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1152f16
    • Kees Cook's avatar
      s390: reduce ELF_ET_DYN_BASE · a2ea4661
      Kees Cook authored
      commit a73dc537 upstream.
      
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
      address space for 32-bit pointers.  On 32-bit use 4MB, which is the
      traditional x86 minimum load location, likely to avoid historically
      requiring a 4MB page table entry when only a portion of the first 4MB
      would be used (since the NULL address is avoided).  For s390 the
      position could be 0x10000, but that is needlessly close to the NULL
      address.
      
      Link: http://lkml.kernel.org/r/1498154792-49952-5-git-send-email-keescook@chromium.orgSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2ea4661
    • Kees Cook's avatar
      powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB · 2aac396b
      Kees Cook authored
      commit 47ebb09d upstream.
      
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
      address space for 32-bit pointers.  On 32-bit use 4MB, which is the
      traditional x86 minimum load location, likely to avoid historically
      requiring a 4MB page table entry when only a portion of the first 4MB
      would be used (since the NULL address is avoided).
      
      Link: http://lkml.kernel.org/r/1498154792-49952-4-git-send-email-keescook@chromium.orgSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2aac396b
    • Kees Cook's avatar
      arm64: move ELF_ET_DYN_BASE to 4GB / 4MB · 465104c1
      Kees Cook authored
      commit 02445990 upstream.
      
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
      address space for 32-bit pointers.  On 32-bit use 4MB, to match ARM.
      This could be 0x8000, the standard ET_EXEC load address, but that is
      needlessly close to the NULL address, and anyone running arm compat PIE
      will have an MMU, so the tight mapping is not needed.
      
      Link: http://lkml.kernel.org/r/1498251600-132458-4-git-send-email-keescook@chromium.orgSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      465104c1
    • Kees Cook's avatar
      arm: move ELF_ET_DYN_BASE to 4MB · 83f266e8
      Kees Cook authored
      commit 6a9af90a upstream.
      
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      4MB is chosen here mainly to have parity with x86, where this is the
      traditional minimum load location, likely to avoid historically
      requiring a 4MB page table entry when only a portion of the first 4MB
      would be used (since the NULL address is avoided).
      
      For ARM the position could be 0x8000, the standard ET_EXEC load address,
      but that is needlessly close to the NULL address, and anyone running PIE
      on 32-bit ARM will have an MMU, so the tight mapping is not needed.
      
      Link: http://lkml.kernel.org/r/1498154792-49952-2-git-send-email-keescook@chromium.orgSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83f266e8
    • Kees Cook's avatar
      binfmt_elf: use ELF_ET_DYN_BASE only for PIE · 0c9fd20c
      Kees Cook authored
      commit eab09532 upstream.
      
      The ELF_ET_DYN_BASE position was originally intended to keep loaders
      away from ET_EXEC binaries.  (For example, running "/lib/ld-linux.so.2
      /bin/cat" might cause the subsequent load of /bin/cat into where the
      loader had been loaded.)
      
      With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
      ELF_ET_DYN_BASE continued to be used since the kernel was only looking
      at ET_DYN.  However, since ELF_ET_DYN_BASE is traditionally set at the
      top 1/3rd of the TASK_SIZE, a substantial portion of the address space
      is unused.
      
      For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
      loaded above the mmap region.  This means they can be made to collide
      (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
      pathological stack regions.
      
      Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
      region in all cases, and will now additionally avoid programs falling
      back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
      if it would have collided with the stack, now it will fail to load
      instead of falling back to the mmap region).
      
      To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
      are loaded into the mmap region, leaving space available for either an
      ET_EXEC binary with a fixed location or PIE being loaded into mmap by
      the loader.  Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
      which means architectures can now safely lower their values without risk
      of loaders colliding with their subsequently loaded programs.
      
      For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
      the entire 32-bit address space for 32-bit pointers.
      
      Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
      suggestions on how to implement this solution.
      
      Fixes: d1fd836d ("mm: split ET_DYN ASLR from mmap ASLR")
      Link: http://lkml.kernel.org/r/20170621173201.GA114489@beastSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c9fd20c
    • Cyril Bur's avatar
      checkpatch: silence perl 5.26.0 unescaped left brace warnings · 93b3ced9
      Cyril Bur authored
      commit 8d81ae05 upstream.
      
      As of perl 5, version 26, subversion 0 (v5.26.0) some new warnings have
      occurred when running checkpatch.
      
      Unescaped left brace in regex is deprecated here (and will be fatal in
      Perl 5.30), passed through in regex; marked by <-- HERE in m/^(.\s*){
      <-- HERE \s*/ at scripts/checkpatch.pl line 3544.
      
      Unescaped left brace in regex is deprecated here (and will be fatal in
      Perl 5.30), passed through in regex; marked by <-- HERE in m/^(.\s*){
      <-- HERE \s*/ at scripts/checkpatch.pl line 3885.
      
      Unescaped left brace in regex is deprecated here (and will be fatal in
      Perl 5.30), passed through in regex; marked by <-- HERE in
      m/^(\+.*(?:do|\))){ <-- HERE / at scripts/checkpatch.pl line 4374.
      
      It seems perfectly reasonable to do as the warning suggests and simply
      escape the left brace in these three locations.
      
      Link: http://lkml.kernel.org/r/20170607060135.17384-1-cyrilbur@gmail.comSigned-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Acked-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93b3ced9
    • Sahitya Tummala's avatar
      fs/dcache.c: fix spin lockup issue on nlru->lock · 2ac2a118
      Sahitya Tummala authored
      commit b17c070f upstream.
      
      __list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer
      duration if there are more number of items in the lru list.  As per the
      current code, it can hold the spin lock for upto maximum UINT_MAX
      entries at a time.  So if there are more number of items in the lru
      list, then "BUG: spinlock lockup suspected" is observed in the below
      path:
      
        spin_bug+0x90
        do_raw_spin_lock+0xfc
        _raw_spin_lock+0x28
        list_lru_add+0x28
        dput+0x1c8
        path_put+0x20
        terminate_walk+0x3c
        path_lookupat+0x100
        filename_lookup+0x6c
        user_path_at_empty+0x54
        SyS_faccessat+0xd0
        el0_svc_naked+0x24
      
      This nlru->lock is acquired by another CPU in this path -
      
        d_lru_shrink_move+0x34
        dentry_lru_isolate_shrink+0x48
        __list_lru_walk_one.isra.10+0x94
        list_lru_walk_node+0x40
        shrink_dcache_sb+0x60
        do_remount_sb+0xbc
        do_emergency_remount+0xb0
        process_one_work+0x228
        worker_thread+0x2e0
        kthread+0xf4
        ret_from_fork+0x10
      
      Fix this lockup by reducing the number of entries to be shrinked from
      the lru list to 1024 at once.  Also, add cond_resched() before
      processing the lru list again.
      
      Link: http://marc.info/?t=149722864900001&r=1&w=2
      Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.orgSigned-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Suggested-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Acked-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Alexander Polakov <apolyakov@beget.ru>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ac2a118
    • Sahitya Tummala's avatar
      mm/list_lru.c: fix list_lru_count_node() to be race free · 9fa881d9
      Sahitya Tummala authored
      commit 2c80cd57 upstream.
      
      list_lru_count_node() iterates over all memcgs to get the total number of
      entries on the node but it can race with memcg_drain_all_list_lrus(),
      which migrates the entries from a dead cgroup to another.  This can return
      incorrect number of entries from list_lru_count_node().
      
      Fix this by keeping track of entries per node and simply return it in
      list_lru_count_node().
      
      Link: http://lkml.kernel.org/r/1498707555-30525-1-git-send-email-stummala@codeaurora.orgSigned-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Acked-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Alexander Polakov <apolyakov@beget.ru>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fa881d9
    • Marcin Nowakowski's avatar
      kernel/extable.c: mark core_kernel_text notrace · e1a6709a
      Marcin Nowakowski authored
      commit c0d80dda upstream.
      
      core_kernel_text is used by MIPS in its function graph trace processing,
      so having this method traced leads to an infinite set of recursive calls
      such as:
      
        Call Trace:
           ftrace_return_to_handler+0x50/0x128
           core_kernel_text+0x10/0x1b8
           prepare_ftrace_return+0x6c/0x114
           ftrace_graph_caller+0x20/0x44
           return_to_handler+0x10/0x30
           return_to_handler+0x0/0x30
           return_to_handler+0x0/0x30
           ftrace_ops_no_ops+0x114/0x1bc
           core_kernel_text+0x10/0x1b8
           core_kernel_text+0x10/0x1b8
           core_kernel_text+0x10/0x1b8
           ftrace_ops_no_ops+0x114/0x1bc
           core_kernel_text+0x10/0x1b8
           prepare_ftrace_return+0x6c/0x114
           ftrace_graph_caller+0x20/0x44
           (...)
      
      Mark the function notrace to avoid it being traced.
      
      Link: http://lkml.kernel.org/r/1498028607-6765-1-git-send-email-marcin.nowakowski@imgtec.comSigned-off-by: default avatarMarcin Nowakowski <marcin.nowakowski@imgtec.com>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Meyer <thomas@m3y3r.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1a6709a
    • Kirill A. Shutemov's avatar
      thp, mm: fix crash due race in MADV_FREE handling · 8a6fcf5f
      Kirill A. Shutemov authored
      commit bbf29ffc upstream.
      
      Reinette reported the following crash:
      
        BUG: Bad page state in process log2exe  pfn:57600
        page:ffffea00015d8000 count:0 mapcount:0 mapping:          (null) index:0x20200
        flags: 0x4000000000040019(locked|uptodate|dirty|swapbacked)
        raw: 4000000000040019 0000000000000000 0000000000020200 00000000ffffffff
        raw: ffffea00015d8020 ffffea00015d8020 0000000000000000 0000000000000000
        page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
        bad because of flags: 0x1(locked)
        Modules linked in: rfcomm 8021q bnep intel_rapl x86_pkg_temp_thermal coretemp efivars btusb btrtl btbcm pwm_lpss_pci snd_hda_codec_hdmi btintel pwm_lpss snd_hda_codec_realtek snd_soc_skl snd_hda_codec_generic snd_soc_skl_ipc spi_pxa2xx_platform snd_soc_sst_ipc snd_soc_sst_dsp i2c_designware_platform i2c_designware_core snd_hda_ext_core snd_soc_sst_match snd_hda_intel snd_hda_codec mei_me snd_hda_core mei snd_soc_rt286 snd_soc_rl6347a snd_soc_core efivarfs
        CPU: 1 PID: 354 Comm: log2exe Not tainted 4.12.0-rc7-test-test #19
        Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0027.2016.1108.1529 11/08/2016
        Call Trace:
         bad_page+0x16a/0x1f0
         free_pages_check_bad+0x117/0x190
         free_hot_cold_page+0x7b1/0xad0
         __put_page+0x70/0xa0
         madvise_free_huge_pmd+0x627/0x7b0
         madvise_free_pte_range+0x6f8/0x1150
         __walk_page_range+0x6b5/0xe30
         walk_page_range+0x13b/0x310
         madvise_free_page_range.isra.16+0xad/0xd0
         madvise_free_single_vma+0x2e4/0x470
         SyS_madvise+0x8ce/0x1450
      
      If somebody frees the page under us and we hold the last reference to
      it, put_page() would attempt to free the page before unlocking it.
      
      The fix is trivial reorder of operations.
      
      Dave said:
       "I came up with the exact same patch.  For posterity, here's the test
        case, generated by syzkaller and trimmed down by Reinette:
      
        	https://www.sr71.net/~dave/intel/log2.c
      
        And the config that helps detect this:
      
        	https://www.sr71.net/~dave/intel/config-log2"
      
      Fixes: b8d3c4c3 ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called")
      Link: http://lkml.kernel.org/r/20170628101249.17879-1-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a6fcf5f
    • David Rientjes's avatar
      compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled · c53c2a4d
      David Rientjes authored
      commit 9a04dbcf upstream.
      
      The motivation for commit abb2ea7d ("compiler, clang: suppress
      warning for unused static inline functions") was to suppress clang's
      warnings about unused static inline functions.
      
      For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any non-x86
      architecture, `inline' in the kernel implies that
      __attribute__((always_inline)) is used.
      
      Some code depends on that behavior, see
        https://lkml.org/lkml/2017/6/13/918:
      
        net/built-in.o: In function `__xchg_mb':
        arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99'
        arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99
      
      The full fix would be to identify these breakages and annotate the
      functions with __always_inline instead of `inline'.  But since we are
      late in the 4.12-rc cycle, simply carry forward the forced inlining
      behavior and work toward moving arm64, and other architectures, toward
      CONFIG_OPTIMIZE_INLINING behavior.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706261552200.1075@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarSodagudi Prasad <psodagud@codeaurora.org>
      Tested-by: default avatarSodagudi Prasad <psodagud@codeaurora.org>
      Tested-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c53c2a4d
    • Ben Hutchings's avatar
      tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth · 921f4e22
      Ben Hutchings authored
      commit 98dcea0c upstream.
      
      liblockdep has been broken since commit 75dd602a ("lockdep: Fix
      lock_chain::base size"), as that adds a check that MAX_LOCK_DEPTH is
      within the range of lock_chain::depth and in liblockdep it is much
      too large.
      
      That should have resulted in a compiler error, but didn't because:
      
      - the check uses ARRAY_SIZE(), which isn't yet defined in liblockdep
        so is assumed to be an (undeclared) function
      - putting a function call inside a BUILD_BUG_ON() expression quietly
        turns it into some nonsense involving a variable-length array
      
      It did produce a compiler warning, but I didn't notice because
      liblockdep already produces too many warnings if -Wall is enabled
      (which I'll fix shortly).
      
      Even before that commit, which reduced lock_chain::depth from 8 bits
      to 6, MAX_LOCK_DEPTH was too large.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: a.p.zijlstra@chello.nl
      Link: http://lkml.kernel.org/r/20170525130005.5947-3-alexander.levin@verizon.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      921f4e22
    • Helge Deller's avatar
      parisc/mm: Ensure IRQs are off in switch_mm() · 56a13c4d
      Helge Deller authored
      commit 649aa242 upstream.
      
      This is because of commit f98db601 ("sched/core: Add switch_mm_irqs_off()
      and use it in the scheduler") in which switch_mm_irqs_off() is called by the
      scheduler, vs switch_mm() which is used by use_mm().
      
      This patch lets the parisc code mirror the x86 and powerpc code, ie. it
      disables interrupts in switch_mm(), and optimises the scheduler case by
      defining switch_mm_irqs_off().
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56a13c4d
    • Thomas Bogendoerfer's avatar
      parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs · 57b605a6
      Thomas Bogendoerfer authored
      commit 33f9e024 upstream.
      
      Enabling parport pc driver on a B2600 (and probably other 64bit PARISC
      systems) produced following BUG:
      
      CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc5-30198-g1132d5e7 #156
      task: 000000009e050000 task.stack: 000000009e04c000
      
           YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
      PSW: 00001000000001101111111100001111 Not tainted
      r00-03  000000ff0806ff0f 000000009e04c990 0000000040871b78 000000009e04cac0
      r04-07  0000000040c14de0 ffffffffffffffff 000000009e07f098 000000009d82d200
      r08-11  000000009d82d210 0000000000000378 0000000000000000 0000000040c345e0
      r12-15  0000000000000005 0000000040c345e0 0000000000000000 0000000040c9d5e0
      r16-19  0000000040c345e0 00000000f00001c4 00000000f00001bc 0000000000000061
      r20-23  000000009e04ce28 0000000000000010 0000000000000010 0000000040b89e40
      r24-27  0000000000000003 0000000000ffffff 000000009d82d210 0000000040c14de0
      r28-31  0000000000000000 000000009e04ca90 000000009e04cb40 0000000000000000
      sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
      IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000404aece0 00000000404aece4
       IIR: 03ffe01f    ISR: 0000000010340000  IOR: 000001781304cac8
       CPU:        0   CR30: 000000009e04c000 CR31: 00000000e2976de2
       ORIG_R28: 0000000000000200
       IAOQ[0]: sba_dma_supported+0x80/0xd0
       IAOQ[1]: sba_dma_supported+0x84/0xd0
       RP(r2): parport_pc_probe_port+0x178/0x1200
      
      Cause is a call to dma_coerce_mask_and_coherenet in parport_pc_probe_port,
      which PARISC DMA API doesn't handle very nicely. This commit gives back
      DMA_ERROR_CODE for DMA API calls, if device isn't capable of DMA
      transaction.
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57b605a6
    • Eric Biggers's avatar
      parisc: use compat_sys_keyctl() · fbc8f9a7
      Eric Biggers authored
      commit b0f94efd upstream.
      
      Architectures with a compat syscall table must put compat_sys_keyctl()
      in it, not sys_keyctl().  The parisc architecture was not doing this;
      fix it.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbc8f9a7
    • Helge Deller's avatar
      parisc: Report SIGSEGV instead of SIGBUS when running out of stack · 3bc74245
      Helge Deller authored
      commit 24746231 upstream.
      
      When a process runs out of stack the parisc kernel wrongly faults with SIGBUS
      instead of the expected SIGSEGV signal.
      
      This example shows how the kernel faults:
      do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000]
      trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000
      
      The vma->vm_end value is the first address which does not belong to the vma, so
      adjust the check to include vma->vm_end to the range for which to send the
      SIGSEGV signal.
      
      This patch unbreaks building the debian libsigsegv package.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bc74245
    • Suzuki K Poulose's avatar
      irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity · 5b2c6af5
      Suzuki K Poulose authored
      commit 866d7c1b upstream.
      
      The GICv3 driver doesn't check if the target CPU for gic_set_affinity
      is valid before going ahead and making the changes. This triggers the
      following splat with KASAN:
      
      [  141.189434] BUG: KASAN: global-out-of-bounds in gic_set_affinity+0x8c/0x140
      [  141.189704] Read of size 8 at addr ffff200009741d20 by task swapper/1/0
      [  141.189958]
      [  141.190158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.12.0-rc7
      [  141.190458] Hardware name: Foundation-v8A (DT)
      [  141.190658] Call trace:
      [  141.190908] [<ffff200008089d70>] dump_backtrace+0x0/0x328
      [  141.191224] [<ffff20000808a1b4>] show_stack+0x14/0x20
      [  141.191507] [<ffff200008504c3c>] dump_stack+0xa4/0xc8
      [  141.191858] [<ffff20000826c19c>] print_address_description+0x13c/0x250
      [  141.192219] [<ffff20000826c5c8>] kasan_report+0x210/0x300
      [  141.192547] [<ffff20000826ad54>] __asan_load8+0x84/0x98
      [  141.192874] [<ffff20000854eeec>] gic_set_affinity+0x8c/0x140
      [  141.193158] [<ffff200008148b14>] irq_do_set_affinity+0x54/0xb8
      [  141.193473] [<ffff200008148d2c>] irq_set_affinity_locked+0x64/0xf0
      [  141.193828] [<ffff200008148e00>] __irq_set_affinity+0x48/0x78
      [  141.194158] [<ffff200008bc48a4>] arm_perf_starting_cpu+0x104/0x150
      [  141.194513] [<ffff2000080d73bc>] cpuhp_invoke_callback+0x17c/0x1f8
      [  141.194783] [<ffff2000080d94ec>] notify_cpu_starting+0x8c/0xb8
      [  141.195130] [<ffff2000080911ec>] secondary_start_kernel+0x15c/0x200
      [  141.195390] [<0000000080db81b4>] 0x80db81b4
      [  141.195603]
      [  141.195685] The buggy address belongs to the variable:
      [  141.196012]  __cpu_logical_map+0x200/0x220
      [  141.196176]
      [  141.196315] Memory state around the buggy address:
      [  141.196586]  ffff200009741c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  141.196913]  ffff200009741c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  141.197158] >ffff200009741d00: 00 00 00 00 fa fa fa fa 00 00 00 00 00 00 00 00
      [  141.197487]                                ^
      [  141.197758]  ffff200009741d80: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00
      [  141.198060]  ffff200009741e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  141.198358] ==================================================================
      [  141.198609] Disabling lock debugging due to kernel taint
      [  141.198961] CPU1: Booted secondary processor [410fd051]
      
      This patch adds the check to make sure the cpu is valid.
      
      Fixes: commit 021f6537 ("irqchip: gic-v3: Initial support for GICv3")
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b2c6af5
    • Alex Williamson's avatar
      kvm-vfio: Decouple only when we match a group · 0aed5da5
      Alex Williamson authored
      commit e323369b upstream.
      
      Unset-KVM and decrement-assignment only when we find the group in our
      list.  Otherwise we can get out of sync if the user triggers this for
      groups that aren't currently on our list.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Tested-by: default avatarEric Auger <eric.auger@redhat.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0aed5da5
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code · e840db55
      Paul Mackerras authored
      commit 00c14757 upstream.
      
      This fixes a typo where the wrong loop index was used to index
      the kvmppc_xive_vcpu.queues[] array in xive_pre_save_scan().
      The variable i contains the vcpu number; we need to index queues[]
      using j, which iterates from 0 to KVMPPC_XIVE_Q_COUNT-1.
      
      The effect of this bug is that things that save the interrupt
      controller state, such as "virsh dump", on a VM with more than
      8 vCPUs, result in xive_pre_save_queue() getting called on a
      bogus queue structure, usually resulting in a crash like this:
      
      [  501.821107] Unable to handle kernel paging request for data at address 0x00000084
      [  501.821212] Faulting instruction address: 0xc008000004c7c6f8
      [  501.821234] Oops: Kernel access of bad area, sig: 11 [#1]
      [  501.821305] SMP NR_CPUS=1024
      [  501.821307] NUMA
      [  501.821376] PowerNV
      [  501.821470] Modules linked in: vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel kvm_hv nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm tg3 ptp pps_core
      [  501.822477] CPU: 3 PID: 3934 Comm: live_migration Not tainted 4.11.0-4.git8caa70f.el7.centos.ppc64le #1
      [  501.822633] task: c0000003f9e3ae80 task.stack: c0000003f9ed4000
      [  501.822745] NIP: c008000004c7c6f8 LR: c008000004c7c628 CTR: 0000000030058018
      [  501.822877] REGS: c0000003f9ed7980 TRAP: 0300   Not tainted  (4.11.0-4.git8caa70f.el7.centos.ppc64le)
      [  501.823030] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
      [  501.823047]   CR: 28022244  XER: 00000000
      [  501.823203] CFAR: c008000004c7c77c DAR: 0000000000000084 DSISR: 40000000 SOFTE: 1
      [  501.823203] GPR00: c008000004c7c628 c0000003f9ed7c00 c008000004c91450 00000000000000ff
      [  501.823203] GPR04: c0000003f5580000 c0000003f559bf98 9000000000009033 0000000000000000
      [  501.823203] GPR08: 0000000000000084 0000000000000000 00000000000001e0 9000000000001003
      [  501.823203] GPR12: c00000000008a7d0 c00000000fdc1b00 000000000a9a0000 0000000000000000
      [  501.823203] GPR16: 00000000402954e8 000000000a9a0000 0000000000000004 0000000000000000
      [  501.823203] GPR20: 0000000000000008 c000000002e8f180 c000000002e8f1e0 0000000000000001
      [  501.823203] GPR24: 0000000000000008 c0000003f5580008 c0000003f4564018 c000000002e8f1e8
      [  501.823203] GPR28: 00003ff6e58bdc28 c0000003f4564000 0000000000000000 0000000000000000
      [  501.825441] NIP [c008000004c7c6f8] xive_get_attr+0x3b8/0x5b0 [kvm]
      [  501.825671] LR [c008000004c7c628] xive_get_attr+0x2e8/0x5b0 [kvm]
      [  501.825887] Call Trace:
      [  501.825991] [c0000003f9ed7c00] [c008000004c7c628] xive_get_attr+0x2e8/0x5b0 [kvm] (unreliable)
      [  501.826312] [c0000003f9ed7cd0] [c008000004c62ec4] kvm_device_ioctl_attr+0x64/0xa0 [kvm]
      [  501.826581] [c0000003f9ed7d20] [c008000004c62fcc] kvm_device_ioctl+0xcc/0xf0 [kvm]
      [  501.826843] [c0000003f9ed7d40] [c000000000350c70] do_vfs_ioctl+0xd0/0x8c0
      [  501.827060] [c0000003f9ed7de0] [c000000000351534] SyS_ioctl+0xd4/0xf0
      [  501.827282] [c0000003f9ed7e30] [c00000000000b8e0] system_call+0x38/0xfc
      [  501.827496] Instruction dump:
      [  501.827632] 419e0078 3b760008 e9160008 83fb000c 83db0010 80fb0008 2f280000 60000000
      [  501.827901] 60000000 60420000 419a0050 7be91764 <7d284c2c> 552a0ffe 7f8af040 419e003c
      [  501.828176] ---[ end trace 2d0529a5bbbbafed ]---
      
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e840db55
    • Hu Huajun's avatar
      KVM: ARM64: fix phy counter access failure in guest. · 233b62de
      Hu Huajun authored
      commit 02d50cda upstream.
      
      When reading the cntpct_el0 in guest with VHE (Virtual Host Extension)
      enabled in host, the "Unsupported guest sys_reg access" error reported.
      The reason is cnthctl_el2.EL1PCTEN is not enabled, which is expected
      to be done in kvm_timer_init_vhe(). The problem is kvm_timer_init_vhe
      is called by cpu_init_hyp_mode, and which is called when VHE is disabled.
      This patch remove the incorrect call to kvm_timer_init_vhe() from
      cpu_init_hyp_mode(), and calls kvm_timer_init_vhe() to enable
      cnthctl_el2.EL1PCTEN in cpu_hyp_reinit().
      
      Fixes: 488f94d7 ("KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems")
      Signed-off-by: default avatarHu Huajun <huhuajun@huawei.com>
      Reviewed-by: default avatarChristoffer Dall <cdall@linaro.org>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      233b62de
    • Alex Deucher's avatar
      drm/amdgpu/gfx6: properly cache mc_arb_ramcfg · 20df3657
      Alex Deucher authored
      commit 6653ebd4 upstream.
      
      This was missing for gfx6.
      Acked-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20df3657
    • Srinivas Dasari's avatar
      cfg80211: Check if NAN service ID is of expected size · 227ea2fd
      Srinivas Dasari authored
      commit 0a27844c upstream.
      
      nla policy checks for only maximum length of the attribute data when the
      attribute type is NLA_BINARY. If userspace sends less data than
      specified, cfg80211 may access illegal memory. When type is NLA_UNSPEC,
      nla policy check ensures that userspace sends minimum specified length
      number of bytes.
      
      Remove type assignment to NLA_BINARY from nla_policy of
      NL80211_NAN_FUNC_SERVICE_ID to make these NLA_UNSPEC and to make sure
      minimum NL80211_NAN_FUNC_SERVICE_ID_LEN bytes are received from
      userspace with NL80211_NAN_FUNC_SERVICE_ID.
      
      Fixes: a442b761 ("cfg80211: add add_nan_func / del_nan_func")
      Signed-off-by: default avatarSrinivas Dasari <dasaris@qti.qualcomm.com>
      Signed-off-by: default avatarJouni Malinen <jouni@qca.qualcomm.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      227ea2fd
    • Srinivas Dasari's avatar
      cfg80211: Check if PMKID attribute is of expected size · 3853bb15
      Srinivas Dasari authored
      commit 9361df14 upstream.
      
      nla policy checks for only maximum length of the attribute data
      when the attribute type is NLA_BINARY. If userspace sends less
      data than specified, the wireless drivers may access illegal
      memory. When type is NLA_UNSPEC, nla policy check ensures that
      userspace sends minimum specified length number of bytes.
      
      Remove type assignment to NLA_BINARY from nla_policy of
      NL80211_ATTR_PMKID to make this NLA_UNSPEC and to make sure minimum
      WLAN_PMKID_LEN bytes are received from userspace with
      NL80211_ATTR_PMKID.
      
      Fixes: 67fbb16b ("nl80211: PMKSA caching support")
      Signed-off-by: default avatarSrinivas Dasari <dasaris@qti.qualcomm.com>
      Signed-off-by: default avatarJouni Malinen <jouni@qca.qualcomm.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3853bb15
    • Srinivas Dasari's avatar
      cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES · be35aac0
      Srinivas Dasari authored
      commit d7f13f74 upstream.
      
      validate_scan_freqs() retrieves frequencies from attributes
      nested in the attribute NL80211_ATTR_SCAN_FREQUENCIES with
      nla_get_u32(), which reads 4 bytes from each attribute
      without validating the size of data received. Attributes
      nested in NL80211_ATTR_SCAN_FREQUENCIES don't have an nla policy.
      
      Validate size of each attribute before parsing to avoid potential buffer
      overread.
      
      Fixes: 2a519311 ("cfg80211/nl80211: scanning (and mac80211 update to use it)")
      Signed-off-by: default avatarSrinivas Dasari <dasaris@qti.qualcomm.com>
      Signed-off-by: default avatarJouni Malinen <jouni@qca.qualcomm.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be35aac0
    • Srinivas Dasari's avatar
      cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE · b9582dbe
      Srinivas Dasari authored
      commit 8feb69c7 upstream.
      
      Buffer overread may happen as nl80211_set_station() reads 4 bytes
      from the attribute NL80211_ATTR_LOCAL_MESH_POWER_MODE without
      validating the size of data received when userspace sends less
      than 4 bytes of data with NL80211_ATTR_LOCAL_MESH_POWER_MODE.
      Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE to avoid
      the buffer overread.
      
      Fixes: 3b1c5a53 ("{cfg,nl}80211: mesh power mode primitives and userspace access")
      Signed-off-by: default avatarSrinivas Dasari <dasaris@qti.qualcomm.com>
      Signed-off-by: default avatarJouni Malinen <jouni@qca.qualcomm.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9582dbe
    • Daniel Kiper's avatar
      efi: Process the MEMATTR table only if EFI_MEMMAP is enabled · c43499cd
      Daniel Kiper authored
      commit 457ea3f7 upstream.
      
      Otherwise e.g. Xen dom0 on x86_64 EFI platforms crashes.
      
      In theory we can check EFI_PARAVIRT too, however,
      EFI_MEMMAP looks more targeted and covers more cases.
      Signed-off-by: default avatarDaniel Kiper <daniel.kiper@oracle.com>
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andrew.cooper3@citrix.com
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: linux-efi@vger.kernel.org
      Cc: matt@codeblueprint.co.uk
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1498128697-12943-2-git-send-email-daniel.kiper@oracle.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c43499cd
    • Peter S. Housel's avatar
      brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain · c9e07e2f
      Peter S. Housel authored
      commit 5ea59db8 upstream.
      
      An earlier change to this function (3bdae810) fixed a leak in the
      case of an unsuccessful call to brcmf_sdiod_buffrw(). However, the
      glom_skb buffer, used for emulating a scattering read, is never used
      or referenced after its contents are copied into the destination
      buffers, and therefore always needs to be freed by the end of the
      function.
      
      Fixes: 3bdae810 ("brcmfmac: Fix glob_skb leak in brcmf_sdiod_recv_chain")
      Fixes: a413e39a ("brcmfmac: fix brcmf_sdcard_recv_chain() for host without sg support")
      Signed-off-by: default avatarPeter S. Housel <housel@acm.org>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9e07e2f
    • Christophe Jaillet's avatar
      brcmfmac: Fix a memory leak in error handling path in 'brcmf_cfg80211_attach' · c3439585
      Christophe Jaillet authored
      commit 57c00f2f upstream.
      
      If 'wiphy_new()' fails, we leak 'ops'. Add a new label in the error
      handling path to free it in such a case.
      
      Fixes: 5c22fb85 ("brcmfmac: add wowl gtk rekeying offload support")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3439585
    • Nitin Gupta's avatar
      sparc64: Fix gup_huge_pmd · 14a86f75
      Nitin Gupta authored
      
      [ Upstream commit dbd2667a ]
      
      The function assumes that each PMD points to head of a
      huge page. This is not correct as a PMD can point to
      start of any 8M region with a, say 256M, hugepage. The
      fix ensures that it points to the correct head of any PMD
      huge page.
      
      Cc: Julian Calaby <julian.calaby@gmail.com>
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14a86f75