- 27 Jul, 2017 8 commits
-
-
Johan Hovold authored
commit 15e0c59f upstream. Make sure to check the tty-device pointer before trying to access the parent device to avoid dereferencing a NULL-pointer when the tty is one end of a Unix98 pty. Fixes: e097dc62 ("NFC: nfcmrvl: add UART driver") Cc: Vincent Cuissard <cuissard@marvell.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit 20777bc5 upstream. Commit 7eda8b8e ("NFC: Use IDR library to assing NFC devices IDs") moved device-id allocation and struct-device initialisation from nfc_allocate_device() to nfc_register_device(). This broke just about every nfc-device-registration error path, which continue to call nfc_free_device() that tries to put the device reference of the now uninitialised (but zeroed) struct device: kobject: '(null)' (ce316420): is not initialized, yet kobject_put() is being called. The late struct-device initialisation also meant that various work queues whose names are derived from the nfc device name were also misnamed: 421 root 0 SW< [(null)_nci_cmd_] 422 root 0 SW< [(null)_nci_rx_w] 423 root 0 SW< [(null)_nci_tx_w] Move the id-allocation and struct-device initialisation back to nfc_allocate_device() and fix up the single call site which did not use nfc_free_device() in its error path. Fixes: 7eda8b8e ("NFC: Use IDR library to assing NFC devices IDs") Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miaoqing Pan authored
commit bde717ab upstream. The hard coded register 0x9864 and 0x9924 are invalid for ar9300 chips. Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miaoqing Pan authored
commit cf8ce1ea upstream. One scenario that could lead to UAF is two threads writing simultaneously to the "tx99" debug file. One of them would set the "start" value to true and follow to ath9k_tx99_init(). Inside the function it would set the sc->tx99_state to true after allocating sc->tx99skb. Then, the other thread would execute write_file_tx99() and call ath9k_tx99_deinit(). sc->tx99_state would be freed. After that, the first thread would continue inside ath9k_tx99_init() and call r = ath9k_tx99_send(sc, sc->tx99_skb, &txctl); that would make use of the freed sc->tx99_skb memory. Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Viresh Kumar authored
commit 289d72af upstream. After the lock is dropped, it is possible that the cpufreq_dev gets freed before we call get_level() and that can cause kernel to crash. Drop the lock after we are done using the structure. Fixes: 02373d7c ("thermal: cpu_cooling: fix lockdep problems in cpu_cooling") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mauro Carvalho Chehab authored
commit a16e3772 upstream. Gcc 7.1 complains about: drivers/media/platform/s5p-jpeg/jpeg-core.c: In function 's5p_jpeg_parse_hdr.isra.9': drivers/media/platform/s5p-jpeg/jpeg-core.c:1207:12: warning: 'width' may be used uninitialized in this function [-Wmaybe-uninitialized] result->w = width; ~~~~~~~~~~^~~~~~~ drivers/media/platform/s5p-jpeg/jpeg-core.c:1208:12: warning: 'height' may be used uninitialized in this function [-Wmaybe-uninitialized] result->h = height; ~~~~~~~~~~^~~~~~~~ Indeed the code would allow it to return a random value (although it shouldn't happen, in practice). So, explicitly set both to zero, just in case. Acked-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arnd Bergmann authored
commit bd7e31bb upstream. gcc-7 suggests that an expression using a bitwise not and a bitmask on a 'bool' variable is better written using boolean logic: drivers/media/rc/imon.c: In function 'imon_incoming_scancode': drivers/media/rc/imon.c:1725:22: error: '~' on a boolean expression [-Werror=bool-operation] ictx->pad_mouse = ~(ictx->pad_mouse) & 0x1; ^ drivers/media/rc/imon.c:1725:22: note: did you mean to use logical not? I agree. Fixes: 21677cfc ("V4L/DVB: ir-core: add imon driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit bd664f6b upstream. I made the mistake of upgrading my desktop to the new Fedora 26 that comes with gcc-7.1.1. There's nothing wrong per se that I've noticed, but I now have 1500 lines of warnings, mostly from the new format-truncation warning triggering all over the tree. We use 'snprintf()' and friends in a lot of places, and often know that the numbers are fairly small (ie a controller index or similar), but gcc doesn't know that, and sees an 'int', and thinks that it could be some huge number. And then complains when our buffers are not able to fit the name for the ten millionth controller. These warnings aren't necessarily bad per se, and we probably want to look through them subsystem by subsystem, but at least during the merge window they just mean that I can't even see if somebody is introducing any *real* problems when I pull. So warnings disabled for now. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 21 Jul, 2017 32 commits
-
-
Greg Kroah-Hartman authored
-
Haozhong Zhang authored
commit 691bd434 upstream. It's easier for host applications, such as QEMU, if they can always access guest MSR_IA32_BNDCFGS in VMCS, even though MPX is disabled in guest cpuid. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jim Mattson authored
commit 4531662d upstream. Bits 11:2 must be zero and the linear addess in bits 63:12 must be canonical. Otherwise, WRMSR(BNDCFGS) should raise #GP. Fixes: 0dd376e7 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save") Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jim Mattson authored
commit 4439af9f upstream. The BNDCFGS MSR should only be exposed to the guest if the guest supports MPX. (cf. the TSC_AUX MSR and RDTSCP.) Fixes: 0dd376e7 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save") Change-Id: I3ad7c01bda616715137ceac878f3fa7e66b6b387 Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jim Mattson authored
commit a8b6fda3 upstream. The MSR permission bitmaps are shared by all VMs. However, some VMs may not be configured to support MPX, even when the host does. If the host supports VMX and the guest does not, we should intercept accesses to the BNDCFGS MSR, so that we can synthesize a #GP fault. Furthermore, if the host does not support MPX and the "ignore_msrs" kvm kernel parameter is set, then we should intercept accesses to the BNDCFGS MSR, so that we can skip over the rdmsr/wrmsr without raising a #GP fault. Fixes: da8999d3 ("KVM: x86: Intel MPX vmx and msr handle") Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Bonzini authored
commit a87036ad upstream. When eager FPU is disabled, KVM will still see the MPX bit in CPUID and presumably the MPX vmentry and vmexit controls. However, it will not be able to expose the MPX XSAVE features to the guest, because the guest's accessible XSAVE features are always a subset of host_xcr0. In this case, we should disable the MPX CPUID bit, the BNDCFGS MSR, and the MPX vmentry and vmexit controls for nested virtualization. It is then unnecessary to enable guest eager FPU if the guest has the MPX CPUID bit set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavankumar Kondeti authored
commit c59f29cb upstream. The 's' flag is supposed to indicate that a softirq is running. This can be detected by testing the preempt_count with SOFTIRQ_OFFSET. The current code tests the preempt_count with SOFTIRQ_MASK, which would be true even when softirqs are disabled but not serving a softirq. Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.orgSigned-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
commit 2ca30331 upstream. In the current code, if the user accidentally writes a bogus command to this sysfs file, then we set the latency tolerance to an uninitialized variable. Fixes: 2d984ad1 (PM / QoS: Introcuce latency tolerance device PM QoS type) Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit ea0212f4 upstream. The wakeirq infrastructure uses RCU to protect the list of wakeirqs. That breaks the irq bus locking infrastructure, which is allows sleeping functions to be called so interrupt controllers behind slow busses, e.g. i2c, can be handled. The wakeirq functions hold rcu_read_lock and call into irq functions, which in case of interrupts using the irq bus locking will trigger a might_sleep() splat. Convert the wakeirq infrastructure to Sleepable RCU and unbreak it. Fixes: 4990d4fe (PM / Wakeirq: Add automated device wake IRQ handling) Reported-by: Brian Norris <briannorris@chromium.org> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lauro Ramos Venancio authored
commit f32d782e upstream. The group mask is always used in intersection with the group CPUs. So, when building the group mask, we don't have to care about CPUs that are not part of the group. Signed-off-by: Lauro Ramos Venancio <lvenanci@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: lwang@redhat.com Cc: riel@redhat.com Link: http://lkml.kernel.org/r/1492717903-5195-2-git-send-email-lvenanci@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
commit 73bb059f upstream. The point of sched_group_mask is to select those CPUs from sched_group_cpus that can actually arrive at this balance domain. The current code gets it wrong, as can be readily demonstrated with a topology like: node 0 1 2 3 0: 10 20 30 20 1: 20 10 20 30 2: 30 20 10 20 3: 20 30 20 10 Where (for example) domain 1 on CPU1 ends up with a mask that includes CPU0: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 (mask: 1), 2, 0 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072) This causes sched_balance_cpu() to compute the wrong CPU and consequently should_we_balance() will terminate early resulting in missed load-balance opportunities. The fixed topology looks like: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 (mask: 1), 2, 0 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072) (note: this relies on OVERLAP domains to always have children, this is true because the regular topology domains are still here -- this is before degenerate trimming) Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: e3589f6c ("sched: Allow for overlapping sched_domain spans") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Horia Geantă authored
commit 7459e1d2 upstream. Driver does not properly handle the case when signals interrupt wait_for_completion_interruptible(): -it does not check for return value -completion structure is allocated on stack; in case a signal interrupts the sleep, it will go out of scope, causing the worker thread (caam_jr_dequeue) to fail when it accesses it wait_for_completion_interruptible() is replaced with uninterruptable wait_for_completion(). We choose to block all signals while waiting for I/O (device executing the split key generation job descriptor) since the alternative - in order to have a deterministic device state - would be to flush the job ring (aborting *all* in-progress jobs). Fixes: 045e3678 ("crypto: caam - ahash hmac support") Fixes: 4c1ec1f9 ("crypto: caam - refactor key_gen, sg") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Herbert Xu authored
commit b82ce244 upstream. It has been reported that sha1-avx2 can cause page faults by reading beyond the end of the input. This patch disables it until it can be fixed. Fixes: 7c1da8d0 ("crypto: sha - SHA1 transform x86_64 AVX2") Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gilad Ben-Yossef authored
commit 1606043f upstream. The Atmel SHA driver was treating -EBUSY as indication of queueing to backlog without checking that backlog is enabled for the request. Fix it by checking request flags. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Hicks authored
commit 03d2c511 upstream. An updated patch that also handles the additional key length requirements for the AEAD algorithms. The max keysize is not 96. For SHA384/512 it's 128, and for the AEAD algorithms it's longer still. Extend the max keysize for the AEAD size for AES256 + HMAC(SHA512). Fixes: 357fb605 ("crypto: talitos - add sha224, sha384 and sha512 to existing AEAD algorithms") Signed-off-by: Martin Hicks <mort@bork.org> Acked-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Helge Deller authored
commit 37511fb5 upstream. Jörn Engel noticed that the expand_upwards() function might not return -ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE. Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa which all define TASK_SIZE as 0xffffffff, but since none of those have an upwards-growing stack we currently have no actual issue. Nevertheless let's fix this just in case any of the architectures with an upward-growing stack (currently parisc, metag and partly ia64) define TASK_SIZE similar. Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box Fixes: bd726c90 ("Allow stack to grow up to address space limit") Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: Jörn Engel <joern@purestorage.com> Cc: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Zimmerman authored
commit d1bd4a79 upstream. If a TPM2 loses power without a TPM2_Shutdown command being issued (a "disorderly reboot"), it may lose some state that has yet to be persisted to NVRam, and will increment the DA counter. After the DA counter gets sufficiently large, the TPM will lock the user out. NOTE: This only changes behavior on TPM2 devices. Since TPM1 uses sysfs, and sysfs relies on implicit locking on chip->ops, it is not safe to allow this code to run in TPM1, or to add sysfs support to TPM2, until that locking is made explicit. Signed-off-by: Josh Zimmerman <joshz@google.com> Cc: stable@vger.kernel.org Fixes: 74d6b3ce ("tpm: fix suspend/resume paths for TPM 2.0") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Zimmerman authored
commit f77af151 upstream. The TPM class has some common shutdown code that must be executed for all drivers. This adds some needed functionality for that. Signed-off-by: Josh Zimmerman <joshz@google.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 74d6b3ce ("tpm: fix suspend/resume paths for TPM 2.0") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Gunthorpe authored
commit 4e26195f upstream. Add a read/write semaphore around the ops function pointers so ops can be set to null when the driver un-registers. Previously the tpm core expected module locking to be enough to ensure that tpm_unregister could not be called during certain times, however that hasn't been sufficient for a long time. Introduce a read/write semaphore around 'ops' so the core can set it to null when unregistering. This provides a strong fence around the driver callbacks, guaranteeing to the driver that no callbacks are running or will run again. For now the ops_lock is placed very high in the call stack, it could be pushed down and made more granular in future if necessary. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Gunthorpe authored
commit 8cfffc9d upstream. This is a hold over from before the struct device conversion. - All prints should be using &chip->dev, which is the Linux standard. This changes prints to use tpm0 as the device name, not the PnP/etc ID. - The few places involving sysfs/modules that really do need the parent just use chip->dev.parent instead - We no longer need to get_device(pdev) in any places since it is no longer used by any of the code. The kref on the parent is held by the device core during device_add and dropped in device_del Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Lutomirski authored
commit 796a3bae upstream. test_execve does rather odd mount manipulations to safely create temporary setuid and setgid executables that aren't visible to the rest of the system. Those executables end up in the test's cwd, but that cwd is MNT_DETACHed. The core namespace code considers MNT_DETACHed trees to belong to no mount namespace at all and, in general, MNT_DETACHed trees are only barely function. This interacted with commit 380cf5ba ("fs: Treat foreign mounts as nosuid") to cause all MNT_DETACHed trees to act as though they're nosuid, breaking the test. Fix it by just not detaching the tree. It's still in a private mount namespace and is therefore still invisible to the rest of the system (except via /proc, and the same nosuid logic will protect all other programs on the system from believing in test_execve's setuid bits). While we're at it, fix some blatant whitespace problems. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: 380cf5ba ("fs: Treat foreign mounts as nosuid") Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Greg KH <greg@kroah.com> Cc: linux-kselftest@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit 296990de upstream. Andrei Vagin pointed out that time to executue propagate_umount can go non-linear (and take a ludicrious amount of time) when the mount propogation trees of the mounts to be unmunted by a lazy unmount overlap. Make the walk of the mount propagation trees nearly linear by remembering which mounts have already been visited, allowing subsequent walks to detect when walking a mount propgation tree or a subtree of a mount propgation tree would be duplicate work and to skip them entirely. Walk the list of mounts whose propgatation trees need to be traversed from the mount highest in the mount tree to mounts lower in the mount tree so that odds are higher that the code will walk the largest trees first, allowing later tree walks to be skipped entirely. Add cleanup_umount_visitation to remover the code's memory of which mounts have been visited. Add the functions last_slave and skip_propagation_subtree to allow skipping appropriate parts of the mount propagation tree without needing to change the logic of the rest of the code. A script to generate overlapping mount propagation trees: $ cat runs.h set -e mount -t tmpfs zdtm /mnt mkdir -p /mnt/1 /mnt/2 mount -t tmpfs zdtm /mnt/1 mount --make-shared /mnt/1 mkdir /mnt/1/1 iteration=10 if [ -n "$1" ] ; then iteration=$1 fi for i in $(seq $iteration); do mount --bind /mnt/1/1 /mnt/1/1 done mount --rbind /mnt/1 /mnt/2 TIMEFORMAT='%Rs' nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 )) echo -n "umount -l /mnt/1 -> $nr " time umount -l /mnt/1 nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l ) time umount -l /mnt/2 $ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done Here are the performance numbers with and without the patch: mhash | 8192 | 8192 | 1048576 | 1048576 mounts | before | after | before | after ------------------------------------------------ 1025 | 0.040s | 0.016s | 0.038s | 0.019s 2049 | 0.094s | 0.017s | 0.080s | 0.018s 4097 | 0.243s | 0.019s | 0.206s | 0.023s 8193 | 1.202s | 0.028s | 1.562s | 0.032s 16385 | 9.635s | 0.036s | 9.952s | 0.041s 32769 | 60.928s | 0.063s | 44.321s | 0.064s 65537 | | 0.097s | | 0.097s 131073 | | 0.233s | | 0.176s 262145 | | 0.653s | | 0.344s 524289 | | 2.305s | | 0.735s 1048577 | | 7.107s | | 2.603s Andrei Vagin reports fixing the performance problem is part of the work to fix CVE-2016-6213. Fixes: a05964f3 ("[PATCH] shared mounts handling: umount") Reported-by: Andrei Vagin <avagin@openvz.org> Reviewed-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit 99b19d16 upstream. While investigating some poor umount performance I realized that in the case of overlapping mount trees where some of the mounts are locked the code has been failing to unmount all of the mounts it should have been unmounting. This failure to unmount all of the necessary mounts can be reproduced with: $ cat locked_mounts_test.sh mount -t tmpfs test-base /mnt mount --make-shared /mnt mkdir -p /mnt/b mount -t tmpfs test1 /mnt/b mount --make-shared /mnt/b mkdir -p /mnt/b/10 mount -t tmpfs test2 /mnt/b/10 mount --make-shared /mnt/b/10 mkdir -p /mnt/b/10/20 mount --rbind /mnt/b /mnt/b/10/20 unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi' sleep 1 umount -l /mnt/b wait %% $ unshare -Urm ./locked_mounts_test.sh This failure is corrected by removing the prepass that marks mounts that may be umounted. A first pass is added that umounts mounts if possible and if not sets mount mark if they could be unmounted if they weren't locked and adds them to a list to umount possibilities. This first pass reconsiders the mounts parent if it is on the list of umount possibilities, ensuring that information of umoutability will pass from child to mount parent. A second pass then walks through all mounts that are umounted and processes their children unmounting them or marking them for reparenting. A last pass cleans up the state on the mounts that could not be umounted and if applicable reparents them to their first parent that remained mounted. While a bit longer than the old code this code is much more robust as it allows information to flow up from the leaves and down from the trunk making the order in which mounts are encountered in the umount propgation tree irrelevant. Fixes: 0c56fe31 ("mnt: Don't propagate unmounts to locked mounts") Reviewed-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit 570487d3 upstream. It was observed that in some pathlogical cases that the current code does not unmount everything it should. After investigation it was determined that the issue is that mnt_change_mntpoint can can change which mounts are available to be unmounted during mount propagation which is wrong. The trivial reproducer is: $ cat ./pathological.sh mount -t tmpfs test-base /mnt cd /mnt mkdir 1 2 1/1 mount --bind 1 1 mount --make-shared 1 mount --bind 1 2 mount --bind 1/1 1/1 mount --bind 1/1 1/1 echo grep test-base /proc/self/mountinfo umount 1/1 echo grep test-base /proc/self/mountinfo $ unshare -Urm ./pathological.sh The expected output looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 The output without the fix looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 That last mount in the output was in the propgation tree to be unmounted but was missed because the mnt_change_mountpoint changed it's parent before the walk through the mount propagation tree observed it. Fixes: 1064f874 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.") Acked-by: Andrei Vagin <avagin@virtuozzo.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Adam Borowski authored
commit 6987dc8a upstream. Only read access is checked before this call. Actually, at the moment this is not an issue, as every in-tree arch does the same manual checks for VERIFY_READ vs VERIFY_WRITE, relying on the MMU to tell them apart, but this wasn't the case in the past and may happen again on some odd arch in the future. If anyone cares about 3.7 and earlier, this is a security hole (untested) on real 80386 CPUs. Signed-off-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit da029c11 upstream. To avoid pathological stack usage or the need to special-case setuid execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit a73dc537 upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. On 32-bit use 4MB, which is the traditional x86 minimum load location, likely to avoid historically requiring a 4MB page table entry when only a portion of the first 4MB would be used (since the NULL address is avoided). For s390 the position could be 0x10000, but that is needlessly close to the NULL address. Link: http://lkml.kernel.org/r/1498154792-49952-5-git-send-email-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit 47ebb09d upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. On 32-bit use 4MB, which is the traditional x86 minimum load location, likely to avoid historically requiring a 4MB page table entry when only a portion of the first 4MB would be used (since the NULL address is avoided). Link: http://lkml.kernel.org/r/1498154792-49952-4-git-send-email-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit 02445990 upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. On 32-bit use 4MB, to match ARM. This could be 0x8000, the standard ET_EXEC load address, but that is needlessly close to the NULL address, and anyone running arm compat PIE will have an MMU, so the tight mapping is not needed. Link: http://lkml.kernel.org/r/1498251600-132458-4-git-send-email-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit 6a9af90a upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. 4MB is chosen here mainly to have parity with x86, where this is the traditional minimum load location, likely to avoid historically requiring a 4MB page table entry when only a portion of the first 4MB would be used (since the NULL address is avoided). For ARM the position could be 0x8000, the standard ET_EXEC load address, but that is needlessly close to the NULL address, and anyone running PIE on 32-bit ARM will have an MMU, so the tight mapping is not needed. Link: http://lkml.kernel.org/r/1498154792-49952-2-git-send-email-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit eab09532 upstream. The ELF_ET_DYN_BASE position was originally intended to keep loaders away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2 /bin/cat" might cause the subsequent load of /bin/cat into where the loader had been loaded.) With the advent of PIE (ET_DYN binaries with an INTERP Program Header), ELF_ET_DYN_BASE continued to be used since the kernel was only looking at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the top 1/3rd of the TASK_SIZE, a substantial portion of the address space is unused. For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are loaded above the mmap region. This means they can be made to collide (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with pathological stack regions. Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap region in all cases, and will now additionally avoid programs falling back to the mmap region by enforcing MAP_FIXED for program loads (i.e. if it would have collided with the stack, now it will fail to load instead of falling back to the mmap region). To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP) are loaded into the mmap region, leaving space available for either an ET_EXEC binary with a fixed location or PIE being loaded into mmap by the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE, which means architectures can now safely lower their values without risk of loaders colliding with their subsequently loaded programs. For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and suggestions on how to implement this solution. Fixes: d1fd836d ("mm: split ET_DYN ASLR from mmap ASLR") Link: http://lkml.kernel.org/r/20170621173201.GA114489@beastSigned-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Pratyush Anand <panand@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Cyril Bur authored
commit 8d81ae05 upstream. As of perl 5, version 26, subversion 0 (v5.26.0) some new warnings have occurred when running checkpatch. Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through in regex; marked by <-- HERE in m/^(.\s*){ <-- HERE \s*/ at scripts/checkpatch.pl line 3544. Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through in regex; marked by <-- HERE in m/^(.\s*){ <-- HERE \s*/ at scripts/checkpatch.pl line 3885. Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through in regex; marked by <-- HERE in m/^(\+.*(?:do|\))){ <-- HERE / at scripts/checkpatch.pl line 4374. It seems perfectly reasonable to do as the warning suggests and simply escape the left brace in these three locations. Link: http://lkml.kernel.org/r/20170607060135.17384-1-cyrilbur@gmail.comSigned-off-by: Cyril Bur <cyrilbur@gmail.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-