- 06 Feb, 2017 37 commits
-
-
Konrad Rzeszutek Wilk authored
commit 8135cf8b upstream. Double fetch vulnerabilities that happen when a variable is fetched twice from shared memory but a security check is only performed the first time. The xen_pcibk_do_op function performs a switch statements on the op->cmd value which is stored in shared memory. Interestingly this can result in a double fetch vulnerability depending on the performed compiler optimization. This patch fixes it by saving the xen_pci_op command before processing it. We also use 'barrier' to make sure that the compiler does not perform any optimization. This is part of XSA155. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: "Jan Beulich" <JBeulich@suse.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Roger Pau Monné authored
commit 1f13d75c upstream. A compiler may load a switch statement value multiple times, which could be bad when the value is in memory shared with the frontend. When converting a non-native request to a native one, ensure that src->operation is only loaded once by using READ_ONCE(). This is part of XSA155. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: "Jan Beulich" <JBeulich@suse.com> [wt: s/READ_ONCE/ACCESS_ONCE for 3.10] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
David Vrabel authored
commit 68a33bfd upstream. Instead of open-coding memcpy()s and directly accessing Tx and Rx requests, use the new RING_COPY_REQUEST() that ensures the local copy is correct. This is more than is strictly necessary for guest Rx requests since only the id and gref fields are used and it is harmless if the frontend modifies these. This is part of XSA155. Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [wt: adjustments for 3.10 : netbk_rx_meta instead of struct xenvif_rx_meta] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
David Vrabel authored
commit 0f589967 upstream. The last from guest transmitted request gives no indication about the minimum amount of credit that the guest might need to send a packet since the last packet might have been a small one. Instead allow for the worst case 128 KiB packet. This is part of XSA155. Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
David Vrabel authored
commit 454d5d88 upstream. Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly (i.e., by not considering that the other end may alter the data in the shared ring while it is being inspected). Safe usage of a request generally requires taking a local copy. Provide a RING_COPY_REQUEST() macro to use instead of RING_GET_REQUEST() and an open-coded memcpy(). This takes care of ensuring that the copy is done correctly regardless of any possible compiler optimizations. Use a volatile source to prevent the compiler from reordering or omitting the copy. This is part of XSA155. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Jan Beulich authored
commit 103f6112 upstream. Huge pages are not normally available to PV guests. Not suppressing hugetlbfs use results in an endless loop of page faults when user mode code tries to access a hugetlbfs mapped area (since the hypervisor denies such PTEs to be created, but error indications can't be propagated out of xen_set_pte_at(), just like for various of its siblings), and - once killed in an oops like this: kernel BUG at .../fs/hugetlbfs/inode.c:428! invalid opcode: 0000 [#1] SMP ... RIP: e030:[<ffffffff811c333b>] [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320 ... Call Trace: [<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40 [<ffffffff81167b3d>] evict+0xbd/0x1b0 [<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0 [<ffffffff81165b0e>] dput+0x1fe/0x220 [<ffffffff81150535>] __fput+0x155/0x200 [<ffffffff81079fc0>] task_work_run+0x60/0xa0 [<ffffffff81063510>] do_exit+0x160/0x400 [<ffffffff810637eb>] do_group_exit+0x3b/0xa0 [<ffffffff8106e8bd>] get_signal+0x1ed/0x470 [<ffffffff8100f854>] do_signal+0x14/0x110 [<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0 [<ffffffff814178a5>] retint_user+0x8/0x13 This is CVE-2016-3961 / XSA-174. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <JGross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: xen-devel <xen-devel@lists.xenproject.org> Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
WANG Cong authored
commit 205e1e25 upstream Matt reported that we have a NULL pointer dereference in ppp_pernet() from ppp_connect_channel(), i.e. pch->chan_net is NULL. This is due to that a parallel ppp_unregister_channel() could happen while we are in ppp_connect_channel(), during which pch->chan_net set to NULL. Since we need a reference to net per channel, it makes sense to sync the refcnt with the life time of the channel, therefore we should release this reference when we destroy it. Fixes: 1f461dcd ("ppp: take reference on channels netns") Reported-by: Matt Bennett <Matt.Bennett@alliedtelesis.co.nz> Cc: Paul Mackerras <paulus@samba.org> Cc: linux-ppp@vger.kernel.org Cc: Guillaume Nault <g.nault@alphalink.fr> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Xiaolong Ye authored
commit 5f25f066 upstream time_in_state in struct devfreq is defined as unsigned long, so devm_kzalloc should use sizeof(unsigned long) as argument instead of sizeof(unsigned int), otherwise it will cause unexpected result in 64bit system. Signed-off-by: Xiaolong Ye <yexl@marvell.com> Signed-off-by: Kevin Liu <kliu5@marvell.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Ignacio Alvarado authored
commit 1650b4eb upstream. Function user_notifier_unregister should be called only once for each registered user notifier. Function kvm_arch_hardware_disable can be executed from an IPI context which could cause a race condition with a VCPU returning to user mode and attempting to unregister the notifier. Signed-off-by: Ignacio Alvarado <ikalvarado@google.com> Fixes: 18863bdd ("KVM: x86 shared msr infrastructure") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim KrÄmáŠ<rkrcmar@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Paolo Bonzini authored
commit 7301d6ab upstream. Reported by syzkaller: [ INFO: suspicious RCU usage. ] 4.9.0-rc4+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage! stack backtrace: CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000 0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9 ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [<ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445 [< inline >] __kvm_memslots include/linux/kvm_host.h:534 [< inline >] kvm_memslots include/linux/kvm_host.h:541 [<ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941 [<ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217 Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: fda4e2e8 Cc: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim KrÄmáŠ<rkrcmar@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Ido Yariv authored
commit bd768e14 upstream. vcpu->arch.wbinvd_dirty_mask may still be used after freeing it, corrupting memory. For example, the following call trace may set a bit in an already freed cpu mask: kvm_arch_vcpu_load vcpu_load vmx_free_vcpu_nested vmx_free_vcpu kvm_arch_vcpu_free Fix this by deferring freeing of wbinvd_dirty_mask. Signed-off-by: Ido Yariv <ido@wizery.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim KrÄmáŠ<rkrcmar@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
James Hogan authored
commit ede5f3e7 upstream. The ERET instruction to return from exception is used for returning from exception level (Status.EXL) and error level (Status.ERL). If both bits are set however we should be returning from ERL first, as ERL can interrupt EXL, for example when an NMI is taken. KVM however checks EXL first. Fix the order of the checks to match the pseudocode in the instruction set manual. Fixes: e685c689 ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim KrÄmáÅ" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Radim Kr�má� authored
commit dccbfcf5 upstream. If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the write with vmcs02 as the current VMCS. This will incorrectly apply modifications intended for vmcs01 to vmcs02 and L2 can use it to gain access to L0's x2APIC registers by disabling virtualized x2APIC while using msr bitmap that assumes enabled. Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the current VMCS. An alternative solution would temporarily make vmcs01 the current VMCS, but it requires more care. Fixes: 8d14695f ("x86, apicv: add virtual x2apic support") Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim KrÄmáŠ<rkrcmar@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
James Hogan authored
commit 91e4f1b6 upstream. When a guest TLB entry is replaced by TLBWI or TLBWR, we only invalidate TLB entries on the local CPU. This doesn't work correctly on an SMP host when the guest is migrated to a different physical CPU, as it could pick up stale TLB mappings from the last time the vCPU ran on that physical CPU. Therefore invalidate both user and kernel host ASIDs on other CPUs, which will cause new ASIDs to be generated when it next runs on those CPUs. We're careful only to do this if the TLB entry was already valid, and only for the kernel ASID where the virtual address it mapped is outside of the guest user address range. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim KrÄmáÅ" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.10.x- Cc: Jiri Slaby <jslaby@suse.cz> [james.hogan@imgtec.com: Backport to 3.10..3.16] Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
James Hogan authored
commit e1e575f6 upstream. The advancing of the PC when completing an MMIO load is done before re-entering the guest, i.e. before restoring the guest ASID. However if the load is in a branch delay slot it may need to access guest code to read the prior branch instruction. This isn't safe in TLB mapped code at the moment, nor in the future when we'll access unmapped guest segments using direct user accessors too, as it could read the branch from host user memory instead. Therefore calculate the resume PC in advance while we're still in the right context and save it in the new vcpu->arch.io_pc (replacing the no longer needed vcpu->arch.pending_load_cause), and restore it on MMIO completion. Fixes: e685c689 ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim KrÄmáŠ<rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.10.x-3.16.x: 5f508c43: MIPS: KVM: Fix unused variable build warning Cc: <stable@vger.kernel.org> # 3.10.x-3.16.x Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [james.hogan@imgtec.com: Backport to 3.10..3.16] Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Nicholas Mc Guire authored
commit 5f508c43 upstream. As kvm_mips_complete_mmio_load() did not yet modify PC at this point as James Hogans <james.hogan@imgtec.com> explained the curr_pc variable and the comments along with it can be dropped. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Link: http://lkml.org/lkml/2015/5/8/422 Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: kvm@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/9993/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [james.hogan@imgtec.com: Backport to 3.10..3.16] Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Ondrej Mosná�ek authored
commit 50d2e6dc upstream. The cipher block size for GCM is 16 bytes, and thus the CTR transform used in crypto_gcm_setkey() will also expect a 16-byte IV. However, the code currently reserves only 8 bytes for the IV, causing an out-of-bounds access in the CTR transform. This patch fixes the issue by setting the size of the IV buffer to 16 bytes. Fixes: 84c91152 ("[CRYPTO] gcm: Add support for async ciphers") Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit acdb04d0 upstream. When we need to allocate a temporary blkcipher_walk_next and it fails, the code is supposed to take the slow path of processing the data block by block. However, due to an unrelated change we instead end up dereferencing the NULL pointer. This patch fixes it by moving the unrelated bsize setting out of the way so that we enter the slow path as inteded. Fixes: 7607bd8f ("[CRYPTO] blkcipher: Added blkcipher_walk_virt_block") Cc: stable@vger.kernel.org Reported-by: xiakaixu <xiakaixu@huawei.com> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Ard Biesheuvel authored
commit 0bd22235 upstream. When calling .import() on a cryptd ahash_request, the structure members that describe the child transform in the shash_desc need to be initialized like they are when calling .init() Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit 4f0414e5 upstream. We need to load the TX SG list in sendmsg(2) after waiting for incoming data, not before. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit 1822793a upstream. We need to lock the child socket in skcipher_check_key as otherwise two simultaneous calls can cause the parent socket to be freed. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit ad46d7e3 upstream. We need to lock the child socket in hash_check_key as otherwise two simultaneous calls can cause the parent socket to be freed. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit a6a48c56 upstream. This patch forbids the calling of bind(2) when there are child sockets created by accept(2) in existence, even if they are created on the nokey path. This is needed as those child sockets have references to the tfm object which bind(2) will destroy. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit d7b65aee upstream. This patch removes the custom release parent function as the generic af_alg_release_parent now works for nokey sockets too. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit f1d84af1 upstream. This patch removes the custom release parent function as the generic af_alg_release_parent now works for nokey sockets too. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit 6a935170 upstream. This patch allows af_alg_release_parent to be called even for nokey sockets. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit 6e8d8ecf upstream. This patch adds an exception to the key check so that cipher_null users may continue to use algif_skcipher without setting a key. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit a1383cd8 upstream. This patch adds a way for skcipher users to determine whether a key is required by a transform. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit 6de62f15 upstream. Hash implementations that require a key may crash if you use them without setting a key. This patch adds the necessary checks so that if you do attempt to use them without a key that we return -ENOKEY instead of proceeding. This patch also adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit 00420a65 upstream. The has_key logic is wrong for shash algorithms as they always have a setkey function. So we should instead be testing against shash_no_setkey. Fixes: a5596d63 ("crypto: hash - Add crypto_ahash_has_setkey") Cc: stable@vger.kernel.org Reported-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit a5596d63 upstream. This patch adds a way for ahash users to determine whether a key is required by a crypto_ahash transform. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit a0fa2d03 upstream. This patch adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit 37766586 upstream. This patch adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit c840ac6a upstream. Each af_alg parent socket obtained by socket(2) corresponds to a tfm object once bind(2) has succeeded. An accept(2) call on that parent socket creates a context which then uses the tfm object. Therefore as long as any child sockets created by accept(2) exist the parent socket must not be modified or freed. This patch guarantees this by using locks and a reference count on the parent socket. Any attempt to modify the parent socket will fail with EBUSY. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Herbert Xu authored
commit dd504589 upstream. Some cipher implementations will crash if you try to use them without calling setkey first. This patch adds a check so that the accept(2) call will fail with -ENOKEY if setkey hasn't been done on the socket yet. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Peter Zijlstra authored
commit ecf7d01c upstream. Oleg noticed that its possible to falsely observe p->on_cpu == 0 such that we'll prematurely continue with the wakeup and effectively run p on two CPUs at the same time. Even though the overlap is very limited; the task is in the middle of being scheduled out; it could still result in corruption of the scheduler data structures. CPU0 CPU1 set_current_state(...) <preempt_schedule> context_switch(X, Y) prepare_lock_switch(Y) Y->on_cpu = 1; finish_lock_switch(X) store_release(X->on_cpu, 0); try_to_wake_up(X) LOCK(p->pi_lock); t = X->on_cpu; // 0 context_switch(Y, X) prepare_lock_switch(X) X->on_cpu = 1; finish_lock_switch(Y) store_release(Y->on_cpu, 0); </preempt_schedule> schedule(); deactivate_task(X); X->on_rq = 0; if (X->on_rq) // false if (t) while (X->on_cpu) cpu_relax(); context_switch(X, ..) finish_lock_switch(X) store_release(X->on_cpu, 0); Avoid the load of X->on_cpu being hoisted over the X->on_rq load. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Balbir Singh authored
commit 135e8c92 upstream. The origin of the issue I've seen is related to a missing memory barrier between check for task->state and the check for task->on_rq. The task being woken up is already awake from a schedule() and is doing the following: do { schedule() set_current_state(TASK_(UN)INTERRUPTIBLE); } while (!cond); The waker, actually gets stuck doing the following in try_to_wake_up(): while (p->on_cpu) cpu_relax(); Analysis: The instance I've seen involves the following race: CPU1 CPU2 while () { if (cond) break; do { schedule(); set_current_state(TASK_UN..) } while (!cond); wakeup_routine() spin_lock_irqsave(wait_lock) raw_spin_lock_irqsave(wait_lock) wake_up_process() } try_to_wake_up() set_current_state(TASK_RUNNING); .. list_del(&waiter.list); CPU2 wakes up CPU1, but before it can get the wait_lock and set current state to TASK_RUNNING the following occurs: CPU3 wakeup_routine() raw_spin_lock_irqsave(wait_lock) if (!list_empty) wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(p->pi_lock) .. if (p->on_rq && ttwu_wakeup()) .. while (p->on_cpu) cpu_relax() .. CPU3 tries to wake up the task on CPU1 again since it finds it on the wait_queue, CPU1 is spinning on wait_lock, but immediately after CPU2, CPU3 got it. CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and the task is spinning on the wait_lock. Interestingly since p->on_rq is checked under pi_lock, I've noticed that try_to_wake_up() finds p->on_rq to be 0. This was the most confusing bit of the analysis, but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq check is not reliable without this fix IMHO. The race is visible (based on the analysis) only when ttwu_queue() does a remote wakeup via ttwu_queue_remote. In which case the p->on_rq change is not done uder the pi_lock. The result is that after a while the entire system locks up on the raw_spin_irqlock_save(wait_lock) and the holder spins infintely Reproduction of the issue: The issue can be reproduced after a long run on my system with 80 threads and having to tweak available memory to very low and running memory stress-ng mmapfork test. It usually takes a long time to reproduce. I am trying to work on a test case that can reproduce the issue faster, but thats work in progress. I am still testing the changes on my still in a loop and the tests seem OK thus far. Big thanks to Benjamin and Nick for helping debug this as well. Ben helped catch the missing barrier, Nick caught every missing bit in my theory. Signed-off-by: Balbir Singh <bsingharora@gmail.com> [ Updated comment to clarify matching barriers. Many architectures do not have a full barrier in switch_to() so that cannot be relied upon. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <nicholas.piggin@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
- 21 Oct, 2016 1 commit
-
-
Willy Tarreau authored
-
- 19 Oct, 2016 2 commits
-
-
Linus Torvalds authored
commit 19be0eaf upstream. This is an ancient bug that was actually attempted to be fixed once (badly) by me eleven years ago in commit 4ceb5db9 ("Fix get_user_pages() race for write access") but that was then undone due to problems on s390 by commit f33ea7f4 ("fix get_user_pages bug"). In the meantime, the s390 situation has long been fixed, and we can now fix it by checking the pte_dirty() bit properly (and do it better). The s390 dirty bit was implemented in abf09bed ("s390/mm: implement software dirty bits") which made it into v3.9. Earlier kernels will have to look at the page state itself. Also, the VM has become more scalable, and what used a purely theoretical race back then has become easier to trigger. To fix it, we introduce a new internal FOLL_COW flag to mark the "yes, we already did a COW" rather than play racy games with FOLL_WRITE that is very fundamental, and then use the pte dirty flag to validate that the FOLL_COW flag is still valid. Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Nick Piggin <npiggin@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: s/gup.c/memory.c; s/follow_page_pte/follow_page_mask; s/faultin_page/__get_user_page] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Wei Liu authored
... so that we can make sure the rings are not freed until all SKBs in internal queues are consumed. 1. The VM is receiving packets through bonding + bridge + netback + netfront. 2. For some unknown reason at least one packet remains in the rx queue and is not delivered to the domU immediately by netback. 3. The VM finishes shutting down. 4. The shared ring between dom0 and domU is freed. 5. then xen-netback continues processing the pending requests and tries to put the packet into the now already released shared ring. > XXXlan0: port 9(vif26.0) entered disabled state > BUG: unable to handle kernel paging request at ffffc900108641d8 > IP: [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback] > PGD 57e20067 PUD 57e21067 PMD 571a7067 PTE 0 > Oops: 0000 [#1] SMP > ... > CPU: 0 PID: 12587 Comm: netback/0 Not tainted 3.10.0-ucs58-amd64 #1 Debian 3.10.11-1.58.201405060908 > Hardware name: FUJITSU PRIMERGY BX620 S6/D3051, BIOS 080015 Rev.3C78.3051 07/22/2011 > task: ffff880004b067c0 ti: ffff8800561ec000 task.ti: ffff8800561ec000 > RIP: e030:[<ffffffffa04147dc>] [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback] > RSP: e02b:ffff8800561edce8 EFLAGS: 00010202 > RAX: ffffc900104adac0 RBX: ffff8800541e95c0 RCX: ffffc90010864000 > RDX: 000000000000003b RSI: 0000000000000000 RDI: ffff880040014380 > RBP: ffff8800570e6800 R08: 0000000000000000 R09: ffff880004799800 > R10: ffffffff813ca115 R11: ffff88005e4fdb08 R12: ffff880054e6f800 > R13: ffff8800561edd58 R14: ffffc900104a1000 R15: 0000000000000000 > FS: 00007f19a54a8700(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffffc900108641d8 CR3: 0000000054cb3000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Stack: > ffff880004b06ba0 0000000000000000 ffff88005da13ec0 ffff88005da13ec0 > 0000000004b067c0 ffffc900104a8ac0 ffffc900104a1020 000000005da13ec0 > 0000000000000000 0000000000000001 ffffc900104a8ac0 ffffc900104adac0 > Call Trace: > [<ffffffff813ca32d>] ? _raw_spin_lock_irqsave+0x11/0x2f > [<ffffffffa0416033>] ? xen_netbk_kthread+0x174/0x841 [xen_netback] > [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20 > [<ffffffffa0415ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback] > [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56 > [<ffffffffa0415ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback] > [<ffffffff8105ce1e>] ? kthread+0xab/0xb3 > [<ffffffff81003638>] ? xen_end_context_switch+0xe/0x1c > [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56 > [<ffffffff813cfbfc>] ? ret_from_fork+0x7c/0xb0 > [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56 > Code: 8b b3 d0 00 00 00 48 8b bb d8 00 00 00 0f b7 74 37 02 89 70 08 eb 07 c7 40 08 00 00 00 00 89 d2 c7 40 04 00 00 00 00 48 83 c2 08 <0f> b7 34 d1 89 30 c7 44 24 60 00 00 00 00 8b 44 d1 04 89 44 24 > RIP [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback] > RSP <ffff8800561edce8> > CR2: ffffc900108641d8 Track the shared ring buffer being unmapped and drop those packets. Ref-count the rings as followed: map -> set to 1 start_xmit -> inc when queueing SKB to internal queue rx_action -> dec after finishing processing a SKB unmap -> dec and wait to be 0 Note that this is different from ref counting the vif structure itself. Currently only guest Rx path is taken care of because that's where the bug surfaced. This bug doesn't exist in kernel >=3.12 as multi-queue support was added there. Link: <https://lists.xenproject.org/archives/html/xen-devel/2014-06/msg00818.html> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Philipp Hahn <hahn@univention.de> Cc: David Vrabel <david.vrabel@citrix.com> Tested-by: Philipp Hahn <hahn@univention.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
-