- 22 Aug, 2018 1 commit
-
-
Alexey Kodanev authored
[ Upstream commit 61ef4b07 ] The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value can lead to undefined behavior [1]. In order to fix this use a gradual shift of the window with a 'while' loop, similar to what tcp_cwnd_restart() is doing. When comparing delta and RTO there is a minor difference between TCP and DCCP, the last one also invokes dccp_cwnd_restart() and reduces 'cwnd' if delta equals RTO. That case is preserved in this change. [1]: [40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7 [40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int' [40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1 ... [40851.377176] Call Trace: [40851.408503] dump_stack+0xf1/0x17b [40851.451331] ? show_regs_print_info+0x5/0x5 [40851.503555] ubsan_epilogue+0x9/0x7c [40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4 [40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f [40851.686796] ? xfrm4_output_finish+0x80/0x80 [40851.739827] ? lock_downgrade+0x6d0/0x6d0 [40851.789744] ? xfrm4_prepare_output+0x160/0x160 [40851.845912] ? ip_queue_xmit+0x810/0x1db0 [40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp] [40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp] [40852.142412] dccp_sendmsg+0x428/0xb20 [dccp] [40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp] [40852.254833] ? sched_clock+0x5/0x10 [40852.298508] ? sched_clock+0x5/0x10 [40852.342194] ? inet_create+0xdf0/0xdf0 [40852.388988] sock_sendmsg+0xd9/0x160 ... Fixes: 113ced1f ("dccp ccid-2: Perform congestion-window validation") Signed-off-by:
Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 18 Aug, 2018 2 commits
-
-
Greg Kroah-Hartman authored
-
Sean Christopherson authored
commit f19f5c49 upstream. It turns out that we should *not* invert all not-present mappings, because the all zeroes case is obviously special. clear_page() does not undergo the XOR logic to invert the address bits, i.e. PTE, PMD and PUD entries that have not been individually written will have val=0 and so will trigger __pte_needs_invert(). As a result, {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok because the page at physical address 0 is reserved early in boot specifically to mitigate L1TF, so explicitly exempt them from the inversion when reading the PFN. Manifested as an unexpected mprotect(..., PROT_NONE) failure when called on a VMA that has VM_PFNMAP and was mmap'd to as something other than PROT_NONE but never used. mprotect() sends the PROT_NONE request down prot_none_walk(), which walks the PTEs to check the PFNs. prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns -EACCES because it thinks mprotect() is trying to adjust a high MMIO address. [ This is a very modified version of Sean's original patch, but all credit goes to Sean for doing this and also pointing out that sometimes the __pte_needs_invert() function only gets the protection bits, not the full eventual pte. But zero remains special even in just protection bits, so that's ok. - Linus ] Fixes: f22cc87f ("x86/speculation/l1tf: Invert all not present mappings") Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Acked-by:
Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 17 Aug, 2018 14 commits
-
-
Greg Kroah-Hartman authored
-
Toshi Kani authored
commit 5e0fb5df upstream. ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates a pud / pmd map. The following preconditions are met at their entry. - All pte entries for a target pud/pmd address range have been cleared. - System-wide TLB purges have been peformed for a target pud/pmd address range. The preconditions assure that there is no stale TLB entry for the range. Speculation may not cache TLB entries since it requires all levels of page entries, including ptes, to have P & A-bits set for an associated address. However, speculation may cache pud/pmd entries (paging-structure caches) when they have P-bit set. Add a system-wide TLB purge (INVLPG) to a single page after clearing pud/pmd entry's P-bit. SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches, states that: INVLPG invalidates all paging-structure caches associated with the current PCID regardless of the liner addresses to which they correspond. Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by:
Toshi Kani <toshi.kani@hpe.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: cpandya@codeaurora.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: Joerg Roedel <joro@8bytes.org> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chintan Pandya authored
commit 785a19f9 upstream. The following kernel panic was observed on ARM64 platform due to a stale TLB entry. 1. ioremap with 4K size, a valid pte page table is set. 2. iounmap it, its pte entry is set to 0. 3. ioremap the same address with 2M size, update its pmd entry with a new value. 4. CPU may hit an exception because the old pmd entry is still in TLB, which leads to a kernel panic. Commit b6bdb751 ("mm/vmalloc: add interfaces to free unmapped page table") has addressed this panic by falling to pte mappings in the above case on ARM64. To support pmd mappings in all cases, TLB purge needs to be performed in this case on ARM64. Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page() so that TLB purge can be added later in seprate patches. [toshi.kani@hpe.com: merge changes, rewrite patch description] Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by:
Chintan Pandya <cpandya@codeaurora.org> Signed-off-by:
Toshi Kani <toshi.kani@hpe.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: Will Deacon <will.deacon@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mark Salyzyn authored
commit 7992c188 upstream. CVE-2018-9363 The buffer length is unsigned at all layers, but gets cast to int and checked in hidp_process_report and can lead to a buffer overflow. Switch len parameter to unsigned int to resolve issue. This affects 3.18 and newer kernels. Signed-off-by:
Mark Salyzyn <salyzyn@android.com> Fixes: a4b1b587 ("HID: Bluetooth: hidp: make sure input buffers are big enough") Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Kees Cook <keescook@chromium.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: security@kernel.org Cc: kernel-team@android.com Acked-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Marcel Holtmann <marcel@holtmann.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thierry Escande authored
commit 3bbda5a3 upstream. If the ts3a227e audio accessory detection hardware is present and its driver probed, the jack needs to be created before enabling jack detection in the ts3a227e driver. With this patch, the jack is instantiated in the max98090 headset init function if the ts3a227e is present. This fixes a null pointer dereference as the jack detection enabling function in the ts3a driver was called before the jack is created. [minor correction to keep error handling on jack creation the same as before by Pierre Bossart] Signed-off-by:
Thierry Escande <thierry.escande@collabora.com> Signed-off-by:
Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Acked-By:
Vinod Koul <vinod.koul@intel.com> Signed-off-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
commit 318abdfb upstream. Like the skcipher_walk and blkcipher_walk cases: scatterwalk_done() is only meant to be called after a nonzero number of bytes have been processed, since scatterwalk_pagedone() will flush the dcache of the *previous* page. But in the error case of ablkcipher_walk_done(), e.g. if the input wasn't an integer number of blocks, scatterwalk_done() was actually called after advancing 0 bytes. This caused a crash ("BUG: unable to handle kernel paging request") during '!PageSlab(page)' on architectures like arm and arm64 that define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was page-aligned as in that case walk->offset == 0. Fix it by reorganizing ablkcipher_walk_done() to skip the scatterwalk_advance() and scatterwalk_done() if an error has occurred. Reported-by:
Liu Chao <liuchao741@huawei.com> Fixes: bf06099d ("crypto: skcipher - Add ablkcipher_walk interfaces") Cc: <stable@vger.kernel.org> # v2.6.35+ Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
commit 0868def3 upstream. Like the skcipher_walk case: scatterwalk_done() is only meant to be called after a nonzero number of bytes have been processed, since scatterwalk_pagedone() will flush the dcache of the *previous* page. But in the error case of blkcipher_walk_done(), e.g. if the input wasn't an integer number of blocks, scatterwalk_done() was actually called after advancing 0 bytes. This caused a crash ("BUG: unable to handle kernel paging request") during '!PageSlab(page)' on architectures like arm and arm64 that define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was page-aligned as in that case walk->offset == 0. Fix it by reorganizing blkcipher_walk_done() to skip the scatterwalk_advance() and scatterwalk_done() if an error has occurred. This bug was found by syzkaller fuzzing. Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE: #include <linux/if_alg.h> #include <sys/socket.h> #include <unistd.h> int main() { struct sockaddr_alg addr = { .salg_type = "skcipher", .salg_name = "ecb(aes-generic)", }; char buffer[4096] __attribute__((aligned(4096))) = { 0 }; int fd; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16); fd = accept(fd, NULL, NULL); write(fd, buffer, 15); read(fd, buffer, 15); } Reported-by:
Liu Chao <liuchao741@huawei.com> Fixes: 5cde0af2 ("[CRYPTO] cipher: Added block cipher type") Cc: <stable@vger.kernel.org> # v2.6.19+ Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
commit bb296481 upstream. syzbot reported a crash in vmac_final() when multiple threads concurrently use the same "vmac(aes)" transform through AF_ALG. The bug is pretty fundamental: the VMAC template doesn't separate per-request state from per-tfm (per-key) state like the other hash algorithms do, but rather stores it all in the tfm context. That's wrong. Also, vmac_final() incorrectly zeroes most of the state including the derived keys and cached pseudorandom pad. Therefore, only the first VMAC invocation with a given key calculates the correct digest. Fix these bugs by splitting the per-tfm state from the per-request state and using the proper init/update/final sequencing for requests. Reproducer for the crash: #include <linux/if_alg.h> #include <sys/socket.h> #include <unistd.h> int main() { int fd; struct sockaddr_alg addr = { .salg_type = "hash", .salg_name = "vmac(aes)", }; char buf[256] = { 0 }; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); setsockopt(fd, SOL_ALG, ALG_SET_KEY, buf, 16); fork(); fd = accept(fd, NULL, NULL); for (;;) write(fd, buf, 256); } The immediate cause of the crash is that vmac_ctx_t.partial_size exceeds VMAC_NHBYTES, causing vmac_final() to memset() a negative length. Reported-by: syzbot+264bca3a6e8d645550d3@syzkaller.appspotmail.com Fixes: f1939f7c ("crypto: vmac - New hash algorithm for intel_txt support") Cc: <stable@vger.kernel.org> # v2.6.32+ Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
commit 73bf20ef upstream. The VMAC template assumes the block cipher has a 128-bit block size, but it failed to check for that. Thus it was possible to instantiate it using a 64-bit block size cipher, e.g. "vmac(cast5)", causing uninitialized memory to be used. Add the needed check when instantiating the template. Fixes: f1939f7c ("crypto: vmac - New hash algorithm for intel_txt support") Cc: <stable@vger.kernel.org> # v2.6.32+ Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Randy Dunlap authored
commit 934193a6 upstream. Verify that 'depmod' ($DEPMOD) is installed. This is a partial revert of commit 620c231c ("kbuild: do not check for ancient modutils tools"). Also update Documentation/process/changes.rst to refer to kmod instead of module-init-tools. Fixes kernel bugzilla #198965: https://bugzilla.kernel.org/show_bug.cgi?id=198965Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Jessica Yu <jeyu@kernel.org> Cc: Chih-Wei Huang <cwhuang@linux.org.tw> Cc: stable@vger.kernel.org # any kernel since 2012 Signed-off-by:
Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liwei Song authored
commit 17e83549 upstream. Fix the following kernel bug: kernel BUG at drivers/iommu/intel-iommu.c:3260! invalid opcode: 0000 [#5] PREEMPT SMP Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB0.X64.0013.D39.1608311820 08/31/2016 task: ffff880175389950 ti: ffff880176bec000 task.ti: ffff880176bec000 RIP: 0010:[<ffffffff8150a83b>] [<ffffffff8150a83b>] intel_unmap+0x25b/0x260 RSP: 0018:ffff880176bef5e8 EFLAGS: 00010296 RAX: 0000000000000024 RBX: ffff8800773c7c88 RCX: 000000000000ce04 RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000009 RBP: ffff880176bef638 R08: 0000000000000010 R09: 0000000000000004 R10: ffff880175389c78 R11: 0000000000000a4f R12: ffff8800773c7868 R13: 00000000ffffac88 R14: ffff8800773c7818 R15: 0000000000000001 FS: 00007fef21258700(0000) GS:ffff88017b5c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000066d6d8 CR3: 000000007118c000 CR4: 00000000003406e0 Stack: 00000000ffffac88 ffffffff8199867f ffff880176bef5f8 ffff880100000030 ffff880176bef668 ffff8800773c7c88 ffff880178288098 ffff8800772c0010 ffff8800773c7818 0000000000000001 ffff880176bef648 ffffffff8150a86e Call Trace: [<ffffffff8199867f>] ? printk+0x46/0x48 [<ffffffff8150a86e>] intel_unmap_page+0xe/0x10 [<ffffffffa039d99b>] ismt_access+0x27b/0x8fa [i2c_ismt] [<ffffffff81554420>] ? __pm_runtime_suspend+0xa0/0xa0 [<ffffffff815544a0>] ? pm_suspend_timer_fn+0x80/0x80 [<ffffffff81554420>] ? __pm_runtime_suspend+0xa0/0xa0 [<ffffffff815544a0>] ? pm_suspend_timer_fn+0x80/0x80 [<ffffffff8143dfd0>] ? pci_bus_read_dev_vendor_id+0xf0/0xf0 [<ffffffff8172b36c>] i2c_smbus_xfer+0xec/0x4b0 [<ffffffff810aa4d5>] ? vprintk_emit+0x345/0x530 [<ffffffffa038936b>] i2cdev_ioctl_smbus+0x12b/0x240 [i2c_dev] [<ffffffff810aa829>] ? vprintk_default+0x29/0x40 [<ffffffffa0389b33>] i2cdev_ioctl+0x63/0x1ec [i2c_dev] [<ffffffff811b04c8>] do_vfs_ioctl+0x328/0x5d0 [<ffffffff8119d8ec>] ? vfs_write+0x11c/0x190 [<ffffffff8109d449>] ? rt_up_read+0x19/0x20 [<ffffffff811b07f1>] SyS_ioctl+0x81/0xa0 [<ffffffff819a351b>] system_call_fastpath+0x16/0x6e This happen When run "i2cdetect -y 0" detect SMBus iSMT adapter. After finished I2C block read/write, when unmap the data buffer, a wrong device address was pass to dma_unmap_single(). To fix this, give dma_unmap_single() the "dev" parameter, just like what dma_map_single() does, then unmap can find the right devices. Fixes: 13f35ac1 ("i2c: Adding support for Intel iSMT SMBus 2.0 host controller") Signed-off-by:
Liwei Song <liwei.song@windriver.com> Reviewed-by:
Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by:
Wolfram Sang <wsa@the-dreams.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrey Konovalov authored
commit 0e410e15 upstream. With KASAN enabled the kernel has two different memset() functions, one with KASAN checks (memset) and one without (__memset). KASAN uses some macro tricks to use the proper version where required. For example memset() calls in mm/slub.c are without KASAN checks, since they operate on poisoned slab object metadata. The issue is that clang emits memset() calls even when there is no memset() in the source code. They get linked with improper memset() implementation and the kernel fails to boot due to a huge amount of KASAN reports during early boot stages. The solution is to add -fno-builtin flag for files with KASAN_SANITIZE := n marker. Link: http://lkml.kernel.org/r/8ffecfffe04088c52c42b92739c2bd8a0bcb3f5e.1516384594.git.andreyknvl@google.comSigned-off-by:
Andrey Konovalov <andreyknvl@google.com> Acked-by:
Nick Desaulniers <ndesaulniers@google.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [ Nick : Backported to 4.4 avoiding KUBSAN ] Signed-off-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Takashi Iwai authored
The 4.4.y stable backport dc6ae4df for the upstream commit 3d4bf93a ("tcp: detect malicious patterns in tcp_collapse_ofo_queue()") missed a line that enlarges the range_truesize value, which broke the whole check. Fixes: dc6ae4df ("tcp: detect malicious patterns in tcp_collapse_ofo_queue()") Signed-off-by:
Takashi Iwai <tiwai@suse.de> Cc: Michal Kubecek <mkubecek@suse.cz>
-
Toshi Kani authored
commit f967db0b upstream. ioremap() supports pmd mappings on x86-PAE. However, kernel's pmd tables are not shared among processes on x86-PAE. Therefore, any update to sync'd pmd entries need re-syncing. Freeing a pte page also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one(). Disable free page handling on x86-PAE. pud_free_pmd_page() and pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present. This assures that ioremap() does not update sync'd pmd entries at the cost of falling back to pte mappings. Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces") Reported-by:
Joerg Roedel <joro@8bytes.org> Signed-off-by:
Toshi Kani <toshi.kani@hpe.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: cpandya@codeaurora.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-2-toshi.kani@hpe.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 Aug, 2018 23 commits
-
-
Greg Kroah-Hartman authored
-
Jiri Kosina authored
commit 6c26fcd2 upstream. pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the !__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g. ia64) and breaks build: include/asm-generic/pgtable.h: Assembler messages: include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)' include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true' include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)' include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false' arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle Move those two static inlines into the !__ASSEMBLY__ section so that they don't confuse the asm build pass. Fixes: 42e4089c ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings") Signed-off-by:
Jiri Kosina <jkosina@suse.cz> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [groeck: Context changes] Signed-off-by:
Guenter Roeck <linux@roeck-us.net>
-
Vlastimil Babka authored
commit 792adb90 upstream. The introduction of generic_max_swapfile_size and arch-specific versions has broken linking on x86 with CONFIG_SWAP=n due to undefined reference to 'generic_max_swapfile_size'. Fix it by compiling the x86-specific max_swapfile_size() only with CONFIG_SWAP=y. Reported-by:
Tomas Pruzina <pruzinat@gmail.com> Fixes: 377eeaa8 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Cc: stable@vger.kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guenter Roeck authored
In linux-4.4.y, the definition of X86_FEATURE_RETPOLINE and X86_FEATURE_RETPOLINE_AMD is different from the upstream definition. Result is an overlap with the newly introduced X86_FEATURE_L1TF_PTEINV. Update RETPOLINE definitions to match upstream definitions to improve alignment with upstream code. Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 1063711b upstream The mmio tracer sets io mapping PTEs and PMDs to non present when enabled without inverting the address bits, which makes the PTE entry vulnerable for L1TF. Make it use the right low level macros to actually invert the address bits to protect against L1TF. In principle this could be avoided because MMIO tracing is not likely to be enabled on production machines, but the fix is straigt forward and for consistency sake it's better to get rid of the open coded PTE manipulation. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 958f79b9 upstream set_memory_np() is used to mark kernel mappings not present, but it has it's own open coded mechanism which does not have the L1TF protection of inverting the address bits. Replace the open coded PTE manipulation with the L1TF protecting low level PTE routines. Passes the CPA self test. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> [ dwmw2: Pull in pud_mkhuge() from commit a00cc7d9, and pfn_pud() ] Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> [groeck: port to 4.4] Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 0768f915 upstream Some cases in THP like: - MADV_FREE - mprotect - split mark the PMD non present for temporarily to prevent races. The window for an L1TF attack in these contexts is very small, but it wants to be fixed for correctness sake. Use the proper low level functions for pmd/pud_mknotpresent() to address this. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit f22cc87f upstream For kernel mappings PAGE_PROTNONE is not necessarily set for a non present mapping, but the inversion logic explicitely checks for !PRESENT and PROT_NONE. Remove the PROT_NONE check and make the inversion unconditional for all not present mappings. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Hocko authored
commit e14d7dfb upstream Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for CONFIG_PAE because phys_addr_t is wider than unsigned long and so the pte_val reps. shift left would get truncated. Fix this up by using proper types. [dwmw2: Backport to 4.9] Fixes: 6b28baca ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation") Reported-by:
Jan Beulich <JBeulich@suse.com> Signed-off-by:
Michal Hocko <mhocko@suse.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vlastimil Babka authored
commit 0d0f6249 upstream The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for offset. The lower word is zeroed, thus systems with less than 4GB memory are safe. With 4GB to 128GB the swap type selects the memory locations vulnerable to L1TF; with even more memory, also the swap offfset influences the address. This might be a problem with 32bit PAE guests running on large 64bit hosts. By continuing to keep the whole swap entry in either high or low 32bit word of PTE we would limit the swap size too much. Thus this patch uses the whole PAE PTE with the same layout as the 64bit version does. The macros just become a bit tricky since they assume the arch-dependent swp_entry_t to be 32bit. Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Michal Hocko <mhocko@suse.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 11e34e64 upstream 336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR (IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set. This new MSR "gives software a way to invalidate structures with finer granularity than other architectual methods like WBINVD." A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vlastimil Babka authored
commit 1a7ed1ba upstream The previous patch has limited swap file size so that large offsets cannot clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation. It assumed that offsets are encoded starting with bit 12, same as pfn. But on x86_64, offsets are encoded starting with bit 9. Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA and 256TB with 46bit MAX_PA. Fixes: 377eeaa8 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 56563f53 upstream The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt which was defined as "Spectre V2 : ". Move the function to be past SSBD and also define the pr_fmt. Fixes: 17dbca11 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 377eeaa8 upstream For the L1TF workaround its necessary to limit the swap file size to below MAX_PA/2, so that the higher bits of the swap offset inverted never point to valid memory. Add a mechanism for the architecture to override the swap file size check in swapfile.c and add a x86 specific max swapfile check function that enforces that limit. The check is only enabled if the CPU is vulnerable to L1TF. In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system with 46bit PA it is 32TB. The limit is only per individual swap file, so it's always possible to exceed these limits with multiple swap files or partitions. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 42e4089c upstream For L1TF PROT_NONE mappings are protected by inverting the PFN in the page table entry. This sets the high bits in the CPU's address space, thus making sure to point to not point an unmapped entry to valid cached memory. Some server system BIOSes put the MMIO mappings high up in the physical address space. If such an high mapping was mapped to unprivileged users they could attack low memory by setting such a mapping to PROT_NONE. This could happen through a special device driver which is not access protected. Normal /dev/mem is of course access protected. To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings. Valid page mappings are allowed because the system is then unsafe anyways. It's not expected that users commonly use PROT_NONE on MMIO. But to minimize any impact this is only enforced if the mapping actually refers to a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip the check for root. For mmaps this is straight forward and can be handled in vm_insert_pfn and in remap_pfn_range(). For mprotect it's a bit trickier. At the point where the actual PTEs are accessed a lot of state has been changed and it would be difficult to undo on an error. Since this is a uncommon case use a separate early page talk walk pass for MMIO PROT_NONE mappings that checks for this condition early. For non MMIO and non PROT_NONE there are no changes. [dwmw2: Backport to 4.9] [groeck: Backport to 4.4] Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Acked-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit 87744ab3 upstream vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(), fails to check the pgprot_t it uses for the mapping against the one recorded in the memtype tracking tree. Add the missing call to track_pfn_insert() to preclude cases where incompatible aliased mappings are established for a given physical address range. [groeck: Backport to v4.4.y] Link: http://lkml.kernel.org/r/147328717909.35069.14256589123570653697.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by:
Dan Williams <dan.j.williams@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Lutomirski authored
commit 1745cbc5 upstream The x86 vvar vma contains pages with differing cacheability flags. x86 currently implements this by manually inserting all the ptes using (io_)remap_pfn_range when the vma is set up. x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up the mappings as needed. The correct API to use to insert a pfn in .fault is vm_insert_pfn(), but vm_insert_pfn() can't override the vma's cache mode, and the HPET page in particular needs to be uncached despite the fact that the rest of the VMA is cached. Add vm_insert_pfn_prot() to support varying cacheability within the same non-COW VMA in a more sane manner. x86 could alternatively use multiple VMAs, but that's messy, would break CRIU, and would create unnecessary VMAs that would waste memory. Signed-off-by:
Andy Lutomirski <luto@kernel.org> Reviewed-by:
Kees Cook <keescook@chromium.org> Acked-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d2938d1eb37be7a5e4f86182db646551f11e45aa.1451446564.git.luto@kernel.orgSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 17dbca11 upstream L1TF core kernel workarounds are cheap and normally always enabled, However they still should be reported in sysfs if the system is vulnerable or mitigated. Add the necessary CPU feature/bug bits. - Extend the existing checks for Meltdowns to determine if the system is vulnerable. All CPUs which are not vulnerable to Meltdown are also not vulnerable to L1TF - Check for 32bit non PAE and emit a warning as there is no practical way for mitigation due to the limited physical address bits - If the system has more than MAX_PA/2 physical memory the invert page workarounds don't protect the system against the L1TF attack anymore, because an inverted physical address will also point to valid memory. Print a warning in this case and report that the system is vulnerable. Add a function which returns the PFN limit for the L1TF mitigation, which will be used in follow up patches for sanity and range checks. [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ] [ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ] Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Acked-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 10a70416 upstream The L1TF workaround doesn't make any attempt to mitigate speculate accesses to the first physical page for zeroed PTEs. Normally it only contains some data from the early real mode BIOS. It's not entirely clear that the first page is reserved in all configurations, so add an extra reservation call to make sure it is really reserved. In most configurations (e.g. with the standard reservations) it's likely a nop. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Acked-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 6b28baca upstream When PTEs are set to PROT_NONE the kernel just clears the Present bit and preserves the PFN, which creates attack surface for L1TF speculation speculation attacks. This is important inside guests, because L1TF speculation bypasses physical page remapping. While the host has its own migitations preventing leaking data from other VMs into the guest, this would still risk leaking the wrong page inside the current guest. This uses the same technique as Linus' swap entry patch: while an entry is is in PROTNONE state invert the complete PFN part part of it. This ensures that the the highest bit will point to non existing memory. The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and pte/pmd/pud_pfn undo it. This assume that no code path touches the PFN part of a PTE directly without using these primitives. This doesn't handle the case that MMIO is on the top of the CPU physical memory. If such an MMIO region was exposed by an unpriviledged driver for mmap it would be possible to attack some real memory. However this situation is all rather unlikely. For 32bit non PAE the inversion is not done because there are really not enough bits to protect anything. Q: Why does the guest need to be protected when the HyperVisor already has L1TF mitigations? A: Here's an example: Physical pages 1 2 get mapped into a guest as GPA 1 -> PA 2 GPA 2 -> PA 1 through EPT. The L1TF speculation ignores the EPT remapping. Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and they belong to different users and should be isolated. A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and gets read access to the underlying physical page. Which in this case points to PA 2, so it can read process B's data, if it happened to be in L1, so isolation inside the guest is broken. There's nothing the hypervisor can do about this. This mitigation has to be done in the guest itself. [ tglx: Massaged changelog ] [ dwmw2: backported to 4.9 ] Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Acked-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 2f22b4cd upstream With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting side effects allow to read the memory the PTE is pointing too, if its values are still in the L1 cache. For swapped out pages Linux uses unmapped PTEs and stores a swap entry into them. To protect against L1TF it must be ensured that the swap entry is not pointing to valid memory, which requires setting higher bits (between bit 36 and bit 45) that are inside the CPUs physical address space, but outside any real memory. To do this invert the offset to make sure the higher bits are always set, as long as the swap file is not too big. Note there is no workaround for 32bit !PAE, or on systems which have more than MAX_PA/2 worth of memory. The later case is very unlikely to happen on real systems. [AK: updated description and minor tweaks by. Split out from the original patch ] Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Andi Kleen <ak@linux.intel.com> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Acked-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit bcd11afa upstream If pages are swapped out, the swap entry is stored in the corresponding PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate on PTE entries which have the present bit set and would treat the swap entry as phsyical address (PFN). To mitigate that the upper bits of the PTE must be set so the PTE points to non existent memory. The swap entry stores the type and the offset of a swapped out page in the PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware ignores the bits beyond the phsyical address space limit, so to make the mitigation effective its required to start 'offset' at the lowest possible bit so that even large swap offsets do not reach into the physical address space limit bits. Move offset to bit 9-58 and type to bit 59-63 which are the bits that hardware generally doesn't care about. That, in turn, means that if you on desktop chip with only 40 bits of physical addressing, now that the offset starts at bit 9, there needs to be 30 bits of offset actually *in use* until bit 39 ends up being set, which means when inverted it will again point into existing memory. So that's 4 terabyte of swap space (because the offset is counted in pages, so 30 bits of offset is 42 bits of actual coverage). With bigger physical addressing, that obviously grows further, until the limit of the offset is hit (at 50 bits of offset - 62 bits of actual swap file coverage). This is a preparatory change for the actual swap entry inversion to protect against L1TF. [ AK: Updated description and minor tweaks. Split into two parts ] [ tglx: Massaged changelog ] Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Andi Kleen <ak@linux.intel.com> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Acked-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naoya Horiguchi authored
commit eee4818b upstream _PAGE_PSE is used to distinguish between a truly non-present (_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and should be treated as present. But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would cause confusion between one of those PMDs undergoing a THP split, and a soft-dirty PMD. Dropping _PAGE_PSE check in pmd_present() does not work well, because it can hurt optimization of tlb handling in thp split. Thus, we need to move the bit. In the current kernel, bits 1-4 are not used in non-present format since commit 00839ee3 ("x86/mm: Move swap offset/type up in PTE to work around erratum"). So let's move _PAGE_SWP_SOFT_DIRTY to bit 1. Bit 7 is used as reserved (always clear), so please don't use it for other purpose. [dwmw2: Pulled in to 4.9 backport to support L1TF changes] Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.comSigned-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by:
Zi Yan <zi.yan@cs.rutgers.edu> Acked-by:
Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-