- 22 May, 2018 40 commits
-
-
Konrad Rzeszutek Wilk authored
commit e96f46ee upstream The style for the 'status' file is CamelCase or this. _. Fixes: fae1fa0f ("proc: Provide details on speculation flaw mitigations") Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 9f65fb29 upstream Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2] as SSBD (Speculative Store Bypass Disable). Hence changing it. It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name is going to be. Following the rename it would be SSBD_NO but that rolls out to Speculative Store Bypass Disable No. Also fixed the missing space in X86_FEATURE_AMD_SSBD. [ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ] Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit f21b53b2 upstream Unless explicitly opted out of, anything running under seccomp will have SSB mitigations enabled. Choosing the "prctl" mode will disable this. [ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ] Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit 8bf37d8c upstream The migitation control is simpler to implement in architecture code as it avoids the extra function call to check the mode. Aside of that having an explicit seccomp enabled mode in the architecture mitigations would require even more workarounds. Move it into architecture code and provide a weak function in the seccomp code. Remove the 'which' argument as this allows the architecture to decide which mitigations are relevant for seccomp. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit 00a02d0c upstream If a seccomp user is not interested in Speculative Store Bypass mitigation by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when adding filters. Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit b849a812 upstream Use PR_SPEC_FORCE_DISABLE in seccomp() because seccomp does not allow to widen restrictions. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit 356e4bff upstream For certain use cases it is desired to enforce mitigations so they cannot be undone afterwards. That's important for loader stubs which want to prevent a child from disabling the mitigation again. Will also be used for seccomp(). The extra state preserving of the prctl state for SSB is a preparatory step for EBPF dymanic speculation control. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit f9544b2b upstream There's no reason for these to be changed after boot. Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit 5c307089 upstream When speculation flaw mitigations are opt-in (via prctl), using seccomp will automatically opt-in to these protections, since using seccomp indicates at least some level of sandboxing is desired. Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit fae1fa0f upstream As done with seccomp and no_new_privs, also show speculation flaw mitigation state in /proc/$pid/status. Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit 7bbf1373 upstream Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than current. This is needed both for /proc/$pid/status queries and for seccomp (since thread-syncing can trigger seccomp in non-current threads). Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit a73ec77e upstream Add prctl based control for Speculative Store Bypass mitigation and make it the default mitigation for Intel and AMD. Andi Kleen provided the following rationale (slightly redacted): There are multiple levels of impact of Speculative Store Bypass: 1) JITed sandbox. It cannot invoke system calls, but can do PRIME+PROBE and may have call interfaces to other code 2) Native code process. No protection inside the process at this level. 3) Kernel. 4) Between processes. The prctl tries to protect against case (1) doing attacks. If the untrusted code can do random system calls then control is already lost in a much worse way. So there needs to be system call protection in some way (using a JIT not allowing them or seccomp). Or rather if the process can subvert its environment somehow to do the prctl it can already execute arbitrary code, which is much worse than SSB. To put it differently, the point of the prctl is to not allow JITed code to read data it shouldn't read from its JITed sandbox. If it already has escaped its sandbox then it can already read everything it wants in its address space, and do much worse. The ability to control Speculative Store Bypass allows to enable the protection selectively without affecting overall system performance. Based on an initial patch from Tim Chen. Completely rewritten. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit 885f82bf upstream The Speculative Store Bypass vulnerability can be mitigated with the Reduced Data Speculation (RDS) feature. To allow finer grained control of this eventually expensive mitigation a per task mitigation control is required. Add a new TIF_RDS flag and put it into the group of TIF flags which are evaluated for mismatch in switch_to(). If these bits differ in the previous and the next task, then the slow path function __switch_to_xtra() is invoked. Implement the TIF_RDS dependent mitigation control in the slow path. If the prctl for controlling Speculative Store Bypass is disabled or no task uses the prctl then there is no overhead in the switch_to() fast path. Update the KVM related speculation control functions to take TID_RDS into account as well. Based on a patch from Tim Chen. Completely rewritten. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit b617cfc8 upstream Add two new prctls to control aspects of speculation related vulnerabilites and their mitigations to provide finer grained control over performance impacting mitigations. PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature which is selected with arg2 of prctl(2). The return value uses bit 0-2 with the following meaning: Bit Define Description 0 PR_SPEC_PRCTL Mitigation can be controlled per task by PR_SET_SPECULATION_CTRL 1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is disabled 2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is enabled If all bits are 0 the CPU is not affected by the speculation misfeature. If PR_SPEC_PRCTL is set, then the per task control of the mitigation is available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation misfeature will fail. PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which is selected by arg2 of prctl(2) per task. arg3 is used to hand in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE. The common return values are: EINVAL prctl is not implemented by the architecture or the unused prctl() arguments are not 0 ENODEV arg2 is selecting a not supported speculation misfeature PR_SET_SPECULATION_CTRL has these additional return values: ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE ENXIO prctl control of the selected speculation misfeature is disabled The first supported controlable speculation misfeature is PR_SPEC_STORE_BYPASS. Add the define so this can be shared between architectures. Based on an initial patch from Tim Chen and mostly rewritten. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit 28a27752 upstream Having everything in nospec-branch.h creates a hell of dependencies when adding the prctl based switching mechanism. Move everything which is not required in nospec-branch.h to spec-ctrl.h and fix up the includes in the relevant files. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit da39556f upstream Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various combinations of SPEC_CTRL MSR values. The handling of the MSR (to take into account the host value of SPEC_CTRL Bit(2)) is taken care of in patch: KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 764f3c21 upstream AMD does not need the Speculative Store Bypass mitigation to be enabled. The parameters for this are already available and can be done via MSR C001_1020. Each family uses a different bit in that MSR for this. [ tglx: Expose the bit mask via a variable and move the actual MSR fiddling into the bugs code as that's the right thing to do and also required to prepare for dynamic enable/disable ] Suggested-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 1115a859 upstream Intel and AMD SPEC_CTRL (0x48) MSR semantics may differ in the future (or in fact use different MSRs for the same functionality). As such a run-time mechanism is required to whitelist the appropriate MSR values. [ tglx: Made the variable __ro_after_init ] Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 77243971 upstream Intel CPUs expose methods to: - Detect whether RDS capability is available via CPUID.7.0.EDX[31], - The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS. - MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS. With that in mind if spec_store_bypass_disable=[auto,on] is selected set at boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it. Note that this does not fix the KVM case where the SPEC_CTRL is exposed to guests which can muck with it, see patch titled : KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS. And for the firmware (IBRS to be set), see patch titled: x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits [ tglx: Distangled it from the intel implementation and kept the call order ] Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 24f7fc83 upstream Contemporary high performance processors use a common industry-wide optimization known as "Speculative Store Bypass" in which loads from addresses to which a recent store has occurred may (speculatively) see an older value. Intel refers to this feature as "Memory Disambiguation" which is part of their "Smart Memory Access" capability. Memory Disambiguation can expose a cache side-channel attack against such speculatively read values. An attacker can create exploit code that allows them to read memory outside of a sandbox environment (for example, malicious JavaScript in a web page), or to perform more complex attacks against code running within the same privilege level, e.g. via the stack. As a first step to mitigate against such attacks, provide two boot command line control knobs: nospec_store_bypass_disable spec_store_bypass_disable=[off,auto,on] By default affected x86 processors will power on with Speculative Store Bypass enabled. Hence the provided kernel parameters are written from the point of view of whether to enable a mitigation or not. The parameters are as follows: - auto - Kernel detects whether your CPU model contains an implementation of Speculative Store Bypass and picks the most appropriate mitigation. - on - disable Speculative Store Bypass - off - enable Speculative Store Bypass [ tglx: Reordered the checks so that the whole evaluation is not done when the CPU does not support RDS ] Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 0cc5fa00 upstream Add the CPU feature bit CPUID.7.0.EDX[31] which indicates whether the CPU supports Reduced Data Speculation. [ tglx: Split it out from a later patch ] Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit c456442c upstream Add the sysfs file for the new vulerability. It does not do much except show the words 'Vulnerable' for recent x86 cores. Intel cores prior to family 6 are known not to be vulnerable, and so are some Atoms and some Xeon Phi. It assumes that older Cyrix, Centaur, etc. cores are immune. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 5cf68754 upstream A guest may modify the SPEC_CTRL MSR from the value used by the kernel. Since the kernel doesn't use IBRS, this means a value of zero is what is needed in the host. But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to the other bits as reserved so the kernel should respect the boot time SPEC_CTRL value and use that. This allows to deal with future extensions to the SPEC_CTRL interface if any at all. Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any difference as paravirt will over-write the callq *0xfff.. with the wrmsrl assembler code. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 1b86883c upstream The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all the other bits as reserved. The Intel SDM glossary defines reserved as implementation specific - aka unknown. As such at bootup this must be taken it into account and proper masking for the bits in use applied. A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 [ tglx: Made x86_spec_ctrl_base __ro_after_init ] Suggested-by:
Jon Masters <jcm@redhat.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit d1059518 upstream Those SysFS functions have a similar preamble, as such make common code to handle them. Suggested-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 4a28bfe3 upstream Combine the various logic which goes through all those x86_cpu_id matching structures in one function. Suggested-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 1aa7a573 upstream The macro is not type safe and I did look for why that "g" constraint for the asm doesn't work: it's because the asm is more fundamentally wrong. It does movl %[val], %%eax but "val" isn't a 32-bit value, so then gcc will pass it in a register, and generate code like movl %rsi, %eax and gas will complain about a nonsensical 'mov' instruction (it's moving a 64-bit register to a 32-bit one). Passing it through memory will just hide the real bug - gcc still thinks the memory location is 64-bit, but the "movl" will only load the first 32 bits and it all happens to work because x86 is little-endian. Convert it to a type safe inline function with a little trick which hands the feature into the ALTERNATIVE macro. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liu Bo authored
commit 02a3307a upstream. If a btree block, aka. extent buffer, is not available in the extent buffer cache, it'll be read out from the disk instead, i.e. btrfs_search_slot() read_block_for_search() # hold parent and its lock, go to read child btrfs_release_path() read_tree_block() # read child Unfortunately, the parent lock got released before reading child, so commit 5bdd3536 ("Btrfs: Fix block generation verification race") had used 0 as parent transid to read the child block. It forces read_tree_block() not to check if parent transid is different with the generation id of the child that it reads out from disk. A simple PoC is included in btrfs/124, 0. A two-disk raid1 btrfs, 1. Right after mkfs.btrfs, block A is allocated to be device tree's root. 2. Mount this filesystem and put it in use, after a while, device tree's root got COW but block A hasn't been allocated/overwritten yet. 3. Umount it and reload the btrfs module to remove both disks from the global @fs_devices list. 4. mount -odegraded dev1 and write some data, so now block A is allocated to be a leaf in checksum tree. Note that only dev1 has the latest metadata of this filesystem. 5. Umount it and mount it again normally (with both disks), since raid1 can pick up one disk by the writer task's pid, if btrfs_search_slot() needs to read block A, dev2 which does NOT have the latest metadata might be read for block A, then we got a stale block A. 6. As parent transid is not checked, block A is marked as uptodate and put into the extent buffer cache, so the future search won't bother to read disk again, which means it'll make changes on this stale one and make it dirty and flush it onto disk. To avoid the problem, parent transid needs to be passed to read_tree_block(). In order to get a valid parent transid, we need to hold the parent's lock until finishing reading child. This patch needs to be slightly adapted for stable kernels, the &first_key parameter added to read_tree_block() is from 4.16+ (581c1760). The fix is to replace 0 by 'gen'. Fixes: 5bdd3536 ("Btrfs: Fix block generation verification race") CC: stable@vger.kernel.org # 4.4+ Signed-off-by:
Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by:
Filipe Manana <fdmanana@suse.com> Reviewed-by:
Qu Wenruo <wqu@suse.com> [ update changelog ] Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikolay Borisov authored
commit fe816d0f upstream. When a transaction is aborted btrfs_cleanup_transaction is called to cleanup all the various in-flight bits and pieces which migth be active. One of those is delalloc inodes - inodes which have dirty pages which haven't been persisted yet. Currently the process of freeing such delalloc inodes in exceptional circumstances such as transaction abort boiled down to calling btrfs_invalidate_inodes whose sole job is to invalidate the dentries for all inodes related to a root. This is in fact wrong and insufficient since such delalloc inodes will likely have pending pages or ordered-extents and will be linked to the sb->s_inode_list. This means that unmounting a btrfs instance with an aborted transaction could potentially lead inodes/their pages visible to the system long after their superblock has been freed. This in turn leads to a "use-after-free" situation once page shrink is triggered. This situation could be simulated by running generic/019 which would cause such inodes to be left hanging, followed by generic/176 which causes memory pressure and page eviction which lead to touching the freed super block instance. This situation is additionally detected by the unmount code of VFS with the following message: "VFS: Busy inodes after unmount of Self-destruct in 5 seconds. Have a nice day..." Additionally btrfs hits WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); in free_fs_root for the same reason. This patch aims to rectify the sitaution by doing the following: 1. Change btrfs_destroy_delalloc_inodes so that it calls invalidate_inode_pages2 for every inode on the delalloc list, this ensures that all the pages of the inode are released. This function boils down to calling btrfs_releasepage. During test I observed cases where inodes on the delalloc list were having an i_count of 0, so this necessitates using igrab to be sure we are working on a non-freed inode. 2. Since calling btrfs_releasepage might queue delayed iputs move the call out to btrfs_cleanup_transaction in btrfs_error_commit_super before calling run_delayed_iputs for the last time. This is necessary to ensure that delayed iputs are run. Note: this patch is tagged for 4.14 stable but the fix applies to older versions too but needs to be backported manually due to conflicts. CC: stable@vger.kernel.org # 4.14.x: 2b877331: btrfs: Split btrfs_del_delalloc_inode into 2 functions CC: stable@vger.kernel.org # 4.14.x Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> [ add comment to igrab ] Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikolay Borisov authored
commit 2b877331 upstream. This is in preparation of fixing delalloc inodes leakage on transaction abort. Also export the new function. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Reviewed-by:
Anand Jain <anand.jain@oracle.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anand Jain authored
commit 02ee654d upstream. We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance() only, which isn't called during the remount. So when resuming from the paused balance we hit the bug: kernel: kernel BUG at fs/btrfs/volumes.c:3890! :: kernel: balance_kthread+0x51/0x60 [btrfs] kernel: kthread+0x111/0x130 :: kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8 Reproducer: On a mounted filesystem: btrfs balance start --full-balance /btrfs btrfs balance pause /btrfs mount -o remount,ro /dev/sdb /btrfs mount -o remount,rw /dev/sdb /btrfs To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async(). CC: stable@vger.kernel.org # 4.4+ Signed-off-by:
Anand Jain <anand.jain@oracle.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Misono Tomohiro authored
commit 1a63c198 upstream. Incompat flag of LZO/ZSTD compression should be set at: 1. mount time (-o compress/compress-force) 2. when defrag is done 3. when property is set Currently 3. is missing and this commit adds this. This could lead to a filesystem that uses ZSTD but is not marked as such. If a kernel without a ZSTD support encounteres a ZSTD compressed extent, it will handle that but this could be confusing to the user. Typically the filesystem is mounted with the ZSTD option, but the discrepancy can arise when a filesystem is never mounted with ZSTD and then the property on some file is set (and some new extents are written). A simple mount with -o compress=zstd will fix that up on an unpatched kernel. Same goes for LZO, but this has been around for a very long time (2.6.37) so it's unlikely that a pre-LZO kernel would be used. Fixes: 5c1aab1d ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Signed-off-by:
Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by:
Anand Jain <anand.jain@oracle.com> Reviewed-by:
David Sterba <dsterba@suse.com> [ add user visible impact ] Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Robbie Ko authored
commit 6f2f0b39 upstream. [BUG] btrfs incremental send BUG happens when creating a snapshot of snapshot that is being used by send. [REASON] The problem can happen if while we are doing a send one of the snapshots used (parent or send) is snapshotted, because snapshoting implies COWing the root of the source subvolume/snapshot. 1. When doing an incremental send, the send process will get the commit roots from the parent and send snapshots, and add references to them through extent_buffer_get(). 2. When a snapshot/subvolume is snapshotted, its root node is COWed (transaction.c:create_pending_snapshot()). 3. COWing releases the space used by the node immediately, through: __btrfs_cow_block() --btrfs_free_tree_block() ----btrfs_add_free_space(bytenr of node) 4. Because send doesn't hold a transaction open, it's possible that the transaction used to create the snapshot commits, switches the commit root and the old space used by the previous root node gets assigned to some other node allocation. Allocation of a new node will use the existing extent buffer found in memory, which we previously got a reference through extent_buffer_get(), and allow the extent buffer's content (pages) to be modified: btrfs_alloc_tree_block --btrfs_reserve_extent ----find_free_extent (get bytenr of old node) --btrfs_init_new_buffer (use bytenr of old node) ----btrfs_find_create_tree_block ------alloc_extent_buffer --------find_extent_buffer (get old node) 5. So send can access invalid memory content and have unpredictable behaviour. [FIX] So we fix the problem by copying the commit roots of the send and parent snapshots and use those copies. CallTrace looks like this: ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1861! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 24235 Comm: btrfs Tainted: P O 3.10.105 #23721 ffff88046652d680 ti: ffff88041b720000 task.ti: ffff88041b720000 RIP: 0010:[<ffffffffa08dd0e8>] read_node_slot+0x108/0x110 [btrfs] RSP: 0018:ffff88041b723b68 EFLAGS: 00010246 RAX: ffff88043ca6b000 RBX: ffff88041b723c50 RCX: ffff880000000000 RDX: 000000000000004c RSI: ffff880314b133f8 RDI: ffff880458b24000 RBP: 0000000000000000 R08: 0000000000000001 R09: ffff88041b723c66 R10: 0000000000000001 R11: 0000000000001000 R12: ffff8803f3e48890 R13: ffff8803f3e48880 R14: ffff880466351800 R15: 0000000000000001 FS: 00007f8c321dc8c0(0000) GS:ffff88047fcc0000(0000) CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 R2: 00007efd1006d000 CR3: 0000000213a24000 CR4: 00000000003407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff88041b723c50 ffff8803f3e48880 ffff8803f3e48890 ffff8803f3e48880 ffff880466351800 0000000000000001 ffffffffa08dd9d7 ffff88041b723c50 ffff8803f3e48880 ffff88041b723c66 ffffffffa08dde85 a9ff88042d2c4400 Call Trace: [<ffffffffa08dd9d7>] ? tree_move_down.isra.33+0x27/0x50 [btrfs] [<ffffffffa08dde85>] ? tree_advance+0xb5/0xc0 [btrfs] [<ffffffffa08e83d4>] ? btrfs_compare_trees+0x2d4/0x760 [btrfs] [<ffffffffa0982050>] ? finish_inode_if_needed+0x870/0x870 [btrfs] [<ffffffffa09841ea>] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs] [<ffffffffa094bd3d>] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs] [<ffffffff81111133>] ? handle_pte_fault+0x373/0x990 [<ffffffff8153a096>] ? atomic_notifier_call_chain+0x16/0x20 [<ffffffff81063256>] ? set_task_cpu+0xb6/0x1d0 [<ffffffff811122c3>] ? handle_mm_fault+0x143/0x2a0 [<ffffffff81539cc0>] ? __do_page_fault+0x1d0/0x500 [<ffffffff81062f07>] ? check_preempt_curr+0x57/0x90 [<ffffffff8115075a>] ? do_vfs_ioctl+0x4aa/0x990 [<ffffffff81034f83>] ? do_fork+0x113/0x3b0 [<ffffffff812dd7d7>] ? trace_hardirqs_off_thunk+0x3a/0x6c [<ffffffff81150cc8>] ? SyS_ioctl+0x88/0xa0 [<ffffffff8153e422>] ? system_call_fastpath+0x16/0x1b ---[ end trace 29576629ee80b2e1 ]--- Fixes: 7069830a ("Btrfs: add btrfs_compare_trees function") CC: stable@vger.kernel.org # 3.6+ Signed-off-by:
Robbie Ko <robbieko@synology.com> Reviewed-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Filipe Manana authored
commit 9a8fca62 upstream. If a file has xattrs, we fsync it, to ensure we clear the flags BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its inode, the current transaction commits and then we fsync it (without either of those bits being set in its inode), we end up not logging all its xattrs. This results in deleting all xattrs when replying the log after a power failure. Trivial reproducer $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/foobar $ setfattr -n user.xa -v qwerty /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar $ sync $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar <power failure> $ mount /dev/sdb /mnt $ getfattr --absolute-names --dump /mnt/foobar <empty output> $ So fix this by making sure all xattrs are logged if we log a file's inode item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor BTRFS_INODE_COPY_EVERYTHING were set in the inode. Fixes: 36283bf7 ("Btrfs: fix fsync xattr loss in the fast fsync path") Cc: <stable@vger.kernel.org> # 4.2+ Signed-off-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Masami Hiramatsu authored
commit 0d73c3f8 upstream. Since do_undefinstr() uses get_user to get the undefined instruction, it can be called before kprobes processes recursive check. This can cause an infinit recursive exception. Prohibit probing on get_user functions. Fixes: 24ba613c ("ARM kprobes: core code") Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Masami Hiramatsu authored
commit 70948c05 upstream. Prohibit probing on optimized_callback() because it is called from kprobes itself. If we put a kprobes on it, that will cause a recursive call loop. Mark it NOKPROBE_SYMBOL. Fixes: 0dc016db ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Masami Hiramatsu authored
commit 69af7e23 upstream. Since get_kprobe_ctlblk() uses smp_processor_id() to access per-cpu variable, it hits smp_processor_id sanity check as below. [ 7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 7.007859] caller is debug_smp_processor_id+0x20/0x24 [ 7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1 [ 7.008890] Hardware name: Generic DT based system [ 7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24) [ 7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98) [ 7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c) [ 7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24) [ 7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4) [ 7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0) To fix this issue, call get_kprobe_ctlblk() right after irq-disabled since that disables preemption. Fixes: 0dc016db ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dexuan Cui authored
commit 5596fe34 upstream. for_each_cpu() unintuitively reports CPU0 as set independent of the actual cpumask content on UP kernels. This causes an unexpected PIT interrupt storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as a result, the virtual machine can suffer from a strange random delay of 1~20 minutes during boot-up, and sometimes it can hang forever. Protect if by checking whether the cpumask is empty before entering the for_each_cpu() loop. [ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ] Signed-off-by:
Dexuan Cui <decui@microsoft.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poulson <jopoulso@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org Cc: Rakib Mullick <rakib.mullick@gmail.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: KY Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COMSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Safonov authored
commit acf46020 upstream. The x86 mmap() code selects the mmap base for an allocation depending on the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and for 32bit mm->mmap_compat_base. exec() calls mmap() which in turn uses in_compat_syscall() to check whether the mapping is for a 32bit or a 64bit task. The decision is made on the following criteria: ia32 child->thread.status & TS_COMPAT x32 child->pt_regs.orig_ax & __X32_SYSCALL_BIT ia64 !ia32 && !x32 __set_personality_x32() was dropping TS_COMPAT flag, but set_personality_64bit() has kept compat syscall flag making in_compat_syscall() return true during the first exec() syscall. Which in result has user-visible effects, mentioned by Alexey: 1) It breaks ASAN $ gcc -fsanitize=address wrap.c -o wrap-asan $ ./wrap32 ./wrap-asan true ==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING. ==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range. ==1217==Process memory map follows: 0x000000400000-0x000000401000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000600000-0x000000601000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000601000-0x000000602000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x0000f7dbd000-0x0000f7de2000 /lib64/ld-2.27.so 0x0000f7fe2000-0x0000f7fe3000 /lib64/ld-2.27.so 0x0000f7fe3000-0x0000f7fe4000 /lib64/ld-2.27.so 0x0000f7fe4000-0x0000f7fe5000 0x7fed9abff000-0x7fed9af54000 0x7fed9af54000-0x7fed9af6b000 /lib64/libgcc_s.so.1 [snip] 2) It doesn't seem to be great for security if an attacker always knows that ld.so is going to be mapped into the first 4GB in this case (the same thing happens for PIEs as well). The testcase: $ cat wrap.c int main(int argc, char *argv[]) { execvp(argv[1], &argv[1]); return 127; } $ gcc wrap.c -o wrap $ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE AT_BASE: 0x7f63b8309000 AT_BASE: 0x7faec143c000 AT_BASE: 0x7fbdb25fa000 $ gcc -m32 wrap.c -o wrap32 $ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE AT_BASE: 0xf7eff000 AT_BASE: 0xf7cee000 AT_BASE: 0x7f8b9774e000 Fixes: 1b028f78 ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Fixes: ada26481 ("x86/mm: Make in_compat_syscall() work during exec") Reported-by:
Alexey Izbyshev <izbyshev@ispras.ru> Bisected-by:
Alexander Monakov <amonakov@ispras.ru> Investigated-by:
Andy Lutomirski <luto@kernel.org> Signed-off-by:
Dmitry Safonov <dima@arista.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: Alexander Monakov <amonakov@ispras.ru> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Masami Hiramatsu authored
commit eb0146da upstream. Prohibit kprobes on do_undefinstr because kprobes on arm is implemented by undefined instruction. This means if we probe do_undefinstr(), it can cause infinit recursive exception. Fixes: 24ba613c ("ARM kprobes: core code") Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-