- 13 Dec, 2015 40 commits
-
-
Paolo Bonzini authored
commit f5f3497c upstream. On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb *0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 000000ff IDT= fffbb000 000007ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163 (gdb) x/4wx 0x3399000 0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 0x32b7070: 0x00000000 [ It appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Andy Lutomirski provides some additional insight: "AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT." Up until commit 23a0d4e8 ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. Which explains why no one has hit this issue until now. ] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [ Updated changelog. ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Oleg Nesterov authored
commit 54708d28 upstream. The commit 96d0df79 ("proc: make proc_fd_permission() thread-friendly") fixed the access to /proc/self/fd from sub-threads, but introduced another problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd if generic_permission() fails. Change proc_fd_permission() to check same_thread_group(pid_task(), current). Fixes: 96d0df79 ("proc: make proc_fd_permission() thread-friendly") Reported-by: "Jin, Yihua" <yihua.jin@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Takashi Iwai authored
commit 60603950 upstream. Another Lifebook machine that needs the same quirk as other similar models to make the driver working. Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=883192Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Catalin Marinas authored
commit d4322d88 upstream. On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc configurations defining ARCH_DMA_MINALIGN to 128), the first kmalloc_caches[] entry to be initialised after slab_early_init = 0 is "kmalloc-128" with index 7. Depending on the debug kernel configuration, sizeof(struct kmem_cache) can be larger than 128 resulting in an INDEX_NODE of 8. Commit 8fc9cf42 ("slab: make more slab management structure off the slab") enables off-slab management objects for sizes starting with PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation of the "kmalloc-128" cache would try to place the management objects off-slab. However, since KMALLOC_MIN_SIZE is already 128 and freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size) returns NULL (kmalloc_caches[7] not populated yet). This triggers the following bug on arm64: kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540 Hardware name: Juno (DT) PC is at __kmem_cache_create+0x21c/0x280 LR is at __kmem_cache_create+0x210/0x280 [...] Call trace: __kmem_cache_create+0x21c/0x280 create_boot_cache+0x48/0x80 create_kmalloc_cache+0x50/0x88 create_kmalloc_caches+0x4c/0xf4 kmem_cache_init+0x100/0x118 start_kernel+0x214/0x33c This patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE. Fixes: 8fc9cf42 ("slab: make more slab management structure off the slab") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Christoph Hellwig authored
commit 40998193 upstream. When dropping a lock while iterating a list we must restart the search as other threads could have manipulated the list under us. Without this we can get stuck in an endless loop. This bug was introduced by commit bc3f02a7 Author: Dan Williams <djbw@fb.com> Date: Tue Aug 28 22:12:10 2012 -0700 [SCSI] scsi_remove_target: fix softlockup regression on hot remove Which was itself trying to fix a reported soft lockup issue http://thread.gmane.org/gmane.linux.kernel/1348679 However, we believe even with this revert of the original patch, the soft lockup problem has been fixed by commit f2495e22 Author: James Bottomley <JBottomley@Parallels.com> Date: Tue Jan 21 07:01:41 2014 -0800 [SCSI] dual scan thread bug fix Thanks go to Dan Williams <dan.j.williams@intel.com> for tracking all this prior history down. Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: bc3f02a7Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Stefan Richter authored
commit 100ceb66 upstream. Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo controllers: Often or even most of the time, the controller is initialized with the message "added OHCI v1.10 device as card 0, 4 IR + 0 IT contexts, quirks 0x10". With 0 isochronous transmit DMA contexts (IT contexts), applications like audio output are impossible. However, OHCI-1394 demands that at least 4 IT contexts are implemented by the link layer controller, and indeed JMicron JMB38x do implement four of them. Only their IsoXmitIntMask register is unreliable at early access. With my own JMB381 single function controller I found: - I can reproduce the problem with a lower probability than Craig's. - If I put a loop around the section which clears and reads IsoXmitIntMask, then either the first or the second attempt will return the correct initial mask of 0x0000000f. I never encountered a case of needing more than a second attempt. - Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet) before the first write, the subsequent read will return the correct result. - If I merely ignore a wrong read result and force the known real result, later isochronous transmit DMA usage works just fine. So let's just fix this chip bug up by the latter method. Tested with JMB381 on kernel 3.13 and 4.3. Since OHCI-1394 generally requires 4 IT contexts at a minium, this workaround is simply applied whenever the initial read of IsoXmitIntMask returns 0, regardless whether it's a JMicron chip or not. I never heard of this issue together with any other chip though. I am not 100% sure that this fix works on the OHCI-1394 part of JMB380 and JMB388 combo controllers exactly the same as on the JMB381 single- function controller, but so far I haven't had a chance to let an owner of a combo chip run a patched kernel. Strangely enough, IsoRecvIntMask is always reported correctly, even though it is probed right before IsoXmitIntMask. Reported-by: Clifford Dunn Reported-by: Craig Moore <craig.moore@qenos.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Alexandra Yates authored
commit 5cf92c8b upstream. Adding Intel codename Lewisburg platform device IDs for audio. [rearranged the position by tiwai] Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Takashi Iwai authored
commit c932b98c upstream. HP ProBook 6550b needs the same pin fixup applied to other HP B-series laptops with docks for making its headphone and dock headphone jacks working properly. We just need to add the codec SSID to the list. Bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=191971Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Krzysztof Kozlowski authored
commit 824ead03 upstream. During probe if the regulator could not be enabled, the error exit path would still disable it. This could lead to unbalanced counter of regulator enable/disable. The patch moves code for getting and enabling the regulator from exynos_map_dt_data() to probe function because it is really not a part of getting Device Tree properties. Acked-by: Lukasz Majewski <l.majewski@samsung.com> Tested-by: Lukasz Majewski <l.majewski@samsung.com> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: 5f09a5cb ("thermal: exynos: Disable the regulator on probe failure") Signed-off-by: Eduardo Valentin <edubezval@gmail.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Radim Krčmář authored
commit 656ec4a4 upstream. The comment in code had it mostly right, but we enable paging for emulated real mode regardless of EPT. Without EPT (which implies emulated real mode), secondary VCPUs won't start unless we disable SM[AE]P when the guest doesn't use paging. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
libin authored
commit c84da8b9 upstream. In nop_mcount, shdr->sh_offset and welp->r_offset should handle endianness properly, otherwise it will trigger Segmentation fault if the recordmcount main and file.o have different endianness. Link: http://lkml.kernel.org/r/563806C7.7070606@huawei.comSigned-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Max Filippov authored
commit ab45fb14 upstream. There are multiple factors adding to the issue in different configurations: - commit 17290231 ("xtensa: add fixup for double exception raised in window overflow") added function window_overflow_restore_a0_fixup to double exception vector overlapping reset vector location of secondary processor cores. - on MMUv2 cores RESET_VECTOR1_VADDR may point to uncached kernel memory making code overlapping depend on cache type and size, so that without cache or with WT cache reset vector code overwrites double exception code, making issue even harder to detect. - on MMUv3 cores RESET_VECTOR1_VADDR may point to unmapped area, as MMUv3 cores change virtual address map to match MMUv2 layout, but reset vector virtual address is given for the original MMUv3 mapping. - physical memory region of the secondary reset vector is not reserved in the physical memory map, and thus may be allocated and overwritten at arbitrary moment. Fix it as follows: - move window_overflow_restore_a0_fixup code to .text section. - define RESET_VECTOR1_VADDR so that it points to reset vector in the cacheable MMUv2 map for cores with MMU. - reserve reset vector region in the physical memory map. Drop separate literal section and build mxhead.S with text section literals. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Arik Nemtsov authored
commit 254d3dfe upstream. In TDLS channel-switch operations the chandef can sometimes be NULL. Avoid an oops in the trace code for these cases and just print a chandef full of zeros. Fixes: a7a6bdd0 ("mac80211: introduce TDLS channel switch ops") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Janusz.Dziedzic@tieto.com authored
commit 519ee691 upstream. In case of one shot NOA the interval can be 0, catch that instead of potentially (depending on the driver) crashing like this: divide error: 0000 [#1] SMP [...] Call Trace: <IRQ> [<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211] [<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211] [<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k] [<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw] [<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k] [<ffffffff8107ad65>] tasklet_action+0xe5/0xf0 [...] Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
sumit.saxena@avagotech.com authored
commit 323c4a02 upstream. This is an issue on SMAP enabled CPUs and 32 bit apps running on 64 bit OS. Do not access user memory from kernel code. The SMAP bit restricts accessing user memory from kernel code. Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Max Filippov authored
commit 5029615e upstream. Build-time fixes: - make lbeg/lend/lcount save/restore conditional on kernel entry; - don't clear lcount in platform_restart functions unconditionally. Run-time fixes: - use correct end of range register in __endla paired with __loopt, not the unused temporary register. This fixes .bss zero-initialization. Update comments in asmmacro.h; - don't clobber a10 in the usercopy that leads to access to unmapped memory. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Herbert Xu authored
commit 4afa5f96 upstream. The hash_accept call fails to work on sockets that have not received any data. For some algorithm implementations it may cause crashes. This patch fixes this by ensuring that we only export and import on sockets that have received data. Reported-by: Harsh Jain <harshjain.prof@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Jani Nikula authored
commit 9be64eee upstream. Reported-by: Keith Webb <khwebb@gmail.com> Suggested-by: Keith Webb <khwebb@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106671Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1446209424-28801-1-git-send-email-jani.nikula@intel.comSigned-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Mauricio Faria de Oliveira authored
commit 47796938 upstream. This reverts commit a1989b33. That commit introduced a regression at least for the case of the SG_IO ioctl() running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there are no active paths: the ioctl() fails with the ENOTTY errno immediately rather than blocking due to queue_if_no_path until a path becomes active, for example. That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]) from multipath devices; which leads to SCSI/filesystem errors in such a guest. More general scenarios can hit that regression too. The following demonstration employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective (some output & user changes omitted for brevity and comments added for clarity). Reverting that commit restores normal operation (queueing) in failing scenarios; tested on linux-next (next-20151022). 1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM) $ cat sg_simple0.c ... see [3] ... $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c $ gcc sgio_inquiry.c -o sgio_inquiry 2) The ioctl() works fine with active paths present. # multipath -l 85ag56 85ag56 (...) dm-19 IBM ,2145 size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | |- 8:0:11:0 sdz 65:144 active undef running | `- 9:0:9:0 sdbf 67:144 active undef running `-+- policy='service-time 0' prio=0 status=enabled |- 8:0:12:0 sdae 65:224 active undef running `- 9:0:12:0 sdbo 68:32 active undef running $ ./sgio_inquiry /dev/mapper/85ag56 Some of the INQUIRY command's response: IBM 2145 0000 INQUIRY duration=0 millisecs, resid=0 3) The ioctl() fails with ENOTTY errno with _no_ active paths present, for unprivileged users (rather than blocking due to queue_if_no_path). # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \ do multipathd -k"fail path $path"; done # multipath -l 85ag56 85ag56 (...) dm-19 IBM ,2145 size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | |- 8:0:11:0 sdz 65:144 failed undef running | `- 9:0:9:0 sdbf 67:144 failed undef running `-+- policy='service-time 0' prio=0 status=enabled |- 8:0:12:0 sdae 65:224 failed undef running `- 9:0:12:0 sdbo 68:32 failed undef running $ ./sgio_inquiry /dev/mapper/85ag56 sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device 4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285); it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl(). $ dmesg <...> [] device-mapper: multipath: Failing path 65:144. [] device-mapper: multipath: Failing path 67:144. [] device-mapper: multipath: Failing path 65:224. [] device-mapper: multipath: Failing path 68:32. [] sgio_inquiry: sending ioctl 2285 to a partition! 5) The ioctl() only works if the SYS_CAP_RAWIO capability is present (then queueing happens -- in this example, queue_if_no_path is set); this is due to a conditional check in scsi_verify_blk_ioctl(). # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56' sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device # ./sgio_inquiry /dev/mapper/85ag56 & [1] 72830 # cat /proc/72830/stack [<c00000171c0df700>] 0xc00000171c0df700 [<c000000000015934>] __switch_to+0x204/0x350 [<c000000000152d4c>] msleep+0x5c/0x80 [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170 [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0 [<c0000000003128e4>] block_ioctl+0x64/0xd0 [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780 [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0 [<c000000000009358>] system_call+0x38/0xd0 6) This is the function call chain exercised in this analysis: SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c -> do_vfs_ioctl() -> vfs_ioctl() ... error = filp->f_op->unlocked_ioctl(filp, cmd, arg); ... -> dm_blk_ioctl() @ drivers/md/dm.c -> multipath_ioctl() @ drivers/md/dm-mpath.c ... (bdev = NULL, due to no active paths) ... if (!bdev || <...>) { int err = scsi_verify_blk_ioctl(NULL, cmd); if (err) r = err; } ... -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c ... if (bd && bd == bd->bd_contains) // not taken (bd = NULL) return 0; ... if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user) return 0; ... printk_ratelimited(KERN_WARNING "%s: sending ioctl %x to a partition!\n" <...>); return -ENOIOCTLCMD; <- ... return r ? : <...> <- ... if (error == -ENOIOCTLCMD) error = -ENOTTY; out: return error; ... Links: [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52 [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device') [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03) Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Brian Norris authored
commit f3c63795 upstream. Commit 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount") fixed a race condition but due to poor ordering of the mutex acquisition, introduced a potential deadlock. The deadlock can occur, for example, when rmmod'ing the m25p80 module, which will delete one or more MTDs, along with any corresponding mtdblock devices. This could potentially race with an acquisition of the block device as follows. -> blktrans_open() -> mutex_lock(&dev->lock); -> mutex_lock(&mtd_table_mutex); -> del_mtd_device() -> mutex_lock(&mtd_table_mutex); -> blktrans_notify_remove() -> del_mtd_blktrans_dev() -> mutex_lock(&dev->lock); This is a classic (potential) ABBA deadlock, which can be fixed by making the A->B ordering consistent everywhere. There was no real purpose to the ordering in the original patch, AFAIR, so this shouldn't be a problem. This ordering was actually already present in del_mtd_blktrans_dev(), for one, where the function tried to ensure that its caller already held mtd_table_mutex before it acquired &dev->lock: if (mutex_trylock(&mtd_table_mutex)) { mutex_unlock(&mtd_table_mutex); BUG(); } So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so we always acquire mtd_table_mutex first. Snippets of the lockdep output follow: # modprobe -r m25p80 [ 53.419251] [ 53.420838] ====================================================== [ 53.427300] [ INFO: possible circular locking dependency detected ] [ 53.433865] 4.3.0-rc6 #96 Not tainted [ 53.437686] ------------------------------------------------------- [ 53.444220] modprobe/372 is trying to acquire lock: [ 53.449320] (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc [ 53.457271] [ 53.457271] but task is already holding lock: [ 53.463372] (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100 [ 53.471321] [ 53.471321] which lock already depends on the new lock. [ 53.471321] [ 53.479856] [ 53.479856] the existing dependency chain (in reverse order) is: [ 53.487660] -> #1 (mtd_table_mutex){+.+.+.}: [ 53.492331] [<c043fc5c>] blktrans_open+0x34/0x1a4 [ 53.497879] [<c01afce0>] __blkdev_get+0xc4/0x3b0 [ 53.503364] [<c01b0bb8>] blkdev_get+0x108/0x320 [ 53.508743] [<c01713c0>] do_dentry_open+0x218/0x314 [ 53.514496] [<c0180454>] path_openat+0x4c0/0xf9c [ 53.519959] [<c0182044>] do_filp_open+0x5c/0xc0 [ 53.525336] [<c0172758>] do_sys_open+0xfc/0x1cc [ 53.530716] [<c000f740>] ret_fast_syscall+0x0/0x1c [ 53.536375] -> #0 (&new->lock){+.+...}: [ 53.540587] [<c063f124>] mutex_lock_nested+0x38/0x3cc [ 53.546504] [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc [ 53.552606] [<c043f164>] blktrans_notify_remove+0x7c/0x84 [ 53.558891] [<c04399f0>] del_mtd_device+0x74/0x100 [ 53.564544] [<c043c670>] del_mtd_partitions+0x80/0xc8 [ 53.570451] [<c0439aa0>] mtd_device_unregister+0x24/0x48 [ 53.576637] [<c046ce6c>] spi_drv_remove+0x1c/0x34 [ 53.582207] [<c03de0f0>] __device_release_driver+0x88/0x114 [ 53.588663] [<c03de19c>] device_release_driver+0x20/0x2c [ 53.594843] [<c03dd9e8>] bus_remove_device+0xd8/0x108 [ 53.600748] [<c03dacc0>] device_del+0x10c/0x210 [ 53.606127] [<c03dadd0>] device_unregister+0xc/0x20 [ 53.611849] [<c046d878>] __unregister+0x10/0x20 [ 53.617211] [<c03da868>] device_for_each_child+0x50/0x7c [ 53.623387] [<c046eae8>] spi_unregister_master+0x58/0x8c [ 53.629578] [<c03e12f0>] release_nodes+0x15c/0x1c8 [ 53.635223] [<c03de0f8>] __device_release_driver+0x90/0x114 [ 53.641689] [<c03de900>] driver_detach+0xb4/0xb8 [ 53.647147] [<c03ddc78>] bus_remove_driver+0x4c/0xa0 [ 53.652970] [<c00cab50>] SyS_delete_module+0x11c/0x1e4 [ 53.658976] [<c000f740>] ret_fast_syscall+0x0/0x1c [ 53.664621] [ 53.664621] other info that might help us debug this: [ 53.664621] [ 53.672979] Possible unsafe locking scenario: [ 53.672979] [ 53.679169] CPU0 CPU1 [ 53.683900] ---- ---- [ 53.688633] lock(mtd_table_mutex); [ 53.692383] lock(&new->lock); [ 53.698306] lock(mtd_table_mutex); [ 53.704658] lock(&new->lock); [ 53.707946] [ 53.707946] *** DEADLOCK *** Fixes: 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount") Reported-by: Felipe Balbi <balbi@ti.com> Tested-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Marek Vasut authored
commit 562b103a upstream. The sizeof() is invoked on an incorrect variable, likely due to some copy-paste error, and this might result in memory corruption. Fix this. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: netdev@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Robin Murphy authored
commit 5accd17d upstream. For reasons not entirely apparent, but now enshrined in history, the architectural mapping of AArch32 banked registers to AArch64 registers actually orders SP_<mode> and LR_<mode> backwards compared to the intuitive r13/r14 order, for all modes except FIQ. Fix the compat_<reg>_<mode> macros accordingly, in the hope of avoiding subtle bugs with KVM and AArch32 guests. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
David Hildenbrand authored
commit c5c2c393 upstream. We seemed to have missed a few corner cases in commit f6c137ff ("KVM: s390: randomize sca address"). The SCA has a maximum size of 2112 bytes. By setting the sca_offset to some unlucky numbers, we exceed the page. 0x7c0 (1984) -> Fits exactly 0x7d0 (2000) -> 16 bytes out 0x7e0 (2016) -> 32 bytes out 0x7f0 (2032) -> 48 bytes out One VCPU entry is 32 bytes long. For the last two cases, we actually write data to the other page. 1. The address of the VCPU. 2. Injection/delivery/clearing of SIGP externall calls via SIGP IF. Especially the 2. happens regularly. So this could produce two problems: 1. The guest losing/getting external calls. 2. Random memory overwrites in the host. So this problem happens on every 127 + 128 created VM with 64 VCPUs. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
sumit.saxena@avagotech.com authored
commit 357ae967 upstream. Do not use PAGE_SIZE marco to calculate max_sectors per I/O request. Driver code assumes PAGE_SIZE will be always 4096 which can lead to wrongly calculated value if PAGE_SIZE is not 4096. This issue was reported in Ubuntu Bugzilla Bug #1475166. Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Vineet Gupta authored
commit 9acdc911 upstream. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Kailang Yang authored
commit 6ed1131f upstream. This machine had I2S codec for speaker output. It need to refill the I2S codec initial verb after resume back. Signed-off-by: Kailang Yang <kailang@realtek.com> Reported-and-tested-by: George Gugulea <gugulea@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Chen Yu authored
commit 49e4b843 upstream. Currently when the system is trying to uninstall the ACPI interrupt handler, it uses acpi_gbl_FADT.sci_interrupt as the IRQ number. However, the IRQ number that the ACPI interrupt handled is installed for comes from acpi_gsi_to_irq() and that is the number that should be used for the handler removal. Fix this problem by using the mapped IRQ returned from acpi_gsi_to_irq() as appropriate. Acked-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Larry Finger authored
commit 1e6e6328 upstream. This adds the USB ID for the Sitecom WLA2100. The Windows 10 inf file was checked to verify that the addition is correct. Reported-by: Frans van de Wiel <fvdw@fvdw.eu> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Frans van de Wiel <fvdw@fvdw.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Bjørn Mork authored
commit f504ab18 upstream. New device IDs shamelessly lifted from the vendor driver. Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
David Mosberger-Tang authored
commit 06515f83 upstream. The DMA-slave configuration depends on the whether <= 8 or > 8 bits are transferred per word, so we need to call atmel_spi_dma_slave_config() with the correct value. Signed-off-by: David Mosberger <davidm@egauge.net> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Dmitry Tunin authored
commit 18e0afab upstream. T: Bus=04 Lev=02 Prnt=02 Port=04 Cnt=01 Dev#= 3 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0cf3 ProdID=817b Rev=00.02 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb BugLink: https://bugs.launchpad.net/bugs/1506615Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Dmitry Tunin authored
commit cd355ff0 upstream. This adapter works with the existing linux-firmware. T: Bus=01 Lev=01 Prnt=01 Port=03 Cnt=02 Dev#= 3 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0930 ProdID=021c Rev=00.01 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb BugLink: https://bugs.launchpad.net/bugs/1502781Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
David Herrmann authored
commit 660f0fc0 upstream. The HIDP specs define an idle-timeout which automatically disconnects a device. This has always been implemented in the HIDP layer and forced a synchronous shutdown of the hidp-scheduler. This works just fine, but lacks a forced disconnect on the underlying l2cap channels. This has been broken since: commit 5205185d Author: David Herrmann <dh.herrmann@gmail.com> Date: Sat Apr 6 20:28:47 2013 +0200 Bluetooth: hidp: remove old session-management The old session-management always forced an l2cap error on the ctrl/intr channels when shutting down. The new session-management skips this, as we don't want to enforce channel policy on the caller. In other words, if user-space removes an HIDP device, the underlying channels (which are *owned* and *referenced* by user-space) are still left active. User-space needs to call shutdown(2) or close(2) to release them. Unfortunately, this does not work with idle-timeouts. There is no way to signal user-space that the HIDP layer has been stopped. The API simply does not support any event-passing except for poll(2). Hence, we restore old behavior and force EUNATCH on the sockets if the HIDP layer is disconnected due to idle-timeouts (behavior of explicit disconnects remains unmodified). User-space can still call getsockopt(..., SO_ERROR, ...) ..to retrieve the EUNATCH error and clear sk_err. Hence, the channels can still be re-used (which nobody does so far, though). Therefore, the API still supports the new behavior, but with this patch it's also compatible to the old implicit channel shutdown. Reported-by: Mark Haun <haunma@keteu.org> Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Tiffany Lin authored
commit d9a98588 upstream. In videobuf2 dma-contig memory type the prepare and finish ops, instead of passing the number of entries in the original scatterlist as the "nents" parameter to dma_sync_sg_for_device() and dma_sync_sg_for_cpu(), the value returned by dma_map_sg() was used. Albeit this has been suggested in comments of some implementations (which have since been corrected), this is wrong. Fixes: 199d101e ("v4l: vb2-dma-contig: add prepare/finish to dma-contig allocator") Signed-off-by: Tiffany Lin <tiffany.lin@mediatek.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Andy Shevchenko authored
commit 02f20387 upstream. The following warning occurs when DW SPI is compiled as a module and it's a PCI device. On the removal stage pcibios_free_irq() is called earlier than free_irq() due to the latter is called at managed resources free strage. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1003 at /home/andy/prj/linux/fs/proc/generic.c:575 remove_proc_entry+0x118/0x150() remove_proc_entry: removing non-empty directory 'irq/38', leaking at least 'dw_spi1' Modules linked in: spi_dw_midpci(-) spi_dw [last unloaded: dw_dmac_core] CPU: 1 PID: 1003 Comm: modprobe Not tainted 4.3.0-rc5-next-20151013+ #32 00000000 00000000 f5535d70 c12dc220 f5535db0 f5535da0 c104e912 c198a6bc f5535dcc 000003eb c198a638 0000023f c11b4098 c11b4098 f54f1ec8 f54f1ea0 f642ba20 f5535db8 c104e96e 00000009 f5535db0 c198a6bc f5535dcc f5535df0 Call Trace: [<c12dc220>] dump_stack+0x41/0x61 [<c104e912>] warn_slowpath_common+0x82/0xb0 [<c11b4098>] ? remove_proc_entry+0x118/0x150 [<c11b4098>] ? remove_proc_entry+0x118/0x150 [<c104e96e>] warn_slowpath_fmt+0x2e/0x30 [<c11b4098>] remove_proc_entry+0x118/0x150 [<c109b96a>] unregister_irq_proc+0xaa/0xc0 [<c109575e>] free_desc+0x1e/0x60 [<c10957d2>] irq_free_descs+0x32/0x70 [<c109b1a0>] irq_domain_free_irqs+0x120/0x150 [<c1039e8c>] mp_unmap_irq+0x5c/0x60 [<c16277b0>] intel_mid_pci_irq_disable+0x20/0x40 [<c1627c7f>] pcibios_free_irq+0xf/0x20 [<c13189f2>] pci_device_remove+0x52/0xb0 [<c13f6367>] __device_release_driver+0x77/0x100 [<c13f6da7>] driver_detach+0x87/0x90 [<c13f5eaa>] bus_remove_driver+0x4a/0xc0 [<c128bf0d>] ? selinux_capable+0xd/0x10 [<c13f7483>] driver_unregister+0x23/0x60 [<c10bad8a>] ? find_module_all+0x5a/0x80 [<c1317413>] pci_unregister_driver+0x13/0x60 [<f80ac654>] dw_spi_driver_exit+0xd/0xf [spi_dw_midpci] [<c10bce9a>] SyS_delete_module+0x17a/0x210 Explicitly call free_irq() at removal stage of the DW SPI driver. Fixes: 04f421e7 (spi: dw: use managed resources) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Hon Ching \\(Vicky\\) Lo authored
commit 60ecd86c upstream. At ibm vtpm initialzation, tpm_ibmvtpm_probe() registers its interrupt handler, ibmvtpm_interrupt, which calls ibmvtpm_crq_process to allocate memory for rtce buffer. The current code uses 'GFP_KERNEL' as the type of kernel memory allocation, which resulted a warning at kernel/lockdep.c. This patch uses 'GFP_ATOMIC' instead so that the allocation is high-priority and does not sleep. Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Daeho Jeong authored
commit 4327ba52 upstream. If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the journaling will be aborted first and the error number will be recorded into JBD2 superblock and, finally, the system will enter into the panic state in "errors=panic" option. But, in the rare case, this sequence is little twisted like the below figure and it will happen that the system enters into panic state, which means the system reset in mobile environment, before completion of recording an error in the journal superblock. In this case, e2fsck cannot recognize that the filesystem failure occurred in the previous run and the corruption wouldn't be fixed. Task A Task B ext4_handle_error() -> jbd2_journal_abort() -> __journal_abort_soft() -> __jbd2_journal_abort_hard() | -> journal->j_flags |= JBD2_ABORT; | | __ext4_abort() | -> jbd2_journal_abort() | | -> __journal_abort_soft() | | -> if (journal->j_flags & JBD2_ABORT) | | return; | -> panic() | -> jbd2_journal_update_sb_errno() Tested-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Andy Leiserson authored
commit 904dad47 upstream. "group" is the group where the backup will be placed, and is initialized to zero in the declaration. This meant that backups for meta_bg descriptors were erroneously written to the backup block group descriptors in groups 1 and (desc_per_block-1). Reproduction information: mke2fs -Fq -t ext4 -b 1024 -O ^resize_inode /tmp/foo.img 16G truncate -s 24G /tmp/foo.img losetup /dev/loop0 /tmp/foo.img mount /dev/loop0 /mnt resize2fs /dev/loop0 umount /dev/loop0 dd if=/dev/zero of=/dev/loop0 bs=1024 count=2 e2fsck -fy /dev/loop0 losetup -d /dev/loop0 Signed-off-by: Andy Leiserson <andy@leiserson.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Lukas Czerner authored
commit 6934da92 upstream. There is a use-after-free possibility in __ext4_journal_stop() in the case that we free the handle in the first jbd2_journal_stop() because we're referencing handle->h_err afterwards. This was introduced in 9705acd6 and it is wrong. Fix it by storing the handle->h_err value beforehand and avoid referencing potentially freed handle. Fixes: 9705acd6Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Filipe Manana authored
commit 0305cd5f upstream. When truncating a file to a smaller size which consists of an inline extent that is compressed, we did not discard (or made unusable) the data between the new file size and the old file size, wasting metadata space and allowing for the truncated data to be leaked and the data corruption/loss mentioned below. We were also not correctly decrementing the number of bytes used by the inode, we were setting it to zero, giving a wrong report for callers of the stat(2) syscall. The fsck tool also reported an error about a mismatch between the nbytes of the file versus the real space used by the file. Now because we weren't discarding the truncated region of the file, it was possible for a caller of the clone ioctl to actually read the data that was truncated, allowing for a security breach without requiring root access to the system, using only standard filesystem operations. The scenario is the following: 1) User A creates a file which consists of an inline and compressed extent with a size of 2000 bytes - the file is not accessible to any other users (no read, write or execution permission for anyone else); 2) The user truncates the file to a size of 1000 bytes; 3) User A makes the file world readable; 4) User B creates a file consisting of an inline extent of 2000 bytes; 5) User B issues a clone operation from user A's file into its own file (using a length argument of 0, clone the whole range); 6) User B now gets to see the 1000 bytes that user A truncated from its file before it made its file world readbale. User B also lost the bytes in the range [1000, 2000[ bytes from its own file, but that might be ok if his/her intention was reading stale data from user A that was never supposed to be public. Note that this contrasts with the case where we truncate a file from 2000 bytes to 1000 bytes and then truncate it back from 1000 to 2000 bytes. In this case reading any byte from the range [1000, 2000[ will return a value of 0x00, instead of the original data. This problem exists since the clone ioctl was added and happens both with and without my recent data loss and file corruption fixes for the clone ioctl (patch "Btrfs: fix file corruption and data loss after cloning inline extents"). So fix this by truncating the compressed inline extents as we do for the non-compressed case, which involves decompressing, if the data isn't already in the page cache, compressing the truncated version of the extent, writing the compressed content into the inline extent and then truncate it. The following test case for fstests reproduces the problem. In order for the test to pass both this fix and my previous fix for the clone ioctl that forbids cloning a smaller inline extent into a larger one, which is titled "Btrfs: fix file corruption and data loss after cloning inline extents", are needed. Without that other fix the test fails in a different way that does not leak the truncated data, instead part of destination file gets replaced with zeroes (because the destination file has a larger inline extent than the source). seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch _require_cloner rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount "-o compress" # Create our test files. File foo is going to be the source of a clone operation # and consists of a single inline extent with an uncompressed size of 512 bytes, # while file bar consists of a single inline extent with an uncompressed size of # 256 bytes. For our test's purpose, it's important that file bar has an inline # extent with a size smaller than foo's inline extent. $XFS_IO_PROG -f -c "pwrite -S 0xa1 0 128" \ -c "pwrite -S 0x2a 128 384" \ $SCRATCH_MNT/foo | _filter_xfs_io $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 256" $SCRATCH_MNT/bar | _filter_xfs_io # Now durably persist all metadata and data. We do this to make sure that we get # on disk an inline extent with a size of 512 bytes for file foo. sync # Now truncate our file foo to a smaller size. Because it consists of a # compressed and inline extent, btrfs did not shrink the inline extent to the # new size (if the extent was not compressed, btrfs would shrink it to 128 # bytes), it only updates the inode's i_size to 128 bytes. $XFS_IO_PROG -c "truncate 128" $SCRATCH_MNT/foo # Now clone foo's inline extent into bar. # This clone operation should fail with errno EOPNOTSUPP because the source # file consists only of an inline extent and the file's size is smaller than # the inline extent of the destination (128 bytes < 256 bytes). However the # clone ioctl was not prepared to deal with a file that has a size smaller # than the size of its inline extent (something that happens only for compressed # inline extents), resulting in copying the full inline extent from the source # file into the destination file. # # Note that btrfs' clone operation for inline extents consists of removing the # inline extent from the destination inode and copy the inline extent from the # source inode into the destination inode, meaning that if the destination # inode's inline extent is larger (N bytes) than the source inode's inline # extent (M bytes), some bytes (N - M bytes) will be lost from the destination # file. Btrfs could copy the source inline extent's data into the destination's # inline extent so that we would not lose any data, but that's currently not # done due to the complexity that would be needed to deal with such cases # (specially when one or both extents are compressed), returning EOPNOTSUPP, as # it's normally not a very common case to clone very small files (only case # where we get inline extents) and copying inline extents does not save any # space (unlike for normal, non-inlined extents). $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar # Now because the above clone operation used to succeed, and due to foo's inline # extent not being shinked by the truncate operation, our file bar got the whole # inline extent copied from foo, making us lose the last 128 bytes from bar # which got replaced by the bytes in range [128, 256[ from foo before foo was # truncated - in other words, data loss from bar and being able to read old and # stale data from foo that should not be possible to read anymore through normal # filesystem operations. Contrast with the case where we truncate a file from a # size N to a smaller size M, truncate it back to size N and then read the range # [M, N[, we should always get the value 0x00 for all the bytes in that range. # We expected the clone operation to fail with errno EOPNOTSUPP and therefore # not modify our file's bar data/metadata. So its content should be 256 bytes # long with all bytes having the value 0xbb. # # Without the btrfs bug fix, the clone operation succeeded and resulted in # leaking truncated data from foo, the bytes that belonged to its range # [128, 256[, and losing data from bar in that same range. So reading the # file gave us the following content: # # 0000000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 # * # 0000200 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a # * # 0000400 echo "File bar's content after the clone operation:" od -t x1 $SCRATCH_MNT/bar # Also because the foo's inline extent was not shrunk by the truncate # operation, btrfs' fsck, which is run by the fstests framework everytime a # test completes, failed reporting the following error: # # root 5 inode 257 errors 400, nbytes wrong status=0 exit Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-