- 16 Mar, 2017 40 commits
-
-
Alan Stern authored
commit add333a8 upstream. Andrey Konovalov reports that fuzz testing with syzkaller causes a KASAN use-after-free bug report in gadgetfs: BUG: KASAN: use-after-free in gadgetfs_setup+0x208a/0x20e0 at addr ffff88003dfe5bf2 Read of size 2 by task syz-executor0/22994 CPU: 3 PID: 22994 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88006df06a18 ffffffff81f96aba ffffffffe0528500 1ffff1000dbe0cd6 ffffed000dbe0cce ffff88006df068f0 0000000041b58ab3 ffffffff8598b4c8 ffffffff81f96828 1ffff1000dbe0ccd ffff88006df06708 ffff88006df06748 Call Trace: <IRQ> [ 201.343209] [< inline >] __dump_stack lib/dump_stack.c:15 <IRQ> [ 201.343209] [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51 [<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159 [< inline >] print_address_description mm/kasan/report.c:197 [<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286 [< inline >] kasan_report mm/kasan/report.c:306 [<ffffffff817e562a>] __asan_report_load_n_noabort+0x3a/0x40 mm/kasan/report.c:337 [< inline >] config_buf drivers/usb/gadget/legacy/inode.c:1298 [<ffffffff8322c8fa>] gadgetfs_setup+0x208a/0x20e0 drivers/usb/gadget/legacy/inode.c:1368 [<ffffffff830fdcd0>] dummy_timer+0x11f0/0x36d0 drivers/usb/gadget/udc/dummy_hcd.c:1858 [<ffffffff814807c1>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308 [< inline >] expire_timers kernel/time/timer.c:1348 [<ffffffff81482de6>] __run_timers+0xa06/0xec0 kernel/time/timer.c:1641 [<ffffffff814832c1>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654 [<ffffffff84f4af8b>] __do_softirq+0x2fb/0xb63 kernel/softirq.c:284 The cause of the bug is subtle. The dev_config() routine gets called twice by the fuzzer. The first time, the user data contains both a full-speed configuration descriptor and a high-speed config descriptor, causing dev->hs_config to be set. But it also contains an invalid device descriptor, so the buffer containing the descriptors is deallocated and dev_config() returns an error. The second time dev_config() is called, the user data contains only a full-speed config descriptor. But dev->hs_config still has the stale pointer remaining from the first call, causing the routine to think that there is a valid high-speed config. Later on, when the driver dereferences the stale pointer to copy that descriptor, we get a use-after-free access. The fix is simple: Clear dev->hs_config if the passed-in data does not contain a high-speed config descriptor. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Alan Stern authored
commit faab5098 upstream. Andrey Konovalov reports that fuzz testing with syzkaller causes a KASAN warning in gadgetfs: BUG: KASAN: slab-out-of-bounds in dev_config+0x86f/0x1190 at addr ffff88003c47e160 Write of size 65537 by task syz-executor0/6356 CPU: 3 PID: 6356 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88003c107ad8 ffffffff81f96aba ffffffff3dc11ef0 1ffff10007820eee ffffed0007820ee6 ffff88003dc11f00 0000000041b58ab3 ffffffff8598b4c8 ffffffff81f96828 ffffffff813fb4a0 ffff88003b6eadc0 ffff88003c107738 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51 [<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159 [< inline >] print_address_description mm/kasan/report.c:197 [<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286 [<ffffffff817e5705>] kasan_report+0x35/0x40 mm/kasan/report.c:306 [< inline >] check_memory_region_inline mm/kasan/kasan.c:308 [<ffffffff817e3fb9>] check_memory_region+0x139/0x190 mm/kasan/kasan.c:315 [<ffffffff817e4044>] kasan_check_write+0x14/0x20 mm/kasan/kasan.c:326 [< inline >] copy_from_user arch/x86/include/asm/uaccess.h:689 [< inline >] ep0_write drivers/usb/gadget/legacy/inode.c:1135 [<ffffffff83228caf>] dev_config+0x86f/0x1190 drivers/usb/gadget/legacy/inode.c:1759 [<ffffffff817fdd55>] __vfs_write+0x5d5/0x760 fs/read_write.c:510 [<ffffffff817ff650>] vfs_write+0x170/0x4e0 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [<ffffffff81803a5b>] SyS_write+0xfb/0x230 fs/read_write.c:599 [<ffffffff84f47ec1>] entry_SYSCALL_64_fastpath+0x1f/0xc2 Indeed, there is a comment saying that the value of len is restricted to a 16-bit integer, but the code doesn't actually do this. This patch fixes the warning. It replaces the comment with a computation that forces the amount of data copied from the user in ep0_write() to be no larger than the wLength size for the control transfer, which is a 16-bit quantity. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Greg Kroah-Hartman authored
commit 0994b0a2 upstream. Andrey Konovalov reported that we were not properly checking the upper limit before of a device configuration size before calling memdup_user(), which could cause some problems. So set the upper limit to PAGE_SIZE * 4, which should be good enough for all devices. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Alan Stern authored
commit bcdbeb84 upstream. The stop_activity() routine in dummy-hcd is supposed to unlink all active requests for every endpoint, among other things. But it doesn't handle ep0. As a result, fuzz testing can generate a WARNING like the following: WARNING: CPU: 0 PID: 4410 at drivers/usb/gadget/udc/dummy_hcd.c:672 dummy_free_request+0x153/0x170 Modules linked in: CPU: 0 PID: 4410 Comm: syz-executor Not tainted 4.9.0-rc7+ #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88006a64ed10 ffffffff81f96b8a ffffffff41b58ab3 1ffff1000d4c9d35 ffffed000d4c9d2d ffff880065f8ac00 0000000041b58ab3 ffffffff8598b510 ffffffff81f968f8 0000000041b58ab3 ffffffff859410e0 ffffffff813f0590 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51 [<ffffffff812b808f>] __warn+0x19f/0x1e0 kernel/panic.c:550 [<ffffffff812b831c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [<ffffffff830fcb13>] dummy_free_request+0x153/0x170 drivers/usb/gadget/udc/dummy_hcd.c:672 [<ffffffff830ed1b0>] usb_ep_free_request+0xc0/0x420 drivers/usb/gadget/udc/core.c:195 [<ffffffff83225031>] gadgetfs_unbind+0x131/0x190 drivers/usb/gadget/legacy/inode.c:1612 [<ffffffff830ebd8f>] usb_gadget_remove_driver+0x10f/0x2b0 drivers/usb/gadget/udc/core.c:1228 [<ffffffff830ec084>] usb_gadget_unregister_driver+0x154/0x240 drivers/usb/gadget/udc/core.c:1357 This patch fixes the problem by iterating over all the endpoints in the driver's ep array instead of iterating over the gadget's ep_list, which explicitly leaves out ep0. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Krzysztof Opasiak authored
commit 7e4da3fc upstream. By convention (according to doc) if function does not provide get_alt() callback composite framework should assume that it has only altsetting 0 and should respond with error if host tries to set other one. After commit dd4dff8b ("USB: composite: Fix bug: should test set_alt function pointer before use it") we started checking set_alt() callback instead of get_alt(). This check is useless as we check if set_alt() is set inside usb_add_function() and fail if it's NULL. Let's fix this check and move comment about why we check the get method instead of set a little bit closer to prevent future false fixes. Fixes: dd4dff8b ("USB: composite: Fix bug: should test set_alt function pointer before use it") Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Felipe Balbi authored
commit d6214592 upstream. commit 0416e494 ("usb: dwc3: ep0: correct cache sync issue in case of ep0_bounced") introduced a bug where we would leak DMA resources which would cause us to starve the system of them resulting in failing DMA transfers. Fix the bug by making sure that we always unmap EP0 requests since those are *always* mapped. Fixes: 0416e494 ("usb: dwc3: ep0: correct cache sync issue in case of ep0_bounced") Tested-by: Tomasz Medrek <tomaszx.medrek@intel.com> Reported-by: Janusz Dziedzic <januszx.dziedzic@linux.intel.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Felipe Balbi authored
commit 19ec3123 upstream. Let's call dwc3_ep0_prepare_one_trb() explicitly because there are occasions where we will need more than one TRB to handle an EP0 transfer. A follow-up patch will fix one bug related to multiple-TRB Data Phases when it comes to mapping/unmapping requests for DMA. Reported-by: Janusz Dziedzic <januszx.dziedzic@linux.intel.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> [bwh: Backported to 3.16: - dwc3_ep0_prepare_one_trb() and dwc3_ep0_start_trans() don't take a 'chain' parameter - Some of the call sites don't exist here - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Felipe Balbi authored
commit 7931ec86 upstream. For now this is just a cleanup patch, no functional changes. We will be using the new function to fix a bug introduced long ago by commit 0416e494 ("usb: dwc3: ep0: correct cache sync issue in case of ep0_bounced") and further worsened by commit c0bd5456 ("usb: dwc3: ep0: handle non maxpacket aligned transfers > 512") Reported-by: Janusz Dziedzic <januszx.dziedzic@linux.intel.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> [bwh: Backported to 3.16: - dwc3_ep0_start_trans() doesn't have a 'chain' parameter, so don't give dwc3_ep0_prepare_one_trb() this parameter - Also delete a debug log statement, removed earlier upstream - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Hauke Mehrtens authored
commit 73529c87 upstream. The xway_nand driver accesses the ltq_ebu_membase symbol which is not exported. This also should not get exported and we should handle the EBU interface in a better way later. This quick fix just deactivated support for building as module. Fixes: 99f2b107 ("mtd: lantiq: Add NAND support on Lantiq XWAY SoC.") Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Vladimir Zapolskiy authored
commit af92305e upstream. On i.MX31 AVIC interrupt controller base address is at 0x68000000. The problem was shadowed by the AVIC driver, which takes the correct base address from a SoC specific header file. Fixes: d2a37b3d ("ARM i.MX31: Add devicetree support") Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Leon Romanovsky authored
commit c1d5f8ff upstream. This patch removes BUG_ON() macro from mlx4_alloc_icm_coherent() by checking DMA address alignment in advance and performing proper folding in case of error. Fixes: 5b0bf5e2 ("mlx4_core: Support ICM tables in coherent memory") Reported-by: Ozgur Karatas <okaratas@member.fsf.org> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Eugenia Emantayev authored
commit 6496bbf0 upstream. Single send WQE in RX buffer should be stamped with software ownership in order to prevent the flow of QP in error in FW once UPDATE_QP is called. Fixes: 9f519f68 ('mlx4_en: Not using Shared Receive Queues') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jack Morgenstein authored
commit 3b01fe7f upstream. mlx4_QP_FLOW_STEERING_DETACH_wrapper first removes the steering rule (which results in freeing the rule structure), and then references a field in this struct (the qp number) when releasing the busy-status on the rule's qp. Since this memory was freed, it could reallocated and changed. Therefore, the qp number in the struct may be incorrect, so that we are releasing the incorrect qp. This leaves the rule's qp in the busy state (and could possibly release an incorrect qp as well). Fix this by saving the qp number in a local variable, for use after removing the steering rule. Fixes: 2c473ae7 ("net/mlx4_core: Disallow releasing VF QPs which have steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Florian Fainelli authored
commit b2eb09af upstream. Commit 57016590 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open") re-ordered how the MDIO bus registration and the network device are registered, but missed to unwind the MDIO bus registration in case we fail to register the network device. Fixes: 57016590 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Kweh, Hock Leong <hock.leong.kweh@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: - stmmac_dvr_probe() returns a pointer - 'pcs' is a member of struct stmmac_priv, not struct mac_device_info] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Florian Fainelli authored
commit 57016590 upstream. There is currently a small window during which the network device registered by stmmac can be made visible, yet all resources, including and clock and MDIO bus have not had a chance to be set up, this can lead to the following error to occur: [ 473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): stmmac_dvr_probe: warning: cannot get CSR clock [ 473.919382] stmmaceth 0000:01:00.0: no reset control found [ 473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42 [ 473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported [ 473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported [ 473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported [ 473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): Enable RX Mitigation via HW Watchdog Timer [ 473.921395] libphy: PHY stmmac-1:00 not found [ 473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY [ 473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to PHY (error: -19) [ 473.959710] libphy: stmmac: probed [ 473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL (stmmac-1:00) active [ 473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL (stmmac-1:01) [ 473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL (stmmac-1:02) [ 473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL (stmmac-1:03) Fix this by making sure that register_netdev() is the last thing being done, which guarantees that the clock and the MDIO bus are available. Fixes: 4bfcbd7a ("stmmac: Move the mdio_register/_unregister in probe/remove") Reported-by: Kweh, Hock Leong <hock.leong.kweh@intel.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: - stmmac_dvr_probe() returns a pointer - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Florian Fainelli authored
commit e6afb1ad upstream. Commit beb0babf ("korina: disable napi on close and restart") introduced calls to napi_disable() that were missing before, unfortunately this leaves a small window during which NAPI has a chance to run, yet we just freed resources since korina_free_ring() has been called: Fix this by disabling NAPI first then freeing resource, and make sure that we also cancel the restart task before doing the resource freeing. Fixes: beb0babf ("korina: disable napi on close and restart") Reported-by: Alexandros C. Couloumbis <alex@ozo.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniel Borkmann authored
commit 628185cf upstream. Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems to trigger under load in the tc control path. What happens is that in tc_ctl_tfilter(), thread A allocates a new tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() with it. In that classifier callback we had to unlock/lock the rtnl mutex and returned with -EAGAIN. One reason why we need to drop there is, for example, that we need to request an action module to be loaded. This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning after we loaded and found the requested action, we need to redo the whole request so we don't race against others. While we had to unlock rtnl in that time, thread B's request was processed next on that CPU. Thread B added a new tp instance successfully to the classifier chain. When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN and destroying its tp instance which never got linked, we goto replay and redo A's request. This time when walking the classifier chain in tc_ctl_tfilter() for checking for existing tp instances we had a priority match and found the tp instance that was created and linked by thread B. Now calling again into tp->ops->change() with that tp was successful and returned without error. tp_created was never cleared in the second round, thus kernel thinks that we need to link it into the classifier chain (once again). tp and *back point to the same object due to the match we had earlier on. Thus for thread B's already public tp, we reset tp->next to tp itself and link it into the chain, which eventually causes the mentioned endless loop in tc_classify() once a packet hits the data path. Fix is to clear tp_created at the beginning of each request, also when we replay it. On the paths that can cause -EAGAIN we already destroy the original tp instance we had and on replay we really need to start from scratch. It seems that this issue was first introduced in commit 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup"). Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup") Reported-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Chris Brandt authored
commit e2a33c34 upstream. The RZ/A1 is different than the other Renesas SOCs because the MSTP registers are 8-bit instead of 32-bit and if you try writing values as 32-bit nothing happens...meaning this driver never worked for r7s72100. Fixes: b6face40 ("ARM: shmobile: r7s72100: add essential clock nodes to dtsi") Signed-off-by: Chris Brandt <chris.brandt@renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Arnd Bergmann authored
commit c739c0a7 upstream. A rare randconfig build failure shows up in this driver when the CRC32 helper is not there: drivers/media/built-in.o: In function `s5k4ecgx_s_power': s5k4ecgx.c:(.text+0x9eb4): undefined reference to `crc32_le' This adds the 'select' that all other users of this function have. Fixes: 8b99312b ("[media] Add v4l2 subdev driver for S5K4ECGX sensor") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Marcos Paulo de Souza authored
commit 41c567a5 upstream. Avoid AUX loopback in Pegatron C15B touchpad, so input subsystem is able to recognize a Synaptics touchpad in the AUX port. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=93791 (Touchpad is not detected on DNS 0801480 notebook (PEGATRON C15B)) Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Larry Finger authored
commit 8ae679c4 upstream. I am getting the following warning when I build kernel 4.9-git on my PowerBook G4 with a 32-bit PPC processor: AS arch/powerpc/kernel/misc_32.o arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef] This problem is evident after commit 989cea5c ("kbuild: prevent lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an error that has been in the code since 2005 when this source file was created. That was with commit 9994a338 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S"). The offending line does not make a lot of sense. This error does not seem to cause any errors in the executable, thus I am not recommending that it be applied to any stable versions. Thanks to Nicholas Piggin for suggesting this solution. Fixes: 9994a338 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Aleksa Sarai authored
commit 613cc2b6 upstream. If you have a process that has set itself to be non-dumpable, and it then undergoes exec(2), any CLOEXEC file descriptors it has open are "exposed" during a race window between the dumpable flags of the process being reset for exec(2) and CLOEXEC being applied to the file descriptors. This can be exploited by a process by attempting to access /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE. The race in question is after set_dumpable has been (for get_link, though the trace is basically the same for readlink): [vfs] -> proc_pid_link_inode_operations.get_link -> proc_pid_get_link -> proc_fd_access_allowed -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); Which will return 0, during the race window and CLOEXEC file descriptors will still be open during this window because do_close_on_exec has not been called yet. As a result, the ordering of these calls should be reversed to avoid this race window. This is of particular concern to container runtimes, where joining a PID namespace with file descriptors referring to the host filesystem can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect against access of CLOEXEC file descriptors -- file descriptors which may reference filesystem objects the container shouldn't have access to). Cc: dev@opencontainers.org Reported-by: Michael Crosby <crosbymichael@gmail.com> Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Hans de Goede authored
commit bb98e72a upstream. On my Cherrytrail CUBE iwork8 Air tablet PIPE-A would get stuck on loading i915 at boot 1 out of every 3 boots, resulting in a non functional LCD. Once the i915 driver has successfully loaded, the panel can be disabled / enabled without hitting this issue. The getting stuck is caused by vlv_init_display_clock_gating() clearing the DPOUNIT_CLOCK_GATE_DISABLE bit in DSPCLK_GATE_D when called from chv_pipe_power_well_ops.enable() on driver load, while a pipe is enabled driving the DSI LCD by the BIOS. Clearing this bit while DSI is in use is a known issue and intel_dsi_pre_enable() / intel_dsi_post_disable() already set / clear it as appropriate. This commit modifies vlv_init_display_clock_gating() to leave the DPOUNIT_CLOCK_GATE_DISABLE bit alone fixing the pipe getting stuck. Changes in v2: -Replace PIPE-A with "a pipe" or "the pipe" in the commit msg and comment Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97330Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161202142904.25613-1-hdegoede@redhat.comSigned-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (cherry picked from commit 721d4845) Signed-off-by: Jani Nikula <jani.nikula@intel.com> [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
NeilBrown authored
commit cfd278c2 upstream. Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds', then ds->ds_clp will also be non-NULL. This is not necessasrily true in the case when the process received a fatal signal while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the devid may not recently have been marked unavailable. So add a test for ds_clp == NULL and return NULL in that case. Fixes: c23266d5 ("NFS4.1 Fix data server connection race") Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Olga Kornievskaia <aglo@umich.edu> Acked-by: Adamson, Andy <William.Adamson@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Russell Currey authored
commit 298360af upstream. ast_get_dram_info() configures a window in order to access BMC memory. A BMC register can be configured to disallow this, and if so, causes an infinite loop in the ast driver which renders the system unusable. Fix this by erroring out if an error is detected. On powerpc systems with EEH, this leads to the device being fenced and the system continuing to operate. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161215051241.20815-1-ruscur@russell.ccSigned-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Russell King authored
commit 7f638c1c upstream. smbus functions return -ve on error, 0 on success. However, __i2c_transfer() have a different return signature - -ve on error, or number of buffers transferred (which may be zero or greater.) The upshot of this is that the sense of the test is reversed when using the mux on a bus supporting the master_xfer method: we cache the value and never retry if we fail to transfer any buffers, but if we succeed, we clear the cached value. Fix this by making pca954x_reg_write() return a negative error code for all failure cases. Fixes: 463e8f84 ("i2c: mux: pca954x: retry updating the mux selection on failure") Acked-by: Peter Rosin <peda@axentia.se> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dan Carpenter authored
commit a91918cd upstream. This iscsit_tpg_add_portal_group() function is only called from lio_target_tiqn_addtpg(). Both functions free the "tpg" pointer on error so it's a double free bug. The memory is allocated in the caller so it should be freed in the caller and not here. Fixes: e48354ce ("iscsi-target: Add iSCSI fabric support for target v4.1") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Disseldorp <ddiss@suse.de> [ bvanassche: Added "Fix" at start of patch title ] Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Wei Fang authored
commit d2a14525 upstream. A race between scanning and fc_remote_port_delete() may result in a permanent stop if the device gets blocked before scsi_sysfs_add_sdev() and unblocked after. The reason is that blocking a device sets both the SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED. However, scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the device to be ignored by scsi_target_unblock() and thus never have its QUEUE_FLAG_STOPPED cleared leading to a device which is apparently running but has a stopped queue. We actually have two places where SDEV_RUNNING is set: once in scsi_add_lun() which respects the blocked flag and once in scsi_sysfs_add_sdev() which doesn't. Since the second set is entirely spurious, simply remove it to fix the problem. Reported-by: Zengxi Chen <chenzengxi@huawei.com> Signed-off-by: Wei Fang <fangwei1@huawei.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steffen Maier authored
commit 6f2ce1c6 upstream. It is unavoidable that zfcp_scsi_queuecommand() has to finish requests with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time window when zfcp detected an unavailable rport but fc_remote_port_delete(), which is asynchronous via zfcp_scsi_schedule_rport_block(), has not yet blocked the rport. However, for the case when the rport becomes available again, we should prevent unblocking the rport too early. In contrast to other FCP LLDDs, zfcp has to open each LUN with the FCP channel hardware before it can send I/O to a LUN. So if a port already has LUNs attached and we unblock the rport just after port recovery, recoveries of LUNs behind this port can still be pending which in turn force zfcp_scsi_queuecommand() to unnecessarily finish requests with DID_IMM_RETRY. This also opens a time window with unblocked rport (until the followup LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Fix this by only calling zfcp_scsi_schedule_rport_register(), to asynchronously trigger fc_remote_port_add(), after all LUN recoveries as children of the rport have finished and no new recoveries of equal or higher order were triggered meanwhile. Finished intentionally includes any recovery result no matter if successful or failed (still unblock rport so other successful LUNs work). For simplicity, we check after each finished LUN recovery if there is another LUN recovery pending on the same port and then do nothing. We handle the special case of a successful recovery of a port without LUN children the same way without changing this case's semantics. For debugging we introduce 2 new trace records written if the rport unblock attempt was aborted due to still unfinished or freshly triggered recovery. The records are only written above the default trace level. Benjamin noticed the important special case of new recovery that can be triggered between having given up the erp_lock and before calling zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the following sequence: ERP thread rport_work other context ------------------------- -------------- -------------------------------- port is unblocked, rport still blocked, due to pending/running ERP action, so ((port->status & ...UNBLOCK) != 0) and (port->rport == NULL) unlock ERP zfcp_erp_action_cleanup() case ZFCP_ERP_ACTION_REOPEN_LUN: zfcp_erp_try_rport_unblock() ((status & ...UNBLOCK) != 0) [OLD!] zfcp_erp_port_reopen() lock ERP zfcp_erp_port_block() port->status clear ...UNBLOCK unlock ERP zfcp_scsi_schedule_rport_block() port->rport_task = RPORT_DEL queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task != RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_block() if (!port->rport) return zfcp_scsi_schedule_rport_register() port->rport_task = RPORT_ADD queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task == RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_register() (port->rport == NULL) rport = fc_remote_port_add() port->rport = rport; Now the rport was erroneously unblocked while the zfcp_port is blocked. This is another situation we want to avoid due to scsi_eh potential. This state would at least remain until the new recovery from the other context finished successfully, or potentially forever if it failed. In order to close this race, we take the erp_lock inside zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or LUN. With that, the possible corresponding rport state sequences would be: (unblock[ERP thread],block[other context]) if the ERP thread gets erp_lock first and still sees ((port->status & ...UNBLOCK) != 0), (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock after the other context has already cleard ...UNBLOCK from port->status. Since checking fields of struct erp_action is unsafe because they could have been overwritten (re-used for new recovery) meanwhile, we only check status of zfcp_port and LUN since these are only changed under erp_lock elsewhere. Regarding the check of the proper status flags (port or port_forced are similar to the shown adapter recovery): [zfcp_erp_adapter_shutdown()] zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() * clear UNBLOCK ---------------------------------------+ zfcp_scsi_schedule_rports_block() | write_lock_irqsave(&adapter->erp_lock, flags);-------+ | zfcp_erp_action_enqueue() | | zfcp_erp_setup_act() | | * set ERP_INUSE -----------------------------------|--|--+ write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | | .context-switch. | | zfcp_erp_thread() | | zfcp_erp_strategy() | | write_lock_irqsave(&adapter->erp_lock, flags);------+ | | ... | | | zfcp_erp_strategy_check_target() | | | zfcp_erp_strategy_check_adapter() | | | zfcp_erp_adapter_unblock() | | | * set UNBLOCK -----------------------------------|--+ | zfcp_erp_action_dequeue() | | * clear ERP_INUSE ---------------------------------|-----+ ... | write_unlock_irqrestore(&adapter->erp_lock, flags);-+ Hence, we should check for both UNBLOCK and ERP_INUSE because they are interleaved. Also we need to explicitly check ERP_FAILED for the link down case which currently does not clear the UNBLOCK flag in zfcp_fsf_link_down_info_eval(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8830271c ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport") Fixes: a2fa0aed ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e0 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a2 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steffen Maier authored
commit 56d23ed7 upstream. Since quite a while, Linux issues enough SCSI commands per scsi_device which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE, and SAM_STAT_GOOD. This floods the HBA trace area and we cannot see other and important HBA trace records long enough. Therefore, do not trace HBA response errors for pure benign residual under counts at the default trace level. This excludes benign residual under count combined with other validity bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL. For all those other cases, we still do want to see both the HBA record and the corresponding SCSI record by default. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f6 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Benjamin Block authored
commit dac37e15 upstream. When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and eh_target_reset_handler(), it expects us to relent the ownership over the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN or target - when returning with SUCCESS from the callback ('release' them). SCSI EH can then reuse those commands. We did not follow this rule to release commands upon SUCCESS; and if later a reply arrived for one of those supposed to be released commands, we would still make use of the scsi_cmnd in our ingress tasklet. This will at least result in undefined behavior or a kernel panic because of a wrong kernel pointer dereference. To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req *)->data in the matching scope if a TMF was successful. This is done under the locks (struct zfcp_adapter *)->abort_lock and (struct zfcp_reqlist *)->lock to prevent the requests from being removed from the request-hashtable, and the ingress tasklet from making use of the scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler(). For cases where a reply arrives during SCSI EH, but before we get a chance to NULLify the pointer - but before we return from the callback -, we assume that the code is protected from races via the CAS operation in blk_complete_request() that is called in scsi_done(). The following stacktrace shows an example for a crash resulting from the previous behavior: Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000 Oops: 0038 [#1] SMP CPU: 2 PID: 0 Comm: swapper/2 Not tainted task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000 Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918 Krnl Code: 00000000001156a2: a7190000 lghi %r1,0 00000000001156a6: a7380015 lhi %r3,21 #00000000001156aa: e32050000008 ag %r2,0(%r5) >00000000001156b0: 482022b0 lh %r2,688(%r2) 00000000001156b4: ae123000 sigp %r1,%r2,0(%r3) 00000000001156b8: b2220020 ipm %r2 00000000001156bc: 8820001c srl %r2,28 00000000001156c0: c02700000001 xilf %r2,1 Call Trace: ([<0000000000000000>] 0x0) [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp] [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp] [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp] [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp] [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio] [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio] [<0000000000141fd4>] tasklet_action+0x9c/0x170 [<0000000000141550>] __do_softirq+0xe8/0x258 [<000000000010ce0a>] do_softirq+0xba/0xc0 [<000000000014187c>] irq_exit+0xc4/0xe8 [<000000000046b526>] do_IRQ+0x146/0x1d8 [<00000000005d6a3c>] io_return+0x0/0x8 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0 ([<0000000000000000>] 0x0) [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8 [<0000000000114782>] smp_start_secondary+0xda/0xe8 [<00000000005d6efe>] restart_int_handler+0x56/0x6c [<0000000000000000>] 0x0 Last Breaking-Event-Address: [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0 Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Fixes: ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Bart Van Assche authored
commit d3a2418e upstream. This patch avoids that Coverity complains about not checking the ib_find_pkey() return value. Fixes: commit 547af765 ("IB/multicast: Report errors on multicast groups if P_key changes") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Bart Van Assche authored
commit 11b642b8 upstream. This patch avoids that Coverity reports the following: Using uninitialized value port_attr.state when calling printk Fixes: commit 94232d9c ("IPoIB: Start multicast join process only on active ports") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Bart Van Assche authored
commit 2fe2f378 upstream. The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS (80) elements. Hence compare the array index with that value instead of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity reports the following: Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127). Fixes: commit b7ab0b19 ("IB/mad: Verify mgmt class in received MADs") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
NeilBrown authored
commit bcc7f5b4 upstream. bdev->bd_contains is not stable before calling __blkdev_get(). When __blkdev_get() is called on a parition with ->bd_openers == 0 it sets bdev->bd_contains = bdev; which is not correct for a partition. After a call to __blkdev_get() succeeds, ->bd_openers will be > 0 and then ->bd_contains is stable. When FMODE_EXCL is used, blkdev_get() calls bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim() This call happens before __blkdev_get() is called, so ->bd_contains is not stable. So bd_may_claim() cannot safely use ->bd_contains. It currently tries to use it, and this can lead to a BUG_ON(). This happens when a whole device is already open with a bd_holder (in use by dm in my particular example) and two threads race to open a partition of that device for the first time, one opening with O_EXCL and one without. The thread that doesn't use O_EXCL gets through blkdev_get() to __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev; Immediately thereafter the other thread, using FMODE_EXCL, calls bd_start_claiming() from blkdev_get(). This should fail because the whole device has a holder, but because bdev->bd_contains == bdev bd_may_claim() incorrectly reports success. This thread continues and blocks on bd_mutex. The first thread then sets bdev->bd_contains correctly and drops the mutex. The thread using FMODE_EXCL then continues and when it calls bd_may_claim() again in: BUG_ON(!bd_may_claim(bdev, whole, holder)); The BUG_ON fires. Fix this by removing the dependency on ->bd_contains in bd_may_claim(). As bd_may_claim() has direct access to the whole device, it can simply test if the target bdev is the whole device. Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Maxim Patlasov authored
commit 2939e1a8 upstream. Problem statement: unprivileged user who has read-write access to more than one btrfs subvolume may easily consume all kernel memory (eventually triggering oom-killer). Reproducer (./mkrmdir below essentially loops over mkdir/rmdir): [root@kteam1 ~]# cat prep.sh DEV=/dev/sdb mkfs.btrfs -f $DEV mount $DEV /mnt for i in `seq 1 16` do mkdir /mnt/$i btrfs subvolume create /mnt/SV_$i ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2` mount -t btrfs -o subvolid=$ID $DEV /mnt/$i chmod a+rwx /mnt/$i done [root@kteam1 ~]# sh prep.sh [maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done [root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done kmalloc-128 10144 10144 128 32 1 : tunables 0 0 0 : slabdata 317 317 0 kmalloc-128 9992352 9992352 128 32 1 : tunables 0 0 0 : slabdata 312261 312261 0 kmalloc-128 24226752 24226752 128 32 1 : tunables 0 0 0 : slabdata 757086 757086 0 kmalloc-128 42754240 42754240 128 32 1 : tunables 0 0 0 : slabdata 1336070 1336070 0 The huge numbers above come from insane number of async_work-s allocated and queued by btrfs_wq_run_delayed_node. The problem is caused by btrfs_wq_run_delayed_node() queuing more and more works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The worker func (btrfs_async_run_delayed_root) processes at least BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery works as expected while the list is almost empty. As soon as it is getting bigger, worker func starts to process more than one item at a time, it takes longer, and the chances to have async_works queued more than needed is getting higher. The problem above is worsened by another flaw of delayed-inode implementation: if async_work was queued in a throttling branch (number of items >= BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that the func occupies CPU infinitely (up to 30sec in my experiments): while the func is trying to drain the list, the user activity may add more and more items to the list. The patch fixes both problems in straightforward way: refuse queuing too many works in btrfs_wq_run_delayed_node and bail out of worker func if at least BTRFS_DELAYED_WRITEBACK items are processed. Changed in v2: remove support of thresh == NO_THRESHOLD. Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Chris Mason <clm@fb.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniel Dressler authored
commit a585e948 upstream. This is the second independent patch of a larger project to cleanup btrfs's internal usage of btrfs_root. Many functions take btrfs_root only to grab the fs_info struct. By requiring a root these functions cause programmer overhead. That these functions can accept any valid root is not obvious until inspection. This patch reduces the specificity of such functions to accept the fs_info directly. These patches can be applied independently and thus are not being submitted as a patch series. There should be about 26 patches by the project's completion. Each patch will cleanup between 1 and 34 functions apiece. Each patch covers a single file's functions. This patch affects the following function(s): 1) btrfs_wq_run_delayed_node Signed-off-by: Daniel Dressler <danieru.dressler@gmail.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jack Morgenstein authored
commit c482af64 upstream. For non-special QPs, the port value becomes non-zero only at the RESET-to-INIT transition. If the QP has not undergone that transition, its port number value is still zero. If such a QP is destroyed before being moved out of the RESET state, subtracting one from the qp port number results in a negative value. Using that negative value as an index into the qp1_proxy array results in an out-of-bounds array reference. Fix this by testing that the QP type is one that uses qp1_proxy before using the port number. For special QPs of all types, the port number is specified at QP creation time. Fixes: 9433c188 ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Eran Ben Elisha authored
commit 1f22e454 upstream. According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is supported only if dmfs_ipoib bit is set. If it isn't set we want to ensure allocating NET_IF QPs fail. We do so by filling out the allocation bitmap. By thus, the NET_IF QPs allocating function won't find any free QP and will fail. Fixes: c1c98501 ('IB/mlx4: Add support for steerable IB UD QPs') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit 5716863e upstream. fsnotify_unmount_inodes() plays complex tricks to pin next inode in the sb->s_inodes list when iterating over all inodes. Furthermore the code has a bug that if the current inode is the last on i_sb_list that does not have e.g. I_FREEING set, then we leave next_i pointing to inode which may get removed from the i_sb_list once we drop s_inode_list_lock thus resulting in use-after-free issues (usually manifesting as infinite looping in fsnotify_unmount_inodes()). Fix the problem by keeping current inode pinned somewhat longer. Then we can make the code much simpler and standard. Signed-off-by: Jan Kara <jack@suse.cz> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-