1. 16 Mar, 2017 40 commits
    • Alan Stern's avatar
      USB: gadgetfs: fix use-after-free bug · c2a30a34
      Alan Stern authored
      commit add333a8 upstream.
      
      Andrey Konovalov reports that fuzz testing with syzkaller causes a
      KASAN use-after-free bug report in gadgetfs:
      
      BUG: KASAN: use-after-free in gadgetfs_setup+0x208a/0x20e0 at addr ffff88003dfe5bf2
      Read of size 2 by task syz-executor0/22994
      CPU: 3 PID: 22994 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #16
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88006df06a18 ffffffff81f96aba ffffffffe0528500 1ffff1000dbe0cd6
       ffffed000dbe0cce ffff88006df068f0 0000000041b58ab3 ffffffff8598b4c8
       ffffffff81f96828 1ffff1000dbe0ccd ffff88006df06708 ffff88006df06748
      Call Trace:
       <IRQ> [  201.343209]  [<     inline     >] __dump_stack lib/dump_stack.c:15
       <IRQ> [  201.343209]  [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51
       [<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159
       [<     inline     >] print_address_description mm/kasan/report.c:197
       [<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286
       [<     inline     >] kasan_report mm/kasan/report.c:306
       [<ffffffff817e562a>] __asan_report_load_n_noabort+0x3a/0x40 mm/kasan/report.c:337
       [<     inline     >] config_buf drivers/usb/gadget/legacy/inode.c:1298
       [<ffffffff8322c8fa>] gadgetfs_setup+0x208a/0x20e0 drivers/usb/gadget/legacy/inode.c:1368
       [<ffffffff830fdcd0>] dummy_timer+0x11f0/0x36d0 drivers/usb/gadget/udc/dummy_hcd.c:1858
       [<ffffffff814807c1>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
       [<     inline     >] expire_timers kernel/time/timer.c:1348
       [<ffffffff81482de6>] __run_timers+0xa06/0xec0 kernel/time/timer.c:1641
       [<ffffffff814832c1>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
       [<ffffffff84f4af8b>] __do_softirq+0x2fb/0xb63 kernel/softirq.c:284
      
      The cause of the bug is subtle.  The dev_config() routine gets called
      twice by the fuzzer.  The first time, the user data contains both a
      full-speed configuration descriptor and a high-speed config
      descriptor, causing dev->hs_config to be set.  But it also contains an
      invalid device descriptor, so the buffer containing the descriptors is
      deallocated and dev_config() returns an error.
      
      The second time dev_config() is called, the user data contains only a
      full-speed config descriptor.  But dev->hs_config still has the stale
      pointer remaining from the first call, causing the routine to think
      that there is a valid high-speed config.  Later on, when the driver
      dereferences the stale pointer to copy that descriptor, we get a
      use-after-free access.
      
      The fix is simple: Clear dev->hs_config if the passed-in data does not
      contain a high-speed config descriptor.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c2a30a34
    • Alan Stern's avatar
      USB: gadgetfs: fix unbounded memory allocation bug · 685731f4
      Alan Stern authored
      commit faab5098 upstream.
      
      Andrey Konovalov reports that fuzz testing with syzkaller causes a
      KASAN warning in gadgetfs:
      
      BUG: KASAN: slab-out-of-bounds in dev_config+0x86f/0x1190 at addr ffff88003c47e160
      Write of size 65537 by task syz-executor0/6356
      CPU: 3 PID: 6356 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #19
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88003c107ad8 ffffffff81f96aba ffffffff3dc11ef0 1ffff10007820eee
       ffffed0007820ee6 ffff88003dc11f00 0000000041b58ab3 ffffffff8598b4c8
       ffffffff81f96828 ffffffff813fb4a0 ffff88003b6eadc0 ffff88003c107738
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51
       [<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159
       [<     inline     >] print_address_description mm/kasan/report.c:197
       [<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286
       [<ffffffff817e5705>] kasan_report+0x35/0x40 mm/kasan/report.c:306
       [<     inline     >] check_memory_region_inline mm/kasan/kasan.c:308
       [<ffffffff817e3fb9>] check_memory_region+0x139/0x190 mm/kasan/kasan.c:315
       [<ffffffff817e4044>] kasan_check_write+0x14/0x20 mm/kasan/kasan.c:326
       [<     inline     >] copy_from_user arch/x86/include/asm/uaccess.h:689
       [<     inline     >] ep0_write drivers/usb/gadget/legacy/inode.c:1135
       [<ffffffff83228caf>] dev_config+0x86f/0x1190 drivers/usb/gadget/legacy/inode.c:1759
       [<ffffffff817fdd55>] __vfs_write+0x5d5/0x760 fs/read_write.c:510
       [<ffffffff817ff650>] vfs_write+0x170/0x4e0 fs/read_write.c:560
       [<     inline     >] SYSC_write fs/read_write.c:607
       [<ffffffff81803a5b>] SyS_write+0xfb/0x230 fs/read_write.c:599
       [<ffffffff84f47ec1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      Indeed, there is a comment saying that the value of len is restricted
      to a 16-bit integer, but the code doesn't actually do this.
      
      This patch fixes the warning.  It replaces the comment with a
      computation that forces the amount of data copied from the user in
      ep0_write() to be no larger than the wLength size for the control
      transfer, which is a 16-bit quantity.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      685731f4
    • Greg Kroah-Hartman's avatar
      usb: gadgetfs: restrict upper bound on device configuration size · d54d9607
      Greg Kroah-Hartman authored
      commit 0994b0a2 upstream.
      
      Andrey Konovalov reported that we were not properly checking the upper
      limit before of a device configuration size before calling
      memdup_user(), which could cause some problems.
      
      So set the upper limit to PAGE_SIZE * 4, which should be good enough for
      all devices.
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d54d9607
    • Alan Stern's avatar
      USB: dummy-hcd: fix bug in stop_activity (handle ep0) · 967066e0
      Alan Stern authored
      commit bcdbeb84 upstream.
      
      The stop_activity() routine in dummy-hcd is supposed to unlink all
      active requests for every endpoint, among other things.  But it
      doesn't handle ep0.  As a result, fuzz testing can generate a WARNING
      like the following:
      
      WARNING: CPU: 0 PID: 4410 at drivers/usb/gadget/udc/dummy_hcd.c:672 dummy_free_request+0x153/0x170
      Modules linked in:
      CPU: 0 PID: 4410 Comm: syz-executor Not tainted 4.9.0-rc7+ #32
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88006a64ed10 ffffffff81f96b8a ffffffff41b58ab3 1ffff1000d4c9d35
       ffffed000d4c9d2d ffff880065f8ac00 0000000041b58ab3 ffffffff8598b510
       ffffffff81f968f8 0000000041b58ab3 ffffffff859410e0 ffffffff813f0590
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
       [<ffffffff812b808f>] __warn+0x19f/0x1e0 kernel/panic.c:550
       [<ffffffff812b831c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
       [<ffffffff830fcb13>] dummy_free_request+0x153/0x170 drivers/usb/gadget/udc/dummy_hcd.c:672
       [<ffffffff830ed1b0>] usb_ep_free_request+0xc0/0x420 drivers/usb/gadget/udc/core.c:195
       [<ffffffff83225031>] gadgetfs_unbind+0x131/0x190 drivers/usb/gadget/legacy/inode.c:1612
       [<ffffffff830ebd8f>] usb_gadget_remove_driver+0x10f/0x2b0 drivers/usb/gadget/udc/core.c:1228
       [<ffffffff830ec084>] usb_gadget_unregister_driver+0x154/0x240 drivers/usb/gadget/udc/core.c:1357
      
      This patch fixes the problem by iterating over all the endpoints in
      the driver's ep array instead of iterating over the gadget's ep_list,
      which explicitly leaves out ep0.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      967066e0
    • Krzysztof Opasiak's avatar
      usb: gadget: composite: Test get_alt() presence instead of set_alt() · b20f2b40
      Krzysztof Opasiak authored
      commit 7e4da3fc upstream.
      
      By convention (according to doc) if function does not provide
      get_alt() callback composite framework should assume that it has only
      altsetting 0 and should respond with error if host tries to set
      other one.
      
      After commit dd4dff8b ("USB: composite: Fix bug: should test
      set_alt function pointer before use it")
      we started checking set_alt() callback instead of get_alt().
      This check is useless as we check if set_alt() is set inside
      usb_add_function() and fail if it's NULL.
      
      Let's fix this check and move comment about why we check the get
      method instead of set a little bit closer to prevent future false
      fixes.
      
      Fixes: dd4dff8b ("USB: composite: Fix bug: should test set_alt function pointer before use it")
      Signed-off-by: default avatarKrzysztof Opasiak <k.opasiak@samsung.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b20f2b40
    • Felipe Balbi's avatar
      usb: dwc3: gadget: always unmap EP0 requests · b7c78a2f
      Felipe Balbi authored
      commit d6214592 upstream.
      
      commit 0416e494 ("usb: dwc3: ep0: correct cache
      sync issue in case of ep0_bounced") introduced a bug
      where we would leak DMA resources which would cause
      us to starve the system of them resulting in failing
      DMA transfers.
      
      Fix the bug by making sure that we always unmap EP0
      requests since those are *always* mapped.
      
      Fixes: 0416e494 ("usb: dwc3: ep0: correct cache
      	sync issue in case of ep0_bounced")
      Tested-by: default avatarTomasz Medrek <tomaszx.medrek@intel.com>
      Reported-by: default avatarJanusz Dziedzic <januszx.dziedzic@linux.intel.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b7c78a2f
    • Felipe Balbi's avatar
      usb: dwc3: ep0: explicitly call dwc3_ep0_prepare_one_trb() · 43a76bc9
      Felipe Balbi authored
      commit 19ec3123 upstream.
      
      Let's call dwc3_ep0_prepare_one_trb() explicitly
      because there are occasions where we will need more
      than one TRB to handle an EP0 transfer.
      
      A follow-up patch will fix one bug related to
      multiple-TRB Data Phases when it comes to
      mapping/unmapping requests for DMA.
      Reported-by: default avatarJanusz Dziedzic <januszx.dziedzic@linux.intel.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      [bwh: Backported to 3.16:
       - dwc3_ep0_prepare_one_trb() and dwc3_ep0_start_trans() don't take a
         'chain' parameter
       - Some of the call sites don't exist here
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      43a76bc9
    • Felipe Balbi's avatar
      usb: dwc3: ep0: add dwc3_ep0_prepare_one_trb() · 77e27869
      Felipe Balbi authored
      commit 7931ec86 upstream.
      
      For now this is just a cleanup patch, no functional
      changes. We will be using the new function to fix a
      bug introduced long ago by commit 0416e494
      ("usb: dwc3: ep0: correct cache sync issue in case
      of ep0_bounced") and further worsened by commit
      c0bd5456 ("usb: dwc3: ep0: handle non maxpacket
      aligned transfers > 512")
      Reported-by: default avatarJanusz Dziedzic <januszx.dziedzic@linux.intel.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      [bwh: Backported to 3.16:
       - dwc3_ep0_start_trans() doesn't have a 'chain' parameter, so don't
         give dwc3_ep0_prepare_one_trb() this parameter
       - Also delete a debug log statement, removed earlier upstream
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      77e27869
    • Hauke Mehrtens's avatar
      mtd: nand: xway: disable module support · 3402d5aa
      Hauke Mehrtens authored
      commit 73529c87 upstream.
      
      The xway_nand driver accesses the ltq_ebu_membase symbol which is not
      exported. This also should not get exported and we should handle the
      EBU interface in a better way later. This quick fix just deactivated
      support for building as module.
      
      Fixes: 99f2b107 ("mtd: lantiq: Add NAND support on Lantiq XWAY SoC.")
      Signed-off-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3402d5aa
    • Vladimir Zapolskiy's avatar
      ARM: dts: imx31: fix AVIC base address · dcef625d
      Vladimir Zapolskiy authored
      commit af92305e upstream.
      
      On i.MX31 AVIC interrupt controller base address is at 0x68000000.
      
      The problem was shadowed by the AVIC driver, which takes the correct
      base address from a SoC specific header file.
      
      Fixes: d2a37b3d ("ARM i.MX31: Add devicetree support")
      Signed-off-by: default avatarVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: default avatarFabio Estevam <fabio.estevam@nxp.com>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dcef625d
    • Leon Romanovsky's avatar
      net/mlx4: Remove BUG_ON from ICM allocation routine · 5ebfea6e
      Leon Romanovsky authored
      commit c1d5f8ff upstream.
      
      This patch removes BUG_ON() macro from mlx4_alloc_icm_coherent()
      by checking DMA address alignment in advance and performing proper
      folding in case of error.
      
      Fixes: 5b0bf5e2 ("mlx4_core: Support ICM tables in coherent memory")
      Reported-by: default avatarOzgur Karatas <okaratas@member.fsf.org>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5ebfea6e
    • Eugenia Emantayev's avatar
      net/mlx4_en: Fix bad WQE issue · 26f2a449
      Eugenia Emantayev authored
      commit 6496bbf0 upstream.
      
      Single send WQE in RX buffer should be stamped with software
      ownership in order to prevent the flow of QP in error in FW
      once UPDATE_QP is called.
      
      Fixes: 9f519f68 ('mlx4_en: Not using Shared Receive Queues')
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      26f2a449
    • Jack Morgenstein's avatar
      net/mlx4_core: Use-after-free causes a resource leak in flow-steering detach · 7e87d5f7
      Jack Morgenstein authored
      commit 3b01fe7f upstream.
      
      mlx4_QP_FLOW_STEERING_DETACH_wrapper first removes the steering
      rule (which results in freeing the rule structure), and then
      references a field in this struct (the qp number) when releasing the
      busy-status on the rule's qp.
      
      Since this memory was freed, it could reallocated and changed.
      Therefore, the qp number in the struct may be incorrect,
      so that we are releasing the incorrect qp. This leaves the rule's qp
      in the busy state (and could possibly release an incorrect qp as well).
      
      Fix this by saving the qp number in a local variable, for use after
      removing the steering rule.
      
      Fixes: 2c473ae7 ("net/mlx4_core: Disallow releasing VF QPs which have steering rules")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7e87d5f7
    • Florian Fainelli's avatar
      net: stmmac: Fix error path after register_netdev move · ca92b798
      Florian Fainelli authored
      commit b2eb09af upstream.
      
      Commit 57016590 ("net: stmmac: Fix race between stmmac_drv_probe and
      stmmac_open") re-ordered how the MDIO bus registration and the network
      device are registered, but missed to unwind the MDIO bus registration in
      case we fail to register the network device.
      
      Fixes: 57016590 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarKweh, Hock Leong <hock.leong.kweh@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16:
       - stmmac_dvr_probe() returns a pointer
       - 'pcs' is a member of struct stmmac_priv, not struct mac_device_info]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ca92b798
    • Florian Fainelli's avatar
      net: stmmac: Fix race between stmmac_drv_probe and stmmac_open · ea6835ec
      Florian Fainelli authored
      commit 57016590 upstream.
      
      There is currently a small window during which the network device registered by
      stmmac can be made visible, yet all resources, including and clock and MDIO bus
      have not had a chance to be set up, this can lead to the following error to
      occur:
      
      [  473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                      stmmac_dvr_probe: warning: cannot get CSR clock
      [  473.919382] stmmaceth 0000:01:00.0: no reset control found
      [  473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
      [  473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
      [  473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
      [  473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
      [  473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                      Enable RX Mitigation via HW Watchdog Timer
      [  473.921395] libphy: PHY stmmac-1:00 not found
      [  473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
      [  473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
                      PHY (error: -19)
      [  473.959710] libphy: stmmac: probed
      [  473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
                      (stmmac-1:00) active
      [  473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
                      (stmmac-1:01)
      [  473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
                      (stmmac-1:02)
      [  473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
                      (stmmac-1:03)
      
      Fix this by making sure that register_netdev() is the last thing being done,
      which guarantees that the clock and the MDIO bus are available.
      
      Fixes: 4bfcbd7a ("stmmac: Move the mdio_register/_unregister in probe/remove")
      Reported-by: default avatarKweh, Hock Leong <hock.leong.kweh@intel.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16:
       - stmmac_dvr_probe() returns a pointer
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ea6835ec
    • Florian Fainelli's avatar
      net: korina: Fix NAPI versus resources freeing · 9aeb0ab3
      Florian Fainelli authored
      commit e6afb1ad upstream.
      
      Commit beb0babf ("korina: disable napi on close and restart")
      introduced calls to napi_disable() that were missing before,
      unfortunately this leaves a small window during which NAPI has a chance
      to run, yet we just freed resources since korina_free_ring() has been
      called:
      
      Fix this by disabling NAPI first then freeing resource, and make sure
      that we also cancel the restart task before doing the resource freeing.
      
      Fixes: beb0babf ("korina: disable napi on close and restart")
      Reported-by: default avatarAlexandros C. Couloumbis <alex@ozo.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9aeb0ab3
    • Daniel Borkmann's avatar
      net, sched: fix soft lockup in tc_classify · 9fa6aa1d
      Daniel Borkmann authored
      commit 628185cf upstream.
      
      Shahar reported a soft lockup in tc_classify(), where we run into an
      endless loop when walking the classifier chain due to tp->next == tp
      which is a state we should never run into. The issue only seems to
      trigger under load in the tc control path.
      
      What happens is that in tc_ctl_tfilter(), thread A allocates a new
      tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
      with it. In that classifier callback we had to unlock/lock the rtnl
      mutex and returned with -EAGAIN. One reason why we need to drop there
      is, for example, that we need to request an action module to be loaded.
      
      This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
      after we loaded and found the requested action, we need to redo the
      whole request so we don't race against others. While we had to unlock
      rtnl in that time, thread B's request was processed next on that CPU.
      Thread B added a new tp instance successfully to the classifier chain.
      When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
      and destroying its tp instance which never got linked, we goto replay
      and redo A's request.
      
      This time when walking the classifier chain in tc_ctl_tfilter() for
      checking for existing tp instances we had a priority match and found
      the tp instance that was created and linked by thread B. Now calling
      again into tp->ops->change() with that tp was successful and returned
      without error.
      
      tp_created was never cleared in the second round, thus kernel thinks
      that we need to link it into the classifier chain (once again). tp and
      *back point to the same object due to the match we had earlier on. Thus
      for thread B's already public tp, we reset tp->next to tp itself and
      link it into the chain, which eventually causes the mentioned endless
      loop in tc_classify() once a packet hits the data path.
      
      Fix is to clear tp_created at the beginning of each request, also when
      we replay it. On the paths that can cause -EAGAIN we already destroy
      the original tp instance we had and on replay we really need to start
      from scratch. It seems that this issue was first introduced in commit
      12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
      and avoid kernel panic when we use cls_cgroup").
      
      Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
      Reported-by: default avatarShahar Klein <shahark@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarShahar Klein <shahark@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9fa6aa1d
    • Chris Brandt's avatar
      clk: renesas: mstp: Support 8-bit registers for r7s72100 · 7eed9ab5
      Chris Brandt authored
      commit e2a33c34 upstream.
      
      The RZ/A1 is different than the other Renesas SOCs because the MSTP
      registers are 8-bit instead of 32-bit and if you try writing values as
      32-bit nothing happens...meaning this driver never worked for r7s72100.
      
      Fixes: b6face40 ("ARM: shmobile: r7s72100: add essential clock nodes to dtsi")
      Signed-off-by: default avatarChris Brandt <chris.brandt@renesas.com>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7eed9ab5
    • Arnd Bergmann's avatar
      s5k4ecgx: select CRC32 helper · ecf50dda
      Arnd Bergmann authored
      commit c739c0a7 upstream.
      
      A rare randconfig build failure shows up in this driver when
      the CRC32 helper is not there:
      
      drivers/media/built-in.o: In function `s5k4ecgx_s_power':
      s5k4ecgx.c:(.text+0x9eb4): undefined reference to `crc32_le'
      
      This adds the 'select' that all other users of this function have.
      
      Fixes: 8b99312b ("[media] Add v4l2 subdev driver for S5K4ECGX sensor")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ecf50dda
    • Marcos Paulo de Souza's avatar
      Input: i8042 - add Pegatron touchpad to noloop table · d32397fe
      Marcos Paulo de Souza authored
      commit 41c567a5 upstream.
      
      Avoid AUX loopback in Pegatron C15B touchpad, so input subsystem is able
      to recognize a Synaptics touchpad in the AUX port.
      
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=93791
      (Touchpad is not detected on DNS 0801480 notebook (PEGATRON C15B))
      Suggested-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d32397fe
    • Larry Finger's avatar
      powerpc: Fix build warning on 32-bit PPC · 13f9cb6e
      Larry Finger authored
      commit 8ae679c4 upstream.
      
      I am getting the following warning when I build kernel 4.9-git on my
      PowerBook G4 with a 32-bit PPC processor:
      
          AS      arch/powerpc/kernel/misc_32.o
        arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
      
      This problem is evident after commit 989cea5c ("kbuild: prevent
      lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
      error that has been in the code since 2005 when this source file was
      created.  That was with commit 9994a338 ("powerpc: Introduce
      entry_{32,64}.S, misc_{32,64}.S, systbl.S").
      
      The offending line does not make a lot of sense.  This error does not
      seem to cause any errors in the executable, thus I am not recommending
      that it be applied to any stable versions.
      
      Thanks to Nicholas Piggin for suggesting this solution.
      
      Fixes: 9994a338 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      13f9cb6e
    • Aleksa Sarai's avatar
      fs: exec: apply CLOEXEC before changing dumpable task flags · 8f7ceb94
      Aleksa Sarai authored
      commit 613cc2b6 upstream.
      
      If you have a process that has set itself to be non-dumpable, and it
      then undergoes exec(2), any CLOEXEC file descriptors it has open are
      "exposed" during a race window between the dumpable flags of the process
      being reset for exec(2) and CLOEXEC being applied to the file
      descriptors. This can be exploited by a process by attempting to access
      /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
      
      The race in question is after set_dumpable has been (for get_link,
      though the trace is basically the same for readlink):
      
      [vfs]
      -> proc_pid_link_inode_operations.get_link
         -> proc_pid_get_link
            -> proc_fd_access_allowed
               -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
      
      Which will return 0, during the race window and CLOEXEC file descriptors
      will still be open during this window because do_close_on_exec has not
      been called yet. As a result, the ordering of these calls should be
      reversed to avoid this race window.
      
      This is of particular concern to container runtimes, where joining a
      PID namespace with file descriptors referring to the host filesystem
      can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
      against access of CLOEXEC file descriptors -- file descriptors which may
      reference filesystem objects the container shouldn't have access to).
      
      Cc: dev@opencontainers.org
      Reported-by: default avatarMichael Crosby <crosbymichael@gmail.com>
      Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8f7ceb94
    • Hans de Goede's avatar
      drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating · 12d4dffb
      Hans de Goede authored
      commit bb98e72a upstream.
      
      On my Cherrytrail CUBE iwork8 Air tablet PIPE-A would get stuck on loading
      i915 at boot 1 out of every 3 boots, resulting in a non functional LCD.
      Once the i915 driver has successfully loaded, the panel can be disabled /
      enabled without hitting this issue.
      
      The getting stuck is caused by vlv_init_display_clock_gating() clearing
      the DPOUNIT_CLOCK_GATE_DISABLE bit in DSPCLK_GATE_D when called from
      chv_pipe_power_well_ops.enable() on driver load, while a pipe is enabled
      driving the DSI LCD by the BIOS.
      
      Clearing this bit while DSI is in use is a known issue and
      intel_dsi_pre_enable() / intel_dsi_post_disable() already set / clear it
      as appropriate.
      
      This commit modifies vlv_init_display_clock_gating() to leave the
      DPOUNIT_CLOCK_GATE_DISABLE bit alone fixing the pipe getting stuck.
      
      Changes in v2:
      -Replace PIPE-A with "a pipe" or "the pipe" in the commit msg and
      comment
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97330Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161202142904.25613-1-hdegoede@redhat.comSigned-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      (cherry picked from commit 721d4845)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      [bwh: Backported to 3.16: adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      12d4dffb
    • NeilBrown's avatar
      NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. · adfffd77
      NeilBrown authored
      commit cfd278c2 upstream.
      
      Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
      then ds->ds_clp will also be non-NULL.
      
      This is not necessasrily true in the case when the process received a fatal signal
      while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
      In that case ->ds_clp may not be set, and the devid may not recently have been marked
      unavailable.
      
      So add a test for ds_clp == NULL and return NULL in that case.
      
      Fixes: c23266d5 ("NFS4.1 Fix data server connection race")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Acked-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Acked-by: default avatarAdamson, Andy <William.Adamson@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      adfffd77
    • Russell Currey's avatar
      drivers/gpu/drm/ast: Fix infinite loop if read fails · 33b5e6fc
      Russell Currey authored
      commit 298360af upstream.
      
      ast_get_dram_info() configures a window in order to access BMC memory.
      A BMC register can be configured to disallow this, and if so, causes
      an infinite loop in the ast driver which renders the system unusable.
      
      Fix this by erroring out if an error is detected.  On powerpc systems with
      EEH, this leads to the device being fenced and the system continuing to
      operate.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161215051241.20815-1-ruscur@russell.ccSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      33b5e6fc
    • Russell King's avatar
      i2c: mux: pca954x: fix i2c mux selection caching · cbc43c0d
      Russell King authored
      commit 7f638c1c upstream.
      
      smbus functions return -ve on error, 0 on success.  However,
      __i2c_transfer() have a different return signature - -ve on error, or
      number of buffers transferred (which may be zero or greater.)
      
      The upshot of this is that the sense of the test is reversed when using
      the mux on a bus supporting the master_xfer method: we cache the value
      and never retry if we fail to transfer any buffers, but if we succeed,
      we clear the cached value.
      
      Fix this by making pca954x_reg_write() return a negative error code for
      all failure cases.
      
      Fixes: 463e8f84 ("i2c: mux: pca954x: retry updating the mux selection on failure")
      Acked-by: default avatarPeter Rosin <peda@axentia.se>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cbc43c0d
    • Dan Carpenter's avatar
      target/iscsi: Fix double free in lio_target_tiqn_addtpg() · f99aa824
      Dan Carpenter authored
      commit a91918cd upstream.
      
      This iscsit_tpg_add_portal_group() function is only called from
      lio_target_tiqn_addtpg().  Both functions free the "tpg" pointer on
      error so it's a double free bug.  The memory is allocated in the caller
      so it should be freed in the caller and not here.
      
      Fixes: e48354ce ("iscsi-target: Add iSCSI fabric support for target v4.1")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarDavid Disseldorp <ddiss@suse.de>
      [ bvanassche: Added "Fix" at start of patch title ]
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f99aa824
    • Wei Fang's avatar
      scsi: avoid a permanent stop of the scsi device's request queue · 2898d5f9
      Wei Fang authored
      commit d2a14525 upstream.
      
      A race between scanning and fc_remote_port_delete() may result in a
      permanent stop if the device gets blocked before scsi_sysfs_add_sdev()
      and unblocked after.  The reason is that blocking a device sets both the
      SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED.  However,
      scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the
      device to be ignored by scsi_target_unblock() and thus never have its
      QUEUE_FLAG_STOPPED cleared leading to a device which is apparently
      running but has a stopped queue.
      
      We actually have two places where SDEV_RUNNING is set: once in
      scsi_add_lun() which respects the blocked flag and once in
      scsi_sysfs_add_sdev() which doesn't.  Since the second set is entirely
      spurious, simply remove it to fix the problem.
      Reported-by: default avatarZengxi Chen <chenzengxi@huawei.com>
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2898d5f9
    • Steffen Maier's avatar
      scsi: zfcp: fix rport unblock race with LUN recovery · 044a8cee
      Steffen Maier authored
      commit 6f2ce1c6 upstream.
      
      It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
      with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
      window when zfcp detected an unavailable rport but
      fc_remote_port_delete(), which is asynchronous via
      zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
      
      However, for the case when the rport becomes available again, we should
      prevent unblocking the rport too early.  In contrast to other FCP LLDDs,
      zfcp has to open each LUN with the FCP channel hardware before it can
      send I/O to a LUN.  So if a port already has LUNs attached and we
      unblock the rport just after port recovery, recoveries of LUNs behind
      this port can still be pending which in turn force
      zfcp_scsi_queuecommand() to unnecessarily finish requests with
      DID_IMM_RETRY.
      
      This also opens a time window with unblocked rport (until the followup
      LUN reopen recovery has finished).  If a scsi_cmnd timeout occurs during
      this time window fc_timed_out() cannot work as desired and such command
      would indeed time out and trigger scsi_eh. This prevents a clean and
      timely path failover.  This should not happen if the path issue can be
      recovered on FC transport layer such as path issues involving RSCNs.
      
      Fix this by only calling zfcp_scsi_schedule_rport_register(), to
      asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
      children of the rport have finished and no new recoveries of equal or
      higher order were triggered meanwhile.  Finished intentionally includes
      any recovery result no matter if successful or failed (still unblock
      rport so other successful LUNs work).  For simplicity, we check after
      each finished LUN recovery if there is another LUN recovery pending on
      the same port and then do nothing.  We handle the special case of a
      successful recovery of a port without LUN children the same way without
      changing this case's semantics.
      
      For debugging we introduce 2 new trace records written if the rport
      unblock attempt was aborted due to still unfinished or freshly triggered
      recovery. The records are only written above the default trace level.
      
      Benjamin noticed the important special case of new recovery that can be
      triggered between having given up the erp_lock and before calling
      zfcp_erp_action_cleanup() within zfcp_erp_strategy().  We must avoid the
      following sequence:
      
      ERP thread                 rport_work      other context
      -------------------------  --------------  --------------------------------
      port is unblocked, rport still blocked,
       due to pending/running ERP action,
       so ((port->status & ...UNBLOCK) != 0)
       and (port->rport == NULL)
      unlock ERP
      zfcp_erp_action_cleanup()
      case ZFCP_ERP_ACTION_REOPEN_LUN:
      zfcp_erp_try_rport_unblock()
      ((status & ...UNBLOCK) != 0) [OLD!]
                                                 zfcp_erp_port_reopen()
                                                 lock ERP
                                                 zfcp_erp_port_block()
                                                 port->status clear ...UNBLOCK
                                                 unlock ERP
                                                 zfcp_scsi_schedule_rport_block()
                                                 port->rport_task = RPORT_DEL
                                                 queue_work(rport_work)
                                 zfcp_scsi_rport_work()
                                 (port->rport_task != RPORT_ADD)
                                 port->rport_task = RPORT_NONE
                                 zfcp_scsi_rport_block()
                                 if (!port->rport) return
      zfcp_scsi_schedule_rport_register()
      port->rport_task = RPORT_ADD
      queue_work(rport_work)
                                 zfcp_scsi_rport_work()
                                 (port->rport_task == RPORT_ADD)
                                 port->rport_task = RPORT_NONE
                                 zfcp_scsi_rport_register()
                                 (port->rport == NULL)
                                 rport = fc_remote_port_add()
                                 port->rport = rport;
      
      Now the rport was erroneously unblocked while the zfcp_port is blocked.
      This is another situation we want to avoid due to scsi_eh
      potential. This state would at least remain until the new recovery from
      the other context finished successfully, or potentially forever if it
      failed.  In order to close this race, we take the erp_lock inside
      zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
      LUN.  With that, the possible corresponding rport state sequences would
      be: (unblock[ERP thread],block[other context]) if the ERP thread gets
      erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
      (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
      after the other context has already cleard ...UNBLOCK from port->status.
      
      Since checking fields of struct erp_action is unsafe because they could
      have been overwritten (re-used for new recovery) meanwhile, we only
      check status of zfcp_port and LUN since these are only changed under
      erp_lock elsewhere. Regarding the check of the proper status flags (port
      or port_forced are similar to the shown adapter recovery):
      
      [zfcp_erp_adapter_shutdown()]
      zfcp_erp_adapter_reopen()
       zfcp_erp_adapter_block()
        * clear UNBLOCK ---------------------------------------+
       zfcp_scsi_schedule_rports_block()                       |
       write_lock_irqsave(&adapter->erp_lock, flags);-------+  |
       zfcp_erp_action_enqueue()                            |  |
        zfcp_erp_setup_act()                                |  |
         * set ERP_INUSE -----------------------------------|--|--+
       write_unlock_irqrestore(&adapter->erp_lock, flags);--+  |  |
      .context-switch.                                         |  |
      zfcp_erp_thread()                                        |  |
       zfcp_erp_strategy()                                     |  |
        write_lock_irqsave(&adapter->erp_lock, flags);------+  |  |
        ...                                                 |  |  |
        zfcp_erp_strategy_check_target()                    |  |  |
         zfcp_erp_strategy_check_adapter()                  |  |  |
          zfcp_erp_adapter_unblock()                        |  |  |
           * set UNBLOCK -----------------------------------|--+  |
        zfcp_erp_action_dequeue()                           |     |
         * clear ERP_INUSE ---------------------------------|-----+
        ...                                                 |
        write_unlock_irqrestore(&adapter->erp_lock, flags);-+
      
      Hence, we should check for both UNBLOCK and ERP_INUSE because they are
      interleaved.  Also we need to explicitly check ERP_FAILED for the link
      down case which currently does not clear the UNBLOCK flag in
      zfcp_fsf_link_down_info_eval().
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 8830271c ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
      Fixes: a2fa0aed ("[SCSI] zfcp: Block FC transport rports early on errors")
      Fixes: 5f852be9 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
      Fixes: 338151e0 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
      Fixes: 3859f6a2 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      044a8cee
    • Steffen Maier's avatar
      scsi: zfcp: do not trace pure benign residual HBA responses at default level · a61aaea9
      Steffen Maier authored
      commit 56d23ed7 upstream.
      
      Since quite a while, Linux issues enough SCSI commands per scsi_device
      which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE,
      and SAM_STAT_GOOD.  This floods the HBA trace area and we cannot see
      other and important HBA trace records long enough.
      
      Therefore, do not trace HBA response errors for pure benign residual
      under counts at the default trace level.
      
      This excludes benign residual under count combined with other validity
      bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL.  For all those other
      cases, we still do want to see both the HBA record and the corresponding
      SCSI record by default.
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: a54ca0f6 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a61aaea9
    • Benjamin Block's avatar
      scsi: zfcp: fix use-after-"free" in FC ingress path after TMF · 6c98813f
      Benjamin Block authored
      commit dac37e15 upstream.
      
      When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and
      eh_target_reset_handler(), it expects us to relent the ownership over
      the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN
      or target - when returning with SUCCESS from the callback ('release'
      them).  SCSI EH can then reuse those commands.
      
      We did not follow this rule to release commands upon SUCCESS; and if
      later a reply arrived for one of those supposed to be released commands,
      we would still make use of the scsi_cmnd in our ingress tasklet. This
      will at least result in undefined behavior or a kernel panic because of
      a wrong kernel pointer dereference.
      
      To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req
      *)->data in the matching scope if a TMF was successful. This is done
      under the locks (struct zfcp_adapter *)->abort_lock and (struct
      zfcp_reqlist *)->lock to prevent the requests from being removed from
      the request-hashtable, and the ingress tasklet from making use of the
      scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler().
      
      For cases where a reply arrives during SCSI EH, but before we get a
      chance to NULLify the pointer - but before we return from the callback
      -, we assume that the code is protected from races via the CAS operation
      in blk_complete_request() that is called in scsi_done().
      
      The following stacktrace shows an example for a crash resulting from the
      previous behavior:
      
      Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000
      Oops: 0038 [#1] SMP
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted
      task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000
      Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40)
                 R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
      Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015
                 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800
                 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93
                 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918
      Krnl Code: 00000000001156a2: a7190000        lghi    %r1,0
                 00000000001156a6: a7380015        lhi    %r3,21
                #00000000001156aa: e32050000008    ag    %r2,0(%r5)
                >00000000001156b0: 482022b0        lh    %r2,688(%r2)
                 00000000001156b4: ae123000        sigp    %r1,%r2,0(%r3)
                 00000000001156b8: b2220020        ipm    %r2
                 00000000001156bc: 8820001c        srl    %r2,28
                 00000000001156c0: c02700000001    xilf    %r2,1
      Call Trace:
      ([<0000000000000000>] 0x0)
       [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp]
       [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp]
       [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp]
       [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp]
       [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio]
       [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio]
       [<0000000000141fd4>] tasklet_action+0x9c/0x170
       [<0000000000141550>] __do_softirq+0xe8/0x258
       [<000000000010ce0a>] do_softirq+0xba/0xc0
       [<000000000014187c>] irq_exit+0xc4/0xe8
       [<000000000046b526>] do_IRQ+0x146/0x1d8
       [<00000000005d6a3c>] io_return+0x0/0x8
       [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0
      ([<0000000000000000>] 0x0)
       [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0
       [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8
       [<0000000000114782>] smp_start_secondary+0xda/0xe8
       [<00000000005d6efe>] restart_int_handler+0x56/0x6c
       [<0000000000000000>] 0x0
      Last Breaking-Event-Address:
       [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0
      Suggested-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Fixes: ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6c98813f
    • Bart Van Assche's avatar
      IB/multicast: Check ib_find_pkey() return value · bfe86555
      Bart Van Assche authored
      commit d3a2418e upstream.
      
      This patch avoids that Coverity complains about not checking the
      ib_find_pkey() return value.
      
      Fixes: commit 547af765 ("IB/multicast: Report errors on multicast groups if P_key changes")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bfe86555
    • Bart Van Assche's avatar
      IPoIB: Avoid reading an uninitialized member variable · e61b09c2
      Bart Van Assche authored
      commit 11b642b8 upstream.
      
      This patch avoids that Coverity reports the following:
      
          Using uninitialized value port_attr.state when calling printk
      
      Fixes: commit 94232d9c ("IPoIB: Start multicast join process only on active ports")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Erez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e61b09c2
    • Bart Van Assche's avatar
      IB/mad: Fix an array index check · 7d934d28
      Bart Van Assche authored
      commit 2fe2f378 upstream.
      
      The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
      (80) elements. Hence compare the array index with that value instead
      of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
      reports the following:
      
      Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).
      
      Fixes: commit b7ab0b19 ("IB/mad: Verify mgmt class in received MADs")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Reviewed-by: default avatarHal Rosenstock <hal@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7d934d28
    • NeilBrown's avatar
      block_dev: don't test bdev->bd_contains when it is not stable · 0e7994b1
      NeilBrown authored
      commit bcc7f5b4 upstream.
      
      bdev->bd_contains is not stable before calling __blkdev_get().
      When __blkdev_get() is called on a parition with ->bd_openers == 0
      it sets
        bdev->bd_contains = bdev;
      which is not correct for a partition.
      After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
      and then ->bd_contains is stable.
      
      When FMODE_EXCL is used, blkdev_get() calls
         bd_start_claiming() ->  bd_prepare_to_claim() -> bd_may_claim()
      
      This call happens before __blkdev_get() is called, so ->bd_contains
      is not stable.  So bd_may_claim() cannot safely use ->bd_contains.
      It currently tries to use it, and this can lead to a BUG_ON().
      
      This happens when a whole device is already open with a bd_holder (in
      use by dm in my particular example) and two threads race to open a
      partition of that device for the first time, one opening with O_EXCL and
      one without.
      
      The thread that doesn't use O_EXCL gets through blkdev_get() to
      __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;
      
      Immediately thereafter the other thread, using FMODE_EXCL, calls
      bd_start_claiming() from blkdev_get().  This should fail because the
      whole device has a holder, but because bdev->bd_contains == bdev
      bd_may_claim() incorrectly reports success.
      This thread continues and blocks on bd_mutex.
      
      The first thread then sets bdev->bd_contains correctly and drops the mutex.
      The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
      again in:
      			BUG_ON(!bd_may_claim(bdev, whole, holder));
      The BUG_ON fires.
      
      Fix this by removing the dependency on ->bd_contains in
      bd_may_claim().  As bd_may_claim() has direct access to the whole
      device, it can simply test if the target bdev is the whole device.
      
      Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0e7994b1
    • Maxim Patlasov's avatar
      btrfs: limit async_work allocation and worker func duration · c1980027
      Maxim Patlasov authored
      commit 2939e1a8 upstream.
      
      Problem statement: unprivileged user who has read-write access to more than
      one btrfs subvolume may easily consume all kernel memory (eventually
      triggering oom-killer).
      
      Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):
      
      [root@kteam1 ~]# cat prep.sh
      
      DEV=/dev/sdb
      mkfs.btrfs -f $DEV
      mount $DEV /mnt
      for i in `seq 1 16`
      do
      	mkdir /mnt/$i
      	btrfs subvolume create /mnt/SV_$i
      	ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
      	mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
      	chmod a+rwx /mnt/$i
      done
      
      [root@kteam1 ~]# sh prep.sh
      
      [maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done
      
      [root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
      kmalloc-128        10144  10144    128   32    1 : tunables    0    0    0 : slabdata    317    317      0
      kmalloc-128       9992352 9992352    128   32    1 : tunables    0    0    0 : slabdata 312261 312261      0
      kmalloc-128       24226752 24226752    128   32    1 : tunables    0    0    0 : slabdata 757086 757086      0
      kmalloc-128       42754240 42754240    128   32    1 : tunables    0    0    0 : slabdata 1336070 1336070      0
      
      The huge numbers above come from insane number of async_work-s allocated
      and queued by btrfs_wq_run_delayed_node.
      
      The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
      works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
      worker func (btrfs_async_run_delayed_root) processes at least
      BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
      works as expected while the list is almost empty. As soon as it is getting
      bigger, worker func starts to process more than one item at a time, it takes
      longer, and the chances to have async_works queued more than needed is getting
      higher.
      
      The problem above is worsened by another flaw of delayed-inode implementation:
      if async_work was queued in a throttling branch (number of items >=
      BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
      the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
      the func occupies CPU infinitely (up to 30sec in my experiments): while the
      func is trying to drain the list, the user activity may add more and more
      items to the list.
      
      The patch fixes both problems in straightforward way: refuse queuing too
      many works in btrfs_wq_run_delayed_node and bail out of worker func if
      at least BTRFS_DELAYED_WRITEBACK items are processed.
      
      Changed in v2: remove support of thresh == NO_THRESHOLD.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c1980027
    • Daniel Dressler's avatar
      Btrfs: delayed-inode: replace root args iff only fs_info used · f7aa7e8b
      Daniel Dressler authored
      commit a585e948 upstream.
      
      This is the second independent patch of a larger project to cleanup
      btrfs's internal usage of btrfs_root. Many functions take btrfs_root
      only to grab the fs_info struct.
      
      By requiring a root these functions cause programmer overhead. That
      these functions can accept any valid root is not obvious until
      inspection.
      
      This patch reduces the specificity of such functions to accept the
      fs_info directly.
      
      These patches can be applied independently and thus are not being
      submitted as a patch series. There should be about 26 patches by the
      project's completion. Each patch will cleanup between 1 and 34 functions
      apiece.  Each patch covers a single file's functions.
      
      This patch affects the following function(s):
        1) btrfs_wq_run_delayed_node
      Signed-off-by: default avatarDaniel Dressler <danieru.dressler@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f7aa7e8b
    • Jack Morgenstein's avatar
      IB/mlx4: Fix out-of-range array index in destroy qp flow · 2f31f92e
      Jack Morgenstein authored
      commit c482af64 upstream.
      
      For non-special QPs, the port value becomes non-zero only at the
      RESET-to-INIT transition. If the QP has not undergone that transition,
      its port number value is still zero.
      
      If such a QP is destroyed before being moved out of the RESET state,
      subtracting one from the qp port number results in a negative value.
      Using that negative value as an index into the qp1_proxy array
      results in an out-of-bounds array reference.
      
      Fix this by testing that the QP type is one that uses qp1_proxy before
      using the port number. For special QPs of all types, the port number is
      specified at QP creation time.
      
      Fixes: 9433c188 ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2f31f92e
    • Eran Ben Elisha's avatar
      IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs · 0af1f0d8
      Eran Ben Elisha authored
      commit 1f22e454 upstream.
      
      According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is
      supported only if dmfs_ipoib bit is set.
      
      If it isn't set we want to ensure allocating NET_IF QPs fail. We do so
      by filling out the allocation bitmap. By thus, the NET_IF QPs allocating
      function won't find any free QP and will fail.
      
      Fixes: c1c98501 ('IB/mlx4: Add support for steerable IB UD QPs')
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0af1f0d8
    • Jan Kara's avatar
      fsnotify: Fix possible use-after-free in inode iteration on umount · 9e4fe773
      Jan Kara authored
      commit 5716863e upstream.
      
      fsnotify_unmount_inodes() plays complex tricks to pin next inode in the
      sb->s_inodes list when iterating over all inodes. Furthermore the code has a
      bug that if the current inode is the last on i_sb_list that does not have e.g.
      I_FREEING set, then we leave next_i pointing to inode which may get removed
      from the i_sb_list once we drop s_inode_list_lock thus resulting in
      use-after-free issues (usually manifesting as infinite looping in
      fsnotify_unmount_inodes()).
      
      Fix the problem by keeping current inode pinned somewhat longer. Then we can
      make the code much simpler and standard.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9e4fe773