1. 13 Mar, 2019 40 commits
    • Taehee Yoo's avatar
      netfilter: xt_TEE: fix wrong interface selection · 02d86085
      Taehee Yoo authored
      [ Upstream commit f24d2d4f ]
      
      TEE netdevice notifier handler checks only interface name. however
      each netns can have same interface name. hence other netns's interface
      could be selected.
      
      test commands:
         %ip netns add vm1
         %iptables -I INPUT -p icmp -j TEE --gateway 192.168.1.1 --oif enp2s0
         %ip link set enp2s0 netns vm1
      
      Above rule is in the root netns. but that rule could get enp2s0
      ifindex of vm1 by notifier handler.
      
      After this patch, TEE rule is added to the per-netns list.
      
      Fixes: 9e2f6c5d ("netfilter: Rework xt_TEE netdevice notifier")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      02d86085
    • Ard Biesheuvel's avatar
      drm: disable uncached DMA optimization for ARM and arm64 · f9a0a08d
      Ard Biesheuvel authored
      [ Upstream commit e02f5c1b ]
      
      The DRM driver stack is designed to work with cache coherent devices
      only, but permits an optimization to be enabled in some cases, where
      for some buffers, both the CPU and the GPU use uncached mappings,
      removing the need for DMA snooping and allocation in the CPU caches.
      
      The use of uncached GPU mappings relies on the correct implementation
      of the PCIe NoSnoop TLP attribute by the platform, otherwise the GPU
      will use cached mappings nonetheless. On x86 platforms, this does not
      seem to matter, as uncached CPU mappings will snoop the caches in any
      case. However, on ARM and arm64, enabling this optimization on a
      platform where NoSnoop is ignored results in loss of coherency, which
      breaks correct operation of the device. Since we have no way of
      detecting whether NoSnoop works or not, just disable this
      optimization entirely for ARM and arm64.
      
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Zhou <David1.Zhou@amd.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Junwei Zhang <Jerry.Zhang@amd.com>
      Cc: Michel Daenzer <michel.daenzer@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <maxime.ripard@bootlin.com>
      Cc: Sean Paul <sean@poorly.run>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>
      Cc: dri-devel <dri-devel@lists.freedesktop.org>
      Reported-by: default avatarCarsten Haitzler <Carsten.Haitzler@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.kernel.org/patch/10778815/Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f9a0a08d
    • Marek Szyprowski's avatar
      ARM: dts: exynos: Fix max voltage for buck8 regulator on Odroid XU3/XU4 · bb2c205c
      Marek Szyprowski authored
      commit a3238924 upstream.
      
      The maximum voltage value for buck8 regulator on Odroid XU3/XU4 boards is
      set too low. Increase it to the 2000mV as specified on the board schematic.
      So far the board worked fine, because of the bug in the PMIC driver, which
      used incorrect step value for that regulator. It interpreted the voltage
      value set by the bootloader as 1225mV and kept it unchanged. The regulator
      driver has been however fixed recently in the commit 56b5d4ea
      ("regulator: s2mps11: Fix steps for buck7, buck8 and LDO35"), what results
      in reading the proper buck8 value and forcing it to 1500mV on boot. This
      is not enough for proper board operation and results in eMMC errors during
      heavy IO traffic. Increasing maximum voltage value for buck8 restores
      original driver behavior and fixes eMMC issues.
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Fixes: 86a2d2ac ("ARM: dts: Add dts file for Odroid XU3 board")
      Fixes: 56b5d4ea ("regulator: s2mps11: Fix steps for buck7, buck8 and LDO35")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb2c205c
    • Marek Szyprowski's avatar
      ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU · bfc341b6
      Marek Szyprowski authored
      commit a66352e0 upstream.
      
      Add minimal parameters needed by the Exynos CLKOUT driver to Exynos3250
      PMU node. This fixes the following warning on boot:
      
      exynos_clkout_init: failed to register clkout clock
      
      Fixes: d19bb397 ("ARM: dts: exynos: Update PMU node with CLKOUT related data")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfc341b6
    • Marek Szyprowski's avatar
      ARM: dts: exynos: Fix pinctrl definition for eMMC RTSN line on Odroid X2/U3 · cd10bc82
      Marek Szyprowski authored
      commit ec33745b upstream.
      
      Commit 225da7e6 ("ARM: dts: add eMMC reset line for
      exynos4412-odroid-common") added MMC power sequence for eMMC card of
      Odroid X2/U3. It reused generic sd1_cd pin control configuration node
      and only disabled pull-up. However that time the pinctrl configuration
      was not applied during MMC power sequence driver initialization. This
      has been changed later by commit d97a1e5d ("mmc: pwrseq: convert to
      proper platform device").
      
      It turned out then, that the provided pinctrl configuration is not
      correct, because the eMMC_RTSN line is being re-configured as 'special
      function/card detect function for mmc1 controller' not the simple
      'output', thus the power sequence driver doesn't really set the pin
      value. This in effect broke the reboot of Odroid X2/U3 boards. Fix this
      by providing separate node with eMMC_RTSN pin configuration.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarMarkus Reichl <m.reichl@fivetechno.de>
      Suggested-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Fixes: 225da7e6 ("ARM: dts: add eMMC reset line for exynos4412-odroid-common")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd10bc82
    • Alistair Strachan's avatar
      arm64: dts: hikey: Revert "Enable HS200 mode on eMMC" · 103ec440
      Alistair Strachan authored
      commit 8d26c139 upstream.
      
      This reverts commit abd7d097. This
      change was already partially reverted by John Stultz in
      commit 9c6d26df ("arm64: dts: hikey: Fix eMMC corruption regression").
      
      This change appears to cause controller resets and block read failures
      which prevents successful booting on some hikey boards.
      
      Cc: Ryan Grachek <ryan@edited.us>
      Cc: Wei Xu <xuwei5@hisilicon.com>
      Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: devicetree@vger.kernel.org
      Cc: stable <stable@vger.kernel.org> #4.17+
      Signed-off-by: default avatarAlistair Strachan <astrachan@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarWei Xu <xuwei5@hisilicon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      103ec440
    • Jan Kiszka's avatar
      arm64: dts: hikey: Give wifi some time after power-on · e6eb5e35
      Jan Kiszka authored
      commit 83b94417 upstream.
      
      Somewhere along recent changes to power control of the wl1835, power-on
      became very unreliable on the hikey, failing like this:
      
      wl1271_sdio: probe of mmc2:0001:1 failed with error -16
      wl1271_sdio: probe of mmc2:0001:2 failed with error -16
      
      After playing with some dt parameters and comparing to other users of
      this chip, it turned out we need some power-on delay to make things
      stable again. In contrast to those other users which define 200 ms, the
      hikey would already be happy with 1 ms. Still, we use the safer 10 ms,
      like on the Ultra96.
      
      Fixes: ea452678 ("arm64: dts: hikey: Fix WiFi support")
      Cc: <stable@vger.kernel.org> #4.12+
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Acked-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarWei Xu <xuwei5@hisilicon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6eb5e35
    • Jan Kiszka's avatar
      arm64: dts: zcu100-revC: Give wifi some time after power-on · 271c5a5d
      Jan Kiszka authored
      commit 35a4f89c upstream.
      
      Somewhere along recent changes to power control of the wl1831, power-on
      became very unreliable on the Ultra96, failing like this:
      
      wl1271_sdio: probe of mmc2:0001:1 failed with error -16
      wl1271_sdio: probe of mmc2:0001:2 failed with error -16
      
      After playing with some dt parameters and comparing to other users of
      this chip, it turned out we need some power-on delay to make things
      stable again. In contrast to those other users which define 200 ms,
      Ultra96 is already happy with 10 ms.
      
      Fixes: 5869ba06 ("arm64: zynqmp: Add support for Xilinx zcu100-revC")
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Acked-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      271c5a5d
    • Alexander Shishkin's avatar
      x86/PCI: Fixup RTIT_BAR of Intel Denverton Trace Hub · 36e3673d
      Alexander Shishkin authored
      commit 2e095ce7 upstream.
      
      On Denverton's integration of the Intel(R) Trace Hub (for a reference and
      overview see Documentation/trace/intel_th.rst) the reported size of one of
      its resources (RTIT_BAR) doesn't match its actual size, which leads to
      overlaps with other devices' resources.
      
      In practice, it overlaps with XHCI MMIO space, which results in the xhci
      driver bailing out after seeing its registers as 0xffffffff, and perceived
      disappearance of all USB devices:
      
        intel_th_pci 0000:00:1f.7: enabling device (0004 -> 0006)
        xhci_hcd 0000:00:15.0: xHCI host controller not responding, assume dead
        xhci_hcd 0000:00:15.0: xHC not responding in xhci_irq, assume controller is dead
        xhci_hcd 0000:00:15.0: HC died; cleaning up
        usb 1-1: USB disconnect, device number 2
      
      For this reason, we need to resize the RTIT_BAR on Denverton to its actual
      size, which in this case is 4MB.  The corresponding erratum is DNV36 at the
      link below:
      
        DNV36.       Processor Host Root Complex May Incorrectly Route Memory
                     Accesses to Intel® Trace Hub
      
        Problem:     The Intel® Trace Hub RTIT_BAR (B0:D31:F7 offset 20h) is
      	       reported as a 2KB memory range.  Due to this erratum, the
      	       processor Host Root Complex will forward addresses from
      	       RTIT_BAR to RTIT_BAR + 4MB -1 to Intel® Trace Hub.
      
        Implication: Devices assigned within the RTIT_BAR to RTIT_BAR + 4MB -1
                     space may not function correctly.
      
        Workaround:  A BIOS code change has been identified and may be
                     implemented as a workaround for this erratum.
      
        Status:      No Fix.
      
      Note that 5118ccd3 ("intel_th: pci: Add Denverton SOC support") updates
      the Trace Hub driver so it claims the Denverton device, but the resource
      overlap exists regardless of whether that driver is loaded or that commit
      is included.
      
      Link: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c3000-family-spec-update.pdfSigned-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      [bhelgaas: include erratum text, clarify relationship with 5118ccd3]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36e3673d
    • Gustavo A. R. Silva's avatar
      scsi: aacraid: Fix missing break in switch statement · 917f9437
      Gustavo A. R. Silva authored
      commit 5e420fe6 upstream.
      
      Add missing break statement and fix identation issue.
      
      This bug was found thanks to the ongoing efforts to enable
      -Wimplicit-fallthrough.
      
      Fixes: 9cb62fa2 ("aacraid: Log firmware AIF messages")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      917f9437
    • Gustavo A. R. Silva's avatar
      iscsi_ibft: Fix missing break in switch statement · dcdd1bcb
      Gustavo A. R. Silva authored
      commit df997abe upstream.
      
      Add missing break statement in order to prevent the code from falling
      through to case ISCSI_BOOT_TGT_NAME, which is unnecessary.
      
      This bug was found thanks to the ongoing efforts to enable
      -Wimplicit-fallthrough.
      
      Fixes: b33a84a3 ("ibft: convert iscsi_ibft module to iscsi boot lib")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcdd1bcb
    • Vincent Batts's avatar
      Input: elan_i2c - add id for touchpad found in Lenovo s21e-20 · fe34541a
      Vincent Batts authored
      commit e154ab69 upstream.
      
      Lenovo s21e-20 uses ELAN0601 in its ACPI tables for the Elan touchpad.
      Signed-off-by: default avatarVincent Batts <vbatts@hashbangbash.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe34541a
    • Jason Gerecke's avatar
      Input: wacom_serial4 - add support for Wacom ArtPad II tablet · b3b29dc5
      Jason Gerecke authored
      commit 44fc95e2 upstream.
      
      Tablet initially begins communicating at 9600 baud, so this command
      should be used to connect to the device:
      
          $ inputattach --daemon --baud 9600 --wacom_iv /dev/ttyS0
      
      https://github.com/linuxwacom/xf86-input-wacom/issues/40Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3b29dc5
    • Keith Busch's avatar
      nvme-pci: add missing unlock for reset error · 7066774e
      Keith Busch authored
      [ Upstream commit 4726bcf3 ]
      
      The reset work holds a mutex to prevent races with removal modifying the
      same resources, but was unlocking only on success. Unlock on failure
      too.
      
      Fixes: 5c959d73 ("nvme-pci: fix rapid add remove sequence")
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7066774e
    • Liu Bo's avatar
      blk-iolatency: fix IO hang due to negative inflight counter · 6d482bc5
      Liu Bo authored
      [ Upstream commit 8c772a9b ]
      
      Our test reported the following stack, and vmcore showed that
      ->inflight counter is -1.
      
      [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
      [ffffc9003fcc3958] schedule at ffffffff8173de26
      [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
      [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
      [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
      [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
      [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
      [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
      [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
      [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
      [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
      [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
      [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
      [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
      [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
      [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
      [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
      [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
      [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
      
      The ->inflight counter may be negative (-1) if
      
      1) blk-iolatency was disabled when the IO was issued,
      
      2) blk-iolatency was enabled before this IO reached its endio,
      
      3) the ->inflight counter is decreased from 0 to -1 in endio()
      
      In fact the hang can be easily reproduced by the below script,
      
      H=/sys/fs/cgroup/unified/
      P=/sys/fs/cgroup/unified/test
      
      echo "+io" > $H/cgroup.subtree_control
      mkdir -p $P
      
      echo $$ > $P/cgroup.procs
      
      xfs_io -f -d -c "pwrite 0 4k" /dev/sdg
      
      echo "`cat /sys/block/sdg/dev` target=1000000" > $P/io.latency
      
      xfs_io -f -d -c "pwrite 0 4k" /dev/sdg
      
      This fixes the problem by freezing the queue so that while
      enabling/disabling iolatency, there is no inflight rq running.
      
      Note that quiesce_queue is not needed as this only updating iolatency
      configuration about which dispatching request_queue doesn't care.
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d482bc5
    • Sudarsana Reddy Kalluru's avatar
      qede: Fix system crash on configuring channels. · 1781ae6f
      Sudarsana Reddy Kalluru authored
      [ Upstream commit 0aa4febb ]
      
      Under heavy traffic load, when changing number of channels via
      ethtool (ethtool -L) which will cause interface to be reloaded,
      it was observed that some packets gets transmitted on old TX
      channel/queue id which doesn't really exist after the channel
      configuration leads to system crash.
      
      Add a safeguard in the driver by validating queue id through
      ndo_select_queue() which is called before the ndo_start_xmit().
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1781ae6f
    • Sudarsana Reddy Kalluru's avatar
      qed: Consider TX tcs while deriving the max num_queues for PF. · 84828dd2
      Sudarsana Reddy Kalluru authored
      [ Upstream commit fb1faab7 ]
      
      Max supported queues is derived incorrectly in the case of multi-CoS.
      Need to consider TCs while calculating num_queues for PF.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      84828dd2
    • Manish Chopra's avatar
      qed: Fix EQ full firmware assert. · d727c0ed
      Manish Chopra authored
      [ Upstream commit 660492bc ]
      
      When slowpath messages are sent with high rate, the resulting
      events can lead to a FW assert in case they are not handled fast
      enough (Event Queue Full assert). Attempt to send queued slowpath
      messages only after the newly evacuated entries in the EQ ring
      are indicated to FW.
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d727c0ed
    • Tetsuo Handa's avatar
      fs: ratelimit __find_get_block_slow() failure message. · 72426ed2
      Tetsuo Handa authored
      [ Upstream commit 43636c80 ]
      
      When something let __find_get_block_slow() hit all_mapped path, it calls
      printk() for 100+ times per a second. But there is no need to print same
      message with such high frequency; it is just asking for stall warning, or
      at least bloating log files.
      
        [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.873324][T15342] b_state=0x00000029, b_size=512
        [  399.878403][T15342] device loop0 blocksize: 4096
        [  399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.890400][T15342] b_state=0x00000029, b_size=512
        [  399.895595][T15342] device loop0 blocksize: 4096
        [  399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.907471][T15342] b_state=0x00000029, b_size=512
        [  399.912506][T15342] device loop0 blocksize: 4096
      
      This patch reduces frequency to up to once per a second, in addition to
      concatenating three lines into one.
      
        [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      72426ed2
    • Keith Busch's avatar
      nvme-pci: fix rapid add remove sequence · 3cc6703d
      Keith Busch authored
      [ Upstream commit 5c959d73 ]
      
      A surprise removal may fail to tear down request queues if it is racing
      with the initial asynchronous probe. If that happens, the remove path
      won't see the queue resources to tear down, and the controller reset
      path may create a new request queue on a removed device, but will not
      be able to make forward progress, deadlocking the pci removal.
      
      Protect setting up non-blocking resources from a shutdown by holding the
      same mutex, and transition to the CONNECTING state after these resources
      are initialized so the probe path may see the dead controller state
      before dispatching new IO.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202081Reported-by: default avatarAlex Gagniuc <Alex_Gagniuc@Dellteam.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Tested-by: default avatarAlex Gagniuc <mr.nuke.me@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3cc6703d
    • Keith Busch's avatar
      nvme: lock NS list changes while handling command effects · e3aabe4c
      Keith Busch authored
      [ Upstream commit e7ad43c3 ]
      
      If a controller supports the NS Change Notification, the namespace
      scan_work is automatically triggered after attaching a new namespace.
      
      Occasionally the namespace scan_work may append the new namespace to the
      list before the admin command effects handling is completed. The effects
      handling unfreezes namespaces, but if it unfreezes the newly attached
      namespace, its request_queue freeze depth will be off and we'll hit the
      warning in blk_mq_unfreeze_queue().
      
      On the next namespace add, we will fail to freeze that queue due to the
      previous bad accounting and deadlock waiting for frozen.
      
      Fix that by preventing scan work from altering the namespace list while
      command effects handling needs to pair freeze with unfreeze.
      Reported-by: default avatarWen Xiong <wenxiong@us.ibm.com>
      Tested-by: default avatarWen Xiong <wenxiong@us.ibm.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e3aabe4c
    • Philip Yang's avatar
      drm/amdgpu: use spin_lock_irqsave to protect vm_manager.pasid_idr · 25aa5c8b
      Philip Yang authored
      [ Upstream commit 0a5f49cb ]
      
      amdgpu_vm_get_task_info is called from interrupt handler and sched timeout
      workqueue, we should use irq version spin_lock to avoid deadlock.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      25aa5c8b
    • Tony Lindgren's avatar
      i2c: omap: Use noirq system sleep pm ops to idle device for suspend · ee84b62f
      Tony Lindgren authored
      [ Upstream commit c6e2bd95 ]
      
      We currently get the following error with pixcir_ts driver during a
      suspend resume cycle:
      
      omap_i2c 4802a000.i2c: controller timed out
      pixcir_ts 1-005c: pixcir_int_enable: can't read reg 0x34 : -110
      pixcir_ts 1-005c: Failed to disable interrupt generation: -110
      pixcir_ts 1-005c: Failed to stop
      dpm_run_callback(): pixcir_i2c_ts_resume+0x0/0x98
      [pixcir_i2c_ts] returns -110
      PM: Device 1-005c failed to resume: error -110
      
      And at least am437x based devices with pixcir_ts will fail to resume
      to a touchscreen that is configured as the wakeup-source in device
      tree for these devices.
      
      This is because pixcir_ts tries to reconfigure it's registers for
      noirq suspend which fails. This also leaves i2c-omap in enabled state
      for suspend.
      
      Let's fix the pixcir_ts issue and make sure i2c-omap is suspended by
      adding SET_NOIRQ_SYSTEM_SLEEP_PM_OPS.
      
      Let's also get rid of some ifdefs while at it and replace them with
      __maybe_unused as SET_RUNTIME_PM_OPS and SET_NOIRQ_SYSTEM_SLEEP_PM_OPS
      already deal with the various PM Kconfig options.
      Reported-by: default avatarKeerthy <j-keerthy@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarVignesh R <vigneshr@ti.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ee84b62f
    • Ross Lagerwall's avatar
      Revert "scsi: libfc: Add WARN_ON() when deleting rports" · 29f7b376
      Ross Lagerwall authored
      [ Upstream commit d8f6382a ]
      
      This reverts commit bbc0f8bd.
      
      It added a warning whose intent was to check whether the rport was still
      linked into the peer list. It doesn't work as intended and gives false
      positive warnings for two reasons:
      
      1) If the rport is never linked into the peer list it will not be
      considered empty since the list_head is never initialized.
      
      2) If the rport is deleted from the peer list using list_del_rcu(), then
      the list_head is in an undefined state and it is not considered empty.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      29f7b376
    • Jun-Ru Chang's avatar
      MIPS: Remove function size check in get_frame_info() · cd8520a2
      Jun-Ru Chang authored
      [ Upstream commit 2b424cfc ]
      
      Patch (b6c7a324 "MIPS: Fix get_frame_info() handling of
      microMIPS function size.") introduces additional function size
      check for microMIPS by only checking insn between ip and ip + func_size.
      However, func_size in get_frame_info() is always 0 if KALLSYMS is not
      enabled. This causes get_frame_info() to return immediately without
      calculating correct frame_size, which in turn causes "Can't analyze
      schedule() prologue" warning messages at boot time.
      
      This patch removes func_size check, and let the frame_size check run
      up to 128 insns for both MIPS and microMIPS.
      Signed-off-by: default avatarJun-Ru Chang <jrjang@realtek.com>
      Signed-off-by: default avatarTony Wu <tonywu@realtek.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Fixes: b6c7a324 ("MIPS: Fix get_frame_info() handling of microMIPS function size.")
      Cc: <ralf@linux-mips.org>
      Cc: <jhogan@kernel.org>
      Cc: <macro@mips.com>
      Cc: <yamada.masahiro@socionext.com>
      Cc: <peterz@infradead.org>
      Cc: <mingo@kernel.org>
      Cc: <linux-mips@vger.kernel.org>
      Cc: <linux-kernel@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cd8520a2
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Support multiple "vfs_getname" probes · 738f9e27
      Arnaldo Carvalho de Melo authored
      [ Upstream commit 6ab3bc24 ]
      
      With a suitably defined "probe:vfs_getname" probe, 'perf trace' can
      "beautify" its output, so syscalls like open() or openat() can print the
      "filename" argument instead of just its hex address, like:
      
        $ perf trace -e open -- touch /dev/null
        [...]
             0.590 ( 0.014 ms): touch/18063 open(filename: /dev/null, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
        [...]
      
      The output without such beautifier looks like:
      
           0.529 ( 0.011 ms): touch/18075 open(filename: 0xc78cf288, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
      
      However, when the vfs_getname probe expands to multiple probes and it is
      not the first one that is hit, the beautifier fails, as following:
      
           0.326 ( 0.010 ms): touch/18072 open(filename: , flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
      
      Fix it by hooking into all the expanded probes (inlines), now, for instance:
      
        [root@quaco ~]# perf probe -l
          probe:vfs_getname    (on getname_flags:73@fs/namei.c with pathname)
          probe:vfs_getname_1  (on getname_flags:73@fs/namei.c with pathname)
        [root@quaco ~]# perf trace -e open* sleep 1
             0.010 ( 0.005 ms): sleep/5588 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: RDONLY|CLOEXEC)   = 3
             0.029 ( 0.006 ms): sleep/5588 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: RDONLY|CLOEXEC)   = 3
             0.194 ( 0.008 ms): sleep/5588 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: RDONLY|CLOEXEC) = 3
        [root@quaco ~]#
      
      Works, further verified with:
      
        [root@quaco ~]# perf test vfs
        65: Use vfs_getname probe to get syscall args filenames   : Ok
        66: Add vfs_getname probe to get syscall args filenames   : Ok
        67: Check open filename arg using perf trace + vfs_getname: Ok
        [root@quaco ~]#
      Reported-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Tested-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-mv8kolk17xla1smvmp3qabv1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      738f9e27
    • Jiri Olsa's avatar
      perf symbols: Filter out hidden symbols from labels · 47e3f3c0
      Jiri Olsa authored
      [ Upstream commit 59a17706 ]
      
      When perf is built with the annobin plugin (RHEL8 build) extra symbols
      are added to its binary:
      
        # nm perf | grep annobin | head -10
        0000000000241100 t .annobin_annotate.c
        0000000000326490 t .annobin_annotate.c
        0000000000249255 t .annobin_annotate.c_end
        00000000003283a8 t .annobin_annotate.c_end
        00000000001bce18 t .annobin_annotate.c_end.hot
        00000000001bce18 t .annobin_annotate.c_end.hot
        00000000001bc3e2 t .annobin_annotate.c_end.unlikely
        00000000001bc400 t .annobin_annotate.c_end.unlikely
        00000000001bce18 t .annobin_annotate.c.hot
        00000000001bce18 t .annobin_annotate.c.hot
        ...
      
      Those symbols have no use for report or annotation and should be
      skipped.  Moreover they interfere with the DWARF unwind test on the PPC
      arch, where they are mixed with checked symbols and then the test fails:
      
        # perf test dwarf -v
        59: Test dwarf unwind                                     :
        --- start ---
        test child forked, pid 8515
        unwind: .annobin_dwarf_unwind.c:ip = 0x10dba40dc (0x2740dc)
        ...
        got: .annobin_dwarf_unwind.c 0x10dba40dc, expecting test__arch_unwind_sample
        unwind: failed with 'no error'
      
      The annobin symbols are defined as NOTYPE/LOCAL/HIDDEN:
      
        # readelf -s ./perf | grep annobin | head -1
          40: 00000000001bce4f     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_init.c
      
      They can still pass the check for the label symbol. Adding check for
      HIDDEN and INTERNAL (as suggested by Nick below) visibility and filter
      out such symbols.
      
      >   Just to be awkward, if you are going to ignore STV_HIDDEN
      >   symbols then you should probably also ignore STV_INTERNAL ones
      >   as well...  Annobin does not generate them, but you never know,
      >   one day some other tool might create some.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Clifton <nickc@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190128133526.GD15461@kravaSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47e3f3c0
    • Julian Wiedmann's avatar
      s390/qeth: cancel close_dev work before removing a card · 825e58bc
      Julian Wiedmann authored
      [ Upstream commit c2780c1a ]
      
      A card's close_dev work is scheduled on a driver-wide workqueue. If the
      card is removed and freed while the work is still active, this causes a
      use-after-free.
      So make sure that the work is completed before freeing the card.
      
      Fixes: 0f54761d ("qeth: Support VEPA mode")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      825e58bc
    • Julian Wiedmann's avatar
      s390/qeth: fix use-after-free in error path · 5327c553
      Julian Wiedmann authored
      [ Upstream commit afa0c590 ]
      
      The error path in qeth_alloc_qdio_buffers() that takes care of
      cleaning up the Output Queues is buggy. It first frees the queue, but
      then calls qeth_clear_outq_buffers() with that very queue struct.
      
      Make the call to qeth_clear_outq_buffers() part of the free action
      (in the correct order), and while at it fix the naming of the helper.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5327c553
    • Julian Wiedmann's avatar
      s390/qeth: release cmd buffer in error paths · 575a2461
      Julian Wiedmann authored
      [ Upstream commit 5065b2dd ]
      
      Whenever we fail before/while starting an IO, make sure to release the
      IO buffer. Usually qeth_irq() would do this for us, but if the IO
      doesn't even start we obviously won't get an interrupt for it either.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      575a2461
    • Martynas Pumputis's avatar
      netfilter: nf_nat: skip nat clash resolution for same-origin entries · 5058447b
      Martynas Pumputis authored
      [ Upstream commit 4e35c1cb ]
      
      It is possible that two concurrent packets originating from the same
      socket of a connection-less protocol (e.g. UDP) can end up having
      different IP_CT_DIR_REPLY tuples which results in one of the packets
      being dropped.
      
      To illustrate this, consider the following simplified scenario:
      
      1. Packet A and B are sent at the same time from two different threads
         by same UDP socket.  No matching conntrack entry exists yet.
         Both packets cause allocation of a new conntrack entry.
      2. get_unique_tuple gets called for A.  No clashing entry found.
         conntrack entry for A is added to main conntrack table.
      3. get_unique_tuple is called for B and will find that the reply
         tuple of B is already taken by A.
         It will allocate a new UDP source port for B to resolve the clash.
      4. conntrack entry for B cannot be added to main conntrack table
         because its ORIGINAL direction is clashing with A and the REPLY
         directions of A and B are not the same anymore due to UDP source
         port reallocation done in step 3.
      
      This patch modifies nf_conntrack_tuple_taken so it doesn't consider
      colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal.
      
      [ Florian: simplify patch to not use .allow_clash setting
        and always ignore identical flows ]
      Signed-off-by: default avatarMartynas Pumputis <martynas@weave.works>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5058447b
    • Florian Westphal's avatar
      selftests: netfilter: add simple masq/redirect test cases · 5c39e08f
      Florian Westphal authored
      [ Upstream commit 98bfc341 ]
      
      Check basic nat/redirect/masquerade for ipv4 and ipv6.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5c39e08f
    • Naresh Kamboju's avatar
      selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INET · 974ed365
      Naresh Kamboju authored
      [ Upstream commit 952b72f8 ]
      
      In selftests the config fragment for netfilter was added as
      NF_TABLES_INET=y and this patch correct it as CONFIG_NF_TABLES_INET=y
      Signed-off-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      974ed365
    • Andy Shevchenko's avatar
      dmaengine: dmatest: Abort test in case of mapping error · 0203f0c9
      Andy Shevchenko authored
      [ Upstream commit 6454368a ]
      
      In case of mapping error the DMA addresses are invalid and continuing
      will screw system memory or potentially something else.
      
      [  222.480310] dmatest: dma0chan7-copy0: summary 1 tests, 3 failures 6 iops 349 KB/s (0)
      ...
      [  240.912725] check: Corrupted low memory at 00000000c7c75ac9 (2940 phys) = 5656000000000000
      [  240.921998] check: Corrupted low memory at 000000005715a1cd (2948 phys) = 279f2aca5595ab2b
      [  240.931280] check: Corrupted low memory at 000000002f4024c0 (2950 phys) = 5e5624f349e793cf
      ...
      
      Abort any test if mapping failed.
      
      Fixes: 4076e755 ("dmatest: convert to dmaengine_unmap_data")
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0203f0c9
    • Stefano Garzarella's avatar
      vsock/virtio: reset connected sockets on device removal · 5eae5899
      Stefano Garzarella authored
      [ Upstream commit 85965487 ]
      
      When the virtio transport device disappear, we should reset all
      connected sockets in order to inform the users.
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5eae5899
    • Stefano Garzarella's avatar
      vsock/virtio: fix kernel panic after device hot-unplug · cd201356
      Stefano Garzarella authored
      [ Upstream commit 22b5c0b6 ]
      
      virtio_vsock_remove() invokes the vsock_core_exit() also if there
      are opened sockets for the AF_VSOCK protocol family. In this way
      the vsock "transport" pointer is set to NULL, triggering the
      kernel panic at the first socket activity.
      
      This patch move the vsock_core_init()/vsock_core_exit() in the
      virtio_vsock respectively in module_init and module_exit functions,
      that cannot be invoked until there are open sockets.
      
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1609699Reported-by: default avatarYan Fu <yafu@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cd201356
    • Codrin Ciubotariu's avatar
      dmaengine: at_xdmac: Fix wrongfull report of a channel as in use · f3ffd455
      Codrin Ciubotariu authored
      [ Upstream commit dc3f595b ]
      
      atchan->status variable is used to store two different information:
       - pass channel interrupts status from interrupt handler to tasklet;
       - channel information like whether it is cyclic or paused;
      
      This causes a bug when device_terminate_all() is called,
      (AT_XDMAC_CHAN_IS_CYCLIC cleared on atchan->status) and then a late End
      of Block interrupt arrives (AT_XDMAC_CIS_BIS), which sets bit 0 of
      atchan->status. Bit 0 is also used for AT_XDMAC_CHAN_IS_CYCLIC, so when
      a new descriptor for a cyclic transfer is created, the driver reports
      the channel as in use:
      
      if (test_and_set_bit(AT_XDMAC_CHAN_IS_CYCLIC, &atchan->status)) {
      	dev_err(chan2dev(chan), "channel currently used\n");
      	return NULL;
      }
      
      This patch fixes the bug by adding a different struct member to keep
      the interrupts status separated from the channel status bits.
      
      Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
      Signed-off-by: default avatarCodrin Ciubotariu <codrin.ciubotariu@microchip.com>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@microchip.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f3ffd455
    • Paul Kocialkowski's avatar
      drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init · 7cf4466d
      Paul Kocialkowski authored
      [ Upstream commit b14e945b ]
      
      When initializing clocks, a reference to the TCON channel 0 clock is
      obtained. However, the clock is never prepared and enabled later.
      Switching from simplefb to DRM actually disables the clock (that was
      usually configured by U-Boot) because of that.
      
      On the V3s, this results in a hang when writing to some mixer registers
      when switching over to DRM from simplefb.
      
      Fix this by preparing and enabling the clock when initializing other
      clocks. Waiting for sun4i_tcon_channel_enable to enable the clock is
      apparently too late and results in the same mixer register access hang.
      Signed-off-by: default avatarPaul Kocialkowski <paul.kocialkowski@bootlin.com>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@bootlin.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190131132550.26355-1-paul.kocialkowski@bootlin.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      7cf4466d
    • Martin KaFai Lau's avatar
      bpf: Fix syscall's stackmap lookup potential deadlock · ae26a710
      Martin KaFai Lau authored
      [ Upstream commit 7c4cd051 ]
      
      The map_lookup_elem used to not acquiring spinlock
      in order to optimize the reader.
      
      It was true until commit 557c0c6e ("bpf: convert stackmap to pre-allocation")
      The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
      bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
      If that is the case, pcpu_freelist_push() saves this elem for reuse later.
      This push requires a spinlock.
      
      If a tracing bpf_prog got run in the middle of the syscall's
      map_lookup_elem(stackmap) and this tracing bpf_prog is calling
      bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
      spinlock, it may end up with a dead lock situation as reported by
      Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/
      
      The situation is the same as the syscall's map_update_elem() which
      needs to acquire the pcpu_freelist's spinlock and could race
      with tracing bpf_prog.  Hence, this patch fixes it by protecting
      bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
      to prevent tracing bpf_prog from running.
      
      A later syscall's map_lookup_elem commit f1a2e44a ("bpf: add queue and stack maps")
      also acquires a spinlock and races with tracing bpf_prog similarly.
      Hence, this patch is forward looking and protects the majority
      of the map lookups.  bpf_map_offload_lookup_elem() is the exception
      since it is for network bpf_prog only (i.e. never called by tracing
      bpf_prog).
      
      Fixes: 557c0c6e ("bpf: convert stackmap to pre-allocation")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ae26a710
    • Alexei Starovoitov's avatar
      bpf: fix potential deadlock in bpf_prog_register · 3bbe6a42
      Alexei Starovoitov authored
      [ Upstream commit e16ec340 ]
      
      Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
      [   13.007000] WARNING: possible circular locking dependency detected
      [   13.007587] 5.0.0-rc3-00018-g2fa53f89-dirty #477 Not tainted
      [   13.008124] ------------------------------------------------------
      [   13.008624] test_progs/246 is trying to acquire lock:
      [   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
      [   13.009770]
      [   13.009770] but task is already holding lock:
      [   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.010877]
      [   13.010877] which lock already depends on the new lock.
      [   13.010877]
      [   13.011532]
      [   13.011532] the existing dependency chain (in reverse order) is:
      [   13.012129]
      [   13.012129] -> #4 (bpf_event_mutex){+.+.}:
      [   13.012582]        perf_event_query_prog_array+0x9b/0x130
      [   13.013016]        _perf_ioctl+0x3aa/0x830
      [   13.013354]        perf_ioctl+0x2e/0x50
      [   13.013668]        do_vfs_ioctl+0x8f/0x6a0
      [   13.014003]        ksys_ioctl+0x70/0x80
      [   13.014320]        __x64_sys_ioctl+0x16/0x20
      [   13.014668]        do_syscall_64+0x4a/0x180
      [   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.015469]
      [   13.015469] -> #3 (&cpuctx_mutex){+.+.}:
      [   13.015910]        perf_event_init_cpu+0x5a/0x90
      [   13.016291]        perf_event_init+0x1b2/0x1de
      [   13.016654]        start_kernel+0x2b8/0x42a
      [   13.016995]        secondary_startup_64+0xa4/0xb0
      [   13.017382]
      [   13.017382] -> #2 (pmus_lock){+.+.}:
      [   13.017794]        perf_event_init_cpu+0x21/0x90
      [   13.018172]        cpuhp_invoke_callback+0xb3/0x960
      [   13.018573]        _cpu_up+0xa7/0x140
      [   13.018871]        do_cpu_up+0xa4/0xc0
      [   13.019178]        smp_init+0xcd/0xd2
      [   13.019483]        kernel_init_freeable+0x123/0x24f
      [   13.019878]        kernel_init+0xa/0x110
      [   13.020201]        ret_from_fork+0x24/0x30
      [   13.020541]
      [   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
      [   13.021051]        static_key_slow_inc+0xe/0x20
      [   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
      [   13.021891]        perf_trace_event_init+0x11f/0x250
      [   13.022297]        perf_trace_init+0x6b/0xa0
      [   13.022644]        perf_tp_event_init+0x25/0x40
      [   13.023011]        perf_try_init_event+0x6b/0x90
      [   13.023386]        perf_event_alloc+0x9a8/0xc40
      [   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
      [   13.024173]        do_syscall_64+0x4a/0x180
      [   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.024968]
      [   13.024968] -> #0 (tracepoints_mutex){+.+.}:
      [   13.025434]        __mutex_lock+0x86/0x970
      [   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
      [   13.026215]        bpf_probe_register+0x40/0x60
      [   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.027042]        __do_sys_bpf+0x94f/0x1a90
      [   13.027389]        do_syscall_64+0x4a/0x180
      [   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.028171]
      [   13.028171] other info that might help us debug this:
      [   13.028171]
      [   13.028807] Chain exists of:
      [   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
      [   13.028807]
      [   13.029666]  Possible unsafe locking scenario:
      [   13.029666]
      [   13.030140]        CPU0                    CPU1
      [   13.030510]        ----                    ----
      [   13.030875]   lock(bpf_event_mutex);
      [   13.031166]                                lock(&cpuctx_mutex);
      [   13.031645]                                lock(bpf_event_mutex);
      [   13.032135]   lock(tracepoints_mutex);
      [   13.032441]
      [   13.032441]  *** DEADLOCK ***
      [   13.032441]
      [   13.032911] 1 lock held by test_progs/246:
      [   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.033909]
      [   13.033909] stack backtrace:
      [   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f89-dirty #477
      [   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   13.035657] Call Trace:
      [   13.035859]  dump_stack+0x5f/0x8b
      [   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
      [   13.036526]  __lock_acquire+0x1158/0x1350
      [   13.036852]  ? lock_acquire+0x98/0x190
      [   13.037154]  lock_acquire+0x98/0x190
      [   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.037876]  __mutex_lock+0x86/0x970
      [   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.039028]  ? __mutex_lock+0x86/0x970
      [   13.039337]  ? __mutex_lock+0x24a/0x970
      [   13.039649]  ? bpf_probe_register+0x1d/0x60
      [   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
      [   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
      [   13.041325]  bpf_probe_register+0x40/0x60
      [   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.042068]  ? __might_fault+0x3e/0x90
      [   13.042374]  __do_sys_bpf+0x94f/0x1a90
      [   13.042678]  do_syscall_64+0x4a/0x180
      [   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.043382] RIP: 0033:0x7f23b10a07f9
      [   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
      [   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
      [   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
      [   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
      [   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
      [   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
      
      Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
      there is no need to take bpf_event_mutex too.
      bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
      bpf_raw_tracepoints don't need to take this mutex.
      
      Fixes: c4f6699d ("bpf: introduce BPF_RAW_TRACEPOINT")
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3bbe6a42