1. 09 Jan, 2019 40 commits
    • Josef Bacik's avatar
      btrfs: run delayed items before dropping the snapshot · 6911b074
      Josef Bacik authored
      commit 0568e82d upstream.
      
      With my delayed refs patches in place we started seeing a large amount
      of aborts in __btrfs_free_extent:
      
       BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964  owner 1 offset 0
       Call Trace:
        ? btrfs_merge_delayed_refs+0xaf/0x340
        __btrfs_run_delayed_refs+0x6ea/0xfc0
        ? btrfs_set_path_blocking+0x31/0x60
        btrfs_run_delayed_refs+0xeb/0x180
        btrfs_commit_transaction+0x179/0x7f0
        ? btrfs_check_space_for_delayed_refs+0x30/0x50
        ? should_end_transaction.isra.19+0xe/0x40
        btrfs_drop_snapshot+0x41c/0x7c0
        btrfs_clean_one_deleted_snapshot+0xb5/0xd0
        cleaner_kthread+0xf6/0x120
        kthread+0xf8/0x130
        ? btree_invalidatepage+0x90/0x90
        ? kthread_bind+0x10/0x10
        ret_from_fork+0x35/0x40
      
      This was because btrfs_drop_snapshot depends on the root not being
      modified while it's dropping the snapshot.  It will unlock the root node
      (and really every node) as it walks down the tree, only to re-lock it
      when it needs to do something.  This is a problem because if we modify
      the tree we could cow a block in our path, which frees our reference to
      that block.  Then once we get back to that shared block we'll free our
      reference to it again, and get ENOENT when trying to lookup our extent
      reference to that block in __btrfs_free_extent.
      
      This is ultimately happening because we have delayed items left to be
      processed for our deleted snapshot _after_ all of the inodes are closed
      for the snapshot.  We only run the delayed inode item if we're deleting
      the inode, and even then we do not run the delayed insertions or delayed
      removals.  These can be run at any point after our final inode does its
      last iput, which is what triggers the snapshot deletion.  We can end up
      with the snapshot deletion happening and then have the delayed items run
      on that file system, resulting in the above problem.
      
      This problem has existed forever, however my patches made it much easier
      to hit as I wake up the cleaner much more often to deal with delayed
      iputs, which made us more likely to start the snapshot dropping work
      before the transaction commits, which is when the delayed items would
      generally be run.  Before, generally speaking, we would run the delayed
      items, commit the transaction, and wakeup the cleaner thread to start
      deleting snapshots, which means we were less likely to hit this problem.
      You could still hit it if you had multiple snapshots to be deleted and
      ended up with lots of delayed items, but it was definitely harder.
      
      Fix for now by simply running all the delayed items before starting to
      drop the snapshot.  We could make this smarter in the future by making
      the delayed items per-root, and then simply drop any delayed items for
      roots that we are going to delete.  But for now just a quick and easy
      solution is the safest.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6911b074
    • Filipe Manana's avatar
      Btrfs: fix fsync of files with multiple hard links in new directories · 10b04210
      Filipe Manana authored
      commit 41bd6067 upstream.
      
      The log tree has a long standing problem that when a file is fsync'ed we
      only check for new ancestors, created in the current transaction, by
      following only the hard link for which the fsync was issued. We follow the
      ancestors using the VFS' dget_parent() API. This means that if we create a
      new link for a file in a directory that is new (or in an any other new
      ancestor directory) and then fsync the file using an old hard link, we end
      up not logging the new ancestor, and on log replay that new hard link and
      ancestor do not exist. In some cases, involving renames, the file will not
      exist at all.
      
      Example:
      
        mkfs.btrfs -f /dev/sdb
        mount /dev/sdb /mnt
      
        mkdir /mnt/A
        touch /mnt/foo
        ln /mnt/foo /mnt/A/bar
        xfs_io -c fsync /mnt/foo
      
        <power failure>
      
      In this example after log replay only the hard link named 'foo' exists
      and directory A does not exist, which is unexpected. In other major linux
      filesystems, such as ext4, xfs and f2fs for example, both hard links exist
      and so does directory A after mounting again the filesystem.
      
      Checking if any new ancestors are new and need to be logged was added in
      2009 by commit 12fcfd22 ("Btrfs: tree logging unlink/rename fixes"),
      however only for the ancestors of the hard link (dentry) for which the
      fsync was issued, instead of checking for all ancestors for all of the
      inode's hard links.
      
      So fix this by tracking the id of the last transaction where a hard link
      was created for an inode and then on fsync fallback to a full transaction
      commit when an inode has more than one hard link and at least one new hard
      link was created in the current transaction. This is the simplest solution
      since this is not a common use case (adding frequently hard links for
      which there's an ancestor created in the current transaction and then
      fsync the file). In case it ever becomes a common use case, a solution
      that consists of iterating the fs/subvol btree for each hard link and
      check if any ancestor is new, could be implemented.
      
      This solves many unexpected scenarios reported by Jayashree Mohan and
      Vijay Chidambaram, and for which there is a new test case for fstests
      under review.
      
      Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
      CC: stable@vger.kernel.org # 4.4+
      Reported-by: default avatarVijay Chidambaram <vvijay03@gmail.com>
      Reported-by: default avatarJayashree Mohan <jayashree2912@gmail.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10b04210
    • Lu Fengqi's avatar
      btrfs: skip file_extent generation check for free_space_inode in run_delalloc_nocow · 7708a830
      Lu Fengqi authored
      commit 27a7ff55 upstream.
      
      The test case btrfs/001 with inode_cache mount option will encounter the
      following warning:
      
        WARNING: CPU: 1 PID: 23700 at fs/btrfs/inode.c:956 cow_file_range.isra.19+0x32b/0x430 [btrfs]
        CPU: 1 PID: 23700 Comm: btrfs Kdump: loaded Tainted: G        W  O      4.20.0-rc4-custom+ #30
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:cow_file_range.isra.19+0x32b/0x430 [btrfs]
        Call Trace:
         ? free_extent_buffer+0x46/0x90 [btrfs]
         run_delalloc_nocow+0x455/0x900 [btrfs]
         btrfs_run_delalloc_range+0x1a7/0x360 [btrfs]
         writepage_delalloc+0xf9/0x150 [btrfs]
         __extent_writepage+0x125/0x3e0 [btrfs]
         extent_write_cache_pages+0x1b6/0x3e0 [btrfs]
         ? __wake_up_common_lock+0x63/0xc0
         extent_writepages+0x50/0x80 [btrfs]
         do_writepages+0x41/0xd0
         ? __filemap_fdatawrite_range+0x9e/0xf0
         __filemap_fdatawrite_range+0xbe/0xf0
         btrfs_fdatawrite_range+0x1b/0x50 [btrfs]
         __btrfs_write_out_cache+0x42c/0x480 [btrfs]
         btrfs_write_out_ino_cache+0x84/0xd0 [btrfs]
         btrfs_save_ino_cache+0x551/0x660 [btrfs]
         commit_fs_roots+0xc5/0x190 [btrfs]
         btrfs_commit_transaction+0x2bf/0x8d0 [btrfs]
         btrfs_mksubvol+0x48d/0x4d0 [btrfs]
         btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs]
         btrfs_ioctl_snap_create_v2+0x124/0x180 [btrfs]
         btrfs_ioctl+0x123f/0x3030 [btrfs]
      
      The file extent generation of the free space inode is equal to the last
      snapshot of the file root, so the inode will be passed to cow_file_rage.
      But the inode was created and its extents were preallocated in
      btrfs_save_ino_cache, there are no cow copies on disk.
      
      The preallocated extent is not yet in the extent tree, and
      btrfs_cross_ref_exist will ignore the -ENOENT returned by
      check_committed_ref, so we can directly write the inode to the disk.
      
      Fixes: 78d4295b ("btrfs: lift some btrfs_cross_ref_exist checks in nocow path")
      CC: stable@vger.kernel.org # 4.18+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7708a830
    • Anand Jain's avatar
      btrfs: dev-replace: go back to suspend state if another EXCL_OP is running · c1f90eb0
      Anand Jain authored
      commit 05c49e6b upstream.
      
      In a secnario where balance and replace co-exists as below,
      
        - start balance
        - pause balance
        - start replace
        - reboot
      
      and when system restarts, balance resumes first. Then the replace is
      attempted to restart but will fail as the EXCL_OP lock is already held
      by the balance. If so place the replace state back to
      BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state.
      
      Fixes: 010a47bd ("btrfs: add proper safety check before resuming dev-replace")
      CC: stable@vger.kernel.org # 4.18+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1f90eb0
    • Anand Jain's avatar
      btrfs: dev-replace: go back to suspended state if target device is missing · 28867a52
      Anand Jain authored
      commit 0d228ece upstream.
      
      At the time of forced unmount we place the running replace to
      BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state, so when the system comes
      back and expect the target device is missing.
      
      Then let the replace state continue to be in
      BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state instead of
      BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED as there isn't any matching scrub
      running as part of replace.
      
      Fixes: e93c89c1 ("Btrfs: add new sources for device replace code")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28867a52
    • Macpaul Lin's avatar
      cdc-acm: fix abnormal DATA RX issue for Mediatek Preloader. · 326ca6bd
      Macpaul Lin authored
      commit eafb27fa upstream.
      
      Mediatek Preloader is a proprietary embedded boot loader for loading
      Little Kernel and Linux into device DRAM.
      
      This boot loader also handle firmware update. Mediatek Preloader will be
      enumerated as a virtual COM port when the device is connected to Windows
      or Linux OS via CDC-ACM class driver. When the USB enumeration has been
      done, Mediatek Preloader will send out handshake command "READY" to PC
      actively instead of waiting command from the download tool.
      
      Since Linux 4.12, the commit "tty: reset termios state on device
      registration" (93857edd) causes Mediatek
      Preloader receiving some abnoraml command like "READYXX" as it sent.
      This will be recognized as an incorrect response. The behavior change
      also causes the download handshake fail. This change only affects
      subsequent connects if the reconnected device happens to get the same minor
      number.
      
      By disabling the ECHO termios flag could avoid this problem. However, it
      cannot be done by user space configuration when download tool open
      /dev/ttyACM0. This is because the device running Mediatek Preloader will
      send handshake command "READY" immediately once the CDC-ACM driver is
      ready.
      
      This patch wants to fix above problem by introducing "DISABLE_ECHO"
      property in driver_info. When Mediatek Preloader is connected, the
      CDC-ACM driver could disable ECHO flag in termios to avoid the problem.
      Signed-off-by: default avatarMacpaul Lin <macpaul.lin@mediatek.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      326ca6bd
    • Tejun Heo's avatar
      cgroup: fix CSS_TASK_ITER_PROCS · 8a2fbdd5
      Tejun Heo authored
      commit e9d81a1b upstream.
      
      CSS_TASK_ITER_PROCS implements process-only iteration by making
      css_task_iter_advance() skip tasks which aren't threadgroup leaders;
      however, when an iteration is started css_task_iter_start() calls the
      inner helper function css_task_iter_advance_css_set() instead of
      css_task_iter_advance().  As the helper doesn't have the skip logic,
      when the first task to visit is a non-leader thread, it doesn't get
      skipped correctly as shown in the following example.
      
        # ps -L 2030
          PID   LWP TTY      STAT   TIME COMMAND
         2030  2030 pts/0    Sl+    0:00 ./test-thread
         2030  2031 pts/0    Sl+    0:00 ./test-thread
        # mkdir -p /sys/fs/cgroup/x/a/b
        # echo threaded > /sys/fs/cgroup/x/a/cgroup.type
        # echo threaded > /sys/fs/cgroup/x/a/b/cgroup.type
        # echo 2030 > /sys/fs/cgroup/x/a/cgroup.procs
        # cat /sys/fs/cgroup/x/a/cgroup.threads
        2030
        2031
        # cat /sys/fs/cgroup/x/cgroup.procs
        2030
        # echo 2030 > /sys/fs/cgroup/x/a/b/cgroup.threads
        # cat /sys/fs/cgroup/x/cgroup.procs
        2031
        2030
      
      The last read of cgroup.procs is incorrectly showing non-leader 2031
      in cgroup.procs output.
      
      This can be fixed by updating css_task_iter_advance() to handle the
      first advance and css_task_iters_tart() to call
      css_task_iter_advance() instead of the inner helper.  After the fix,
      the same commands result in the following (correct) result:
      
        # ps -L 2062
          PID   LWP TTY      STAT   TIME COMMAND
         2062  2062 pts/0    Sl+    0:00 ./test-thread
         2062  2063 pts/0    Sl+    0:00 ./test-thread
        # mkdir -p /sys/fs/cgroup/x/a/b
        # echo threaded > /sys/fs/cgroup/x/a/cgroup.type
        # echo threaded > /sys/fs/cgroup/x/a/b/cgroup.type
        # echo 2062 > /sys/fs/cgroup/x/a/cgroup.procs
        # cat /sys/fs/cgroup/x/a/cgroup.threads
        2062
        2063
        # cat /sys/fs/cgroup/x/cgroup.procs
        2062
        # echo 2062 > /sys/fs/cgroup/x/a/b/cgroup.threads
        # cat /sys/fs/cgroup/x/cgroup.procs
        2062
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatar"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Fixes: 8cfd8147 ("cgroup: implement cgroup v2 thread support")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a2fbdd5
    • Dmitry Eremin-Solenikov's avatar
      crypto: cfb - fix decryption · 99dcd45f
      Dmitry Eremin-Solenikov authored
      commit fa460073 upstream.
      
      crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with
      IV, rather than with data stream, resulting in incorrect decryption.
      Test vectors will be added in the next patch.
      Signed-off-by: default avatarDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99dcd45f
    • Dmitry Eremin-Solenikov's avatar
      crypto: testmgr - add AES-CFB tests · d8e4b24f
      Dmitry Eremin-Solenikov authored
      commit 7da66670 upstream.
      
      Add AES128/192/256-CFB testvectors from NIST SP800-38A.
      Signed-off-by: default avatarDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8e4b24f
    • Atul Gupta's avatar
      crypto: chcr - small packet Tx stalls the queue · cc43a8af
      Atul Gupta authored
      commit c35828ea upstream.
      
      Immediate packets sent to hardware should include the work
      request length in calculating the flits. WR occupy one flit and
      if not accounted result in invalid request which stalls the HW
      queue.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAtul Gupta <atul.gupta@chelsio.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc43a8af
    • Wenwen Wang's avatar
      crypto: cavium/nitrox - fix a DMA pool free failure · 0fa6bead
      Wenwen Wang authored
      commit 7172122b upstream.
      
      In crypto_alloc_context(), a DMA pool is allocated through dma_pool_alloc()
      to hold the crypto context. The meta data of the DMA pool, including the
      pool used for the allocation 'ndev->ctx_pool' and the base address of the
      DMA pool used by the device 'dma', are then stored to the beginning of the
      pool. These meta data are eventually used in crypto_free_context() to free
      the DMA pool through dma_pool_free(). However, given that the DMA pool can
      also be accessed by the device, a malicious device can modify these meta
      data, especially when the device is controlled to deploy an attack. This
      can cause an unexpected DMA pool free failure.
      
      To avoid the above issue, this patch introduces a new structure
      crypto_ctx_hdr and a new field chdr in the structure nitrox_crypto_ctx hold
      the meta data information of the DMA pool after the allocation. Note that
      the original structure ctx_hdr is not changed to ensure the compatibility.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fa6bead
    • Jernej Skrabec's avatar
      clk: sunxi-ng: Use u64 for calculation of NM rate · d095e1ba
      Jernej Skrabec authored
      commit 65b66576 upstream.
      
      Allwinner H6 SoC has multiplier N range between 1 and 254. Since parent
      rate is 24MHz, intermediate result when calculating final rate easily
      overflows 32 bit variable.
      
      Because of that, introduce function for calculating clock rate which
      uses 64 bit variable for intermediate result.
      
      Fixes: 6174a1e2 ("clk: sunxi-ng: Add N-M-factor clock support")
      Fixes: ee28648c ("clk: sunxi-ng: Remove the use of rational computations")
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarJernej Skrabec <jernej.skrabec@siol.net>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d095e1ba
    • Johan Jonker's avatar
      clk: rockchip: fix typo in rk3188 spdif_frac parent · 36ef9d14
      Johan Jonker authored
      commit 8b19faf6 upstream.
      
      Fix typo in common_clk_branches.
      Make spdif_pre parent of spdif_frac.
      
      Fixes: 66746420 ("clk: rockchip: include downstream muxes into fractional dividers")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJohan Jonker <jbx9999@hotmail.com>
      Acked-by: default avatarElaine Zhang <zhangqing@rock-chips.com>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36ef9d14
    • Lukas Wunner's avatar
      spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode · 9e9c6698
      Lukas Wunner authored
      commit 56c17234 upstream.
      
      The IRQ handler bcm2835_spi_interrupt() first reads as much as possible
      from the RX FIFO, then writes as much as possible to the TX FIFO.
      Afterwards it decides whether the transfer is finished by checking if
      the TX FIFO is empty.
      
      If very few bytes were written to the TX FIFO, they may already have
      been transmitted by the time the FIFO's emptiness is checked.  As a
      result, the transfer will be declared finished and the chip will be
      reset without reading the corresponding received bytes from the RX FIFO.
      
      The odds of this happening increase with a high clock frequency (such
      that the TX FIFO drains quickly) and either passing "threadirqs" on the
      command line or enabling CONFIG_PREEMPT_RT_BASE (such that the IRQ
      handler may be preempted between filling the TX FIFO and checking its
      emptiness).
      
      Fix by instead checking whether rx_len has reached zero, which means
      that the transfer has been received in full.  This is also more
      efficient as it avoids one bus read access per interrupt.  Note that
      bcm2835_spi_transfer_one_poll() likewise uses rx_len to determine
      whether the transfer has finished.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: e34ff011 ("spi: bcm2835: move to the transfer_one driver model")
      Cc: stable@vger.kernel.org # v4.1+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e9c6698
    • Lukas Wunner's avatar
      spi: bcm2835: Fix book-keeping of DMA termination · cc8b83ff
      Lukas Wunner authored
      commit dbc94411 upstream.
      
      If submission of a DMA TX transfer succeeds but submission of the
      corresponding RX transfer does not, the BCM2835 SPI driver terminates
      the TX transfer but neglects to reset the dma_pending flag to false.
      
      Thus, if the next transfer uses interrupt mode (because it is shorter
      than BCM2835_SPI_DMA_MIN_LENGTH) and runs into a timeout,
      dmaengine_terminate_all() will be called both for TX (once more) and
      for RX (which was never started in the first place).  Fix it.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: 3ecd37ed ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
      Cc: stable@vger.kernel.org # v4.2+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc8b83ff
    • Lukas Wunner's avatar
      spi: bcm2835: Fix race on DMA termination · 63f97d30
      Lukas Wunner authored
      commit e82b0b38 upstream.
      
      If a DMA transfer finishes orderly right when spi_transfer_one_message()
      determines that it has timed out, the callbacks bcm2835_spi_dma_done()
      and bcm2835_spi_handle_err() race to call dmaengine_terminate_all(),
      potentially leading to double termination.
      
      Prevent by atomically changing the dma_pending flag before calling
      dmaengine_terminate_all().
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: 3ecd37ed ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
      Cc: stable@vger.kernel.org # v4.2+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63f97d30
    • Theodore Ts'o's avatar
      ext4: check for shutdown and r/o file system in ext4_write_inode() · 0cb4f655
      Theodore Ts'o authored
      commit 18f2c4fc upstream.
      
      If the file system has been shut down or is read-only, then
      ext4_write_inode() needs to bail out early.
      
      Also use jbd2_complete_transaction() instead of ext4_force_commit() so
      we only force a commit if it is needed.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cb4f655
    • Theodore Ts'o's avatar
      ext4: force inode writes when nfsd calls commit_metadata() · bf2fd1f9
      Theodore Ts'o authored
      commit fde87268 upstream.
      
      Some time back, nfsd switched from calling vfs_fsync() to using a new
      commit_metadata() hook in export_operations().  If the file system did
      not provide a commit_metadata() hook, it fell back to using
      sync_inode_metadata().  Unfortunately doesn't work on all file
      systems.  In particular, it doesn't work on ext4 due to how the inode
      gets journalled --- the VFS writeback code will not always call
      ext4_write_inode().
      
      So we need to provide our own ext4_nfs_commit_metdata() method which
      calls ext4_write_inode() directly.
      
      Google-Bug-Id: 121195940
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf2fd1f9
    • Theodore Ts'o's avatar
      ext4: avoid declaring fs inconsistent due to invalid file handles · 26366388
      Theodore Ts'o authored
      commit 8a363970 upstream.
      
      If we receive a file handle, either from NFS or open_by_handle_at(2),
      and it points at an inode which has not been initialized, and the file
      system has metadata checksums enabled, we shouldn't try to get the
      inode, discover the checksum is invalid, and then declare the file
      system as being inconsistent.
      
      This can be reproduced by creating a test file system via "mke2fs -t
      ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
      directory, and then running the following program.
      
      #define _GNU_SOURCE
      #include <fcntl.h>
      
      struct handle {
      	struct file_handle fh;
      	unsigned char fid[MAX_HANDLE_SZ];
      };
      
      int main(int argc, char **argv)
      {
      	struct handle h = {{8, 1 }, { 12, }};
      
      	open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
      	return 0;
      }
      
      Google-Bug-Id: 120690101
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26366388
    • Theodore Ts'o's avatar
      ext4: include terminating u32 in size of xattr entries when expanding inodes · 6633fcb2
      Theodore Ts'o authored
      commit a805622a upstream.
      
      In ext4_expand_extra_isize_ea(), we calculate the total size of the
      xattr header, plus the xattr entries so we know how much of the
      beginning part of the xattrs to move when expanding the inode extra
      size.  We need to include the terminating u32 at the end of the xattr
      entries, or else if there is uninitialized, non-zero bytes after the
      xattr entries and before the xattr values, the list of xattr entries
      won't be properly terminated.
      Reported-by: default avatarSteve Graham <stgraham2000@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6633fcb2
    • ruippan (潘睿)'s avatar
      ext4: fix EXT4_IOC_GROUP_ADD ioctl · 11bb168b
      ruippan (潘睿) authored
      commit e647e291 upstream.
      
      Commit e2b911c5 ("ext4: clean up feature test macros with
      predicate functions") broke the EXT4_IOC_GROUP_ADD ioctl.  This was
      not noticed since only very old versions of resize2fs (before
      e2fsprogs 1.42) use this ioctl.  However, using a new kernel with an
      enterprise Linux userspace will cause attempts to use online resize to
      fail with "No reserved GDT blocks".
      
      Fixes: e2b911c5 ("ext4: clean up feature test macros with predicate...")
      Cc: stable@kernel.org # v4.4
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarruippan (潘睿) <ruippan@tencent.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11bb168b
    • Maurizio Lombardi's avatar
      ext4: missing unlock/put_page() in ext4_try_to_write_inline_data() · 0d078853
      Maurizio Lombardi authored
      commit 132d00be upstream.
      
      In case of error, ext4_try_to_write_inline_data() should unlock
      and release the page it holds.
      
      Fixes: f19d5870 ("ext4: add normal write support for inline data")
      Cc: stable@kernel.org # 3.8
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d078853
    • Pan Bian's avatar
      ext4: fix possible use after free in ext4_quota_enable · 0a1c177d
      Pan Bian authored
      commit 61157b24 upstream.
      
      The function frees qf_inode via iput but then pass qf_inode to
      lockdep_set_quota_inode on the failure path. This may result in a
      use-after-free bug. The patch frees df_inode only when it is never used.
      
      Fixes: daf647d2 ("ext4: add lockdep annotations for i_data_sem")
      Cc: stable@kernel.org # 4.6
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a1c177d
    • Theodore Ts'o's avatar
      ext4: add ext4_sb_bread() to disambiguate ENOMEM cases · b878c8a7
      Theodore Ts'o authored
      commit fb265c9c upstream.
      
      Today, when sb_bread() returns NULL, this can either be because of an
      I/O error or because the system failed to allocate the buffer.  Since
      it's an old interface, changing would require changing many call
      sites.
      
      So instead we create our own ext4_sb_bread(), which also allows us to
      set the REQ_META flag.
      
      Also fixed a problem in the xattr code where a NULL return in a
      function could also mean that the xattr was not found, which could
      lead to the wrong error getting returned to userspace.
      
      Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
      Cc: stable@kernel.org # 2.6.19
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b878c8a7
    • Greg Kurz's avatar
      ocxl: Fix endiannes bug in read_afu_name() · 6665481e
      Greg Kurz authored
      commit 2f07229f upstream.
      
      The AFU Descriptor Template in the PCI config space has a Name Space
      field which is a 24 Byte ASCII character string of descriptive name
      space for the AFU. The OCXL driver read the string four characters at
      a time with pci_read_config_dword().
      
      This optimization is valid on a little-endian system since this is PCI,
      but a big-endian system ends up with each subset of four characters in
      reverse order.
      
      This could be fixed by switching to read characters one by one. Another
      option is to swap the bytes if we're big-endian.
      
      Go for the latter with le32_to_cpu().
      
      Cc: stable@vger.kernel.org      # v4.16
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6665481e
    • Greg Kurz's avatar
      ocxl: Fix endiannes bug in ocxl_link_update_pe() · 3fbf78b2
      Greg Kurz authored
      commit e1e71e20 upstream.
      
      All fields in the PE are big-endian. Use cpu_to_be32() like everywhere
      else something is written to the PE. Otherwise a wrong TID will be used
      by the NPU. If this TID happens to point to an existing thread sharing
      the same mm, it could be woken up by error. This is highly improbable
      though. The likely outcome of this is the NPU not finding the target
      thread and forcing the AFU into sending an interrupt, which userspace
      is supposed to handle anyway.
      
      Fixes: e948e06f ("ocxl: Expose the thread_id needed for wait on POWER9")
      Cc: stable@vger.kernel.org      # v4.18
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fbf78b2
    • Arnaldo Carvalho de Melo's avatar
      perf env: Also consider env->arch == NULL as local operation · 65e4e67d
      Arnaldo Carvalho de Melo authored
      commit 804234f2 upstream.
      
      We'll set a new machine field based on env->arch, which for live mode,
      like with 'perf top' means we need to use uname() to figure the name of
      the arch, fix perf_env__arch() to consider both (env == NULL) and
      (env->arch == NULL) as local operation.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: stable@vger.kernel.org # 4.19
      Link: https://lkml.kernel.org/n/tip-vcz4ufzdon7cwy8dm2ua53xk@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65e4e67d
    • Ben Hutchings's avatar
      perf pmu: Suppress potential format-truncation warning · d124dd5c
      Ben Hutchings authored
      commit 11a64a05 upstream.
      
      Depending on which functions are inlined in util/pmu.c, the snprintf()
      calls in perf_pmu__parse_{scale,unit,per_pkg,snapshot}() might trigger a
      warning:
      
        util/pmu.c: In function 'pmu_aliases':
        util/pmu.c:178:31: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
          snprintf(path, PATH_MAX, "%s/%s.unit", dir, name);
                                     ^~
      
      I found this when trying to build perf from Linux 3.16 with gcc 8.
      However I can reproduce the problem in mainline if I force
      __perf_pmu__new_alias() to be inlined.
      
      Suppress this by using scnprintf() as has been done elsewhere in perf.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181111184524.fux4taownc6ndbx6@decadent.org.ukSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d124dd5c
    • Adrian Hunter's avatar
      perf script: Use fallbacks for branch stacks · 307dbd38
      Adrian Hunter authored
      commit 692d0e63 upstream.
      
      Branch stacks do not necessarily have the same cpumode as the 'ip'. Use
      the fallback functions in those cases.
      
      This patch depends on patch "perf tools: Add fallback functions for cases
      where cpumode is insufficient".
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/20181106210712.12098-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      307dbd38
    • Adrian Hunter's avatar
      perf tools: Use fallback for sample_addr_correlates_sym() cases · 39dad822
      Adrian Hunter authored
      commit 225f99e0 upstream.
      
      thread__resolve() is used in the sample_addr_correlates_sym() cases
      where 'addr' is a destination of a branch which does not necessarily
      have the same cpumode as the 'ip'. Use the fallback function in that
      case.
      
      This patch depends on patch "perf tools: Add fallback functions for
      cases where cpumode is insufficient".
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/20181106210712.12098-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39dad822
    • Adrian Hunter's avatar
      perf thread: Add fallback functions for cases where cpumode is insufficient · 0ada27a7
      Adrian Hunter authored
      commit 8e80ad99 upstream.
      
      For branch stacks or branch samples, the sample cpumode might not be
      correct because it applies only to the sample 'ip' and not necessary to
      'addr' or branch stack addresses. Add fallback functions that can be
      used to deal with those cases
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/20181106210712.12098-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ada27a7
    • Adrian Hunter's avatar
      perf machine: Record if a arch has a single user/kernel address space · 62977a9b
      Adrian Hunter authored
      commit ec1891af upstream.
      
      Some architectures have a single address space for kernel and user
      addresses, which makes it possible to determine if an address is in
      kernel space or user space. Some don't, e.g.: sparc.
      
      Cache that info in perf_env so that, for instance, code needing to
      fallback failed symbol lookups at the kernel space in single address
      space arches can lookup at userspace.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/20181106210712.12098-2-adrian.hunter@intel.com
      [ split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62977a9b
    • Alexey Brodkin's avatar
      clocksource/drivers/arc_timer: Utilize generic sched_clock · bf75d938
      Alexey Brodkin authored
      commit bf287607 upstream.
      
      It turned out we used to use default implementation of sched_clock()
      from kernel/sched/clock.c which was as precise as 1/HZ, i.e.
      by default we had 10 msec granularity of time measurement.
      
      Now given ARC built-in timers are clocked with the same frequency as
      CPU cores we may get much higher precision of time tracking.
      
      Thus we switch to generic sched_clock which really reads ARC hardware
      counters.
      
      This is especially helpful for measuring short events.
      That's what we used to have:
      ------------------------------>8------------------------
      $ perf stat /bin/sh -c /root/lmbench-master/bin/arc/hello > /dev/null
      
       Performance counter stats for '/bin/sh -c /root/lmbench-master/bin/arc/hello':
      
               10.000000      task-clock (msec)         #    2.832 CPUs utilized
                       1      context-switches          #    0.100 K/sec
                       1      cpu-migrations            #    0.100 K/sec
                      63      page-faults               #    0.006 M/sec
                 3049480      cycles                    #    0.305 GHz
                 1091259      instructions              #    0.36  insn per cycle
                  256828      branches                  #   25.683 M/sec
                   27026      branch-misses             #   10.52% of all branches
      
             0.003530687 seconds time elapsed
      
             0.000000000 seconds user
             0.010000000 seconds sys
      ------------------------------>8------------------------
      
      And now we'll see:
      ------------------------------>8------------------------
      $ perf stat /bin/sh -c /root/lmbench-master/bin/arc/hello > /dev/null
      
       Performance counter stats for '/bin/sh -c /root/lmbench-master/bin/arc/hello':
      
                3.004322      task-clock (msec)         #    0.865 CPUs utilized
                       1      context-switches          #    0.333 K/sec
                       1      cpu-migrations            #    0.333 K/sec
                      63      page-faults               #    0.021 M/sec
                 2986734      cycles                    #    0.994 GHz
                 1087466      instructions              #    0.36  insn per cycle
                  255209      branches                  #   84.947 M/sec
                   26002      branch-misses             #   10.19% of all branches
      
             0.003474829 seconds time elapsed
      
             0.003519000 seconds user
             0.000000000 seconds sys
      ------------------------------>8------------------------
      
      Note how much more meaningful is the second output - time spent for
      execution pretty much matches number of cycles spent (we're runnign
      @ 1GHz here).
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf75d938
    • Eugeniy Paltsev's avatar
      DRM: UDL: get rid of useless vblank initialization · ca3a6fd2
      Eugeniy Paltsev authored
      commit 32e932e3 upstream.
      
      UDL doesn't support vblank functionality so we don't need to
      initialize vblank here (we are able to send page flip
      completion events even without vblank initialization)
      
      Moreover current drm_vblank_init call with num_crtcs > 0 causes
      sending DRM_EVENT_FLIP_COMPLETE event with zero timestamp every
      time. This breaks userspace apps (for example weston) which
      relies on timestamp value.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180928144126.21598-1-Eugeniy.Paltsev@synopsys.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca3a6fd2
    • Eric Anholt's avatar
      drm/v3d: Skip debugfs dumping GCA on platforms without GCA. · 29ac2218
      Eric Anholt authored
      commit 2f20fa8d upstream.
      
      Fixes an oops reading this debugfs entry on BCM7278.
      Signed-off-by: default avatarEric Anholt <eric@anholt.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180928232126.4332-4-eric@anholt.net
      Fixes: 57692c94 ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29ac2218
    • Miquel Raynal's avatar
      platform-msi: Free descriptors in platform_msi_domain_free() · 6c56e89e
      Miquel Raynal authored
      commit 81b1e6e6 upstream.
      
      Since the addition of platform MSI support, there were two helpers
      supposed to allocate/free IRQs for a device:
      
          platform_msi_domain_alloc_irqs()
          platform_msi_domain_free_irqs()
      
      In these helpers, IRQ descriptors are allocated in the "alloc" routine
      while they are freed in the "free" one.
      
      Later, two other helpers have been added to handle IRQ domains on top
      of MSI domains:
      
          platform_msi_domain_alloc()
          platform_msi_domain_free()
      
      Seen from the outside, the logic is pretty close with the former
      helpers and people used it with the same logic as before: a
      platform_msi_domain_alloc() call should be balanced with a
      platform_msi_domain_free() call. While this is probably what was
      intended to do, the platform_msi_domain_free() does not remove/free
      the IRQ descriptor(s) created/inserted in
      platform_msi_domain_alloc().
      
      One effect of such situation is that removing a module that requested
      an IRQ will let one orphaned IRQ descriptor (with an allocated MSI
      entry) in the device descriptors list. Next time the module will be
      inserted back, one will observe that the allocation will happen twice
      in the MSI domain, one time for the remaining descriptor, one time for
      the new one. It also has the side effect to quickly overshoot the
      maximum number of allocated MSI and then prevent any module requesting
      an interrupt in the same domain to be inserted anymore.
      
      This situation has been met with loops of insertion/removal of the
      mvpp2.ko module (requesting 15 MSIs each time).
      
      Fixes: 552c494a ("platform-msi: Allow creation of a MSI-based stacked irq domain")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c56e89e
    • Sean Christopherson's avatar
      KVM: nVMX: Free the VMREAD/VMWRITE bitmaps if alloc_kvm_area() fails · c9dae887
      Sean Christopherson authored
      commit 1b3ab5ad upstream.
      
      Fixes: 34a1cd60 ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9dae887
    • Marc Zyngier's avatar
      arm64: KVM: Make VHE Stage-2 TLB invalidation operations non-interruptible · 07cbcfc3
      Marc Zyngier authored
      commit c987876a upstream.
      
      Contrary to the non-VHE version of the TLB invalidation helpers, the VHE
      code  has interrupts enabled, meaning that we can take an interrupt in
      the middle of such a sequence, and start running something else with
      HCR_EL2.TGE cleared.
      
      That's really not a good idea.
      
      Take the heavy-handed option and disable interrupts in
      __tlb_switch_to_guest_vhe, restoring them in __tlb_switch_to_host_vhe.
      The latter also gain an ISB in order to make sure that TGE really has
      taken effect.
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07cbcfc3
    • Sean Christopherson's avatar
      KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup · edcf33b1
      Sean Christopherson authored
      commit e8143499 upstream.
      
      ____kvm_handle_fault_on_reboot() provides a generic exception fixup
      handler that is used to cleanly handle faults on VMX/SVM instructions
      during reboot (or at least try to).  If there isn't a reboot in
      progress, ____kvm_handle_fault_on_reboot() treats any exception as
      fatal to KVM and invokes kvm_spurious_fault(), which in turn generates
      a BUG() to get a stack trace and die.
      
      When it was originally added by commit 4ecac3fd ("KVM: Handle
      virtualization instruction #UD faults during reboot"), the "call" to
      kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value
      is the RIP of the faulting instructing.
      
      The PUSH+JMP trickery is necessary because the exception fixup handler
      code lies outside of its associated function, e.g. right after the
      function.  An actual CALL from the .fixup code would show a slightly
      bogus stack trace, e.g. an extra "random" function would be inserted
      into the trace, as the return RIP on the stack would point to no known
      function (and the unwinder will likely try to guess who owns the RIP).
      
      Unfortunately, the JMP was replaced with a CALL when the macro was
      reworked to not spin indefinitely during reboot (commit b7c4145b
      "KVM: Don't spin on virt instruction faults during reboot").  This
      causes the aforementioned behavior where a bogus function is inserted
      into the stack trace, e.g. my builds like to blame free_kvm_area().
      
      Revert the CALL back to a JMP.  The changelog for commit b7c4145b
      ("KVM: Don't spin on virt instruction faults during reboot") contains
      nothing that indicates the switch to CALL was deliberate.  This is
      backed up by the fact that the PUSH <insn RIP> was left intact.
      
      Note that an alternative to the PUSH+JMP magic would be to JMP back
      to the "real" code and CALL from there, but that would require adding
      a JMP in the non-faulting path to avoid calling kvm_spurious_fault()
      and would add no value, i.e. the stack trace would be the same.
      
      Using CALL:
      
      ------------[ cut here ]------------
      kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
      invalid opcode: 0000 [#1] SMP
      CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
      Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
      RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0
      R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000
      FS:  00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0
      Call Trace:
       free_kvm_area+0x1044/0x43ea [kvm_intel]
       ? vmx_vcpu_run+0x156/0x630 [kvm_intel]
       ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? __set_task_blocked+0x38/0x90
       ? __set_current_blocked+0x50/0x60
       ? __fpu__restore_sig+0x97/0x490
       ? do_vfs_ioctl+0xa1/0x620
       ? __x64_sys_futex+0x89/0x180
       ? ksys_ioctl+0x66/0x70
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x4f/0x100
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
      ---[ end trace 9775b14b123b1713 ]---
      
      Using JMP:
      
      ------------[ cut here ]------------
      kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
      invalid opcode: 0000 [#1] SMP
      CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
      Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
      RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0
      R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40
      FS:  00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0
      Call Trace:
       vmx_vcpu_run+0x156/0x630 [kvm_intel]
       ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? __set_task_blocked+0x38/0x90
       ? __set_current_blocked+0x50/0x60
       ? __fpu__restore_sig+0x97/0x490
       ? do_vfs_ioctl+0xa1/0x620
       ? __x64_sys_futex+0x89/0x180
       ? ksys_ioctl+0x66/0x70
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x4f/0x100
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
      ---[ end trace f9daedb85ab3ddba ]---
      
      Fixes: b7c4145b ("KVM: Don't spin on virt instruction faults during reboot")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edcf33b1
    • Dan Williams's avatar
      x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() · 49102719
      Dan Williams authored
      commit ba6f508d upstream.
      
      Commit:
      
        f77084d9 "x86/mm/pat: Disable preemption around __flush_tlb_all()"
      
      addressed a case where __flush_tlb_all() is called without preemption
      being disabled. It also left a warning to catch other cases where
      preemption is not disabled.
      
      That warning triggers for the memory hotplug path which is also used for
      persistent memory enabling:
      
       WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
       RIP: 0010:__flush_tlb_all+0x1b/0x3a
       [..]
       Call Trace:
        phys_pud_init+0x29c/0x2bb
        kernel_physical_mapping_init+0xfc/0x219
        init_memory_mapping+0x1a5/0x3b0
        arch_add_memory+0x2c/0x50
        devm_memremap_pages+0x3aa/0x610
        pmem_attach_disk+0x585/0x700 [nd_pmem]
      
      Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
      and Dave confirmed the expectation for TLB flush is for modifying /
      invalidating existing PTE entries, but not initial population [2]. Drop
      the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
      expectation that this path is only ever populating empty entries for the
      linear map. Note, at linear map teardown time there is a call to the
      all-cpu flush_tlb_all() to invalidate the removed mappings.
      
      [1]: https://lkml.kernel.org/r/9DFD717D-857D-493D-A606-B635D72BAC21@amacapital.net
      [2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa0b5@intel.com
      
      [ mingo: Minor readability edits. ]
      Suggested-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reported-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@intel.com
      Fixes: f77084d9 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
      Link: http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49102719