1. 17 Apr, 2020 40 commits
    • Nathan Chancellor's avatar
      rtc: omap: Use define directive for PIN_CONFIG_ACTIVE_HIGH · 23599f81
      Nathan Chancellor authored
      commit c5015652 upstream.
      
      Clang warns when one enumerated type is implicitly converted to another:
      
      drivers/rtc/rtc-omap.c:574:21: warning: implicit conversion from
      enumeration type 'enum rtc_pin_config_param' to different enumeration
      type 'enum pin_config_param' [-Wenum-conversion]
              {"ti,active-high", PIN_CONFIG_ACTIVE_HIGH, 0},
              ~                  ^~~~~~~~~~~~~~~~~~~~~~
      drivers/rtc/rtc-omap.c:579:12: warning: implicit conversion from
      enumeration type 'enum rtc_pin_config_param' to different enumeration
      type 'enum pin_config_param' [-Wenum-conversion]
              PCONFDUMP(PIN_CONFIG_ACTIVE_HIGH, "input active high", NULL, false),
              ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/linux/pinctrl/pinconf-generic.h:163:11: note: expanded from
      macro 'PCONFDUMP'
              .param = a, .display = b, .format = c, .has_arg = d     \
                       ^
      2 warnings generated.
      
      It is expected that pinctrl drivers can extend pin_config_param because
      of the gap between PIN_CONFIG_END and PIN_CONFIG_MAX so this conversion
      isn't an issue. Most drivers that take advantage of this define the
      PIN_CONFIG variables as constants, rather than enumerated values. Do the
      same thing here so that Clang no longer warns.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/144Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23599f81
    • Michal Hocko's avatar
      selftests: vm: drop dependencies on page flags from mlock2 tests · 01522e4d
      Michal Hocko authored
      commit eea274d6 upstream.
      
      It was noticed that mlock2 tests are failing after 9c4e6b1a ("mm,
      mlock, vmscan: no more skipping pagevecs") because the patch has changed
      the timing on when the page is added to the unevictable LRU list and thus
      gains the unevictable page flag.
      
      The test was just too dependent on the implementation details which were
      true at the time when it was introduced.  Page flags and the timing when
      they are set is something no userspace should ever depend on.  The test
      should be testing only for the user observable contract of the tested
      syscalls.  Those are defined pretty well for the mlock and there are other
      means for testing them.  In fact this is already done and testing for page
      flags can be safely dropped to achieve the aimed purpose.  Present bits
      can be checked by /proc/<pid>/smaps RSS field and the locking state by
      VmFlags although I would argue that Locked: field would be more
      appropriate.
      
      Drop all the page flag machinery and considerably simplify the test.  This
      should be more robust for future kernel changes while checking the
      promised contract is still valid.
      
      Fixes: 9c4e6b1a ("mm, mlock, vmscan: no more skipping pagevecs")
      Reported-by: default avatarRafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200324154218.GS19542@dhcp22.suse.czSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01522e4d
    • Fredrik Strupe's avatar
      arm64: armv8_deprecated: Fix undef_hook mask for thumb setend · fb3e9f47
      Fredrik Strupe authored
      commit fc226601 upstream.
      
      For thumb instructions, call_undef_hook() in traps.c first reads a u16,
      and if the u16 indicates a T32 instruction (u16 >= 0xe800), a second
      u16 is read, which then makes up the the lower half-word of a T32
      instruction. For T16 instructions, the second u16 is not read,
      which makes the resulting u32 opcode always have the upper half set to
      0.
      
      However, having the upper half of instr_mask in the undef_hook set to 0
      masks out the upper half of all thumb instructions - both T16 and T32.
      This results in trapped T32 instructions with the lower half-word equal
      to the T16 encoding of setend (b650) being matched, even though the upper
      half-word is not 0000 and thus indicates a T32 opcode.
      
      An example of such a T32 instruction is eaa0b650, which should raise a
      SIGILL since T32 instructions with an eaa prefix are unallocated as per
      Arm ARM, but instead works as a SETEND because the second half-word is set
      to b650.
      
      This patch fixes the issue by extending instr_mask to include the
      upper u32 half, which will still match T16 instructions where the upper
      half is 0, but not T32 instructions.
      
      Fixes: 2d888f48 ("arm64: Emulate SETEND for AArch32 tasks")
      Cc: <stable@vger.kernel.org> # 4.0.x-
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarFredrik Strupe <fredrik@strupe.net>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb3e9f47
    • Steffen Maier's avatar
      scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point · af77e3e4
      Steffen Maier authored
      commit 819732be upstream.
      
      v2.6.27 commit cc8c2829 ("[SCSI] zfcp: Automatically attach remote
      ports") introduced zfcp automatic port scan.
      
      Before that, the user had to use the sysfs attribute "port_add" of an FCP
      device (adapter) to add and open remote (target) ports, even for the remote
      peer port in point-to-point topology. That code path did a proper port open
      recovery trigger taking the erp_lock.
      
      Since above commit, a new helper function zfcp_erp_open_ptp_port()
      performed an UNlocked port open recovery trigger. This can race with other
      parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt
      e.g. adapter->erp_total_count or adapter->erp_ready_head.
      
      As already found for fabric topology in v4.17 commit fa89adba ("scsi:
      zfcp: fix infinite iteration on ERP ready list"), there was an endless loop
      during tracing of rport (un)block.  A subsequent v4.18 commit 9e156c54
      ("scsi: zfcp: assert that the ERP lock is held when tracing a recovery
      trigger") introduced a lockdep assertion for that case.
      
      As a side effect, that lockdep assertion now uncovered the unlocked code
      path for PtP. It is from within an adapter ERP action:
      
      zfcp_erp_strategy[1479]  intentionally DROPs erp lock around
                               zfcp_erp_strategy_do_action()
      zfcp_erp_strategy_do_action[1441]      NO erp lock
      zfcp_erp_adapter_strategy[876]         NO erp lock
      zfcp_erp_adapter_strategy_open[855]    NO erp lock
      zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock
      zfcp_erp_adapter_strat_fsf_xconf[772]  erp lock only around
                                             zfcp_erp_action_to_running(),
                                             BUT *_not_* around
                                             zfcp_erp_enqueue_ptp_port()
      zfcp_erp_enqueue_ptp_port[728]         BUG: *_not_* taking erp lock
      _zfcp_erp_port_reopen[432]             assumes to be called with erp lock
      zfcp_erp_action_enqueue[314]           assumes to be called with erp lock
      zfcp_dbf_rec_trig[288]                 _checks_ to be called with erp lock:
      	lockdep_assert_held(&adapter->erp_lock);
      
      It causes the following lockdep warning:
      
      WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288
                                  zfcp_dbf_rec_trig+0x16a/0x188
      no locks held by zfcperp0.0.17c0/775.
      
      Fix this by using the proper locked recovery trigger helper function.
      
      Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com
      Fixes: cc8c2829 ("[SCSI] zfcp: Automatically attach remote ports")
      Cc: <stable@vger.kernel.org> #v2.6.27+
      Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
      Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af77e3e4
    • Shetty, Harshini X (EXT-Sony Mobile)'s avatar
      dm verity fec: fix memory leak in verity_fec_dtr · 92760667
      Shetty, Harshini X (EXT-Sony Mobile) authored
      commit 75fa6019 upstream.
      
      Fix below kmemleak detected in verity_fec_ctr. output_pool is
      allocated for each dm-verity-fec device. But it is not freed when
      dm-table for the verity target is removed. Hence free the output
      mempool in destructor function verity_fec_dtr.
      
      unreferenced object 0xffffffffa574d000 (size 4096):
        comm "init", pid 1667, jiffies 4294894890 (age 307.168s)
        hex dump (first 32 bytes):
          8e 36 00 98 66 a8 0b 9b 00 00 00 00 00 00 00 00  .6..f...........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000060e82407>] __kmalloc+0x2b4/0x340
          [<00000000dd99488f>] mempool_kmalloc+0x18/0x20
          [<000000002560172b>] mempool_init_node+0x98/0x118
          [<000000006c3574d2>] mempool_init+0x14/0x20
          [<0000000008cb266e>] verity_fec_ctr+0x388/0x3b0
          [<000000000887261b>] verity_ctr+0x87c/0x8d0
          [<000000002b1e1c62>] dm_table_add_target+0x174/0x348
          [<000000002ad89eda>] table_load+0xe4/0x328
          [<000000001f06f5e9>] dm_ctl_ioctl+0x3b4/0x5a0
          [<00000000bee5fbb7>] do_vfs_ioctl+0x5dc/0x928
          [<00000000b475b8f5>] __arm64_sys_ioctl+0x70/0x98
          [<000000005361e2e8>] el0_svc_common+0xa0/0x158
          [<000000001374818f>] el0_svc_handler+0x6c/0x88
          [<000000003364e9f4>] el0_svc+0x8/0xc
          [<000000009d84cec9>] 0xffffffffffffffff
      
      Fixes: a739ff3f ("dm verity: add support for forward error correction")
      Depends-on: 6f1c819c ("dm: convert to bioset_init()/mempool_init()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHarshini Shetty <harshini.x.shetty@sony.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92760667
    • Mikulas Patocka's avatar
      dm writecache: add cond_resched to avoid CPU hangs · 6f3a303a
      Mikulas Patocka authored
      commit 1edaa447 upstream.
      
      Initializing a dm-writecache device can take a long time when the
      persistent memory device is large.  Add cond_resched() to a few loops
      to avoid warnings that the CPU is stuck.
      
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f3a303a
    • Maxime Ripard's avatar
      arm64: dts: allwinner: h6: Fix PMU compatible · a6d77a5c
      Maxime Ripard authored
      commit 4c7eeb9a upstream.
      
      The commit 7aa9b9eb ("arm64: dts: allwinner: H6: Add PMU mode")
      introduced support for the PMU found on the Allwinner H6. However, the
      binding only allows for a single compatible, while the patch was adding
      two.
      
      Make sure we follow the binding.
      
      Fixes: 7aa9b9eb ("arm64: dts: allwinner: H6: Add PMU mode")
      Signed-off-by: default avatarMaxime Ripard <maxime@cerno.tech>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6d77a5c
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Allow configuration updates to existing devices · 0389387e
      Subash Abhinov Kasiviswanathan authored
      commit 2abb5792 upstream.
      
      This allows the changelink operation to succeed if the mux_id was
      specified as an argument. Note that the mux_id must match the
      existing mux_id of the rmnet device or should be an unused mux_id.
      
      Fixes: 1dc49e9d ("net: rmnet: do not allow to change mux id if mux id is duplicated")
      Reported-and-tested-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0389387e
    • Alexander Duyck's avatar
      mm: Use fixed constant in page_frag_alloc instead of size + 1 · 69598616
      Alexander Duyck authored
      commit 86447726 upstream.
      
      This patch replaces the size + 1 value introduced with the recent fix for 1
      byte allocs with a constant value.
      
      The idea here is to reduce code overhead as the previous logic would have
      to read size into a register, then increment it, and write it back to
      whatever field was being used. By using a constant we can avoid those
      memory reads and arithmetic operations in favor of just encoding the
      maximum value into the operation itself.
      
      Fixes: 2c2ade81 ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs")
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69598616
    • Anssi Hannula's avatar
      tools: gpio: Fix out-of-tree build regression · 2e22edcd
      Anssi Hannula authored
      commit 82f04bfe upstream.
      
      Commit 0161a94e ("tools: gpio: Correctly add make dependencies for
      gpio_utils") added a make rule for gpio-utils-in.o but used $(output)
      instead of the correct $(OUTPUT) for the output directory, breaking
      out-of-tree build (O=xx) with the following error:
      
        No rule to make target 'out/tools/gpio/gpio-utils-in.o', needed by 'out/tools/gpio/lsgpio-in.o'.  Stop.
      
      Fix that.
      
      Fixes: 0161a94e ("tools: gpio: Correctly add make dependencies for gpio_utils")
      Cc: <stable@vger.kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Link: https://lore.kernel.org/r/20200325103154.32235-1-anssi.hannula@bitwise.fiReviewed-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e22edcd
    • Zhenzhong Duan's avatar
      x86/speculation: Remove redundant arch_smt_update() invocation · 6209e098
      Zhenzhong Duan authored
      commit 34d66caf upstream.
      
      With commit a74cfffb ("x86/speculation: Rework SMT state change"),
      arch_smt_update() is invoked from each individual CPU hotplug function.
      
      Therefore the extra arch_smt_update() call in the sysfs SMT control is
      redundant.
      
      Fixes: a74cfffb ("x86/speculation: Rework SMT state change")
      Signed-off-by: default avatarZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <konrad.wilk@oracle.com>
      Cc: <dwmw@amazon.co.uk>
      Cc: <bp@suse.de>
      Cc: <srinivas.eeda@oracle.com>
      Cc: <peterz@infradead.org>
      Cc: <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/e2e064f2-e8ef-42ca-bf4f-76b612964752@default
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6209e098
    • YueHaibing's avatar
      powerpc/pseries: Drop pointless static qualifier in vpa_debugfs_init() · f5e2eef0
      YueHaibing authored
      commit 11dd34f3 upstream.
      
      There is no need to have the 'struct dentry *vpa_dir' variable static
      since new value always be assigned before use it.
      
      Fixes: c6c26fb5 ("powerpc/pseries: Export raw per-CPU VPA data via debugfs")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190218125644.87448-1-yuehaibing@huawei.com
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5e2eef0
    • Gao Xiang's avatar
      erofs: correct the remaining shrink objects · d8bd8bca
      Gao Xiang authored
      commit 9d5a09c6 upstream.
      
      The remaining count should not include successful
      shrink attempts.
      
      Fixes: e7e9a307 ("staging: erofs: introduce workstation for decompression")
      Cc: <stable@vger.kernel.org> # 4.19+
      Link: https://lore.kernel.org/r/20200226081008.86348-1-gaoxiang25@huawei.comReviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      d8bd8bca
    • Rosioru Dragos's avatar
      crypto: mxs-dcp - fix scatterlist linearization for hash · c127f180
      Rosioru Dragos authored
      commit fa03481b upstream.
      
      The incorrect traversal of the scatterlist, during the linearization phase
      lead to computing the hash value of the wrong input buffer.
      New implementation uses scatterwalk_map_and_copy()
      to address this issue.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 15b59e7c ("crypto: mxs - Add Freescale MXS DCP driver")
      Signed-off-by: default avatarRosioru Dragos <dragos.rosioru@nxp.com>
      Reviewed-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c127f180
    • Robbie Ko's avatar
      btrfs: fix missing semaphore unlock in btrfs_sync_file · ed870340
      Robbie Ko authored
      commit 6ff06729 upstream.
      
      Ordered ops are started twice in sync file, once outside of inode mutex
      and once inside, taking the dio semaphore. There was one error path
      missing the semaphore unlock.
      
      Fixes: aab15e8e ("Btrfs: fix rare chances for data loss when doing a fast fsync")
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarRobbie Ko <robbieko@synology.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      [ add changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed870340
    • Filipe Manana's avatar
      btrfs: fix missing file extent item for hole after ranged fsync · 867ae5eb
      Filipe Manana authored
      commit 95418ed1 upstream.
      
      When doing a fast fsync for a range that starts at an offset greater than
      zero, we can end up with a log that when replayed causes the respective
      inode miss a file extent item representing a hole if we are not using the
      NO_HOLES feature. This is because for fast fsyncs we don't log any extents
      that cover a range different from the one requested in the fsync.
      
      Example scenario to trigger it:
      
        $ mkfs.btrfs -O ^no-holes -f /dev/sdd
        $ mount /dev/sdd /mnt
      
        # Create a file with a single 256K and fsync it to clear to full sync
        # bit in the inode - we want the msync below to trigger a fast fsync.
        $ xfs_io -f -c "pwrite -S 0xab 0 256K" -c "fsync" /mnt/foo
      
        # Force a transaction commit and wipe out the log tree.
        $ sync
      
        # Dirty 768K of data, increasing the file size to 1Mb, and flush only
        # the range from 256K to 512K without updating the log tree
        # (sync_file_range() does not trigger fsync, it only starts writeback
        # and waits for it to finish).
      
        $ xfs_io -c "pwrite -S 0xcd 256K 768K" /mnt/foo
        $ xfs_io -c "sync_range -abw 256K 256K" /mnt/foo
      
        # Now dirty the range from 768K to 1M again and sync that range.
        $ xfs_io -c "mmap -w 768K 256K"        \
                 -c "mwrite -S 0xef 768K 256K" \
                 -c "msync -s 768K 256K"       \
                 -c "munmap"                   \
                 /mnt/foo
      
        <power fail>
      
        # Mount to replay the log.
        $ mount /dev/sdd /mnt
        $ umount /mnt
      
        $ btrfs check /dev/sdd
        Opening filesystem to check...
        Checking filesystem on /dev/sdd
        UUID: 482fb574-b288-478e-a190-a9c44a78fca6
        [1/7] checking root items
        [2/7] checking extents
        [3/7] checking free space cache
        [4/7] checking fs roots
        root 5 inode 257 errors 100, file extent discount
        Found file extent holes:
             start: 262144, len: 524288
        ERROR: errors found in fs roots
        found 720896 bytes used, error(s) found
        total csum bytes: 512
        total tree bytes: 131072
        total fs tree bytes: 32768
        total extent tree bytes: 16384
        btree space waste bytes: 123514
        file data blocks allocated: 589824
          referenced 589824
      
      Fix this issue by setting the range to full (0 to LLONG_MAX) when the
      NO_HOLES feature is not enabled. This results in extra work being done
      but it gives the guarantee we don't end up with missing holes after
      replaying the log.
      
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      867ae5eb
    • Josef Bacik's avatar
      btrfs: drop block from cache on error in relocation · d8ecdce1
      Josef Bacik authored
      commit 8e19c973 upstream.
      
      If we have an error while building the backref tree in relocation we'll
      process all the pending edges and then free the node.  However if we
      integrated some edges into the cache we'll lose our link to those edges
      by simply freeing this node, which means we'll leak memory and
      references to any roots that we've found.
      
      Instead we need to use remove_backref_node(), which walks through all of
      the edges that are still linked to this node and free's them up and
      drops any root references we may be holding.
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8ecdce1
    • Josef Bacik's avatar
      btrfs: set update the uuid generation as soon as possible · d3a7c4b8
      Josef Bacik authored
      commit 75ec1db8 upstream.
      
      In my EIO stress testing I noticed I was getting forced to rescan the
      uuid tree pretty often, which was weird.  This is because my error
      injection stuff would sometimes inject an error after log replay but
      before we loaded the UUID tree.  If log replay committed the transaction
      it wouldn't have updated the uuid tree generation, but the tree was
      valid and didn't change, so there's no reason to not update the
      generation here.
      
      Fix this by setting the BTRFS_FS_UPDATE_UUID_TREE_GEN bit immediately
      after reading all the fs roots if the uuid tree generation matches the
      fs generation.  Then any transaction commits that happen during mount
      won't screw up our uuid tree state, forcing us to do needless uuid
      rescans.
      
      Fixes: 70f80175 ("Btrfs: check UUID tree during mount if required")
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3a7c4b8
    • Filipe Manana's avatar
      Btrfs: fix crash during unmount due to race with delayed inode workers · 7ed0c4db
      Filipe Manana authored
      commit f0cc2cd7 upstream.
      
      During unmount we can have a job from the delayed inode items work queue
      still running, that can lead to at least two bad things:
      
      1) A crash, because the worker can try to create a transaction just
         after the fs roots were freed;
      
      2) A transaction leak, because the worker can create a transaction
         before the fs roots are freed and just after we committed the last
         transaction and after we stopped the transaction kthread.
      
      A stack trace example of the crash:
      
       [79011.691214] kernel BUG at lib/radix-tree.c:982!
       [79011.692056] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
       [79011.693180] CPU: 3 PID: 1394 Comm: kworker/u8:2 Tainted: G        W         5.6.0-rc2-btrfs-next-54 #2
       (...)
       [79011.696789] Workqueue: btrfs-delayed-meta btrfs_work_helper [btrfs]
       [79011.697904] RIP: 0010:radix_tree_tag_set+0xe7/0x170
       (...)
       [79011.702014] RSP: 0018:ffffb3c84a317ca0 EFLAGS: 00010293
       [79011.702949] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
       [79011.704202] RDX: ffffb3c84a317cb0 RSI: ffffb3c84a317ca8 RDI: ffff8db3931340a0
       [79011.705463] RBP: 0000000000000005 R08: 0000000000000005 R09: ffffffff974629d0
       [79011.706756] R10: ffffb3c84a317bc0 R11: 0000000000000001 R12: ffff8db393134000
       [79011.708010] R13: ffff8db3931340a0 R14: ffff8db393134068 R15: 0000000000000001
       [79011.709270] FS:  0000000000000000(0000) GS:ffff8db3b6a00000(0000) knlGS:0000000000000000
       [79011.710699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [79011.711710] CR2: 00007f22c2a0a000 CR3: 0000000232ad4005 CR4: 00000000003606e0
       [79011.712958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [79011.714205] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [79011.715448] Call Trace:
       [79011.715925]  record_root_in_trans+0x72/0xf0 [btrfs]
       [79011.716819]  btrfs_record_root_in_trans+0x4b/0x70 [btrfs]
       [79011.717925]  start_transaction+0xdd/0x5c0 [btrfs]
       [79011.718829]  btrfs_async_run_delayed_root+0x17e/0x2b0 [btrfs]
       [79011.719915]  btrfs_work_helper+0xaa/0x720 [btrfs]
       [79011.720773]  process_one_work+0x26d/0x6a0
       [79011.721497]  worker_thread+0x4f/0x3e0
       [79011.722153]  ? process_one_work+0x6a0/0x6a0
       [79011.722901]  kthread+0x103/0x140
       [79011.723481]  ? kthread_create_worker_on_cpu+0x70/0x70
       [79011.724379]  ret_from_fork+0x3a/0x50
       (...)
      
      The following diagram shows a sequence of steps that lead to the crash
      during ummount of the filesystem:
      
              CPU 1                                             CPU 2                                CPU 3
      
       btrfs_punch_hole()
         btrfs_btree_balance_dirty()
           btrfs_balance_delayed_items()
             --> sees
                 fs_info->delayed_root->items
                 with value 200, which is greater
                 than
                 BTRFS_DELAYED_BACKGROUND (128)
                 and smaller than
                 BTRFS_DELAYED_WRITEBACK (512)
             btrfs_wq_run_delayed_node()
               --> queues a job for
                   fs_info->delayed_workers to run
                   btrfs_async_run_delayed_root()
      
                                                                                                  btrfs_async_run_delayed_root()
                                                                                                    --> job queued by CPU 1
      
                                                                                                    --> starts picking and running
                                                                                                        delayed nodes from the
                                                                                                        prepare_list list
      
                                                       close_ctree()
      
                                                         btrfs_delete_unused_bgs()
      
                                                         btrfs_commit_super()
      
                                                           btrfs_join_transaction()
                                                             --> gets transaction N
      
                                                           btrfs_commit_transaction(N)
                                                             --> set transaction state
                                                              to TRANTS_STATE_COMMIT_START
      
                                                                                                   btrfs_first_prepared_delayed_node()
                                                                                                     --> picks delayed node X through
                                                                                                         the prepared_list list
      
                                                             btrfs_run_delayed_items()
      
                                                               btrfs_first_delayed_node()
                                                                 --> also picks delayed node X
                                                                     but through the node_list
                                                                     list
      
                                                               __btrfs_commit_inode_delayed_items()
                                                                  --> runs all delayed items from
                                                                      this node and drops the
                                                                      node's item count to 0
                                                                      through call to
                                                                      btrfs_release_delayed_inode()
      
                                                               --> finishes running any remaining
                                                                   delayed nodes
      
                                                             --> finishes transaction commit
      
                                                         --> stops cleaner and transaction threads
      
                                                         btrfs_free_fs_roots()
                                                           --> frees all roots and removes them
                                                               from the radix tree
                                                               fs_info->fs_roots_radix
      
                                                                                                   btrfs_join_transaction()
                                                                                                     start_transaction()
                                                                                                       btrfs_record_root_in_trans()
                                                                                                         record_root_in_trans()
                                                                                                           radix_tree_tag_set()
                                                                                                             --> crashes because
                                                                                                                 the root is not in
                                                                                                                 the radix tree
                                                                                                                 anymore
      
      If the worker is able to call btrfs_join_transaction() before the unmount
      task frees the fs roots, we end up leaking a transaction and all its
      resources, since after the call to btrfs_commit_super() and stopping the
      transaction kthread, we don't expect to have any transaction open anymore.
      
      When this situation happens the worker has a delayed node that has no
      more items to run, since the task calling btrfs_run_delayed_items(),
      which is doing a transaction commit, picks the same node and runs all
      its items first.
      
      We can not wait for the worker to complete when running delayed items
      through btrfs_run_delayed_items(), because we call that function in
      several phases of a transaction commit, and that could cause a deadlock
      because the worker calls btrfs_join_transaction() and the task doing the
      transaction commit may have already set the transaction state to
      TRANS_STATE_COMMIT_DOING.
      
      Also it's not possible to get into a situation where only some of the
      items of a delayed node are added to the fs/subvolume tree in the current
      transaction and the remaining ones in the next transaction, because when
      running the items of a delayed inode we lock its mutex, effectively
      waiting for the worker if the worker is running the items of the delayed
      node already.
      
      Since this can only cause issues when unmounting a filesystem, fix it in
      a simple way by waiting for any jobs on the delayed workers queue before
      calling btrfs_commit_supper() at close_ctree(). This works because at this
      point no one can call btrfs_btree_balance_dirty() or
      btrfs_balance_delayed_items(), and if we end up waiting for any worker to
      complete, btrfs_commit_super() will commit the transaction created by the
      worker.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ed0c4db
    • Frieder Schrempf's avatar
      mtd: spinand: Do not erase the block before writing a bad block marker · d389050b
      Frieder Schrempf authored
      commit b645ad39 upstream.
      
      Currently when marking a block, we use spinand_erase_op() to erase
      the block before writing the marker to the OOB area. Doing so without
      waiting for the operation to finish can lead to the marking failing
      silently and no bad block marker being written to the flash.
      
      In fact we don't need to do an erase at all before writing the BBM.
      The ECC is disabled for raw accesses to the OOB data and we don't
      need to work around any issues with chips reporting ECC errors as it
      is known to be the case for raw NAND.
      
      Fixes: 7529df46 ("mtd: nand: Add core infrastructure to support SPI NANDs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFrieder Schrempf <frieder.schrempf@kontron.de>
      Reviewed-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/linux-mtd/20200218100432.32433-4-frieder.schrempf@kontron.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d389050b
    • Frieder Schrempf's avatar
      mtd: spinand: Stop using spinand->oobbuf for buffering bad block markers · a8899631
      Frieder Schrempf authored
      commit 21489375 upstream.
      
      For reading and writing the bad block markers, spinand->oobbuf is
      currently used as a buffer for the marker bytes. During the
      underlying read and write operations to actually get/set the content
      of the OOB area, the content of spinand->oobbuf is reused and changed
      by accessing it through spinand->oobbuf and/or spinand->databuf.
      
      This is a flaw in the original design of the SPI NAND core and at the
      latest from 13c15e07 ("mtd: spinand: Handle the case where
      PROGRAM LOAD does not reset the cache") on, it results in not having
      the bad block marker written at all, as the spinand->oobbuf is
      cleared to 0xff after setting the marker bytes to zero.
      
      To fix it, we now just store the two bytes for the marker on the
      stack and let the read/write operations copy it from/to the page
      buffer later.
      
      Fixes: 7529df46 ("mtd: nand: Add core infrastructure to support SPI NANDs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFrieder Schrempf <frieder.schrempf@kontron.de>
      Reviewed-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/linux-mtd/20200218100432.32433-2-frieder.schrempf@kontron.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8899631
    • Yilu Lin's avatar
      CIFS: Fix bug which the return value by asynchronous read is error · 9bc02258
      Yilu Lin authored
      commit 97adda8b upstream.
      
      This patch is used to fix the bug in collect_uncached_read_data()
      that rc is automatically converted from a signed number to an
      unsigned number when the CIFS asynchronous read fails.
      It will cause ctx->rc is error.
      
      Example:
      Share a directory and create a file on the Windows OS.
      Mount the directory to the Linux OS using CIFS.
      On the CIFS client of the Linux OS, invoke the pread interface to
      deliver the read request.
      
      The size of the read length plus offset of the read request is greater
      than the maximum file size.
      
      In this case, the CIFS server on the Windows OS returns a failure
      message (for example, the return value of
      smb2.nt_status is STATUS_INVALID_PARAMETER).
      
      After receiving the response message, the CIFS client parses
      smb2.nt_status to STATUS_INVALID_PARAMETER
      and converts it to the Linux error code (rdata->result=-22).
      
      Then the CIFS client invokes the collect_uncached_read_data function to
      assign the value of rdata->result to rc, that is, rc=rdata->result=-22.
      
      The type of the ctx->total_len variable is unsigned integer,
      the type of the rc variable is integer, and the type of
      the ctx->rc variable is ssize_t.
      
      Therefore, during the ternary operation, the value of rc is
      automatically converted to an unsigned number. The final result is
      ctx->rc=4294967274. However, the expected result is ctx->rc=-22.
      Signed-off-by: default avatarYilu Lin <linyilu@huawei.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Acked-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bc02258
    • Vitaly Kuznetsov's avatar
      KVM: VMX: fix crash cleanup when KVM wasn't used · f9971a89
      Vitaly Kuznetsov authored
      commit dbef2808 upstream.
      
      If KVM wasn't used at all before we crash the cleanup procedure fails with
       BUG: unable to handle page fault for address: ffffffffffffffc8
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 23215067 P4D 23215067 PUD 23217067 PMD 0
       Oops: 0000 [#8] SMP PTI
       CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G      D           5.6.0-rc2+ #823
       RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel]
      
      The root cause is that loaded_vmcss_on_cpu list is not yet initialized,
      we initialize it in hardware_enable() but this only happens when we start
      a VM.
      
      Previously, we used to have a bitmap with enabled CPUs and that was
      preventing [masking] the issue.
      
      Initialized loaded_vmcss_on_cpu list earlier, right before we assign
      crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and
      blocked_vcpu_on_cpu_lock are moved altogether for consistency.
      
      Fixes: 31603d4f ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9971a89
    • Sean Christopherson's avatar
      KVM: x86: Gracefully handle __vmalloc() failure during VM allocation · 4538f42a
      Sean Christopherson authored
      commit d18b2f43 upstream.
      
      Check the result of __vmalloc() to avoid dereferencing a NULL pointer in
      the event that allocation failres.
      
      Fixes: d1e5b0e9 ("kvm: Make VM ioctl do valloc for some archs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4538f42a
    • Sean Christopherson's avatar
      KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support · a9f890aa
      Sean Christopherson authored
      commit 31603d4f upstream.
      
      VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
      interrupted a KVM update of the percpu in-use VMCS list.
      
      Because NMIs are not blocked by disabling IRQs, it's possible that
      crash_vmclear_local_loaded_vmcss() could be called while the percpu list
      of VMCSes is being modified, e.g. in the middle of list_add() in
      vmx_vcpu_load_vmcs().  This potential corner case was called out in the
      original commit[*], but the analysis of its impact was wrong.
      
      Skipping the VMCLEARs is wrong because it all but guarantees that a
      loaded, and therefore cached, VMCS will live across kexec and corrupt
      memory in the new kernel.  Corruption will occur because the CPU's VMCS
      cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
      memory on its eviction will overwrite random memory in the new kernel.
      The VMCS will live because the NMI shootdown also disables VMX, i.e. the
      in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
      VMCS cache on VMXOFF.
      
      Furthermore, interrupting list_add() and list_del() is safe due to
      crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
      ensures the new entry is not visible to forward iteration unless the
      entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
      pointer could be observed if the NMI shootdown interrupted list_del() or
      list_add(), but list_for_each_entry() does not consume ->prev.
      
      In addition to removing the temporary disabling of VMCLEAR, open code
      loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
      the VMCS is deleted from the list only after it's been VMCLEAR'd.
      Deleting the VMCS before VMCLEAR would allow a race where the NMI
      shootdown could arrive between list_del() and vmcs_clear() and thus
      neither flow would execute a successful VMCLEAR.  Alternatively, more
      code could be moved into loaded_vmcs_init(), but that gets rather silly
      as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
      and would need to work around the list_del().
      
      Update the smp_*() comments related to the list manipulation, and
      opportunistically reword them to improve clarity.
      
      [*] https://patchwork.kernel.org/patch/1675731/#3720461
      
      Fixes: 8f536b76 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9f890aa
    • Sean Christopherson's avatar
      KVM: x86: Allocate new rmap and large page tracking when moving memslot · 4a0efabb
      Sean Christopherson authored
      commit edd4fa37 upstream.
      
      Reallocate a rmap array and recalcuate large page compatibility when
      moving an existing memslot to correctly handle the alignment properties
      of the new memslot.  The number of rmap entries required at each level
      is dependent on the alignment of the memslot's base gfn with respect to
      that level, e.g. moving a large-page aligned memslot so that it becomes
      unaligned will increase the number of rmap entries needed at the now
      unaligned level.
      
      Not updating the rmap array is the most obvious bug, as KVM accesses
      garbage data beyond the end of the rmap.  KVM interprets the bad data as
      pointers, leading to non-canonical #GPs, unexpected #PFs, etc...
      
        general protection fault: 0000 [#1] SMP
        CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:rmap_get_first+0x37/0x50 [kvm]
        Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3
        RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246
        RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012
        RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570
        RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8
        R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000
        R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8
        FS:  00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0
        Call Trace:
         kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm]
         __kvm_set_memory_region.part.64+0x559/0x960 [kvm]
         kvm_set_memory_region+0x45/0x60 [kvm]
         kvm_vm_ioctl+0x30f/0x920 [kvm]
         do_vfs_ioctl+0xa1/0x620
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x4c/0x170
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f7ed9911f47
        Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48
        RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47
        RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004
        RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700
        R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000
        R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000
        Modules linked in: kvm_intel kvm irqbypass
        ---[ end trace 0c5f570b3358ca89 ]---
      
      The disallow_lpage tracking is more subtle.  Failure to update results
      in KVM creating large pages when it shouldn't, either due to stale data
      or again due to indexing beyond the end of the metadata arrays, which
      can lead to memory corruption and/or leaking data to guest/userspace.
      
      Note, the arrays for the old memslot are freed by the unconditional call
      to kvm_free_memslot() in __kvm_set_memory_region().
      
      Fixes: 05da4558 ("KVM: MMU: large page support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a0efabb
    • David Hildenbrand's avatar
      KVM: s390: vsie: Fix delivery of addressing exceptions · de2ac8a7
      David Hildenbrand authored
      commit 4d4cee96 upstream.
      
      Whenever we get an -EFAULT, we failed to read in guest 2 physical
      address space. Such addressing exceptions are reported via a program
      intercept to the nested hypervisor.
      
      We faked the intercept, we have to return to guest 2. Instead, right
      now we would be returning -EFAULT from the intercept handler, eventually
      crashing the VM.
      the correct thing to do is to return 1 as rc == 1 is the internal
      representation of "we have to go back into g2".
      
      Addressing exceptions can only happen if the g2->g3 page tables
      reference invalid g2 addresses (say, either a table or the final page is
      not accessible - so something that basically never happens in sane
      environments.
      
      Identified by manual code inspection.
      
      Fixes: a3508fbe ("KVM: s390: vsie: initial support for nested virtualization")
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Link: https://lore.kernel.org/r/20200403153050.20569-3-david@redhat.comReviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      [borntraeger@de.ibm.com: fix patch description]
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de2ac8a7
    • David Hildenbrand's avatar
      KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks · 50a59d2d
      David Hildenbrand authored
      commit a1d032a4 upstream.
      
      In case we have a region 1 the following calculation
      (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11)
      results in 64. As shifts beyond the size are undefined the compiler is
      free to use instructions like sllg. sllg will only use 6 bits of the
      shift value (here 64) resulting in no shift at all. That means that ALL
      addresses will be rejected.
      
      The can result in endless loops, e.g. when prefix cannot get mapped.
      
      Fixes: 4be130a0 ("s390/mm: add shadow gmap support")
      Tested-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Reported-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Link: https://lore.kernel.org/r/20200403153050.20569-2-david@redhat.comReviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      [borntraeger@de.ibm.com: fix patch description, remove WARN_ON_ONCE]
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50a59d2d
    • Sean Christopherson's avatar
      KVM: nVMX: Properly handle userspace interrupt window request · deecbb36
      Sean Christopherson authored
      commit a1c77abb upstream.
      
      Return true for vmx_interrupt_allowed() if the vCPU is in L2 and L1 has
      external interrupt exiting enabled.  IRQs are never blocked in hardware
      if the CPU is in the guest (L2 from L1's perspective) when IRQs trigger
      VM-Exit.
      
      The new check percolates up to kvm_vcpu_ready_for_interrupt_injection()
      and thus vcpu_run(), and so KVM will exit to userspace if userspace has
      requested an interrupt window (to inject an IRQ into L1).
      
      Remove the @external_intr param from vmx_check_nested_events(), which is
      actually an indicator that userspace wants an interrupt window, e.g.
      it's named @req_int_win further up the stack.  Injecting a VM-Exit into
      L1 to try and bounce out to L0 userspace is all kinds of broken and is
      no longer necessary.
      
      Remove the hack in nested_vmx_vmexit() that attempted to workaround the
      breakage in vmx_check_nested_events() by only filling interrupt info if
      there's an actual interrupt pending.  The hack actually made things
      worse because it caused KVM to _never_ fill interrupt info when the
      LAPIC resides in userspace (kvm_cpu_has_interrupt() queries
      interrupt.injected, which is always cleared by prepare_vmcs12() before
      reaching the hack in nested_vmx_vmexit()).
      
      Fixes: 6550c4df ("KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"")
      Cc: stable@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      deecbb36
    • Thomas Gleixner's avatar
      x86/entry/32: Add missing ASM_CLAC to general_protection entry · 7460d17c
      Thomas Gleixner authored
      commit 3d51507f upstream.
      
      All exception entry points must have ASM_CLAC right at the
      beginning. The general_protection entry is missing one.
      
      Fixes: e59d1b0a ("x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: default avatarAlexandre Chartre <alexandre.chartre@oracle.com>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200225220216.219537887@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7460d17c
    • Eric W. Biederman's avatar
      signal: Extend exec_id to 64bits · a2a1be2d
      Eric W. Biederman authored
      commit d1e7fd64 upstream.
      
      Replace the 32bit exec_id with a 64bit exec_id to make it impossible
      to wrap the exec_id counter.  With care an attacker can cause exec_id
      wrap and send arbitrary signals to a newly exec'd parent.  This
      bypasses the signal sending checks if the parent changes their
      credentials during exec.
      
      The severity of this problem can been seen that in my limited testing
      of a 32bit exec_id it can take as little as 19s to exec 65536 times.
      Which means that it can take as little as 14 days to wrap a 32bit
      exec_id.  Adam Zabrocki has succeeded wrapping the self_exe_id in 7
      days.  Even my slower timing is in the uptime of a typical server.
      Which means self_exec_id is simply a speed bump today, and if exec
      gets noticably faster self_exec_id won't even be a speed bump.
      
      Extending self_exec_id to 64bits introduces a problem on 32bit
      architectures where reading self_exec_id is no longer atomic and can
      take two read instructions.  Which means that is is possible to hit
      a window where the read value of exec_id does not match the written
      value.  So with very lucky timing after this change this still
      remains expoiltable.
      
      I have updated the update of exec_id on exec to use WRITE_ONCE
      and the read of exec_id in do_notify_parent to use READ_ONCE
      to make it clear that there is no locking between these two
      locations.
      
      Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
      Fixes: 2.3.23pre2
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2a1be2d
    • Remi Pommarel's avatar
      ath9k: Handle txpower changes even when TPC is disabled · 19e119d4
      Remi Pommarel authored
      commit 968ae2ca upstream.
      
      When TPC is disabled IEEE80211_CONF_CHANGE_POWER event can be handled to
      reconfigure HW's maximum txpower.
      
      This fixes 0dBm txpower setting when user attaches to an interface for
      the first time with the following scenario:
      
      ieee80211_do_open()
          ath9k_add_interface()
              ath9k_set_txpower() /* Set TX power with not yet initialized
                                     sc->hw->conf.power_level */
      
          ieee80211_hw_config() /* Iniatilize sc->hw->conf.power_level and
                                   raise IEEE80211_CONF_CHANGE_POWER */
      
          ath9k_config() /* IEEE80211_CONF_CHANGE_POWER is ignored */
      
      This issue can be reproduced with the following:
      
        $ modprobe -r ath9k
        $ modprobe ath9k
        $ wpa_supplicant -i wlan0 -c /tmp/wpa.conf &
        $ iw dev /* Here TX power is either 0 or 3 depending on RF chain */
        $ killall wpa_supplicant
        $ iw dev /* TX power goes back to calibrated value and subsequent
                    calls will be fine */
      
      Fixes: 283dd119 ("ath9k: add per-vif TX power capability")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19e119d4
    • Gustavo A. R. Silva's avatar
      MIPS: OCTEON: irq: Fix potential NULL pointer dereference · cde7e660
      Gustavo A. R. Silva authored
      commit 792a402c upstream.
      
      There is a potential NULL pointer dereference in case kzalloc()
      fails and returns NULL.
      
      Fix this by adding a NULL check on *cd*
      
      This bug was detected with the help of Coccinelle.
      
      Fixes: 64b139f9 ("MIPS: OCTEON: irq: add CIB and other fixes")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cde7e660
    • Huacai Chen's avatar
      MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3 · 67dea3c7
      Huacai Chen authored
      commit d191aaff upstream.
      
      LDDIR/LDPTE is Loongson-3's acceleration for Page Table Walking. If BD
      (Base Directory, the 4th page directory) is not enabled, then GDOffset
      is biased by BadVAddr[63:62]. So, if GDOffset (aka. BadVAddr[47:36] for
      Loongson-3) is big enough, "0b11(BadVAddr[63:62])|BadVAddr[47:36]|...."
      can far beyond pg_swapper_dir. This means the pg_swapper_dir may NOT be
      accessed by LDDIR correctly, so fix it by set PWDirExt in CP0_PWCtl.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarPei Huang <huangpei@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67dea3c7
    • Vasily Averin's avatar
      pstore: pstore_ftrace_seq_next should increase position index · 76b48e98
      Vasily Averin authored
      commit 6c871b73 upstream.
      
      In Aug 2018 NeilBrown noticed
      commit 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface")
      "Some ->next functions do not increment *pos when they return NULL...
      Note that such ->next functions are buggy and should be fixed.
      A simple demonstration is
      
       dd if=/proc/swaps bs=1000 skip=1
      
      Choose any block size larger than the size of /proc/swaps. This will
      always show the whole last line of /proc/swaps"
      
      /proc/swaps output was fixed recently, however there are lot of other
      affected files, and one of them is related to pstore subsystem.
      
      If .next function does not change position index, following .show function
      will repeat output related to current position index.
      
      There are at least 2 related problems:
      - read after lseek beyond end of file, described above by NeilBrown
        "dd if=<AFFECTED_FILE> bs=1000 skip=1" will generate whole last list
      - read after lseek on in middle of last line will output expected rest of
        last line but then repeat whole last line once again.
      
      If .show() function generates multy-line output (like
      pstore_ftrace_seq_show() does ?) following bash script cycles endlessly
      
       $ q=;while read -r r;do echo "$((++q)) $r";done < AFFECTED_FILE
      
      Unfortunately I'm not familiar enough to pstore subsystem and was unable
      to find affected pstore-related file on my test node.
      
      If .next function does not change position index, following .show function
      will repeat output related to current position index.
      
      Cc: stable@vger.kernel.org
      Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code ...")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Link: https://lore.kernel.org/r/4e49830d-4c88-0171-ee24-1ee540028dad@virtuozzo.com
      [kees: with robustness tweak from Joel Fernandes <joelaf@google.com>]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76b48e98
    • Sungbo Eo's avatar
      irqchip/versatile-fpga: Apply clear-mask earlier · 977cab66
      Sungbo Eo authored
      commit 6a214a28 upstream.
      
      Clear its own IRQs before the parent IRQ get enabled, so that the
      remaining IRQs do not accidentally interrupt the parent IRQ controller.
      
      This patch also fixes a reboot bug on OX820 SoC, where the remaining
      rps-timer IRQ raises a GIC interrupt that is left pending. After that,
      the rps-timer IRQ is cleared during driver initialization, and there's
      no IRQ left in rps-irq when local_irq_enable() is called, which evokes
      an error message "unexpected IRQ trap".
      
      Fixes: bdd272cb ("irqchip: versatile FPGA: support cascaded interrupts from DT")
      Signed-off-by: default avatarSungbo Eo <mans0n@gorani.run>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20200321133842.2408823-1-mans0n@gorani.runSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      977cab66
    • Yang Xu's avatar
      KEYS: reaching the keys quotas correctly · 14b96359
      Yang Xu authored
      commit 2e356101 upstream.
      
      Currently, when we add a new user key, the calltrace as below:
      
      add_key()
        key_create_or_update()
          key_alloc()
          __key_instantiate_and_link
            generic_key_instantiate
              key_payload_reserve
                ......
      
      Since commit a08bf91c ("KEYS: allow reaching the keys quotas exactly"),
      we can reach max bytes/keys in key_alloc, but we forget to remove this
      limit when we reserver space for payload in key_payload_reserve. So we
      can only reach max keys but not max bytes when having delta between plen
      and type->def_datalen. Remove this limit when instantiating the key, so we
      can keep consistent with key_alloc.
      
      Also, fix the similar problem in keyctl_chown_key().
      
      Fixes: 0b77f5bf ("keys: make the keyring quotas controllable through /proc/sys")
      Fixes: a08bf91c ("KEYS: allow reaching the keys quotas exactly")
      Cc: stable@vger.kernel.org # 5.0.x
      Cc: Eric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarYang Xu <xuyang2018.jy@cn.fujitsu.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14b96359
    • Vasily Averin's avatar
      tpm: tpm2_bios_measurements_next should increase position index · 64157692
      Vasily Averin authored
      commit f9bf8adb upstream.
      
      If .next function does not change position index,
      following .show function will repeat output related
      to current position index.
      
      For /sys/kernel/security/tpm0/binary_bios_measurements:
      1) read after lseek beyound end of file generates whole last line.
      2) read after lseek to middle of last line generates
      expected end of last line and unexpected whole last line once again.
      
      Cc: stable@vger.kernel.org # 4.19.x
      Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code ...")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64157692
    • Vasily Averin's avatar
      tpm: tpm1_bios_measurements_next should increase position index · 1da36bed
      Vasily Averin authored
      commit d7a47b96 upstream.
      
      If .next function does not change position index,
      following .show function will repeat output related
      to current position index.
      
      In case of /sys/kernel/security/tpm0/ascii_bios_measurements
      and binary_bios_measurements:
      1) read after lseek beyound end of file generates whole last line.
      2) read after lseek to middle of last line generates
      expected end of last line and unexpected whole last line once again.
      
      Cc: stable@vger.kernel.org # 4.19.x
      Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code ...")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1da36bed
    • Matthew Garrett's avatar
      tpm: Don't make log failures fatal · 7c775e8e
      Matthew Garrett authored
      commit 805fa88e upstream.
      
      If a TPM is in disabled state, it's reasonable for it to have an empty
      log. Bailing out of probe in this case means that the PPI interface
      isn't available, so there's no way to then enable the TPM from the OS.
      In general it seems reasonable to ignore log errors - they shouldn't
      interfere with any other TPM functionality.
      Signed-off-by: default avatarMatthew Garrett <matthewgarrett@google.com>
      Cc: stable@vger.kernel.org # 4.19.x
      Reviewed-by: default avatarJerry Snitselaar <jsnitsel@redhat.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c775e8e