1. 28 Feb, 2020 40 commits
    • Oliver Upton's avatar
      KVM: nVMX: Check IO instruction VM-exit conditions · 85dd0eb7
      Oliver Upton authored
      commit 35a57134 upstream.
      
      Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution
      controls when checking instruction interception. If the 'use IO bitmaps'
      VM-execution control is 1, check the instruction access against the IO
      bitmaps to determine if the instruction causes a VM-exit.
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85dd0eb7
    • Oliver Upton's avatar
      KVM: nVMX: Refactor IO bitmap checks into helper function · e5c0857b
      Oliver Upton authored
      commit e71237d3 upstream.
      
      Checks against the IO bitmap are useful for both instruction emulation
      and VM-exit reflection. Refactor the IO bitmap checks into a helper
      function.
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5c0857b
    • Eric Biggers's avatar
      ext4: fix race between writepages and enabling EXT4_EXTENTS_FL · 8cf20fb7
      Eric Biggers authored
      commit cb85f4d2 upstream.
      
      If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
      on it, the following warning in ext4_add_complete_io() can be hit:
      
      WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120
      
      Here's a minimal reproducer (not 100% reliable) (root isn't required):
      
              while true; do
                      sync
              done &
              while true; do
                      rm -f file
                      touch file
                      chattr -e file
                      echo X >> file
                      chattr +e file
              done
      
      The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
      (which only returns true on extent-based files) is checked once to set
      the number of reserved journal credits, and also again later to select
      the flags for ext4_map_blocks() and copy the reserved journal handle to
      ext4_io_end::handle.  But if EXT4_EXTENTS_FL is being concurrently set,
      the first check can see dioread_nolock disabled while the later one can
      see it enabled, causing the reserved handle to unexpectedly be NULL.
      
      Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
      related to doing so as well, fix this by synchronizing changing
      EXT4_EXTENTS_FL with ext4_writepages() via the existing
      s_writepages_rwsem (previously called s_journal_flag_rwsem).
      
      This was originally reported by syzbot without a reproducer at
      https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
      but now that dioread_nolock is the default I also started seeing this
      when running syzkaller locally.
      
      Link: https://lore.kernel.org/r/20200219183047.47417-3-ebiggers@kernel.org
      Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
      Fixes: 6b523df4 ("ext4: use transaction reservation for extent conversion in ext4_end_io")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cf20fb7
    • Eric Biggers's avatar
      ext4: rename s_journal_flag_rwsem to s_writepages_rwsem · 48fdbe2a
      Eric Biggers authored
      commit bbd55937 upstream.
      
      In preparation for making s_journal_flag_rwsem synchronize
      ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
      flags (rather than just JOURNAL_DATA as it does currently), rename it to
      s_writepages_rwsem.
      
      Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.orgSigned-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48fdbe2a
    • Jan Kara's avatar
      ext4: fix mount failure with quota configured as module · b7dc081c
      Jan Kara authored
      commit 9db176bc upstream.
      
      When CONFIG_QFMT_V2 is configured as a module, the test in
      ext4_feature_set_ok() fails and so mount of filesystems with quota or
      project features fails. Fix the test to use IS_ENABLED macro which
      works properly even for modules.
      
      Link: https://lore.kernel.org/r/20200221100835.9332-1-jack@suse.cz
      Fixes: d65d87a0 ("ext4: improve explanation of a mount failure caused by a misconfigured kernel")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7dc081c
    • Suraj Jitindar Singh's avatar
      ext4: fix potential race between s_flex_groups online resizing and access · 50017cec
      Suraj Jitindar Singh authored
      commit 7c990728 upstream.
      
      During an online resize an array of s_flex_groups structures gets replaced
      so it can get enlarged. If there is a concurrent access to the array and
      this memory has been reused then this can lead to an invalid memory access.
      
      The s_flex_group array has been converted into an array of pointers rather
      than an array of structures. This is to ensure that the information
      contained in the structures cannot get out of sync during a resize due to
      an accessor updating the value in the old structure after it has been
      copied but before the array pointer is updated. Since the structures them-
      selves are no longer copied but only the pointers to them this case is
      mitigated.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
      Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.eduSigned-off-by: default avatarSuraj Jitindar Singh <surajjs@amazon.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50017cec
    • Suraj Jitindar Singh's avatar
      ext4: fix potential race between s_group_info online resizing and access · 7720966a
      Suraj Jitindar Singh authored
      commit df3da4ea upstream.
      
      During an online resize an array of pointers to s_group_info gets replaced
      so it can get enlarged. If there is a concurrent access to the array in
      ext4_get_group_info() and this memory has been reused then this can lead to
      an invalid memory access.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
      Link: https://lore.kernel.org/r/20200221053458.730016-3-tytso@mit.eduSigned-off-by: default avatarSuraj Jitindar Singh <surajjs@amazon.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarBalbir Singh <sblbir@amazon.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7720966a
    • Theodore Ts'o's avatar
      ext4: fix potential race between online resizing and write operations · cc9948ab
      Theodore Ts'o authored
      commit 1d0c3924 upstream.
      
      During an online resize an array of pointers to buffer heads gets
      replaced so it can get enlarged.  If there is a racing block
      allocation or deallocation which uses the old array, and the old array
      has gotten reused this can lead to a GPF or some other random kernel
      memory getting modified.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
      Link: https://lore.kernel.org/r/20200221053458.730016-2-tytso@mit.eduReported-by: default avatarSuraj Jitindar Singh <surajjs@amazon.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc9948ab
    • Shijie Luo's avatar
      ext4: add cond_resched() to __ext4_find_entry() · 38884609
      Shijie Luo authored
      commit 9424ef56 upstream.
      
      We tested a soft lockup problem in linux 4.19 which could also
      be found in linux 5.x.
      
      When dir inode takes up a large number of blocks, and if the
      directory is growing when we are searching, it's possible the
      restart branch could be called many times, and the do while loop
      could hold cpu a long time.
      
      Here is the call trace in linux 4.19.
      
      [  473.756186] Call trace:
      [  473.756196]  dump_backtrace+0x0/0x198
      [  473.756199]  show_stack+0x24/0x30
      [  473.756205]  dump_stack+0xa4/0xcc
      [  473.756210]  watchdog_timer_fn+0x300/0x3e8
      [  473.756215]  __hrtimer_run_queues+0x114/0x358
      [  473.756217]  hrtimer_interrupt+0x104/0x2d8
      [  473.756222]  arch_timer_handler_virt+0x38/0x58
      [  473.756226]  handle_percpu_devid_irq+0x90/0x248
      [  473.756231]  generic_handle_irq+0x34/0x50
      [  473.756234]  __handle_domain_irq+0x68/0xc0
      [  473.756236]  gic_handle_irq+0x6c/0x150
      [  473.756238]  el1_irq+0xb8/0x140
      [  473.756286]  ext4_es_lookup_extent+0xdc/0x258 [ext4]
      [  473.756310]  ext4_map_blocks+0x64/0x5c0 [ext4]
      [  473.756333]  ext4_getblk+0x6c/0x1d0 [ext4]
      [  473.756356]  ext4_bread_batch+0x7c/0x1f8 [ext4]
      [  473.756379]  ext4_find_entry+0x124/0x3f8 [ext4]
      [  473.756402]  ext4_lookup+0x8c/0x258 [ext4]
      [  473.756407]  __lookup_hash+0x8c/0xe8
      [  473.756411]  filename_create+0xa0/0x170
      [  473.756413]  do_mkdirat+0x6c/0x140
      [  473.756415]  __arm64_sys_mkdirat+0x28/0x38
      [  473.756419]  el0_svc_common+0x78/0x130
      [  473.756421]  el0_svc_handler+0x38/0x78
      [  473.756423]  el0_svc+0x8/0xc
      [  485.755156] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [tmp:5149]
      
      Add cond_resched() to avoid soft lockup and to provide a better
      system responding.
      
      Link: https://lore.kernel.org/r/20200215080206.13293-1-luoshijie1@huawei.comSigned-off-by: default avatarShijie Luo <luoshijie1@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38884609
    • Qian Cai's avatar
      ext4: fix a data race in EXT4_I(inode)->i_disksize · 9b6e9091
      Qian Cai authored
      commit 35df4299 upstream.
      
      EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
      
       write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
        ext4_write_end+0x4e3/0x750 [ext4]
        ext4_update_i_disksize at fs/ext4/ext4.h:3032
        (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
        (inlined by) ext4_write_end at fs/ext4/inode.c:1287
        generic_perform_write+0x208/0x2a0
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
        ext4_writepages+0x10ac/0x1d00 [ext4]
        mpage_map_and_submit_extent at fs/ext4/inode.c:2468
        (inlined by) ext4_writepages at fs/ext4/inode.c:2772
        do_writepages+0x5e/0x130
        __writeback_single_inode+0xeb/0xb20
        writeback_sb_inodes+0x429/0x900
        __writeback_inodes_wb+0xc4/0x150
        wb_writeback+0x4bd/0x870
        wb_workfn+0x6b4/0x960
        process_one_work+0x54c/0xbe0
        worker_thread+0x80/0x650
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
       Workqueue: writeback wb_workfn (flush-7:0)
      
      Since only the read is operating as lockless (outside of the
      "i_data_sem"), load tearing could introduce a logic bug. Fix it by
      adding READ_ONCE() for the read and WRITE_ONCE() for the write.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pwSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b6e9091
    • Lyude Paul's avatar
      drm/nouveau/kms/gv100-: Re-set LUT after clearing for modesets · 0e3a6e86
      Lyude Paul authored
      [ Upstream commit f287d3d1 ]
      
      While certain modeset operations on gv100+ need us to temporarily
      disable the LUT, we make the mistake of sometimes neglecting to
      reprogram the LUT after such modesets. In particular, moving a head from
      one encoder to another seems to trigger this quite often. GV100+ is very
      picky about having a LUT in most scenarios, so this causes the display
      engine to hang with the following error code:
      
      disp: chid 1 stat 00005080 reason 5 [INVALID_STATE] mthd 0200 data
      00000001 code 0000002d)
      
      So, fix this by always re-programming the LUT if we're clearing it in a
      state where the wndw is still visible, and has a XLUT handle programmed.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Fixes: facaed62 ("drm/nouveau/kms/gv100: initial support")
      Cc: <stable@vger.kernel.org> # v4.18+
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0e3a6e86
    • Alexander Potapenko's avatar
      lib/stackdepot.c: fix global out-of-bounds in stack_slabs · da3418ad
      Alexander Potapenko authored
      [ Upstream commit 305e519c ]
      
      Walter Wu has reported a potential case in which init_stack_slab() is
      called after stack_slabs[STACK_ALLOC_MAX_SLABS - 1] has already been
      initialized.  In that case init_stack_slab() will overwrite
      stack_slabs[STACK_ALLOC_MAX_SLABS], which may result in a memory
      corruption.
      
      Link: http://lkml.kernel.org/r/20200218102950.260263-1-glider@google.com
      Fixes: cd11016e ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reported-by: default avatarWalter Wu <walter-zh.wu@mediatek.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      da3418ad
    • satya priya's avatar
      tty: serial: qcom_geni_serial: Fix RX cancel command failure · 56ad5b4b
      satya priya authored
      [ Upstream commit 679aac5e ]
      
      RX cancel command fails when BT is switched on and off multiple times.
      
      To handle this, poll for the cancel bit in SE_GENI_S_IRQ_STATUS register
      instead of SE_GENI_S_CMD_CTRL_REG.
      
      As per the HPG update, handle the RX last bit after cancel command
      and flush out the RX FIFO buffer.
      Signed-off-by: default avatarsatya priya <skakit@codeaurora.org>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/1581415982-8793-1-git-send-email-skakit@codeaurora.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      56ad5b4b
    • Ryan Case's avatar
      tty: serial: qcom_geni_serial: Remove xfer_mode variable · e6ebad85
      Ryan Case authored
      [ Upstream commit bdc05a8a ]
      
      The driver only supports FIFO mode so setting and checking this variable
      is unnecessary. If DMA support is ever added then such checks can be
      introduced.
      Signed-off-by: default avatarRyan Case <ryandcase@chromium.org>
      Reviewed-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e6ebad85
    • Ryan Case's avatar
      tty: serial: qcom_geni_serial: Remove set_rfr_wm() and related variables · 4e438733
      Ryan Case authored
      [ Upstream commit a85fb9ce ]
      
      The variables of tx_wm and rx_wm were set to the same define value in
      all cases, never updated, and the define was sometimes used
      interchangably. Remove the variables/function and use the fixed value.
      Signed-off-by: default avatarRyan Case <ryandcase@chromium.org>
      Reviewed-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4e438733
    • Ryan Case's avatar
      tty: serial: qcom_geni_serial: Remove use of *_relaxed() and mb() · 1cc88347
      Ryan Case authored
      [ Upstream commit 9e06d55f ]
      
      A frequent side comment has been to remove the use of writel_relaxed,
      readl_relaxed, and mb. This reduces driver complexity and the _relaxed
      variants were not known to provide any noticeable performance benefit.
      Signed-off-by: default avatarRyan Case <ryandcase@chromium.org>
      Reviewed-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1cc88347
    • Ryan Case's avatar
      tty: serial: qcom_geni_serial: Remove interrupt storm · 4d1a94fa
      Ryan Case authored
      [ Upstream commit 64a42807 ]
      
      Disable M_TX_FIFO_WATERMARK_EN after we've sent all data for a given
      transaction so we don't continue to receive a flurry of free space
      interrupts while waiting for the M_CMD_DONE notification. Re-enable the
      watermark when establishing the next transaction.
      
      Also clear the watermark interrupt after filling the FIFO so we do not
      receive notification again prior to actually having free space.
      Signed-off-by: default avatarRyan Case <ryandcase@chromium.org>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Tested-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4d1a94fa
    • Ryan Case's avatar
      tty: serial: qcom_geni_serial: Fix UART hang · 0a38fd93
      Ryan Case authored
      [ Upstream commit 663abb1a ]
      
      If a serial console write occured while a UART transmit command was
      waiting for a done signal then no further data would be sent until
      something new kicked the system into gear. If there is already data
      waiting in the circular buffer we must re-enable the tx watermark so we
      receive the expected interrupts.
      Signed-off-by: default avatarRyan Case <ryandcase@chromium.org>
      Reviewed-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0a38fd93
    • Miaohe Lin's avatar
      KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI · fe1cfc64
      Miaohe Lin authored
      commit 7455a832 upstream.
      
      Commit 13db7734 ("KVM: x86: don't notify userspace IOAPIC on edge
      EOI") said, edge-triggered interrupts don't set a bit in TMR, which means
      that IOAPIC isn't notified on EOI. And var level indicates level-triggered
      interrupt.
      But commit 3159d36a ("KVM: x86: use generic function for MSI parsing")
      replace var level with irq.level by mistake. Fix it by changing irq.level
      to irq.trig_mode.
      
      Cc: stable@vger.kernel.org
      Fixes: 3159d36a ("KVM: x86: use generic function for MSI parsing")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe1cfc64
    • Paolo Bonzini's avatar
      KVM: nVMX: Don't emulate instructions in guest mode · ed9e97c3
      Paolo Bonzini authored
      commit 07721fee upstream.
      
      vmx_check_intercept is not yet fully implemented. To avoid emulating
      instructions disallowed by the L1 hypervisor, refuse to emulate
      instructions by default.
      
      Cc: stable@vger.kernel.org
      [Made commit, added commit msg - Oliver]
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed9e97c3
    • Mathias Nyman's avatar
      xhci: apply XHCI_PME_STUCK_QUIRK to Intel Comet Lake platforms · 6ca274be
      Mathias Nyman authored
      commit a3ae87dc upstream.
      
      Intel Comet Lake based platform require the XHCI_PME_STUCK_QUIRK
      quirk as well. Without this xHC can not enter D3 in runtime suspend.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Link: https://lore.kernel.org/r/20200210134553.9144-5-mathias.nyman@linux.intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ca274be
    • Alex Deucher's avatar
      drm/amdgpu/soc15: fix xclk for raven · 8300ed5a
      Alex Deucher authored
      commit c657b936 upstream.
      
      It's 25 Mhz (refclk / 4).  This fixes the interpretation
      of the rlc clock counter.
      Acked-by: default avatarEvan Quan <evan.quan@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8300ed5a
    • Gavin Shan's avatar
      mm/vmscan.c: don't round up scan size for online memory cgroup · 837ba482
      Gavin Shan authored
      commit 76073c64 upstream.
      
      Commit 68600f62 ("mm: don't miss the last page because of round-off
      error") makes the scan size round up to @denominator regardless of the
      memory cgroup's state, online or offline.  This affects the overall
      reclaiming behavior: the corresponding LRU list is eligible for
      reclaiming only when its size logically right shifted by @sc->priority
      is bigger than zero in the former formula.
      
      For example, the inactive anonymous LRU list should have at least 0x4000
      pages to be eligible for reclaiming when we have 60/12 for
      swappiness/priority and without taking scan/rotation ratio into account.
      
      After the roundup is applied, the inactive anonymous LRU list becomes
      eligible for reclaiming when its size is bigger than or equal to 0x1000
      in the same condition.
      
          (0x4000 >> 12) * 60 / (60 + 140 + 1) = 1
          ((0x1000 >> 12) * 60) + 200) / (60 + 140 + 1) = 1
      
      aarch64 has 512MB huge page size when the base page size is 64KB.  The
      memory cgroup that has a huge page is always eligible for reclaiming in
      that case.
      
      The reclaiming is likely to stop after the huge page is reclaimed,
      meaing the further iteration on @sc->priority and the silbing and child
      memory cgroups will be skipped.  The overall behaviour has been changed.
      This fixes the issue by applying the roundup to offlined memory cgroups
      only, to give more preference to reclaim memory from offlined memory
      cgroup.  It sounds reasonable as those memory is unlikedly to be used by
      anyone.
      
      The issue was found by starting up 8 VMs on a Ampere Mustang machine,
      which has 8 CPUs and 16 GB memory.  Each VM is given with 2 vCPUs and
      2GB memory.  It took 264 seconds for all VMs to be completely up and
      784MB swap is consumed after that.  With this patch applied, it took 236
      seconds and 60MB swap to do same thing.  So there is 10% performance
      improvement for my case.  Note that KSM is disable while THP is enabled
      in the testing.
      
               total     used    free   shared  buff/cache   available
         Mem:  16196    10065    2049       16        4081        3749
         Swap:  8175      784    7391
               total     used    free   shared  buff/cache   available
         Mem:  16196    11324    3656       24        1215        2936
         Swap:  8175       60    8115
      
      Link: http://lkml.kernel.org/r/20200211024514.8730-1-gshan@redhat.com
      Fixes: 68600f62 ("mm: don't miss the last page because of round-off error")
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>	[4.20+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      837ba482
    • Zenghui Yu's avatar
      genirq/irqdomain: Make sure all irq domain flags are distinct · ea2a1156
      Zenghui Yu authored
      commit 2546287c upstream.
      
      This was noticed when printing debugfs for MSIs on my ARM64 server.  The
      new dstate IRQD_MSI_NOMASK_QUIRK came out surprisingly while it should only
      be the x86 stuff for the time being...
      
      The new MSI quirk flag uses the same bit as IRQ_DOMAIN_NAME_ALLOCATED which
      is oddly defined as bit 6 for no good reason.
      
      Switch it to the non used bit 1.
      
      Fixes: 6f1a4891 ("x86/apic/msi: Plug non-maskable MSI affinity race")
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200221020725.2038-1-yuzenghui@huawei.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea2a1156
    • Logan Gunthorpe's avatar
      nvme-multipath: Fix memory leak with ana_log_buf · 576c04cb
      Logan Gunthorpe authored
      commit 3b783090 upstream.
      
      kmemleak reports a memory leak with the ana_log_buf allocated by
      nvme_mpath_init():
      
      unreferenced object 0xffff888120e94000 (size 8208):
        comm "nvme", pid 6884, jiffies 4295020435 (age 78786.312s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
            01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<00000000e2360188>] kmalloc_order+0x97/0xc0
            [<0000000079b18dd4>] kmalloc_order_trace+0x24/0x100
            [<00000000f50c0406>] __kmalloc+0x24c/0x2d0
            [<00000000f31a10b9>] nvme_mpath_init+0x23c/0x2b0
            [<000000005802589e>] nvme_init_identify+0x75f/0x1600
            [<0000000058ef911b>] nvme_loop_configure_admin_queue+0x26d/0x280
            [<00000000673774b9>] nvme_loop_create_ctrl+0x2a7/0x710
            [<00000000f1c7a233>] nvmf_dev_write+0xc66/0x10b9
            [<000000004199f8d0>] __vfs_write+0x50/0xa0
            [<0000000065466fef>] vfs_write+0xf3/0x280
            [<00000000b0db9a8b>] ksys_write+0xc6/0x160
            [<0000000082156b91>] __x64_sys_write+0x43/0x50
            [<00000000c34fbb6d>] do_syscall_64+0x77/0x2f0
            [<00000000bbc574c9>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      nvme_mpath_init() is called by nvme_init_identify() which is called in
      multiple places (nvme_reset_work(), nvme_passthru_end(), etc). This
      means nvme_mpath_init() may be called multiple times before
      nvme_mpath_uninit() (which is only called on nvme_free_ctrl()).
      
      When nvme_mpath_init() is called multiple times, it overwrites the
      ana_log_buf pointer with a new allocation, thus leaking the previous
      allocation.
      
      To fix this, free ana_log_buf before allocating a new one.
      
      Fixes: 0d0b660f ("nvme: add ANA support")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      576c04cb
    • Vasily Averin's avatar
      mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps() · e75d2de9
      Vasily Averin authored
      commit 75866af6 upstream.
      
      for_each_mem_cgroup() increases css reference counter for memory cgroup
      and requires to use mem_cgroup_iter_break() if the walk is cancelled.
      
      Link: http://lkml.kernel.org/r/c98414fb-7e1f-da0f-867a-9340ec4bd30b@virtuozzo.com
      Fixes: 0a4465d3 ("mm, memcg: assign memcg-aware shrinkers bitmap to memcg")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Acked-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e75d2de9
    • Ioanna Alifieraki's avatar
      Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" · cf85f00f
      Ioanna Alifieraki authored
      commit edf28f40 upstream.
      
      This reverts commit a9795584.
      
      Commit a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage
      in exit_sem()") removes a lock that is needed.  This leads to a process
      looping infinitely in exit_sem() and can also lead to a crash.  There is
      a reproducer available in [1] and with the commit reverted the issue
      does not reproduce anymore.
      
      Using the reproducer found in [1] is fairly easy to reach a point where
      one of the child processes is looping infinitely in exit_sem between
      for(;;) and if (semid == -1) block, while it's trying to free its last
      sem_undo structure which has already been freed by freeary().
      
      Each sem_undo struct is on two lists: one per semaphore set (list_id)
      and one per process (list_proc).  The list_id list tracks undos by
      semaphore set, and the list_proc by process.
      
      Undo structures are removed either by freeary() or by exit_sem().  The
      freeary function is invoked when the user invokes a syscall to remove a
      semaphore set.  During this operation freeary() traverses the list_id
      associated with the semaphore set and removes the undo structures from
      both the list_id and list_proc lists.
      
      For this case, exit_sem() is called at process exit.  Each process
      contains a struct sem_undo_list (referred to as "ulp") which contains
      the head for the list_proc list.  When the process exits, exit_sem()
      traverses this list to remove each sem_undo struct.  As in freeary(),
      whenever a sem_undo struct is removed from list_proc, it is also removed
      from the list_id list.
      
      Removing elements from list_id is safe for both exit_sem() and freeary()
      due to sem_lock().  Removing elements from list_proc is not safe;
      freeary() locks &un->ulp->lock when it performs
      list_del_rcu(&un->list_proc) but exit_sem() does not (locking was
      removed by commit a9795584 ("ipc,sem: remove uneeded sem_undo_list
      lock usage in exit_sem()").
      
      This can result in the following situation while executing the
      reproducer [1] : Consider a child process in exit_sem() and the parent
      in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)).
      
       - The list_proc for the child contains the last two undo structs A and
         B (the rest have been removed either by exit_sem() or freeary()).
      
       - The semid for A is 1 and semid for B is 2.
      
       - exit_sem() removes A and at the same time freeary() removes B.
      
       - Since A and B have different semid sem_lock() will acquire different
         locks for each process and both can proceed.
      
      The bug is that they remove A and B from the same list_proc at the same
      time because only freeary() acquires the ulp lock. When exit_sem()
      removes A it makes ulp->list_proc.next to point at B and at the same
      time freeary() removes B setting B->semid=-1.
      
      At the next iteration of for(;;) loop exit_sem() will try to remove B.
      
      The only way to break from for(;;) is for (&un->list_proc ==
      &ulp->list_proc) to be true which is not. Then exit_sem() will check if
      B->semid=-1 which is and will continue looping in for(;;) until the
      memory for B is reallocated and the value at B->semid is changed.
      
      At that point, exit_sem() will crash attempting to unlink B from the
      lists (this can be easily triggered by running the reproducer [1] a
      second time).
      
      To prove this scenario instrumentation was added to keep information
      about each sem_undo (un) struct that is removed per process and per
      semaphore set (sma).
      
                CPU0                                CPU1
        [caller holds sem_lock(sma for A)]      ...
        freeary()                               exit_sem()
        ...                                     ...
        ...                                     sem_lock(sma for B)
        spin_lock(A->ulp->lock)                 ...
        list_del_rcu(un_A->list_proc)           list_del_rcu(un_B->list_proc)
      
      Undo structures A and B have different semid and sem_lock() operations
      proceed.  However they belong to the same list_proc list and they are
      removed at the same time.  This results into ulp->list_proc.next
      pointing to the address of B which is already removed.
      
      After reverting commit a9795584 ("ipc,sem: remove uneeded
      sem_undo_list lock usage in exit_sem()") the issue was no longer
      reproducible.
      
      [1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779
      
      Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com
      Fixes: a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()")
      Signed-off-by: default avatarIoanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: <malat@debian.org>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf85f00f
    • Jani Nikula's avatar
      MAINTAINERS: Update drm/i915 bug filing URL · af4693da
      Jani Nikula authored
      commit 96228b7d upstream.
      
      We've moved from bugzilla to gitlab.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200212160434.6437-1-jani.nikula@intel.com
      (cherry picked from commit 3a6a4f08)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af4693da
    • Johan Hovold's avatar
      serdev: ttyport: restore client ops on deregistration · c9ca2010
      Johan Hovold authored
      commit 0c5aae59 upstream.
      
      The serdev tty-port controller driver should reset the tty-port client
      operations also on deregistration to avoid a NULL-pointer dereference in
      case the port is later re-registered as a normal tty device.
      
      Note that this can only happen with tty drivers such as 8250 which have
      statically allocated port structures that can end up being reused and
      where a later registration would not register a serdev controller (e.g.
      due to registration errors or if the devicetree has been changed in
      between).
      
      Specifically, this can be an issue for any statically defined ports that
      would be registered by 8250 core when an 8250 driver is being unbound.
      
      Fixes: bed35c6d ("serdev: add a tty port controller driver")
      Cc: stable <stable@vger.kernel.org>     # 4.11
      Reported-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Link: https://lore.kernel.org/r/20200210145730.22762-1-johan@kernel.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9ca2010
    • Fugang Duan's avatar
      tty: serial: imx: setup the correct sg entry for tx dma · 463a3db8
      Fugang Duan authored
      commit f7670783 upstream.
      
      There has oops as below happen on i.MX8MP EVK platform that has
      6G bytes DDR memory.
      
      when (xmit->tail < xmit->head) && (xmit->head == 0),
      it setups one sg entry with sg->length is zero:
      	sg_set_buf(sgl + 1, xmit->buf, xmit->head);
      
      if xmit->buf is allocated from >4G address space, and SDMA only
      support <4G address space, then dma_map_sg() will call swiotlb_map()
      to do bounce buffer copying and mapping.
      
      But swiotlb_map() don't allow sg entry's length is zero, otherwise
      report BUG_ON().
      
      So the patch is to correct the tx DMA scatter list.
      
      Oops:
      [  287.675715] kernel BUG at kernel/dma/swiotlb.c:497!
      [  287.680592] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      [  287.686075] Modules linked in:
      [  287.689133] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.3-00016-g3fdc4e0-dirty #10
      [  287.696872] Hardware name: FSL i.MX8MP EVK (DT)
      [  287.701402] pstate: 80000085 (Nzcv daIf -PAN -UAO)
      [  287.706199] pc : swiotlb_tbl_map_single+0x1fc/0x310
      [  287.711076] lr : swiotlb_map+0x60/0x148
      [  287.714909] sp : ffff800010003c00
      [  287.718221] x29: ffff800010003c00 x28: 0000000000000000
      [  287.723533] x27: 0000000000000040 x26: ffff800011ae0000
      [  287.728844] x25: ffff800011ae09f8 x24: 0000000000000000
      [  287.734155] x23: 00000001b7af9000 x22: 0000000000000000
      [  287.739465] x21: ffff000176409c10 x20: 00000000001f7ffe
      [  287.744776] x19: ffff000176409c10 x18: 000000000000002e
      [  287.750087] x17: 0000000000000000 x16: 0000000000000000
      [  287.755397] x15: 0000000000000000 x14: 0000000000000000
      [  287.760707] x13: ffff00017f334000 x12: 0000000000000001
      [  287.766018] x11: 00000000001fffff x10: 0000000000000000
      [  287.771328] x9 : 0000000000000003 x8 : 0000000000000000
      [  287.776638] x7 : 0000000000000000 x6 : 0000000000000000
      [  287.781949] x5 : 0000000000200000 x4 : 0000000000000000
      [  287.787259] x3 : 0000000000000001 x2 : 00000001b7af9000
      [  287.792570] x1 : 00000000fbfff000 x0 : 0000000000000000
      [  287.797881] Call trace:
      [  287.800328]  swiotlb_tbl_map_single+0x1fc/0x310
      [  287.804859]  swiotlb_map+0x60/0x148
      [  287.808347]  dma_direct_map_page+0xf0/0x130
      [  287.812530]  dma_direct_map_sg+0x78/0xe0
      [  287.816453]  imx_uart_dma_tx+0x134/0x2f8
      [  287.820374]  imx_uart_dma_tx_callback+0xd8/0x168
      [  287.824992]  vchan_complete+0x194/0x200
      [  287.828828]  tasklet_action_common.isra.0+0x154/0x1a0
      [  287.833879]  tasklet_action+0x24/0x30
      [  287.837540]  __do_softirq+0x120/0x23c
      [  287.841202]  irq_exit+0xb8/0xd8
      [  287.844343]  __handle_domain_irq+0x64/0xb8
      [  287.848438]  gic_handle_irq+0x5c/0x148
      [  287.852185]  el1_irq+0xb8/0x180
      [  287.855327]  cpuidle_enter_state+0x84/0x360
      [  287.859508]  cpuidle_enter+0x34/0x48
      [  287.863083]  call_cpuidle+0x18/0x38
      [  287.866571]  do_idle+0x1e0/0x280
      [  287.869798]  cpu_startup_entry+0x20/0x40
      [  287.873721]  rest_init+0xd4/0xe0
      [  287.876949]  arch_call_rest_init+0xc/0x14
      [  287.880958]  start_kernel+0x420/0x44c
      [  287.884622] Code: 9124c021 9417aff8 a94363f7 17ffffd5 (d4210000)
      [  287.890718] ---[ end trace 5bc44c4ab6b009ce ]---
      [  287.895334] Kernel panic - not syncing: Fatal exception in interrupt
      [  287.901686] SMP: stopping secondary CPUs
      [  288.905607] SMP: failed to stop secondary CPUs 0-1
      [  288.910395] Kernel Offset: disabled
      [  288.913882] CPU features: 0x0002,2000200c
      [  288.917888] Memory Limit: none
      [  288.920944] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      Reported-by: default avatarEagle Zhou <eagle.zhou@nxp.com>
      Tested-by: default avatarEagle Zhou <eagle.zhou@nxp.com>
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Cc: stable <stable@vger.kernel.org>
      Fixes: 7942f857 ("serial: imx: TX DMA: clean up sg initialization")
      Reviewed-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Link: https://lore.kernel.org/r/1581401761-6378-1-git-send-email-fugang.duan@nxp.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      463a3db8
    • Nicolas Ferre's avatar
      tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode · 6807593e
      Nicolas Ferre authored
      commit 04b5bfe3 upstream.
      
      In atmel_shutdown() we call atmel_stop_rx() and atmel_stop_tx() functions.
      Prevent the rx restart that is implemented in RS485 or ISO7816 modes when
      calling atmel_stop_tx() by using the atomic information tasklet_shutdown
      that is already in place for this purpose.
      
      Fixes: 98f2082c ("tty/serial: atmel: enforce tasklet init and termination sequences")
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200210152053.8289-1-nicolas.ferre@microchip.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6807593e
    • Andy Shevchenko's avatar
      serial: 8250: Check UPF_IRQ_SHARED in advance · f4e6d51f
      Andy Shevchenko authored
      commit 7febbcbc upstream.
      
      The commit 54e53b2e
        ("tty: serial: 8250: pass IRQ shared flag to UART ports")
      nicely explained the problem:
      
      ---8<---8<---
      
      On some systems IRQ lines between multiple UARTs might be shared. If so, the
      irqflags have to be configured accordingly. The reason is: The 8250 port startup
      code performs IRQ tests *before* the IRQ handler for that particular port is
      registered. This is performed in serial8250_do_startup(). This function checks
      whether IRQF_SHARED is configured and only then disables the IRQ line while
      testing.
      
      This test is performed upon each open() of the UART device. Imagine two UARTs
      share the same IRQ line: On is already opened and the IRQ is active. When the
      second UART is opened, the IRQ line has to be disabled while performing IRQ
      tests. Otherwise an IRQ might handler might be invoked, but the IRQ itself
      cannot be handled, because the corresponding handler isn't registered,
      yet. That's because the 8250 code uses a chain-handler and invokes the
      corresponding port's IRQ handling routines himself.
      
      Unfortunately this IRQF_SHARED flag isn't configured for UARTs probed via device
      tree even if the IRQs are shared. This way, the actual and shared IRQ line isn't
      disabled while performing tests and the kernel correctly detects a spurious
      IRQ. So, adding this flag to the DT probe solves the issue.
      
      Note: The UPF_SHARE_IRQ flag is configured unconditionally. Therefore, the
      IRQF_SHARED flag can be set unconditionally as well.
      
      Example stack trace by performing `echo 1 > /dev/ttyS2` on a non-patched system:
      
      |irq 85: nobody cared (try booting with the "irqpoll" option)
      | [...]
      |handlers:
      |[<ffff0000080fc628>] irq_default_primary_handler threaded [<ffff00000855fbb8>] serial8250_interrupt
      |Disabling IRQ #85
      
      ---8<---8<---
      
      But unfortunately didn't fix the root cause. Let's try again here by moving
      IRQ flag assignment from serial_link_irq_chain() to serial8250_do_startup().
      
      This should fix the similar issue reported for 8250_pnp case.
      
      Since this change we don't need to have custom solutions in 8250_aspeed_vuart
      and 8250_of drivers, thus, drop them.
      
      Fixes: 1c2f0493 ("serial: 8250: add IRQ trigger support")
      Reported-by: default avatarLi RongQing <lirongqing@baidu.com>
      Cc: Kurt Kanzenbach <kurt@linutronix.de>
      Cc: Vikram Pandita <vikram.pandita@ti.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Link: https://lore.kernel.org/r/20200211135559.85960-1-andriy.shevchenko@linux.intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4e6d51f
    • Kim Phillips's avatar
      x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF · f28ec250
      Kim Phillips authored
      commit 21b5ee59 upstream.
      
      Commit
      
        aaf24884 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired)
      		  performance counter")
      
      added support for access to the free-running counter via 'perf -e
      msr/irperf/', but when exercised, it always returns a 0 count:
      
      BEFORE:
      
        $ perf stat -e instructions,msr/irperf/ true
      
         Performance counter stats for 'true':
      
                   624,833      instructions
                         0      msr/irperf/
      
      Simply set its enable bit - HWCR bit 30 - to make it start counting.
      
      Enablement is restricted to all machines advertising IRPERF capability,
      except those susceptible to an erratum that makes the IRPERF return
      bad values.
      
      That erratum occurs in Family 17h models 00-1fh [1], but not in F17h
      models 20h and above [2].
      
      AFTER (on a family 17h model 31h machine):
      
        $ perf stat -e instructions,msr/irperf/ true
      
         Performance counter stats for 'true':
      
                   621,690      instructions
                   622,490      msr/irperf/
      
      [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors
      [2] Revision Guide for AMD Family 17h Models 30h-3Fh Processors
      
      The revision guides are available from the bugzilla Link below.
      
       [ bp: Massage commit message. ]
      
      Fixes: aaf24884 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter")
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
      Link: http://lkml.kernel.org/r/20200214201805.13830-1-kim.phillips@amd.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f28ec250
    • Thomas Gleixner's avatar
      x86/mce/amd: Fix kobject lifetime · 5e5b443a
      Thomas Gleixner authored
      commit 51dede9c upstream.
      
      Accessing the MCA thresholding controls in sysfs concurrently with CPU
      hotplug can lead to a couple of KASAN-reported issues:
      
        BUG: KASAN: use-after-free in sysfs_file_ops+0x155/0x180
        Read of size 8 at addr ffff888367578940 by task grep/4019
      
      and
      
        BUG: KASAN: use-after-free in show_error_count+0x15c/0x180
        Read of size 2 at addr ffff888368a05514 by task grep/4454
      
      for example. Both result from the fact that the threshold block
      creation/teardown code frees the descriptor memory itself instead of
      defining proper ->release function and leaving it to the driver core to
      take care of that, after all sysfs accesses have completed.
      
      Do that and get rid of the custom freeing code, fixing the above UAFs in
      the process.
      
        [ bp: write commit message. ]
      
      Fixes: 95268664 ("[PATCH] x86_64: mce_amd support for family 0x10 processors")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200214082801.13836-1-bp@alien8.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e5b443a
    • Borislav Petkov's avatar
      x86/mce/amd: Publish the bank pointer only after setup has succeeded · 0a3aca3a
      Borislav Petkov authored
      commit 6e5cf31f upstream.
      
      threshold_create_bank() creates a bank descriptor per MCA error
      thresholding counter which can be controlled over sysfs. It publishes
      the pointer to that bank in a per-CPU variable and then goes on to
      create additional thresholding blocks if the bank has such.
      
      However, that creation of additional blocks in
      allocate_threshold_blocks() can fail, leading to a use-after-free
      through the per-CPU pointer.
      
      Therefore, publish that pointer only after all blocks have been setup
      successfully.
      
      Fixes: 019f34fc ("x86, MCE, AMD: Move shared bank to node descriptor")
      Reported-by: default avatarSaar Amar <Saar.Amar@microsoft.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200128140846.phctkvx5btiexvbx@kili.mountainSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a3aca3a
    • wangyan's avatar
      jbd2: fix ocfs2 corrupt when clearing block group bits · 4512119a
      wangyan authored
      commit 8eedabfd upstream.
      
      I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
      The running environment:
      	kernel version: 4.19
      	A cluster with two nodes, 5 luns mounted on two nodes, and do some
      	file operations like dd/fallocate/truncate/rm on every lun with storage
      	network disconnection.
      
      The fallocate operation on dm-23-45 caused an null pointer dereference.
      
      The information of NULL pointer dereference as follows:
      	[577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
      	[577992.878290] Aborting journal on device dm-23-45.
      	...
      	[577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
      	[577992.890908] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
      	[577992.890918] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
      	[577992.890922] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
      	[577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
      	[577992.890928] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
      	[577992.890933] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890939] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
      	[577992.890950] Mem abort info:
      	[577992.890951]   ESR = 0x96000004
      	[577992.890952]   Exception class = DABT (current EL), IL = 32 bits
      	[577992.890952]   SET = 0, FnV = 0
      	[577992.890953]   EA = 0, S1PTW = 0
      	[577992.890954] Data abort info:
      	[577992.890955]   ISV = 0, ISS = 0x00000004
      	[577992.890956]   CM = 0, WnR = 0
      	[577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
      	[577992.890960] [0000000000000020] pgd=0000000000000000
      	[577992.890964] Internal error: Oops: 96000004 [#1] SMP
      	[577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
      	[577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G        W  OE     4.19.36 #1
      	[577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
      	[577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
      	[577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
      	[577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
      	[577992.891084] sp : ffff0000c8e2b810
      	[577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
      	[577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
      	[577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
      	[577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
      	[577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
      	[577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
      	[577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
      	[577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
      	[577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
      	[577992.891103] x11: 0000000000000038 x10: 0101010101010101
      	[577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
      	[577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
      	[577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
      	[577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
      	[577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
      	[577992.891116] Call trace:
      	[577992.891139]  _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
      	[577992.891162]  _ocfs2_free_clusters+0x100/0x290 [ocfs2]
      	[577992.891185]  ocfs2_free_clusters+0x50/0x68 [ocfs2]
      	[577992.891206]  ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
      	[577992.891227]  ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
      	[577992.891248]  ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
      	[577992.891269]  ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
      	[577992.891290]  __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
      	[577992.891309]  ocfs2_fallocate+0xe4/0x128 [ocfs2]
      	[577992.891316]  vfs_fallocate+0x11c/0x250
      	[577992.891317]  ksys_fallocate+0x54/0x88
      	[577992.891319]  __arm64_sys_fallocate+0x28/0x38
      	[577992.891323]  el0_svc_common+0x78/0x130
      	[577992.891325]  el0_svc_handler+0x38/0x78
      	[577992.891327]  el0_svc+0x8/0xc
      
      My analysis process as follows:
      ocfs2_fallocate
        __ocfs2_change_file_space
          ocfs2_allocate_extents
            ocfs2_extend_allocation
              ocfs2_add_inode_data
                ocfs2_add_clusters_in_btree
                  ocfs2_insert_extent
                    ocfs2_do_insert_extent
                      ocfs2_rotate_tree_right
                        ocfs2_extend_rotate_transaction
                          ocfs2_extend_trans
                            jbd2_journal_restart
                              jbd2__journal_restart
                                /* handle->h_transaction is NULL,
                                 * is_handle_aborted(handle) is true
                                 */
                                handle->h_transaction = NULL;
                                start_this_handle
                                  return -EROFS;
                  ocfs2_free_clusters
                    _ocfs2_free_clusters
                      _ocfs2_free_suballoc_bits
                        ocfs2_block_group_clear_bits
                          ocfs2_journal_access_gd
                            __ocfs2_journal_access
                              jbd2_journal_get_undo_access
                                /* I think jbd2_write_access_granted() will
                                 * return true, because do_get_write_access()
                                 * will return -EROFS.
                                 */
                                if (jbd2_write_access_granted(...)) return 0;
                                do_get_write_access
                                  /* handle->h_transaction is NULL, it will
                                   * return -EROFS here, so do_get_write_access()
                                   * was not called.
                                   */
                                  if (is_handle_aborted(handle)) return -EROFS;
                          /* bh2jh(group_bh) is NULL, caused NULL
                             pointer dereference */
                          undo_bg = (struct ocfs2_group_desc *)
                                      bh2jh(group_bh)->b_committed_data;
      
      If handle->h_transaction == NULL, then jbd2_write_access_granted()
      does not really guarantee that journal_head will stay around,
      not even speaking of its b_committed_data. The bh2jh(group_bh)
      can be removed after ocfs2_journal_access_gd() and before call
      "bh2jh(group_bh)->b_committed_data". So, we should move
      is_handle_aborted() check from do_get_write_access() into
      jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
      before the call to jbd2_write_access_granted().
      
      Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.comSigned-off-by: default avatarYan Wang <wangyan122@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4512119a
    • Gustavo Luiz Duarte's avatar
      powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery · 72e2df70
      Gustavo Luiz Duarte authored
      commit 2464cc4c upstream.
      
      After a treclaim, we expect to be in non-transactional state. If we
      don't clear the current thread's MSR[TS] before we get preempted, then
      tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
      suspended transaction state.
      
      When handling a signal caught in transactional state,
      handle_rt_signal64() calls get_tm_stackpointer() that treclaims the
      transaction using tm_reclaim_current() but without clearing the
      thread's MSR[TS]. This can cause the TM Bad Thing exception below if
      later we pagefault and get preempted trying to access the user's
      sigframe, using __put_user(). Afterwards, when we are rescheduled back
      into do_page_fault() (but now in suspended state since the thread's
      MSR[TS] was not cleared), upon executing 'rfid' after completion of
      the page fault handling, the exception is raised because a transition
      from suspended to non-transactional state is invalid.
      
        Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033
        Oops: Unrecoverable exception, sig: 6 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
        NIP:  c00000000000de44 LR: c000000000034728 CTR: 0000000000000000
        REGS: c00000003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
        MSR:  8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]>  CR: 44000884  XER: 00000000
        CFAR: c00000000000dda4 IRQMASK: 0
        PACATMSCRATCH: 800000010280b033
        GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78
        GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0
        GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000
        GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140
        GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800
        GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000
        NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0
        LR [c000000000034728] handle_rt_signal64+0x78/0xc50
        Call Trace:
        [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable)
        [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460
        [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
        Instruction dump:
        7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088
        60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989
        ---[ end trace 93094aa44b442f87 ]---
      
      The simplified sequence of events that triggers the above exception is:
      
        ...				# userspace in NON-TRANSACTIONAL state
        tbegin			# userspace in TRANSACTIONAL state
        signal delivery		# kernelspace in SUSPENDED state
        handle_rt_signal64()
          get_tm_stackpointer()
            treclaim			# kernelspace in NON-TRANSACTIONAL state
          __put_user()
            page fault happens. We will never get back here because of the TM Bad Thing exception.
      
        page fault handling kicks in and we voluntarily preempt ourselves
        do_page_fault()
          __schedule()
            __switch_to(other_task)
      
        our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared
        __switch_to(our_task)
          switch_to_tm()
            tm_recheckpoint_new_task()
              trechkpt			# kernelspace in SUSPENDED state
      
        The page fault handling resumes, but now we are in suspended transaction state
        do_page_fault()    completes
        rfid     <----- trying to get back where the page fault happened (we were non-transactional back then)
        TM Bad Thing			# illegal transition from suspended to non-transactional
      
      This patch fixes that issue by clearing the current thread's MSR[TS]
      just after treclaim in get_tm_stackpointer() so that we stay in
      non-transactional state in case we are preempted. In order to make
      treclaim and clearing the thread's MSR[TS] atomic from a preemption
      perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is
      used. It's also necessary to save the previous value of the thread's
      MSR before get_tm_stackpointer() is called so that it can be exposed
      to the signal handler later in setup_tm_sigcontexts() to inform the
      userspace MSR at the moment of the signal delivery.
      
      Found with tm-signal-context-force-tm kernel selftest.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Cc: stable@vger.kernel.org # v3.9
      Signed-off-by: default avatarGustavo Luiz Duarte <gustavold@linux.ibm.com>
      Acked-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200211033831.11165-1-gustavold@linux.ibm.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72e2df70
    • Larry Finger's avatar
      staging: rtl8723bs: Fix potential overuse of kernel memory · e34182fb
      Larry Finger authored
      commit 23954cb0 upstream.
      
      In routine wpa_supplicant_ioctl(), the user-controlled p->length is
      checked to be at least the size of struct ieee_param size, but the code
      does not detect the case where p->length is greater than the size
      of the struct, thus a malicious user could be wasting kernel memory.
      Fixes commit 554c0a3a ("staging: Add rtl8723bs sdio wifi driver").
      
      Reported by: Pietro Oliva <pietroliva@gmail.com>
      Cc: Pietro Oliva <pietroliva@gmail.com>
      Cc: Stable <stable@vger.kernel.org>
      Fixes: 554c0a3a ("staging: Add rtl8723bs sdio wifi driver").
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Link: https://lore.kernel.org/r/20200210180235.21691-5-Larry.Finger@lwfinger.netSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e34182fb
    • Larry Finger's avatar
      staging: rtl8723bs: Fix potential security hole · e4770de3
      Larry Finger authored
      commit ac33597c upstream.
      
      In routine rtw_hostapd_ioctl(), the user-controlled p->length is assumed
      to be at least the size of struct ieee_param size, but this assumption is
      never checked. This could result in out-of-bounds read/write on kernel
      heap in case a p->length less than the size of struct ieee_param is
      specified by the user. If p->length is allowed to be greater than the size
      of the struct, then a malicious user could be wasting kernel memory.
      Fixes commit 554c0a3a ("0taging: Add rtl8723bs sdio wifi driver").
      
      Reported by: Pietro Oliva <pietroliva@gmail.com>
      Cc: Pietro Oliva <pietroliva@gmail.com>
      Cc: Stable <stable@vger.kernel.org>
      Fixes 554c0a3a ("0taging: Add rtl8723bs sdio wifi driver").
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Link: https://lore.kernel.org/r/20200210180235.21691-3-Larry.Finger@lwfinger.netSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4770de3
    • Larry Finger's avatar
      staging: rtl8188eu: Fix potential overuse of kernel memory · b4eab56d
      Larry Finger authored
      commit 4ddf8ab8 upstream.
      
      In routine wpa_supplicant_ioctl(), the user-controlled p->length is
      checked to be at least the size of struct ieee_param size, but the code
      does not detect the case where p->length is greater than the size
      of the struct, thus a malicious user could be wasting kernel memory.
      Fixes commit a2c60d42 ("Add files for new driver - part 16").
      
      Reported by: Pietro Oliva <pietroliva@gmail.com>
      Cc: Pietro Oliva <pietroliva@gmail.com>
      Cc: Stable <stable@vger.kernel.org>
      Fixes commit a2c60d42 ("Add files for new driver - part 16").
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Link: https://lore.kernel.org/r/20200210180235.21691-4-Larry.Finger@lwfinger.netSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4eab56d