1. 03 Sep, 2020 27 commits
  2. 26 Aug, 2020 13 commits
    • Greg Kroah-Hartman's avatar
      f6d5cb9e
    • Will Deacon's avatar
      KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set · 0f090712
      Will Deacon authored
      commit b5331379 upstream.
      
      When an MMU notifier call results in unmapping a range that spans multiple
      PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
      since this avoids running into RCU stalls during VM teardown. Unfortunately,
      if the VM is destroyed as a result of OOM, then blocking is not permitted
      and the call to the scheduler triggers the following BUG():
      
       | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
       | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
       | INFO: lockdep is turned off.
       | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
       | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
       | Call trace:
       |  dump_backtrace+0x0/0x284
       |  show_stack+0x1c/0x28
       |  dump_stack+0xf0/0x1a4
       |  ___might_sleep+0x2bc/0x2cc
       |  unmap_stage2_range+0x160/0x1ac
       |  kvm_unmap_hva_range+0x1a0/0x1c8
       |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
       |  __mmu_notifier_invalidate_range_start+0x218/0x31c
       |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
       |  __oom_reap_task_mm+0x128/0x268
       |  oom_reap_task+0xac/0x298
       |  oom_reaper+0x178/0x17c
       |  kthread+0x1e4/0x1fc
       |  ret_from_fork+0x10/0x30
      
      Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
      only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
      flags.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 8b3405e3 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-3-will@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [will: Backport to 4.19; use 'blockable' instead of non-existent MMU_NOTIFIER_RANGE_BLOCKABLE flag]
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f090712
    • Will Deacon's avatar
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · a53dc164
      Will Deacon authored
      commit fdfe7cbd upstream.
      
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [will: Backport to 4.19; use 'blockable' instead of non-existent range flags]
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a53dc164
    • Stephen Boyd's avatar
      clk: Evict unregistered clks from parent caches · 903c6bd9
      Stephen Boyd authored
      commit bdcf1dc2 upstream.
      
      We leave a dangling pointer in each clk_core::parents array that has an
      unregistered clk as a potential parent when that clk_core pointer is
      freed by clk{_hw}_unregister(). It is impossible for the true parent of
      a clk to be set with clk_set_parent() once the dangling pointer is left
      in the cache because we compare parent pointers in
      clk_fetch_parent_index() instead of checking for a matching clk name or
      clk_hw pointer.
      
      Before commit ede77858 ("clk: Remove global clk traversal on fetch
      parent index"), we would check clk_hw pointers, which has a higher
      chance of being the same between registration and unregistration, but it
      can still be allocated and freed by the clk provider. In fact, this has
      been a long standing problem since commit da0f0b2c ("clk: Correct
      lookup logic in clk_fetch_parent_index()") where we stopped trying to
      compare clk names and skipped over entries in the cache that weren't
      NULL.
      
      There are good (performance) reasons to not do the global tree lookup in
      cases where the cache holds dangling pointers to parents that have been
      unregistered. Let's take the performance hit on the uncommon
      registration path instead. Loop through all the clk_core::parents arrays
      when a clk is unregistered and set the entry to NULL when the parent
      cache entry and clk being unregistered are the same pointer. This will
      fix this problem and avoid the overhead for the "normal" case.
      
      Based on a patch by Bjorn Andersson.
      
      Fixes: da0f0b2c ("clk: Correct lookup logic in clk_fetch_parent_index()")
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Tested-by: default avatarSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Link: https://lkml.kernel.org/r/20190828181959.204401-1-sboyd@kernel.orgTested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      903c6bd9
    • Juergen Gross's avatar
      xen: don't reschedule in preemption off sections · da1754a2
      Juergen Gross authored
      For support of long running hypercalls xen_maybe_preempt_hcall() is
      calling cond_resched() in case a hypercall marked as preemptible has
      been interrupted.
      
      Normally this is no problem, as only hypercalls done via some ioctl()s
      are marked to be preemptible. In rare cases when during such a
      preemptible hypercall an interrupt occurs and any softirq action is
      started from irq_exit(), a further hypercall issued by the softirq
      handler will be regarded to be preemptible, too. This might lead to
      rescheduling in spite of the softirq handler potentially having set
      preempt_disable(), leading to splats like:
      
      BUG: sleeping function called from invalid context at drivers/xen/preempt.c:37
      in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20775, name: xl
      INFO: lockdep is turned off.
      CPU: 1 PID: 20775 Comm: xl Tainted: G D W 5.4.46-1_prgmr_debug.el7.x86_64 #1
      Call Trace:
      <IRQ>
      dump_stack+0x8f/0xd0
      ___might_sleep.cold.76+0xb2/0x103
      xen_maybe_preempt_hcall+0x48/0x70
      xen_do_hypervisor_callback+0x37/0x40
      RIP: e030:xen_hypercall_xen_version+0xa/0x20
      Code: ...
      RSP: e02b:ffffc900400dcc30 EFLAGS: 00000246
      RAX: 000000000004000d RBX: 0000000000000200 RCX: ffffffff8100122a
      RDX: ffff88812e788000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffffffff83ee3ad0 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000246 R12: ffff8881824aa0b0
      R13: 0000000865496000 R14: 0000000865496000 R15: ffff88815d040000
      ? xen_hypercall_xen_version+0xa/0x20
      ? xen_force_evtchn_callback+0x9/0x10
      ? check_events+0x12/0x20
      ? xen_restore_fl_direct+0x1f/0x20
      ? _raw_spin_unlock_irqrestore+0x53/0x60
      ? debug_dma_sync_single_for_cpu+0x91/0xc0
      ? _raw_spin_unlock_irqrestore+0x53/0x60
      ? xen_swiotlb_sync_single_for_cpu+0x3d/0x140
      ? mlx4_en_process_rx_cq+0x6b6/0x1110 [mlx4_en]
      ? mlx4_en_poll_rx_cq+0x64/0x100 [mlx4_en]
      ? net_rx_action+0x151/0x4a0
      ? __do_softirq+0xed/0x55b
      ? irq_exit+0xea/0x100
      ? xen_evtchn_do_upcall+0x2c/0x40
      ? xen_do_hypervisor_callback+0x29/0x40
      </IRQ>
      ? xen_hypercall_domctl+0xa/0x20
      ? xen_hypercall_domctl+0x8/0x20
      ? privcmd_ioctl+0x221/0x990 [xen_privcmd]
      ? do_vfs_ioctl+0xa5/0x6f0
      ? ksys_ioctl+0x60/0x90
      ? trace_hardirqs_off_thunk+0x1a/0x20
      ? __x64_sys_ioctl+0x16/0x20
      ? do_syscall_64+0x62/0x250
      ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix that by testing preempt_count() before calling cond_resched().
      
      In kernel 5.8 this can't happen any more due to the entry code rework
      (more than 100 patches, so not a candidate for backporting).
      
      The issue was introduced in kernel 4.3, so this patch should go into
      all stable kernels in [4.3 ... 5.7].
      Reported-by: default avatarSarah Newman <srn@prgmr.com>
      Fixes: 0fa2f5cb ("sched/preempt, xen: Use need_resched() instead of should_resched()")
      Cc: Sarah Newman <srn@prgmr.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarChris Brannon <cmb@prgmr.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da1754a2
    • Peter Xu's avatar
      mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible · 734654ae
      Peter Xu authored
      commit 75802ca6 upstream.
      
      This is found by code observation only.
      
      Firstly, the worst case scenario should assume the whole range was covered
      by pmd sharing.  The old algorithm might not work as expected for ranges
      like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
      expected range should be (0, 2g).
      
      Since at it, remove the loop since it should not be required.  With that,
      the new code should be faster too when the invalidating range is huge.
      
      Mike said:
      
      : With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only
      : adjust to (0, 1g+2m) which is incorrect.
      :
      : We should cc stable.  The original reason for adjusting the range was to
      : prevent data corruption (getting wrong page).  Since the range is not
      : always adjusted correctly, the potential for corruption still exists.
      :
      : However, I am fairly confident that adjust_range_if_pmd_sharing_possible
      : is only gong to be called in two cases:
      :
      : 1) for a single page
      : 2) for range == entire vma
      :
      : In those cases, the current code should produce the correct results.
      :
      : To be safe, let's just cc stable.
      
      Fixes: 017b1660 ("mm: migration: fix migration of huge PMD shared pages")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      734654ae
    • Al Viro's avatar
      do_epoll_ctl(): clean the failure exits up a bit · dcb6e6ef
      Al Viro authored
      commit 52c47969 upstream.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcb6e6ef
    • Marc Zyngier's avatar
      epoll: Keep a reference on files added to the check list · 4957d564
      Marc Zyngier authored
      commit a9ed4a65 upstream.
      
      When adding a new fd to an epoll, and that this new fd is an
      epoll fd itself, we recursively scan the fds attached to it
      to detect cycles, and add non-epool files to a "check list"
      that gets subsequently parsed.
      
      However, this check list isn't completely safe when deletions
      can happen concurrently. To sidestep the issue, make sure that
      a struct file placed on the check list sees its f_count increased,
      ensuring that a concurrent deletion won't result in the file
      disapearing from under our feet.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4957d564
    • Li Heng's avatar
      efi: add missed destroy_workqueue when efisubsys_init fails · 2ff3c97b
      Li Heng authored
      commit 98086df8 upstream.
      
      destroy_workqueue() should be called to destroy efi_rts_wq
      when efisubsys_init() init resources fails.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarLi Heng <liheng40@huawei.com>
      Link: https://lore.kernel.org/r/1595229738-10087-1-git-send-email-liheng40@huawei.comSigned-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ff3c97b
    • Vasant Hegde's avatar
      powerpc/pseries: Do not initiate shutdown when system is running on UPS · fa80b284
      Vasant Hegde authored
      commit 90a9b102 upstream.
      
      As per PAPR we have to look for both EPOW sensor value and event
      modifier to identify the type of event and take appropriate action.
      
      In LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes":
      
        SYSTEM_SHUTDOWN 3
      
        The system must be shut down. An EPOW-aware OS logs the EPOW error
        log information, then schedules the system to be shut down to begin
        after an OS defined delay internal (default is 10 minutes.)
      
      Then in section 10.3.2.2.8 there is table 146 "Platform Event Log
      Format, Version 6, EPOW Section", which includes the "EPOW Event
      Modifier":
      
        For EPOW sensor value = 3
        0x01 = Normal system shutdown with no additional delay
        0x02 = Loss of utility power, system is running on UPS/Battery
        0x03 = Loss of system critical functions, system should be shutdown
        0x04 = Ambient temperature too high
        All other values = reserved
      
      We have a user space tool (rtas_errd) on LPAR to monitor for
      EPOW_SHUTDOWN_ON_UPS. Once it gets an event it initiates shutdown
      after predefined time. It also starts monitoring for any new EPOW
      events. If it receives "Power restored" event before predefined time
      it will cancel the shutdown. Otherwise after predefined time it will
      shutdown the system.
      
      Commit 79872e35 ("powerpc/pseries: All events of
      EPOW_SYSTEM_SHUTDOWN must initiate shutdown") changed our handling of
      the "on UPS/Battery" case, to immediately shutdown the system. This
      breaks existing setups that rely on the userspace tool to delay
      shutdown and let the system run on the UPS.
      
      Fixes: 79872e35 ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown")
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      [mpe: Massage change log and add PAPR references]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200820061844.306460-1-hegdevasant@linux.vnet.ibm.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa80b284
    • Tom Rix's avatar
      net: dsa: b53: check for timeout · 405ef1b4
      Tom Rix authored
      [ Upstream commit 774d977a ]
      
      clang static analysis reports this problem
      
      b53_common.c:1583:13: warning: The left expression of the compound
        assignment is an uninitialized value. The computed value will
        also be garbage
              ent.port &= ~BIT(port);
              ~~~~~~~~ ^
      
      ent is set by a successful call to b53_arl_read().  Unsuccessful
      calls are caught by an switch statement handling specific returns.
      b32_arl_read() calls b53_arl_op_wait() which fails with the
      unhandled -ETIMEDOUT.
      
      So add -ETIMEDOUT to the switch statement.  Because
      b53_arl_op_wait() already prints out a message, do not add another
      one.
      
      Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      405ef1b4
    • Haiyang Zhang's avatar
      hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit() · 1bf1ca93
      Haiyang Zhang authored
      [ Upstream commit c3d897e0 ]
      
      netvsc_vf_xmit() / dev_queue_xmit() will call VF NIC’s ndo_select_queue
      or netdev_pick_tx() again. They will use skb_get_rx_queue() to get the
      queue number, so the “skb->queue_mapping - 1” will be used. This may
      cause the last queue of VF not been used.
      
      Use skb_record_rx_queue() here, so that the skb_get_rx_queue() called
      later will get the correct queue number, and VF will be able to use
      all queues.
      
      Fixes: b3bf5666 ("hv_netvsc: defer queue selection to VF")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1bf1ca93
    • Wang Hai's avatar
      net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe() · 391d0ad1
      Wang Hai authored
      [ Upstream commit cf96d977 ]
      
      Replace alloc_etherdev_mq with devm_alloc_etherdev_mqs. In this way,
      when probe fails, netdev can be freed automatically.
      
      Fixes: 4d5ae32f ("net: ethernet: Add a driver for Gemini gigabit ethernet")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      391d0ad1