1. 23 Jun, 2023 37 commits
  2. 22 Jun, 2023 3 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.4-4' of... · 2623b3dc
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.4, take #4
      
      - Correctly save/restore PMUSERNR_EL0 when host userspace is using
        PMU counters directly
      
      - Fix GICv2 emulation on GICv3 after the locking rework
      
      - Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
        document why...
      2623b3dc
    • Gavin Shan's avatar
      KVM: Avoid illegal stage2 mapping on invalid memory slot · 2230f9e1
      Gavin Shan authored
      We run into guest hang in edk2 firmware when KSM is kept as running on
      the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
      device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
      buffered write. The status is returned by reading the memory region of
      the pflash device and the read request should have been forwarded to QEMU
      and emulated by it. Unfortunately, the read request is covered by an
      illegal stage2 mapping when the guest hang issue occurs. The read request
      is completed with QEMU bypassed and wrong status is fetched. The edk2
      firmware runs into an infinite loop with the wrong status.
      
      The illegal stage2 mapping is populated due to same page sharing by KSM
      at (C) even the associated memory slot has been marked as invalid at (B)
      when the memory slot is requested to be deleted. It's notable that the
      active and inactive memory slots can't be swapped when we're in the middle
      of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
      is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
      to zero again. Besides, the swapping from the active to the inactive memory
      slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
      corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
      
        CPU-A                    CPU-B
        -----                    -----
                                 ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
                                 kvm_vm_ioctl_set_memory_region
                                 kvm_set_memory_region
                                 __kvm_set_memory_region
                                 kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
                                   kvm_invalidate_memslot
                                     kvm_copy_memslot
                                     kvm_replace_memslot
                                     kvm_swap_active_memslots        (A)
                                     kvm_arch_flush_shadow_memslot   (B)
        same page sharing by KSM
        kvm_mmu_notifier_invalidate_range_start
              :
        kvm_mmu_notifier_change_pte
          kvm_handle_hva_range
          __kvm_handle_hva_range
          kvm_set_spte_gfn            (C)
              :
        kvm_mmu_notifier_invalidate_range_end
      
      Fix the issue by skipping the invalid memory slot at (C) to avoid the
      illegal stage2 mapping so that the read request for the pflash's status
      is forwarded to QEMU and emulated by it. In this way, the correct pflash's
      status can be returned from QEMU to break the infinite loop in the edk2
      firmware.
      
      We tried a git-bisect and the first problematic commit is cd4c7183 ("
      KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
      clean_dcache_guest_page() is called after the memory slots are iterated
      in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
      before the iteration on the memory slots before this commit. This change
      literally enlarges the racy window between kvm_mmu_notifier_change_pte()
      and memory slot removal so that we're able to reproduce the issue in a
      practical test case. However, the issue exists since commit d5d8184d
      ("KVM: ARM: Memory virtualization setup").
      
      Cc: stable@vger.kernel.org # v3.9+
      Fixes: d5d8184d ("KVM: ARM: Memory virtualization setup")
      Reported-by: default avatarShuai Hu <hshuai@redhat.com>
      Reported-by: default avatarZhenyu Zhang <zhenyzha@redhat.com>
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarShaoqin Huang <shahuang@redhat.com>
      Message-Id: <20230615054259.14911-1-gshan@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2230f9e1
    • Paolo Abeni's avatar
      Merge tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 2ba7e7eb
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      This is v3, including a crash fix for patch 01/14.
      
      The following patchset contains Netfilter/IPVS fixes for net:
      
      1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.
      
      2) Fix chain binding transaction logic, add a bound flag to rule
         transactions. Remove incorrect logic in nft_data_hold() and
         nft_data_release().
      
      3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
         the set/chain as a follow up to 1240eb93 ("netfilter: nf_tables:
         incorrect error path handling with NFT_MSG_NEWRULE")
      
      4) Drop map element references from preparation phase instead of
         set destroy path, otherwise bogus EBUSY with transactions such as:
      
              flush chain ip x y
              delete chain ip x w
      
         where chain ip x y contains jump/goto from set elements.
      
      5) Pipapo set type does not regard generation mask from the walk
         iteration.
      
      6) Fix reference count underflow in set element reference to
         stateful object.
      
      7) Several patches to tighten the nf_tables API:
         - disallow set element updates of bound anonymous set
         - disallow unbound anonymous set/chain at the end of transaction.
         - disallow updates of anonymous set.
         - disallow timeout configuration for anonymous sets.
      
      8) Fix module reference leak in chain updates.
      
      9) Fix nfnetlink_osf module autoload.
      
      10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
          in iptables-nft.
      
      This Netfilter batch is larger than usual at this stage, I am aware we
      are fairly late in the -rc cycle, if you prefer to route them through
      net-next, please let me know.
      
      netfilter pull request 23-06-21
      
      * tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: Fix for deleting base chains with payload
        netfilter: nfnetlink_osf: fix module autoload
        netfilter: nf_tables: drop module reference after updating chain
        netfilter: nf_tables: disallow timeout for anonymous sets
        netfilter: nf_tables: disallow updates of anonymous sets
        netfilter: nf_tables: reject unbound chain set before commit phase
        netfilter: nf_tables: reject unbound anonymous set before commit phase
        netfilter: nf_tables: disallow element updates of bound anonymous sets
        netfilter: nf_tables: fix underflow in object reference counter
        netfilter: nft_set_pipapo: .walk does not deal with generations
        netfilter: nf_tables: drop map element references from preparation phase
        netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
        netfilter: nf_tables: fix chain binding transaction logic
        ipvs: align inner_mac_header for encapsulation
      ====================
      
      Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2ba7e7eb