1. 08 Jul, 2023 2 commits
    • Yuan Can's avatar
      NTB: amd: Fix error handling in amd_ntb_pci_driver_init() · 98af0a33
      Yuan Can authored
      A problem about ntb_hw_amd create debugfs failed is triggered with the
      following log given:
      
       [  618.431232] AMD(R) PCI-E Non-Transparent Bridge Driver 1.0
       [  618.433284] debugfs: Directory 'ntb_hw_amd' with parent '/' already present!
      
      The reason is that amd_ntb_pci_driver_init() returns pci_register_driver()
      directly without checking its return value, if pci_register_driver()
      failed, it returns without destroy the newly created debugfs, resulting
      the debugfs of ntb_hw_amd can never be created later.
      
       amd_ntb_pci_driver_init()
         debugfs_create_dir() # create debugfs directory
         pci_register_driver()
           driver_register()
             bus_add_driver()
               priv = kzalloc(...) # OOM happened
         # return without destroy debugfs directory
      
      Fix by removing debugfs when pci_register_driver() returns error.
      
      Fixes: a1b36958 ("NTB: Add support for AMD PCI-Express Non-Transparent Bridge")
      Signed-off-by: default avatarYuan Can <yuancan@huawei.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      98af0a33
    • Yuan Can's avatar
      ntb: idt: Fix error handling in idt_pci_driver_init() · c0129682
      Yuan Can authored
      A problem about ntb_hw_idt create debugfs failed is triggered with the
      following log given:
      
       [ 1236.637636] IDT PCI-E Non-Transparent Bridge Driver 2.0
       [ 1236.639292] debugfs: Directory 'ntb_hw_idt' with parent '/' already present!
      
      The reason is that idt_pci_driver_init() returns pci_register_driver()
      directly without checking its return value, if pci_register_driver()
      failed, it returns without destroy the newly created debugfs, resulting
      the debugfs of ntb_hw_idt can never be created later.
      
       idt_pci_driver_init()
         debugfs_create_dir() # create debugfs directory
         pci_register_driver()
           driver_register()
             bus_add_driver()
               priv = kzalloc(...) # OOM happened
         # return without destroy debugfs directory
      
      Fix by removing debugfs when pci_register_driver() returns error.
      
      Fixes: bf2a952d ("NTB: Add IDT 89HPESxNTx PCIe-switches support")
      Signed-off-by: default avatarYuan Can <yuancan@huawei.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      c0129682
  2. 25 Jun, 2023 5 commits
  3. 23 Jun, 2023 21 commits
  4. 22 Jun, 2023 12 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.4-4' of... · 2623b3dc
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.4, take #4
      
      - Correctly save/restore PMUSERNR_EL0 when host userspace is using
        PMU counters directly
      
      - Fix GICv2 emulation on GICv3 after the locking rework
      
      - Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
        document why...
      2623b3dc
    • Douglas Anderson's avatar
      arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices · 7b59e8ae
      Douglas Anderson authored
      Just like for sc7180 devices using the Chrome bootflow (AKA trogdor
      and IDP), sc7280 devices using the Chrome bootflow also need their
      firmware marked dma-coherent. On sc7280 this wasn't causing WiFi to
      fail to startup, since WiFi works differently there. However, on
      sc7280 devices we were still getting the message at bootup after
      commit 7bd6680b ("Revert "Revert "arm64: dma: Drop cache
      invalidation from arch_dma_prep_coherent()"""):
      
       qcom_scm firmware:scm: Assign memory protection call failed -22
       qcom_rmtfs_mem 9c900000.memory: assign memory failed
       qcom_rmtfs_mem: probe of 9c900000.memory failed with error -22
      
      We should mark SCM properly just like we did for trogdor.
      
      Fixes: 7bd6680b ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
      Fixes: 7a1f4e7f ("arm64: dts: qcom: sc7280: Add basic dts/dtsi files for sc7280 soc")
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Link: https://lore.kernel.org/r/20230616081440.v2.4.I21dc14a63327bf81c6bb58fe8ed91dbdc9849ee2@changeidSigned-off-by: default avatarBjorn Andersson <andersson@kernel.org>
      7b59e8ae
    • Douglas Anderson's avatar
      arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor · a54b7fa6
      Douglas Anderson authored
      Trogdor devices use firmware backed by TF-A instead of Qualcomm's
      normal TZ. On TF-A we end up mapping memory as cacheable.
      Specifically, you can see in Trogdor's TF-A code [1] in
      qti_sip_mem_assign() that we call qti_mmap_add_dynamic_region() with
      MT_RO_DATA. This translates down to MT_MEMORY instead of
      MT_NON_CACHEABLE or MT_DEVICE. Apparently Qualcomm's normal TZ
      implementation maps the memory as non-cacheable.
      
      Let's add the "dma-coherent" attribute to the SCM for trogdor.
      
      Adding "dma-coherent" like this fixes WiFi on sc7180-trogdor
      devices. WiFi was broken as of commit 7bd6680b ("Revert "Revert
      "arm64: dma: Drop cache invalidation from
      arch_dma_prep_coherent()"""). Specifically at bootup we'd get:
      
       qcom_scm firmware:scm: Assign memory protection call failed -22
       qcom_rmtfs_mem 94600000.memory: assign memory failed
       qcom_rmtfs_mem: probe of 94600000.memory failed with error -22
      
      From discussion on the mailing lists [2] and over IRC [3], it was
      determined that we should always have been tagging the SCM as
      dma-coherent on trogdor but that the old "invalidate" happened to make
      things work most of the time. Tagging it properly like this is a much
      more robust solution.
      
      [1] https://chromium.googlesource.com/chromiumos/third_party/arm-trusted-firmware/+/refs/heads/firmware-trogdor-13577.B/plat/qti/common/src/qti_syscall.c
      [2] https://lore.kernel.org/r/20230614165904.1.I279773c37e2c1ed8fbb622ca6d1397aea0023526@changeid
      [3] https://oftc.irclog.whitequark.org/linux-msm/2023-06-15
      
      Fixes: 7bd6680b ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
      Fixes: 7ec3e673 ("arm64: dts: qcom: sc7180-trogdor: add initial trogdor and lazor dt")
      Reviewed-by: default avatarKonrad Dybcio <konrad.dybcio@linaro.org>
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Link: https://lore.kernel.org/r/20230616081440.v2.3.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeidSigned-off-by: default avatarBjorn Andersson <andersson@kernel.org>
      a54b7fa6
    • Douglas Anderson's avatar
      arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP · 9a5f0b11
      Douglas Anderson authored
      sc7180-idp is, for most intents and purposes, a trogdor device.
      Specifically, sc7180-idp is designed to run the same style of firmware
      as trogdor devices. This can be seen from the fact that IDP has the
      same "Reserved memory changes" in its device tree that trogdor has.
      
      Recently it was realized that we need to mark SCM as dma-coherent to
      match what trogdor's style of firmware (based on TF-A) does [1]. That
      means we need this dma-coherent tag on IDP as well.
      
      Without this, on newer versions of Linux, specifically those with
      commit 7bd6680b ("Revert "Revert "arm64: dma: Drop cache
      invalidation from arch_dma_prep_coherent()"""), WiFi will fail to
      work. At bootup you'll see:
      
        qcom_scm firmware:scm: Assign memory protection call failed -22
        qcom_rmtfs_mem 94600000.memory: assign memory failed
        qcom_rmtfs_mem: probe of 94600000.memory failed with error -22
      
      [1] https://lore.kernel.org/r/20230615145253.1.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeid
      
      Fixes: 7bd6680b ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
      Fixes: f5ab220d ("arm64: dts: qcom: sc7180: Add remoteproc enablers")
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Link: https://lore.kernel.org/r/20230616081440.v2.2.I3c17d546d553378aa8a0c68c3fe04bccea7cba17@changeidSigned-off-by: default avatarBjorn Andersson <andersson@kernel.org>
      9a5f0b11
    • Douglas Anderson's avatar
      dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent · c0877829
      Douglas Anderson authored
      Trogdor devices use firmware backed by TF-A instead of Qualcomm's
      normal TZ. On TF-A we end up mapping memory as cacheable. Specifically,
      you can see in Trogdor's TF-A code [1] in qti_sip_mem_assign() that we
      call qti_mmap_add_dynamic_region() with MT_RO_DATA. This translates
      down to MT_MEMORY instead of MT_NON_CACHEABLE or MT_DEVICE.
      
      Let's allow devices like trogdor to be described properly by allowing
      "dma-coherent" in the SCM node.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Acked-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20230616081440.v2.1.Ie79b5f0ed45739695c9970df121e11d724909157@changeidSigned-off-by: default avatarBjorn Andersson <andersson@kernel.org>
      c0877829
    • Gavin Shan's avatar
      KVM: Avoid illegal stage2 mapping on invalid memory slot · 2230f9e1
      Gavin Shan authored
      We run into guest hang in edk2 firmware when KSM is kept as running on
      the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
      device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
      buffered write. The status is returned by reading the memory region of
      the pflash device and the read request should have been forwarded to QEMU
      and emulated by it. Unfortunately, the read request is covered by an
      illegal stage2 mapping when the guest hang issue occurs. The read request
      is completed with QEMU bypassed and wrong status is fetched. The edk2
      firmware runs into an infinite loop with the wrong status.
      
      The illegal stage2 mapping is populated due to same page sharing by KSM
      at (C) even the associated memory slot has been marked as invalid at (B)
      when the memory slot is requested to be deleted. It's notable that the
      active and inactive memory slots can't be swapped when we're in the middle
      of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
      is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
      to zero again. Besides, the swapping from the active to the inactive memory
      slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
      corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
      
        CPU-A                    CPU-B
        -----                    -----
                                 ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
                                 kvm_vm_ioctl_set_memory_region
                                 kvm_set_memory_region
                                 __kvm_set_memory_region
                                 kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
                                   kvm_invalidate_memslot
                                     kvm_copy_memslot
                                     kvm_replace_memslot
                                     kvm_swap_active_memslots        (A)
                                     kvm_arch_flush_shadow_memslot   (B)
        same page sharing by KSM
        kvm_mmu_notifier_invalidate_range_start
              :
        kvm_mmu_notifier_change_pte
          kvm_handle_hva_range
          __kvm_handle_hva_range
          kvm_set_spte_gfn            (C)
              :
        kvm_mmu_notifier_invalidate_range_end
      
      Fix the issue by skipping the invalid memory slot at (C) to avoid the
      illegal stage2 mapping so that the read request for the pflash's status
      is forwarded to QEMU and emulated by it. In this way, the correct pflash's
      status can be returned from QEMU to break the infinite loop in the edk2
      firmware.
      
      We tried a git-bisect and the first problematic commit is cd4c7183 ("
      KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
      clean_dcache_guest_page() is called after the memory slots are iterated
      in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
      before the iteration on the memory slots before this commit. This change
      literally enlarges the racy window between kvm_mmu_notifier_change_pte()
      and memory slot removal so that we're able to reproduce the issue in a
      practical test case. However, the issue exists since commit d5d8184d
      ("KVM: ARM: Memory virtualization setup").
      
      Cc: stable@vger.kernel.org # v3.9+
      Fixes: d5d8184d ("KVM: ARM: Memory virtualization setup")
      Reported-by: default avatarShuai Hu <hshuai@redhat.com>
      Reported-by: default avatarZhenyu Zhang <zhenyzha@redhat.com>
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarShaoqin Huang <shahuang@redhat.com>
      Message-Id: <20230615054259.14911-1-gshan@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2230f9e1
    • Qu Wenruo's avatar
      btrfs: fix remaining u32 overflows when left shifting stripe_nr · cb091225
      Qu Wenruo authored
      There was regression caused by a97699d1 ("btrfs: replace
      map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by
      a7299a18 ("btrfs: fix u32 overflows when left shifting stripe_nr").
      To avoid code churn the fix was open coding the type casts but
      unfortunately missed one which was still possible to hit [1].
      
      The missing place was assignment of bioc->full_stripe_logical inside
      btrfs_map_block().
      
      Fix it by adding a helper that does the safe calculation of the offset
      and use it everywhere even though it may not be strictly necessary due
      to already using u64 types.  This replaces all remaining
      "<< BTRFS_STRIPE_LEN_SHIFT" calls.
      
      [1] https://lore.kernel.org/linux-btrfs/20230622065438.86402-1-wqu@suse.com/
      
      Fixes: a7299a18 ("btrfs: fix u32 overflows when left shifting stripe_nr")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      cb091225
    • Ming Lei's avatar
      block: make sure local irq is disabled when calling __blkcg_rstat_flush · 9c39b7a9
      Ming Lei authored
      When __blkcg_rstat_flush() is called from cgroup_rstat_flush*() code
      path, interrupt is always disabled.
      
      When we start to flush blkcg per-cpu stats list in __blkg_release()
      for avoiding to leak blkcg_gq's reference in commit 20cb1c2f
      ("blk-cgroup: Flush stats before releasing blkcg_gq"), local irq
      isn't disabled yet, then lockdep warning may be triggered because
      the dependent cgroup locks may be acquired from irq(soft irq) handler.
      
      Fix the issue by disabling local irq always.
      
      Fixes: 20cb1c2f ("blk-cgroup: Flush stats before releasing blkcg_gq")
      Reported-by: default avatarShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Closes: https://lore.kernel.org/linux-block/pz2wzwnmn5tk3pwpskmjhli6g3qly7eoknilb26of376c7kwxy@qydzpvt6zpis/T/#u
      Cc: stable@vger.kernel.org
      Cc: Jay Shin <jaeshin@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Link: https://lore.kernel.org/r/20230622084249.1208005-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9c39b7a9
    • Paolo Abeni's avatar
      Merge tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 2ba7e7eb
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      This is v3, including a crash fix for patch 01/14.
      
      The following patchset contains Netfilter/IPVS fixes for net:
      
      1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.
      
      2) Fix chain binding transaction logic, add a bound flag to rule
         transactions. Remove incorrect logic in nft_data_hold() and
         nft_data_release().
      
      3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
         the set/chain as a follow up to 1240eb93 ("netfilter: nf_tables:
         incorrect error path handling with NFT_MSG_NEWRULE")
      
      4) Drop map element references from preparation phase instead of
         set destroy path, otherwise bogus EBUSY with transactions such as:
      
              flush chain ip x y
              delete chain ip x w
      
         where chain ip x y contains jump/goto from set elements.
      
      5) Pipapo set type does not regard generation mask from the walk
         iteration.
      
      6) Fix reference count underflow in set element reference to
         stateful object.
      
      7) Several patches to tighten the nf_tables API:
         - disallow set element updates of bound anonymous set
         - disallow unbound anonymous set/chain at the end of transaction.
         - disallow updates of anonymous set.
         - disallow timeout configuration for anonymous sets.
      
      8) Fix module reference leak in chain updates.
      
      9) Fix nfnetlink_osf module autoload.
      
      10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
          in iptables-nft.
      
      This Netfilter batch is larger than usual at this stage, I am aware we
      are fairly late in the -rc cycle, if you prefer to route them through
      net-next, please let me know.
      
      netfilter pull request 23-06-21
      
      * tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: Fix for deleting base chains with payload
        netfilter: nfnetlink_osf: fix module autoload
        netfilter: nf_tables: drop module reference after updating chain
        netfilter: nf_tables: disallow timeout for anonymous sets
        netfilter: nf_tables: disallow updates of anonymous sets
        netfilter: nf_tables: reject unbound chain set before commit phase
        netfilter: nf_tables: reject unbound anonymous set before commit phase
        netfilter: nf_tables: disallow element updates of bound anonymous sets
        netfilter: nf_tables: fix underflow in object reference counter
        netfilter: nft_set_pipapo: .walk does not deal with generations
        netfilter: nf_tables: drop map element references from preparation phase
        netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
        netfilter: nf_tables: fix chain binding transaction logic
        ipvs: align inner_mac_header for encapsulation
      ====================
      
      Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2ba7e7eb
    • Maciej Żenczykowski's avatar
      revert "net: align SO_RCVMARK required privileges with SO_MARK" · a9628e88
      Maciej Żenczykowski authored
      This reverts commit 1f86123b ("net: align SO_RCVMARK required
      privileges with SO_MARK") because the reasoning in the commit message
      is not really correct:
        SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
        it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
        and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
        sets the socket mark and does require privs.
      
        Additionally incoming skb->mark may already be visible if
        sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
      
        Furthermore, it is easier to block the getsockopt via bpf
        (either cgroup setsockopt hook, or via syscall filters)
        then to unblock it if it requires CAP_NET_RAW/ADMIN.
      
      On Android the socket mark is (among other things) used to store
      the network identifier a socket is bound to.  Setting it is privileged,
      but retrieving it is not.  We'd like unprivileged userspace to be able
      to read the network id of incoming packets (where mark is set via
      iptables [to be moved to bpf])...
      
      An alternative would be to add another sysctl to control whether
      setting SO_RCVMARK is privilged or not.
      (or even a MASK of which bits in the mark can be exposed)
      But this seems like over-engineering...
      
      Note: This is a non-trivial revert, due to later merged commit e42c7bee
      ("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
      which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
      
      Fixes: 1f86123b ("net: align SO_RCVMARK required privileges with SO_MARK")
      Cc: Larysa Zaremba <larysa.zaremba@intel.com>
      Cc: Simon Horman <simon.horman@corigine.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Eyal Birger <eyal.birger@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Patrick Rohr <prohr@google.com>
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a9628e88
    • Kees Cook's avatar
      net: wwan: iosm: Convert single instance struct member to flexible array · dec24b3b
      Kees Cook authored
      struct mux_adth actually ends with multiple struct mux_adth_dg members.
      This is seen both in the comments about the member:
      
      /**
       * struct mux_adth - Structure of the Aggregated Datagram Table Header.
       ...
       * @dg:		datagramm table with variable length
       */
      
      and in the preparation for populating it:
      
                              adth_dg_size = offsetof(struct mux_adth, dg) +
                                              ul_adb->dg_count[i] * sizeof(*dg);
      			...
                              adth_dg_size -= offsetof(struct mux_adth, dg);
                              memcpy(&adth->dg, ul_adb->dg[i], adth_dg_size);
      
      This was reported as a run-time false positive warning:
      
      memcpy: detected field-spanning write (size 16) of single field "&adth->dg" at drivers/net/wwan/iosm/iosm_ipc_mux_codec.c:852 (size 8)
      
      Adjust the struct mux_adth definition and associated sizeof() math; no binary
      output differences are observed in the resulting object file.
      Reported-by: default avatarFlorian Klink <flokli@flokli.de>
      Closes: https://lore.kernel.org/lkml/dbfa25f5-64c8-5574-4f5d-0151ba95d232@gmail.com/
      Fixes: 1f52d7b6 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
      Cc: M Chetan Kumar <m.chetan.kumar@intel.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Intel Corporation <linuxwwan@intel.com>
      Cc: Loic Poulain <loic.poulain@linaro.org>
      Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230620194234.never.023-kees@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dec24b3b
    • Eric Dumazet's avatar
      sch_netem: acquire qdisc lock in netem_change() · 2174a08d
      Eric Dumazet authored
      syzbot managed to trigger a divide error [1] in netem.
      
      It could happen if q->rate changes while netem_enqueue()
      is running, since q->rate is read twice.
      
      It turns out netem_change() always lacked proper synchronization.
      
      [1]
      divide error: 0000 [#1] SMP KASAN
      CPU: 1 PID: 7867 Comm: syz-executor.1 Not tainted 6.1.30-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
      RIP: 0010:div64_u64 include/linux/math64.h:69 [inline]
      RIP: 0010:packet_time_ns net/sched/sch_netem.c:357 [inline]
      RIP: 0010:netem_enqueue+0x2067/0x36d0 net/sched/sch_netem.c:576
      Code: 89 e2 48 69 da 00 ca 9a 3b 42 80 3c 28 00 4c 8b a4 24 88 00 00 00 74 0d 4c 89 e7 e8 c3 4f 3b fd 48 8b 4c 24 18 48 89 d8 31 d2 <49> f7 34 24 49 01 c7 4c 8b 64 24 48 4d 01 f7 4c 89 e3 48 c1 eb 03
      RSP: 0018:ffffc9000dccea60 EFLAGS: 00010246
      RAX: 000001a442624200 RBX: 000001a442624200 RCX: ffff888108a4f000
      RDX: 0000000000000000 RSI: 000000000000070d RDI: 000000000000070d
      RBP: ffffc9000dcceb90 R08: ffffffff849c5e26 R09: fffffbfff10e1297
      R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888108a4f358
      R13: dffffc0000000000 R14: 0000001a8cd9a7ec R15: 0000000000000000
      FS: 00007fa73fe18700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa73fdf7718 CR3: 000000011d36e000 CR4: 0000000000350ee0
      Call Trace:
      <TASK>
      [<ffffffff84714385>] __dev_xmit_skb net/core/dev.c:3931 [inline]
      [<ffffffff84714385>] __dev_queue_xmit+0xcf5/0x3370 net/core/dev.c:4290
      [<ffffffff84d22df2>] dev_queue_xmit include/linux/netdevice.h:3030 [inline]
      [<ffffffff84d22df2>] neigh_hh_output include/net/neighbour.h:531 [inline]
      [<ffffffff84d22df2>] neigh_output include/net/neighbour.h:545 [inline]
      [<ffffffff84d22df2>] ip_finish_output2+0xb92/0x10d0 net/ipv4/ip_output.c:235
      [<ffffffff84d21e63>] __ip_finish_output+0xc3/0x2b0
      [<ffffffff84d10a81>] ip_finish_output+0x31/0x2a0 net/ipv4/ip_output.c:323
      [<ffffffff84d10f14>] NF_HOOK_COND include/linux/netfilter.h:298 [inline]
      [<ffffffff84d10f14>] ip_output+0x224/0x2a0 net/ipv4/ip_output.c:437
      [<ffffffff84d123b5>] dst_output include/net/dst.h:444 [inline]
      [<ffffffff84d123b5>] ip_local_out net/ipv4/ip_output.c:127 [inline]
      [<ffffffff84d123b5>] __ip_queue_xmit+0x1425/0x2000 net/ipv4/ip_output.c:542
      [<ffffffff84d12fdc>] ip_queue_xmit+0x4c/0x70 net/ipv4/ip_output.c:556
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230620184425.1179809-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2174a08d