Commits · 0d21e6684779493d90f3dee90d4457d5b4aed8ad · Kirill Smelkov / linux

10 Dec, 2021 16 commits

Merge tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux · 0d21e668

Linus Torvalds authored Dec 10, 2021

Pull aio poll fixes from Eric Biggers:
 "Fix three bugs in aio poll, and one issue with POLLFREE more broadly:

   - aio poll didn't handle POLLFREE, causing a use-after-free.

   - aio poll could block while the file is ready.

   - aio poll called eventfd_signal() when it isn't allowed.

   - POLLFREE didn't handle multiple exclusive waiters correctly.

  This has been tested with the libaio test suite, as well as with test
  programs I wrote that reproduce the first two bugs. I am sending this
  pull request myself as no one seems to be maintaining this code"

* tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  aio: Fix incorrect usage of eventfd_signal_allowed()
  aio: fix use-after-free due to missing POLLFREE handling
  aio: keep poll requests on waitqueue until completed
  signalfd: use wake_up_pollfree()
  binder: use wake_up_pollfree()
  wait: add wake_up_pollfree()

0d21e668

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b9172f9e

Linus Torvalds authored Dec 10, 2021

Pull kvm fixes from Paolo Bonzini:
 "More x86 fixes:

   - Logic bugs in CR0 writes and Hyper-V hypercalls

   - Don't use Enlightened MSR Bitmap for L3

   - Remove user-triggerable WARN

  Plus a few selftest fixes and a regression test for the
  user-triggerable WARN"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  selftests: KVM: Add test to verify KVM doesn't explode on "bad" I/O
  KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit
  KVM: X86: Raise #GP when clearing CR0_PG in 64 bit mode
  selftests: KVM: avoid failures due to reserved HyperTransport region
  KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req
  KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall
  KVM: x86: selftests: svm_int_ctl_test: fix intercept calculation
  KVM: nVMX: Don't use Enlightened MSR Bitmap for L3

b9172f9e

Merge tag 'pci-v5.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b8a98b6b

Linus Torvalds authored Dec 10, 2021

Pull PCI fixes from Bjorn Helgaas:

 - Revert emulation of Marvell Armada A3720 expansion ROM because it
   doesn't work as expected (Marek Behún)

 - Assert PERST# in Apple M1 driver to fix initialization when booting
   from bootloaders using PCIe, such as U-Boot (Marc Zyngier)

 - Describe PERST# as active low in Apple T8103 DT and update driver to
   match (Marc Zyngier)

* tag 'pci-v5.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: apple: Fix PERST# polarity
  arm64: dts: apple: t8103: Mark PCIe PERST# polarity active low in DT
  PCI: apple: Follow the PCIe specifications when resetting the port
  Revert "PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge"

b8a98b6b

Merge tag 'mmc-v5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 2ca4b651

Linus Torvalds authored Dec 10, 2021

Pull MMC host fixes from Ulf Hansson:

 - mtk-sd: Fix memory leak during tuning

 - renesas_sdhi: Initialize variable properly when tuning

* tag 'mmc-v5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: mediatek: free the ext_csd when mmc_get_ext_csd success
  mmc: renesas_sdhi: initialize variable properly when tuning

2ca4b651

Merge tag 'libata-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · bec8cb26

Linus Torvalds authored Dec 10, 2021

Pull libata fixes from Damien Le Moal:

 - Fix a sparse warning in the ahci_ceva driver (me)

 - Disable the ASMedia 1092 non-functional device (Hannes)

* tag 'libata-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  libata: add horkage for ASMedia 1092
  ata: ahci_ceva: Fix id array access in ceva_ahci_read_id()

bec8cb26

Merge tag 'sound-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 5b46fb03

Linus Torvalds authored Dec 10, 2021

Pull sound fixes from Takashi Iwai:
 "Another collection of small fixes. It's still not quite calm yet, but
  nothing looks scary.

  ALSA core got a few fixes for covering the issues detected by fuzzer
  and the 32bit compat problem of control API, while the rest are all
  device-specific small fixes, including the continued fixes for Tegra"

* tag 'sound-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits)
  ALSA: hda/realtek - Add headset Mic support for Lenovo ALC897 platform
  ALSA: usb-audio: Reorder snd_djm_devices[] entries
  ALSA: hda/realtek: Fix quirk for TongFang PHxTxX1
  ALSA: ctl: Fix copy of updated id with element read/write
  ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*()
  ALSA: pcm: oss: Limit the period size to 16MB
  ALSA: pcm: oss: Fix negative period/buffer sizes
  ASoC: codecs: wsa881x: fix return values from kcontrol put
  ASoC: codecs: wcd934x: return correct value from mixer put
  ASoC: codecs: wcd934x: handle channel mappping list correctly
  ASoC: qdsp6: q6routing: Fix return value from msm_routing_put_audio_mixer
  ASoC: SOF: Intel: Retry codec probing if it fails
  ASoC: amd: fix uninitialized variable in snd_acp6x_probe()
  ASoC: rockchip: i2s_tdm: Dup static DAI template
  ASoC: rt5682s: Fix crash due to out of scope stack vars
  ASoC: rt5682: Fix crash due to out of scope stack vars
  ASoC: tegra: Use normal system sleep for ADX
  ASoC: tegra: Use normal system sleep for AMX
  ASoC: tegra: Use normal system sleep for Mixer
  ASoC: tegra: Use normal system sleep for MVC
  ...

5b46fb03

Merge tag 'drm-fixes-2021-12-10' of git://anongit.freedesktop.org/drm/drm · 9b302ffe

Linus Torvalds authored Dec 10, 2021

Pull drm fixes from Dave Airlie:
 "Regular fixes, pretty small overall, couple of core fixes, two i915
  and two amdgpu, hopefully it stays this quiet.

  ttm:
   - fix ttm_bo_swapout

  syncobj:
   - fix fence find bug with signalled fences

  i915:
   - fix error pointer deref in gem execbuffer
   - fix for GT init with GuC/HuC on ICL

  amdgpu:
   - DPIA fix
   - eDP fix"

* tag 'drm-fixes-2021-12-10' of git://anongit.freedesktop.org/drm/drm:
  drm/i915/gen11: Moving WAs to icl_gt_workarounds_init()
  drm/amd/display: prevent reading unitialized links
  drm/amd/display: Fix DPIA outbox timeout after S3/S4/reset
  drm/i915: Fix error pointer dereference in i915_gem_do_execbuffer()
  drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence.
  drm/ttm: fix ttm_bo_swapout

9b302ffe

selftests: KVM: Add test to verify KVM doesn't explode on "bad" I/O · 10e7a099

Sean Christopherson authored Oct 25, 2021

Add an x86 selftest to verify that KVM doesn't WARN or otherwise explode
if userspace modifies RCX during a userspace exit to handle string I/O.
This is a regression test for a user-triggerable WARN introduced by
commit 3b27de27 ("KVM: x86: split the two parts of emulator_pio_in").
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211025201311.1881846-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

10e7a099

KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit · d07898ea

Sean Christopherson authored Oct 25, 2021

Replace a WARN with a comment to call out that userspace can modify RCX
during an exit to userspace to handle string I/O.  KVM doesn't actually
support changing the rep count during an exit, i.e. the scenario can be
ignored, but the WARN needs to go as it's trivial to trigger from
userspace.

Cc: stable@vger.kernel.org
Fixes: 3b27de27 ("KVM: x86: split the two parts of emulator_pio_in")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211025201311.1881846-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d07898ea

KVM: X86: Raise #GP when clearing CR0_PG in 64 bit mode · 777ab82d

Lai Jiangshan authored Dec 07, 2021

In the SDM:
If the logical processor is in 64-bit mode or if CR4.PCIDE = 1, an
attempt to clear CR0.PG causes a general-protection exception (#GP).
Software should transition to compatibility mode and clear CR4.PCIDE
before attempting to disable paging.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211207095230.53437-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

777ab82d

selftests: KVM: avoid failures due to reserved HyperTransport region · c8cc43c1

Paolo Bonzini authored Aug 05, 2021

AMD proceessors define an address range that is reserved by HyperTransport
and causes a failure if used for guest physical addresses.  Avoid
selftests failures by reserving those guest physical addresses; the
rules are:

- On parts with <40 bits, its fully hidden from software.

- Before Fam17h, it was always 12G just below 1T, even if there was more
RAM above this location.  In this case we just not use any RAM above 1T.

- On Fam17h and later, it is variable based on SME, and is either just
below 2^48 (no encryption) or 2^43 (encryption).

Fixes: ef4c9f4f ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210805105423.412878-1-pbonzini@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c8cc43c1

KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req · 3244867a

Sean Christopherson authored Dec 07, 2021

Do not bail early if there are no bits set in the sparse banks for a
non-sparse, a.k.a. "all CPUs", IPI request.  Per the Hyper-V spec, it is
legal to have a variable length of '0', e.g. VP_SET's BankContents in
this case, if the request can be serviced without the extra info.

  It is possible that for a given invocation of a hypercall that does
  accept variable sized input headers that all the header input fits
  entirely within the fixed size header. In such cases the variable sized
  input header is zero-sized and the corresponding bits in the hypercall
  input should be set to zero.

Bailing early results in KVM failing to send IPIs to all CPUs as expected
by the guest.

Fixes: 214ff83d ("KVM: x86: hyperv: implement PV IPI send hypercalls")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3244867a

KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall · 1ebfaa11

Vitaly Kuznetsov authored Dec 09, 2021

Prior to commit 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use
tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH |
KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs
and KVM_REQ_TLB_FLUSH is/was defined as:

 (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)

so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that
"This call guarantees that by the time control returns back to the
caller, the observable effects of all flushes on the specified virtual
processors have occurred." and without KVM_REQUEST_WAIT there's a small
chance that the vCPU making the TLB flush will resume running before
all IPIs get delivered to other vCPUs and a stale mapping can get read
there.

Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST:
kvm_hv_flush_tlb() is the sole caller which uses it for
kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where
KVM_REQUEST_WAIT makes a difference.

Cc: stable@kernel.org
Fixes: 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211209102937.584397-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1ebfaa11

Merge tag 'amd-drm-fixes-5.16-2021-12-08' of... · 675a0957

Dave Airlie authored Dec 10, 2021

Merge tag 'amd-drm-fixes-5.16-2021-12-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.16-2021-12-08:

amdgpu:
- DPIA fix
- eDP fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211209042824.6720-1-alexander.deucher@amd.com

675a0957

Merge tag 'drm-intel-fixes-2021-12-09' of... · 233bee7e

Dave Airlie authored Dec 10, 2021

Merge tag 'drm-intel-fixes-2021-12-09' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

A fix to a error pointer dereference in gem_execbuffer and
a fix for GT initialization when GuC/HuC are used on ICL.
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YbJVWYAd/jeERCYY@intel.com

233bee7e

Merge tag 'drm-misc-fixes-2021-12-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 2eb557d2

Dave Airlie authored Dec 10, 2021

A fix in syncobj to handle fence already signalled better, and a fix for
a ttm_bo_swapout eviction check.
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20211209124305.gxhid5zwf7m4oasn@houat

2eb557d2

09 Dec, 2021 24 commits

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · c741e491

Linus Torvalds authored Dec 09, 2021

Pull rdma fixes from Jason Gunthorpe:
 "Quite a few small bug fixes old and new, also Doug Ledford is retiring
  now, we thank him for his work. Details:

   - Use after free in rxe

   - mlx5 DM regression

   - hns bugs triggred by device reset

   - Two fixes for CONFIG_DEBUG_PREEMPT

   - Several longstanding corner case bugs in hfi1

   - Two irdma data path bugs in rare cases and some memory issues"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQ
  RDMA/irdma: Report correct WC errors
  RDMA/irdma: Fix a potential memory allocation issue in 'irdma_prm_add_pble_mem()'
  RDMA/irdma: Fix a user-after-free in add_pble_prm
  IB/hfi1: Fix leak of rcvhdrtail_dummy_kvaddr
  IB/hfi1: Fix early init panic
  IB/hfi1: Insure use of smp_processor_id() is preempt disabled
  IB/hfi1: Correct guard on eager buffer deallocation
  RDMA/rtrs: Call {get,put}_cpu_ptr to silence a debug kernel warning
  RDMA/hns: Do not destroy QP resources in the hw resetting phase
  RDMA/hns: Do not halt commands during reset until later
  Remove Doug Ledford from MAINTAINERS
  RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow
  RDMA: Fix use-after-free in rxe_queue_cleanup

c741e491

Merge tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · ded746bf

Linus Torvalds authored Dec 09, 2021

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, can and netfilter.

  Current release - regressions:

   - bpf, sockmap: re-evaluate proto ops when psock is removed from
     sockmap

  Current release - new code bugs:

   - bpf: fix bpf_check_mod_kfunc_call for built-in modules

   - ice: fixes for TC classifier offloads

   - vrf: don't run conntrack on vrf with !dflt qdisc

  Previous releases - regressions:

   - bpf: fix the off-by-two error in range markings

   - seg6: fix the iif in the IPv6 socket control block

   - devlink: fix netns refcount leak in devlink_nl_cmd_reload()

   - dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"

   - dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports

  Previous releases - always broken:

   - ethtool: do not perform operations on net devices being
     unregistered

   - udp: use datalen to cap max gso segments

   - ice: fix races in stats collection

   - fec: only clear interrupt of handling queue in fec_enet_rx_queue()

   - m_can: pci: fix incorrect reference clock rate

   - m_can: disable and ignore ELO interrupt

   - mvpp2: fix XDP rx queues registering

  Misc:

   - treewide: add missing includes masked by cgroup -> bpf.h
     dependency"

* tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
  net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
  net: wwan: iosm: fixes unable to send AT command during mbim tx
  net: wwan: iosm: fixes net interface nonfunctional after fw flash
  net: wwan: iosm: fixes unnecessary doorbell send
  net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
  MAINTAINERS: s390/net: remove myself as maintainer
  net/sched: fq_pie: prevent dismantle issue
  net: mana: Fix memory leak in mana_hwc_create_wq
  seg6: fix the iif in the IPv6 socket control block
  nfp: Fix memory leak in nfp_cpp_area_cache_add()
  nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
  nfc: fix segfault in nfc_genl_dump_devices_done
  udp: using datalen to cap max gso segments
  net: dsa: mv88e6xxx: error handling for serdes_power functions
  can: kvaser_usb: get CAN clock frequency from device
  can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
  net: mvpp2: fix XDP rx queues registering
  vmxnet3: fix minimum vectors alloc issue
  net, neigh: clear whole pneigh_entry at alloc time
  net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
  ...

ded746bf

Merge tag 'mtd/fixes-for-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · 27698cd2

Linus Torvalds authored Dec 09, 2021

Pull mtd fixes from Miquel Raynal:
 "MTD fixes:

   - dataflash: Add device-tree SPI IDs to avoid new warnings

  Raw NAND fixes:

   - Fix nand_choose_best_timings() on unsupported interface

   - Fix nand_erase_op delay (wrong unit)

   - fsmc:
      - Fix timing computation
      - Take instruction delay into account

   - denali:
      - Add the dependency on HAS_IOMEM to silence robots"

* tag 'mtd/fixes-for-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: dataflash: Add device-tree SPI IDs
  mtd: rawnand: fsmc: Fix timing computation
  mtd: rawnand: fsmc: Take instruction delay into account
  mtd: rawnand: Fix nand_choose_best_timings() on unsupported interface
  mtd: rawnand: Fix nand_erase_op delay
  mtd: rawnand: denali: Add the dependency on HAS_IOMEM

27698cd2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 03090cc7

Linus Torvalds authored Dec 09, 2021

Pull HID fixes from Jiri Kosina:

 - fixes for various drivers which assume that a HID device is on USB
   transport, but that might not necessarily be the case, as the device
   can be faked by uhid. (Greg, Benjamin Tissoires)

 - fix for spurious wakeups on certain Lenovo notebooks (Thomas
   Weißschuh)

 - a few other device-specific quirks

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: Ignore battery for Elan touchscreen on Asus UX550VE
  HID: intel-ish-hid: ipc: only enable IRQ wakeup when requested
  HID: google: add eel USB id
  HID: add USB_HID dependancy to hid-prodikeys
  HID: add USB_HID dependancy to hid-chicony
  HID: bigbenff: prevent null pointer dereference
  HID: sony: fix error path in probe
  HID: add USB_HID dependancy on some USB HID drivers
  HID: check for valid USB device for many HID drivers
  HID: wacom: fix problems when device is not a valid USB device
  HID: add hid_is_usb() function to make it simpler for USB detection
  HID: quirks: Add quirk for the Microsoft Surface 3 type-cover

03090cc7

aio: Fix incorrect usage of eventfd_signal_allowed() · 4b374986

Xie Yongji authored Sep 13, 2021

We should defer eventfd_signal() to the workqueue when
eventfd_signal_allowed() return false rather than return
true.

Fixes: b542e383 ("eventfd: Make signal recursion protection a task bit")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.comReviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>

4b374986

aio: fix use-after-free due to missing POLLFREE handling · 50252e4b

Eric Biggers authored Dec 08, 2021

signalfd_poll() and binder_poll() are special in that they use a
waitqueue whose lifetime is the current task, rather than the struct
file as is normally the case. This is okay for blocking polls, since a
blocking poll occurs within one task; however, non-blocking polls
require another solution. This solution is for the queue to be cleared
before it is freed, by sending a POLLFREE notification to all waiters.

Unfortunately, only eventpoll handles POLLFREE. A second type of
non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
handle POLLFREE. This allows a use-after-free to occur if a signalfd or
binder fd is polled with aio poll, and the waitqueue gets freed.

Fix this by making aio poll handle POLLFREE.

A patch by Ramji Jiyani <ramjiyani@google.com>
(https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com)
tried to do this by making aio_poll_wake() always complete the request
inline if POLLFREE is seen. However, that solution had two bugs.
First, it introduced a deadlock, as it unconditionally locked the aio
context while holding the waitqueue lock, which inverts the normal
locking order. Second, it didn't consider that POLLFREE notifications
are missed while the request has been temporarily de-queued.

The second problem was solved by my previous patch. This patch then
properly fixes the use-after-free by handling POLLFREE in a
deadlock-free way. It does this by taking advantage of the fact that
freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.

Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.18+
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.orgSigned-off-by: Eric Biggers <ebiggers@google.com>

50252e4b

aio: keep poll requests on waitqueue until completed · 363bee27

Eric Biggers authored Dec 08, 2021

Currently, aio_poll_wake() will always remove the poll request from the
waitqueue. Then, if aio_poll_complete_work() sees that none of the
polled events are ready and the request isn't cancelled, it re-adds the
request to the waitqueue. (This can easily happen when polling a file
that doesn't pass an event mask when waking up its waitqueue.)

This is fundamentally broken for two reasons:

1. If a wakeup occurs between vfs_poll() and the request being
re-added to the waitqueue, it will be missed because the request
wasn't on the waitqueue at the time. Therefore, IOCB_CMD_POLL
might never complete even if the polled file is ready.

2. When the request isn't on the waitqueue, there is no way to be
notified that the waitqueue is being freed (which happens when its
lifetime is shorter than the struct file's). This is supposed to
happen via the waitqueue entries being woken up with POLLFREE.

Therefore, leave the requests on the waitqueue until they are actually
completed (or cancelled). To keep track of when aio_poll_complete_work
needs to be scheduled, use new fields in struct poll_iocb. Remove the
'done' field which is now redundant.

Note that this is consistent with how sys_poll() and eventpoll work;
their wakeup functions do *not* remove the waitqueue entries.

Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.18+
Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.orgSigned-off-by: Eric Biggers <ebiggers@google.com>

363bee27

signalfd: use wake_up_pollfree() · 9537bae0

Eric Biggers authored Dec 08, 2021

wake_up_poll() uses nr_exclusive=1, so it's not guaranteed to wake up
all exclusive waiters. Yet, POLLFREE *must* wake up all waiters. epoll
and aio poll are fortunately not affected by this, but it's very
fragile. Thus, the new function wake_up_pollfree() has been introduced.

Convert signalfd to use wake_up_pollfree().
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: d80e731e ("epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211209010455.42744-4-ebiggers@kernel.orgSigned-off-by: Eric Biggers <ebiggers@google.com>

9537bae0

binder: use wake_up_pollfree() · a880b28a

Eric Biggers authored Dec 08, 2021

Convert binder to use wake_up_pollfree().
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: f5cb779b ("ANDROID: binder: remove waitqueue when thread exits.")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211209010455.42744-3-ebiggers@kernel.orgSigned-off-by: Eric Biggers <ebiggers@google.com>

a880b28a

wait: add wake_up_pollfree() · 42288cb4

Eric Biggers authored Dec 08, 2021

Several ->poll() implementations are special in that they use a
waitqueue whose lifetime is the current task, rather than the struct
file as is normally the case.  This is okay for blocking polls, since a
blocking poll occurs within one task; however, non-blocking polls
require another solution.  This solution is for the queue to be cleared
before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'.

However, that has a bug: wake_up_poll() calls __wake_up() with
nr_exclusive=1.  Therefore, if there are multiple "exclusive" waiters,
and the wakeup function for the first one returns a positive value, only
that one will be called.  That's *not* what's needed for POLLFREE;
POLLFREE is special in that it really needs to wake up everyone.

Considering the three non-blocking poll systems:

- io_uring poll doesn't handle POLLFREE at all, so it is broken anyway.

- aio poll is unaffected, since it doesn't support exclusive waits.
  However, that's fragile, as someone could add this feature later.

- epoll doesn't appear to be broken by this, since its wakeup function
  returns 0 when it sees POLLFREE.  But this is fragile.

Although there is a workaround (see epoll), it's better to define a
function which always sends POLLFREE to all waiters.  Add such a
function.  Also make it verify that the queue really becomes empty after
all waiters have been woken up.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211209010455.42744-2-ebiggers@kernel.orgSigned-off-by: Eric Biggers <ebiggers@google.com>

42288cb4

Merge tag 'netfs-fixes-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 2990c89d

Linus Torvalds authored Dec 09, 2021

Pull netfslib fixes from David Howells:

 - Fix a lockdep warning and potential deadlock. This is takes the
   simple approach of offloading the write-to-cache done from within a
   network filesystem read to a worker thread to avoid taking the
   sb_writer lock from the cache backing filesystem whilst holding the
   mmap lock on an inode from the network filesystem.

   Jan Kara posits a scenario whereby this can cause deadlock[1], though
   it's quite complex and I think requires someone in userspace to
   actually do I/O on the cache files. Matthew Wilcox isn't so certain,
   though[2].

   An alternative way to fix this, suggested by Darrick Wong, might be
   to allow cachefiles to prevent userspace from performing I/O upon the
   file - something like an exclusive open - but that's beyond the scope
   of a fix here if we do want to make such a facility in the future.

 - In some of the error handling paths where netfs_ops->cleanup() is
   called, the arguments are transposed[3]. gcc doesn't complain because
   one of the parameters is void* and one of the values is void*.

Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1]
Link: https://lore.kernel.org/r/Ya9eDiFCE2fO7K/S@casper.infradead.org/ [2]
Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ [3]

* tag 'netfs-fixes-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  netfs: fix parameter of cleanup()
  netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock

2990c89d

KVM: x86: selftests: svm_int_ctl_test: fix intercept calculation · ee3a4f66

Maciej S. Szmigiero authored Dec 03, 2021

INTERCEPT_x are bit positions, but the code was using the raw value of
INTERCEPT_VINTR (4) instead of BIT(INTERCEPT_VINTR).
This resulted in masking of bit 2 - that is, SMI instead of VINTR.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <49b9571d25588870db5380b0be1a41df4bbaaf93.1638486479.git.maciej.szmigiero@oracle.com>

ee3a4f66

tools/lib/lockdep: drop leftover liblockdep headers · 3a49cc22

Sasha Levin authored Dec 09, 2021

Clean up remaining headers that are specific to liblockdep but lived in
the shared header directory.  These are all unused after the liblockdep
code was removed in commit 7246f4dc ("tools/lib/lockdep: drop
liblockdep").

Note that there are still headers that were originally created for
liblockdep, that still have liblockdep references, but they are used by
other tools/ code at this point.
Signed-off-by: Sasha Levin <sashal@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3a49cc22

net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports · 04ec4e62

Russell King (Oracle) authored Dec 09, 2021

Martyn Welch reports that his CPU port is unable to link where it has
been necessary to use one of the switch ports with an internal PHY for
the CPU port. The reason behind this is the port control register is
left forcing the link down, preventing traffic flow.

This occurs because during initialisation, phylink expects the link to
be down, and DSA forces the link down by synthesising a call to the
DSA drivers phylink_mac_link_down() method, but we don't touch the
forced-link state when we later reconfigure the port.

Resolve this by also unforcing the link state when we are operating in
PHY mode and the PPU is set to poll the PHY to retrieve link status
information.
Reported-by: Martyn Welch <martyn.welch@collabora.com>
Tested-by: Martyn Welch <martyn.welch@collabora.com>
Fixes: 3be98b2d ("net: dsa: Down cpu/dsa ports phylink will control")
Cc: <stable@vger.kernel.org> # 5.7: 2b29cb9e: net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1mvFhP-00F8Zb-Ul@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>

04ec4e62

Merge branch 'net-wwan-iosm-bug-fixes' · 19961780

Jakub Kicinski authored Dec 09, 2021

M Chetan Kumar says:

====================
net: wwan: iosm: bug fixes

This patch series brings in IOSM driver bug fixes. Patch details are
explained below.

PATCH1: stop sending unnecessary doorbell in IP tx flow.
PATCH2: Restore the IP channel configuration after fw flash.
PATCH3: Removed the unnecessary check around control port TX transfer.
====================

Link: https://lore.kernel.org/r/20211209101629.2940877-1-m.chetan.kumar@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

19961780

net: wwan: iosm: fixes unable to send AT command during mbim tx · 383451ce

M Chetan Kumar authored Dec 09, 2021

ev_cdev_write_pending flag is preventing a TX message post for
AT port while MBIM transfer is ongoing.

Removed the unnecessary check around control port TX transfer.
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

383451ce

net: wwan: iosm: fixes net interface nonfunctional after fw flash · 07d3f274

M Chetan Kumar authored Dec 09, 2021

Devlink initialization flow was overwriting the IP traffic
channel configuration. This was causing wwan0 network interface
to be unusable after fw flash.

When device boots to fully functional mode restore the IP channel
configuration.
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

07d3f274

net: wwan: iosm: fixes unnecessary doorbell send · 373f121a

M Chetan Kumar authored Dec 09, 2021

In TX packet accumulation flow transport layer is
giving a doorbell to device even though there is
no pending control TX transfer that needs immediate
attention.

Introduced a new hpda_ctrl_pending variable to keep
track of pending control TX transfer. If there is a
pending control TX transfer which needs an immediate
attention only then give a doorbell to device.
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

373f121a

net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering · e8b1d769

José Expósito authored Dec 09, 2021

Avoid a memory leak if there is not a CPU port defined.

Fixes: 8d5f7954 ("net: dsa: felix: break at first CPU port during init and teardown")
Addresses-Coverity-ID: 1492897 ("Resource leak")
Addresses-Coverity-ID: 1492899 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211209110538.11585-1-jose.exposito89@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e8b1d769

MAINTAINERS: s390/net: remove myself as maintainer · 37ad4e2a

Julian Wiedmann authored Dec 09, 2021

I won't have access to the relevant HW and docs much longer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Link: https://lore.kernel.org/r/20211209153546.1152921-1-jwi@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

37ad4e2a

net/sched: fq_pie: prevent dismantle issue · 61c24026

Eric Dumazet authored Dec 09, 2021

For some reason, fq_pie_destroy() did not copy
working code from pie_destroy() and other qdiscs,
thus causing elusive bug.

Before calling del_timer_sync(&q->adapt_timer),
we need to ensure timer will not rearm itself.

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    0-....: (4416 ticks this GP) idle=60d/1/0x4000000000000000 softirq=10433/10434 fqs=2579
        (t=10501 jiffies g=13085 q=3989)
NMI backtrace for cpu 0
CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:711 [inline]
 rcu_pending kernel/rcu/tree.c:3878 [inline]
 rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597
 update_process_times+0x16d/0x200 kernel/time/timer.c:1785
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
 __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
 __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:write_comp_data kernel/kcov.c:221 [inline]
RIP: 0010:__sanitizer_cov_trace_const_cmp1+0x1d/0x80 kernel/kcov.c:273
Code: 54 c8 20 48 89 10 c3 66 0f 1f 44 00 00 53 41 89 fb 41 89 f1 bf 03 00 00 00 65 48 8b 0c 25 40 70 02 00 48 89 ce 4c 8b 54 24 08 <e8> 4e f7 ff ff 84 c0 74 51 48 8b 81 88 15 00 00 44 8b 81 84 15 00
RSP: 0018:ffffc90000d27b28 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff888064bf1bf0 RCX: ffff888011928000
RDX: ffff888011928000 RSI: ffff888011928000 RDI: 0000000000000003
RBP: ffff888064bf1c28 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff875d8295 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8880783dd300 R14: 0000000000000000 R15: 0000000000000000
 pie_calculate_probability+0x405/0x7c0 net/sched/sch_pie.c:418
 fq_pie_timer+0x170/0x2a0 net/sched/sch_fq_pie.c:383
 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
 expire_timers kernel/time/timer.c:1466 [inline]
 __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734
 __run_timers kernel/time/timer.c:1715 [inline]
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
 run_ksoftirqd kernel/softirq.c:921 [inline]
 run_ksoftirqd+0x2d/0x60 kernel/softirq.c:913
 smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164
 kthread+0x405/0x4f0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>

Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Cc: Sachin D. Patil <sdp.sachin@gmail.com>
Cc: V. Saicharan <vsaicharan1998@gmail.com>
Cc: Mohit Bhasi <mohitbhasi1998@gmail.com>
Cc: Leslie Monis <lesliemonis@gmail.com>
Cc: Gautam Ramakrishnan <gautamramk@gmail.com>
Link: https://lore.kernel.org/r/20211209084937.3500020-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

61c24026

net: mana: Fix memory leak in mana_hwc_create_wq · 9acfc57f

José Expósito authored Dec 08, 2021

If allocating the DMA buffer fails, mana_hwc_destroy_wq was called
without previously storing the pointer to the queue.

In order to avoid leaking the pointer to the queue, store it as soon as
it is allocated.

Addresses-Coverity-ID: 1484720 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/20211208223723.18520-1-jose.exposito89@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9acfc57f

seg6: fix the iif in the IPv6 socket control block · ae68d933

Andrea Mayer authored Dec 08, 2021

When an IPv4 packet is received, the ip_rcv_core(...) sets the receiving
interface index into the IPv4 socket control block (v5.16-rc4,
net/ipv4/ip_input.c line 510):

    IPCB(skb)->iif = skb->skb_iif;

If that IPv4 packet is meant to be encapsulated in an outer IPv6+SRH
header, the seg6_do_srh_encap(...) performs the required encapsulation.
In this case, the seg6_do_srh_encap function clears the IPv6 socket control
block (v5.16-rc4 net/ipv6/seg6_iptunnel.c line 163):

    memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));

The memset(...) was introduced in commit ef489749 ("ipv6: sr: clear
IP6CB(skb) on SRH ip4ip6 encapsulation") a long time ago (2019-01-29).

Since the IPv6 socket control block and the IPv4 socket control block share
the same memory area (skb->cb), the receiving interface index info is lost
(IP6CB(skb)->iif is set to zero).

As a side effect, that condition triggers a NULL pointer dereference if
commit 0857d6f8 ("ipv6: When forwarding count rx stats on the orig
netdev") is applied.

To fix that issue, we set the IP6CB(skb)->iif with the index of the
receiving interface once again.

Fixes: ef489749 ("ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20211208195409.12169-1-andrea.mayer@uniroma2.itSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ae68d933

nfp: Fix memory leak in nfp_cpp_area_cache_add() · c56c9630

Jianglei Nie authored Dec 09, 2021

In line 800 (#1), nfp_cpp_area_alloc() allocates and initializes a
CPP area structure. But in line 807 (#2), when the cache is allocated
failed, this CPP area structure is not freed, which will result in
memory leak.

We can fix it by freeing the CPP area when the cache is allocated
failed (#2).

792 int nfp_cpp_area_cache_add(struct nfp_cpp *cpp, size_t size)
793 {
794 	struct nfp_cpp_area_cache *cache;
795 	struct nfp_cpp_area *area;

800	area = nfp_cpp_area_alloc(cpp, NFP_CPP_ID(7, NFP_CPP_ACTION_RW, 0),
801 				  0, size);
	// #1: allocates and initializes

802 	if (!area)
803 		return -ENOMEM;

805 	cache = kzalloc(sizeof(*cache), GFP_KERNEL);
806 	if (!cache)
807 		return -ENOMEM; // #2: missing free

817	return 0;
818 }

Fixes: 4cb584e0 ("nfp: add CPP access core")
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20211209061511.122535-1-niejianglei2021@163.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c56c9630