Commits · 192ad3c27a4895ee4b2fa31c5b54a932f5bb08c1 · Kirill Smelkov / linux

07 Sep, 2021 14 commits

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 192ad3c2

Linus Torvalds authored Sep 07, 2021

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - Page ownership tracking between host EL1 and EL2
   - Rely on userspace page tables to create large stage-2 mappings
   - Fix incompatibility between pKVM and kmemleak
   - Fix the PMU reset state, and improve the performance of the virtual
     PMU
   - Move over to the generic KVM entry code
   - Address PSCI reset issues w.r.t. save/restore
   - Preliminary rework for the upcoming pKVM fixed feature
   - A bunch of MM cleanups
   - a vGIC fix for timer spurious interrupts
   - Various cleanups

  s390:
   - enable interpretation of specification exceptions
   - fix a vcpu_idx vs vcpu_id mixup

  x86:
   - fast (lockless) page fault support for the new MMU
   - new MMU now the default
   - increased maximum allowed VCPU count
   - allow inhibit IRQs on KVM_RUN while debugging guests
   - let Hyper-V-enabled guests run with virtualized LAPIC as long as
     they do not enable the Hyper-V "AutoEOI" feature
   - fixes and optimizations for the toggling of AMD AVIC (virtualized
     LAPIC)
   - tuning for the case when two-dimensional paging (EPT/NPT) is
     disabled
   - bugfixes and cleanups, especially with respect to vCPU reset and
     choosing a paging mode based on CR0/CR4/EFER
   - support for 5-level page table on AMD processors

  Generic:
   - MMU notifier invalidation callbacks do not take mmu_lock unless
     necessary
   - improved caching of LRU kvm_memory_slot
   - support for histogram statistics
   - add statistics for halt polling and remote TLB flush requests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
  KVM: Drop unused kvm_dirty_gfn_invalid()
  KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
  KVM: MMU: mark role_regs and role accessors as maybe unused
  KVM: MIPS: Remove a "set but not used" variable
  x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
  KVM: stats: Add VM stat for remote tlb flush requests
  KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
  KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
  KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
  Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
  KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
  kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
  kvm: x86: Increase MAX_VCPUS to 1024
  kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
  KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
  KVM: s390: index kvm->arch.idle_mask by vcpu_idx
  KVM: s390: Enable specification exception interpretation
  KVM: arm64: Trim guest debug exception handling
  KVM: SVM: Add 5-level page table support for SVM
  ...

192ad3c2

Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · a2b28235

Linus Torvalds authored Sep 07, 2021

Pull dmi fix from Jean Delvare.

Unbreak some existing udev/hwdb modalias matches due to misplaced
product_sku field.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  firmware: dmi: Move product_sku info to the end of the modalias

a2b28235

Merge tag 'ntb-5.15' of git://github.com/jonmason/ntb · 1735715e

Linus Torvalds authored Sep 07, 2021

Pull NTB updates from Jon Mason:
 "Bug fixes and clean-ups for Linux v5.15"

* tag 'ntb-5.15' of git://github.com/jonmason/ntb:
  NTB: switch from 'pci_' to 'dma_' API
  ntb: ntb_pingpong: remove redundant initialization of variables msg_data and spad_data
  NTB: perf: Fix an error code in perf_setup_inbuf()
  NTB: Fix an error code in ntb_msit_probe()
  ntb: intel: remove invalid email address in header comment

1735715e

Merge tag 'rproc-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc · 21f577b0

Linus Torvalds authored Sep 07, 2021

Pull remoteproc updates from Bjorn Andersson:

 - move the crash recovery worker to the freezable work queue to avoid
   interaction with other drivers during suspend & resume

 - fix a couple of typos in comments

 - add support for handling the audio DSP on SDM660

 - fix a race between the Qualcomm wireless subsystem driver and the
   associated driver for the RF chip

* tag 'rproc-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
  remoteproc: q6v5_pas: Add sdm660 ADSP PIL compatible
  dt-bindings: remoteproc: qcom: adsp: Add SDM660 ADSP
  remoteproc: use freezable workqueue for crash notifications
  remoteproc: fix kernel doc for struct rproc_ops
  remoteproc: fix an typo in fw_elf_get_class code comments
  remoteproc: qcom: wcnss: Fix race with iris probe

21f577b0

Merge tag 'backlight-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight · 2d7b4cdb

Linus Torvalds authored Sep 07, 2021

Pull backlight updates from Lee Jones:
 "Fix-ups:
   - Improve bootloader/kernel device handover

  Bug Fixes:
   - Stabilise backlight in ktd253 driver"

* tag 'backlight-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: pwm_bl: Improve bootloader/kernel device handover
  backlight: ktd253: Stabilize backlight

2d7b4cdb

Merge tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · 86406a9e

Linus Torvalds authored Sep 07, 2021

Pull MFD updates from Lee Jones:
 "Core Frameworks:
   - Add support for registering devices via MFD cells to Simple MFD (I2C)

  New Drivers:
   - Add support for Renesas Synchronization Management Unit (SMU)

  New Device Support:
   - Add support for N5010 to Intel M10 BMC
   - Add support for Cannon Lake to Intel LPSS ACPI
   - Add support for Samsung SSG{1,2} to ST-Ericsson's U8500 family
   - Add support for TQMx110EB and TQMxE40x to TQ-Systems PLD TQMx86

  New Functionality:
   - Add support for GPIO to Intel LPC ICH
   - Add support for Reset to Texas Instruments TPS65086

  Fix-ups:
   - Trivial, sorting, whitespace, renaming, etc; mt6360-core, db8500-prcmu-regs, tqmx86
   - Device Tree fiddling; syscon, axp20x, qcom,pm8008, ti,tps65086, brcm,cru
   - Use proper APIs for IRQ map resolution; ab8500-core, stmpe, tc3589x, wm8994-irq
   - Pass 'supplied-from' property through axp288_fuel_gauge via swnode
   - Remove unused file entry; MAINTAINERS
   - Make interrupt line optional; tps65086
   - Rename db8500-cpuidle driver symbol; db8500-prcmu
   - Remove support for unused hardware; tqmx86
   - Provide a standard LPC clock frequency for unknown boards; tqmx86
   - Remove unused code; ti_am335x_tscadc
   - Use of_iomap() instead of ioremap(); syscon

  Bug Fixes:
   - Clear GPIO IRQ resource flags when no IRQ is set; tqmx86
   - Fix incorrect/misleading frequencies; db8500-prcmu
   - Mitigate namespace clash with other GPIOBASE users"

* tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (31 commits)
  mfd: lpc_sch: Rename GPIOBASE to prevent build error
  mfd: syscon: Use of_iomap() instead of ioremap()
  dt-bindings: mfd: Add Broadcom CRU
  mfd: ti_am335x_tscadc: Delete superfluous error message
  mfd: tqmx86: Assume 24MHz LPC clock for unknown boards
  mfd: tqmx86: Add support for TQ-Systems DMI IDs
  mfd: tqmx86: Add support for TQMx110EB and TQMxE40x
  mfd: tqmx86: Fix typo in "platform"
  mfd: tqmx86: Remove incorrect TQMx90UC board ID
  mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set
  mfd: simple-mfd-i2c: Add support for registering devices via MFD cells
  mfd/cpuidle: ux500: Rename driver symbol
  mfd: tps65086: Add cell entry for reset driver
  mfd: tps65086: Make interrupt line optional
  dt-bindings: mfd: Convert tps65086.txt to YAML
  MAINTAINERS: Adjust ARM/NOMADIK/Ux500 ARCHITECTURES to file renaming
  mfd: db8500-prcmu: Handle missing FW variant
  mfd: db8500-prcmu: Rename register header
  mfd: axp20x: Add supplied-from property to axp288_fuel_gauge cell
  mfd: Don't use irq_create_mapping() to resolve a mapping
  ...

86406a9e

Merge tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 5e6a5845

Linus Torvalds authored Sep 07, 2021

Pull gpio updates from Bartosz Golaszewski:
 "We mostly have various improvements and refactoring all over the place
  but also some interesting new features - like the virtio GPIO driver
  that allows guest VMs to use host's GPIOs. We also have a new/old GPIO
  driver for rockchip - this one has been split out of the pinctrl
  driver.

  Summary:

   - new driver: gpio-virtio allowing a guest VM running linux to access
     GPIO lines provided by the host

   - split the GPIO driver out of the rockchip pin control driver

   - add support for a new model to gpio-aspeed-sgpio, refactor the
     driver and use generic device property interfaces, improve property
     sanitization

   - add ACPI support to gpio-tegra186

   - improve the code setting the line names to support multiple GPIO
     banks per device

   - constify a bunch of OF functions in the core GPIO code and make the
     declaration for one of the core OF functions we use consistent
     within its header

   - use software nodes in intel_quark_i2c_gpio

   - add support for the gpio-line-names property in gpio-mt7621

   - use the standard GPIO function for setting the GPIO names in
     gpio-brcmstb

   - fix a bunch of leaks and other bugs in gpio-mpc8xxx

   - use generic pm callbacks in gpio-ml-ioh

   - improve resource management and PM handling in gpio-mlxbf2

   - modernize and improve the gpio-dwapb driver

   - coding style improvements in gpio-rcar

   - documentation fixes and improvements

   - update the MAINTAINERS entry for gpio-zynq

   - minor tweaks in several drivers"

* tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (35 commits)
  gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak
  gpio: mpc8xxx: Fix a potential double iounmap call in 'mpc8xxx_probe()'
  gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()'
  gpio: viperboard: remove platform_set_drvdata() call in probe
  gpio: virtio: Add missing mailings lists in MAINTAINERS entry
  gpio: virtio: Fix sparse warnings
  gpio: remove the obsolete MX35 3DS BOARD MC9S08DZ60 GPIO functions
  gpio: max730x: Use the right include
  gpio: Add virtio-gpio driver
  gpio: mlxbf2: Use DEFINE_RES_MEM_NAMED() helper macro
  gpio: mlxbf2: Use devm_platform_ioremap_resource()
  gpio: mlxbf2: Drop wrong use of ACPI_PTR()
  gpio: mlxbf2: Convert to device PM ops
  gpio: dwapb: Get rid of legacy platform data
  mfd: intel_quark_i2c_gpio: Convert GPIO to use software nodes
  gpio: dwapb: Read GPIO base from gpio-base property
  gpio: dwapb: Unify ACPI enumeration checks in get_irq() and configure_irqs()
  gpiolib: Deduplicate forward declaration in the consumer.h header
  MAINTAINERS: update gpio-zynq.yaml reference
  gpio: tegra186: Add ACPI support
  ...

5e6a5845

Merge tag 'fuse-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 75b96f0e

Linus Torvalds authored Sep 07, 2021

Pull fuse updates from Miklos Szeredi:

 - Allow mounting an active fuse device. Previously the fuse device
   would always be mounted during initialization, and sharing a fuse
   superblock was only possible through mount or namespace cloning

 - Fix data flushing in syncfs (virtiofs only)

 - Fix data flushing in copy_file_range()

 - Fix a possible deadlock in atomic O_TRUNC

 - Misc fixes and cleanups

* tag 'fuse-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unused arg in fuse_write_file_get()
  fuse: wait for writepages in syncfs
  fuse: flush extending writes
  fuse: truncate pagecache on atomic_o_trunc
  fuse: allow sharing existing sb
  fuse: move fget() to fuse_get_tree()
  fuse: move option checking into fuse_fill_super()
  fuse: name fs_context consistently
  fuse: fix use after free in fuse_read_interrupt()

75b96f0e

Merge tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux · 996fe061

Linus Torvalds authored Sep 07, 2021

Pull kgdb updates from Daniel Thompson:
 "Changes for kgdb/kdb this cycle are dominated by a change from Sumit
  that removes as small (256K) private heap from kdb. This is change
  I've hoped for ever since I discovered how few users of this heap
  remained in the kernel, so many thanks to Sumit for hunting these
  down.

  The other change is an incremental step towards SPDX headers"

* tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kernel: debug: Convert to SPDX identifier
  kdb: Rename members of struct kdbtab_t
  kdb: Simplify kdb_defcmd macro logic
  kdb: Get rid of redundant kdb_register_flags()
  kdb: Rename struct defcmd_set to struct kdb_macro
  kdb: Get rid of custom debug heap allocator

996fe061

Revert "memcg: enable accounting for pollfd and select bits arrays" · 0bcfe68b

Linus Torvalds authored Sep 07, 2021

This reverts commit b6558434.

Just like with the memcg lock accounting, the kernel test robot reports
a sizeable performance regression for this commit, and while it clearly
does the rigth thing in theory, we'll need to look at just how to avoid
or minimize the performance overhead of the memcg accounting.

People already have suggestions on how to do that, but it's "future
work".

So revert it for now.

[ Note: the first link below is for this same commit but a different
  commit ID, because it's the kernel test robot ended up noticing it in
  Andrew Morton's patch queue ]

Link: https://lore.kernel.org/lkml/20210905132732.GC15026@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0bcfe68b

Revert "memcg: enable accounting for file lock caches" · 3754707b

Linus Torvalds authored Sep 07, 2021

This reverts commit 0f12156d.

The kernel test robot reports a sizeable performance regression for this
commit, and while it clearly does the rigth thing in theory, we'll need
to look at just how to avoid or minimize the performance overhead of the
memcg accounting.

People already have suggestions on how to do that, but it's "future
work".

So revert it for now.

Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3754707b

Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" · cd1adf1b

Linus Torvalds authored Sep 07, 2021

This reverts commit 9857a17f.

That commit was completely broken, and I should have caught on to it
earlier.  But happily, the kernel test robot noticed the breakage fairly
quickly.

The breakage is because "try_get_page()" is about avoiding the page
reference count overflow case, but is otherwise the exact same as a
plain "get_page()".

In contrast, "try_get_compound_head()" is an entirely different beast,
and uses __page_cache_add_speculative() because it's not just about the
page reference count, but also about possibly racing with the underlying
page going away.

So all the commentary about how

 "try_get_page() has fallen a little behind in terms of maintenance,
  try_get_compound_head() handles speculative page references more
  thoroughly"

was just completely wrong: yes, try_get_compound_head() handles
speculative page references, but the point is that try_get_page() does
not, and must not.

So there's no lack of maintainance - there are fundamentally different
semantics.

A speculative page reference would be entirely wrong in "get_page()",
and it's entirely wrong in "try_get_page()".  It's not about
speculation, it's purely about "uhhuh, you can't get this page because
you've tried to increment the reference count too much already".

The reason the kernel test robot noticed this bug was that it hit the
VM_BUG_ON() in __page_cache_add_speculative(), which is all about
verifying that the context of any speculative page access is correct.
But since that isn't what try_get_page() is all about, the VM_BUG_ON()
tests things that are not correct to test for try_get_page().
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cd1adf1b

mfd: lpc_sch: Rename GPIOBASE to prevent build error · cdff1eda

Randy Dunlap authored Sep 06, 2021

One MIPS platform (mach-rc32434) defines GPIOBASE. This macro
conflicts with one of the same name in lpc_sch.c. Rename the latter one
to prevent the build error.

../drivers/mfd/lpc_sch.c:25: error: "GPIOBASE" redefined [-Werror]
   25 | #define GPIOBASE        0x44
../arch/mips/include/asm/mach-rc32434/rb.h:32: note: this is the location of the previous definition
   32 | #define GPIOBASE        0x050000

Cc: Denis Turischev <denis@compulab.co.il>
Fixes: e82c60ae ("mfd: Introduce lpc_sch for Intel SCH LPC bridge")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>

cdff1eda

mfd: syscon: Use of_iomap() instead of ioremap() · 452d0741

Hector Martin authored Aug 23, 2021

This automatically selects between ioremap() and ioremap_np() on
platforms that require it, such as Apple SoCs.
Signed-off-by: Hector Martin <marcan@marcan.st>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>

452d0741

06 Sep, 2021 26 commits

thunderbolt: test: split up test cases in tb_test_credit_alloc_all · 4b93c544

Linus Torvalds authored Sep 06, 2021

The tb_test_credit_alloc_all() function had a huge number of
KUNIT_ASSERT() statements, all of which (though the magic of many many
layers of inscrutable macros) ended up allocating and initializing
various test assertion structures on the stack.

Don't do that.  The kernel stack isn't infinite, and we have compiler
warnings (now errors) for the case where a stack frame grows too large.

Like it did here, by not an inconsiderable margin:

   drivers/thunderbolt/test.c: In function ‘tb_test_credit_alloc_all’:
   drivers/thunderbolt/test.c:2367:1: error: the frame size of 4500 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
    2367 | }
         | ^

Solve this similarly to the lib/test_scanf case: split out the tests
into several smaller functions, each just testing one particular tunnel
credit allocation.

This makes the i386 allyesconfig build work for me again.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4b93c544

lib/test_scanf: split up number parsing test routines · ba7b1f86

Linus Torvalds authored Sep 06, 2021

It turns out that gcc has real trouble merging all the temporary
on-stack buffer allocation.  So despite the fact that their lifetimes do
not overlap, gcc will allocate stack for all of them when they have
different types.  Which they do in the number scanning test routines.

This is unfortunate in general, but with lots of test-cases in one
function, it becomes a real problem.  gcc will allocate a huge stack
frame for no actual good reason.

We have tried to counteract this tendency of gcc not merging stack slots
(see "-fconserve-stack"), but that has limited effect (and should be on
by default these days, iirc).

So with all the debug options enabled on an i386 allmodconfig build, we
end up with overly big stack frames, and the resulting stack frame size
warnings (now errors):

   lib/test_scanf.c: In function ‘numbers_list_field_width_val_width’:
   lib/test_scanf.c:530:1: error: the frame size of 2088 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
     530 | }
         | ^
   lib/test_scanf.c: In function ‘numbers_list_field_width_typemax’:
   lib/test_scanf.c:488:1: error: the frame size of 2568 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
     488 | }
         | ^
   lib/test_scanf.c: In function ‘numbers_list’:
   lib/test_scanf.c:437:1: error: the frame size of 2088 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
     437 | }
         | ^

In this particular case, the reasonably straightforward solution is to
just split out the test routines into multiple more targeted versions.
That way we don't have one huge stack, but several smaller ones, and
they aren't active all at the same time.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ba7b1f86

iwl: fix debug printf format strings · 1476ff21

Linus Torvalds authored Sep 06, 2021

The variable 'package_size' is an unsigned long, and should be printed
out using '%lu', not '%zd' (that would be for a size_t).

Yes, on many architectures (including x86-64), 'size_t' is in fact the
same type as 'long', but that's a fairly random architecture definition,
and on some platforms 'size_t' is in fact 'int' rather than 'long'.

That is the case on traditional 32-bit x86.  Yes, both types are the
exact same 32-bit size, and it would all print out perfectly correctly,
but '%zd' ends up still being wrong.

And we can't make 'package_size' be a 'size_t', because we get the
actual value using efivar_entry_get() that takes a pointer to an
'unsigned long'.  So '%lu' it is.

This fixes two of the i386 allmodconfig build warnings (that is now an
error due to -Werror).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1476ff21

Merge tag 'block-5.15-2021-09-05' of git://git.kernel.dk/linux-block · 1dbe7e38

Linus Torvalds authored Sep 06, 2021

Pull block fixes from Jens Axboe:
 "Was going to send this one in later this week, but given that -Werror
  is now enabled (or at least available), the mq-deadline fix really
  should go in for the folks hitting that.

   - Ensure dd_queued() is only there if needed (Geert)

   - Fix a kerneldoc warning for bio_alloc_kiocb()

   - BFQ fix for queue merging

   - loop locking fix (Tetsuo)"

* tag 'block-5.15-2021-09-05' of git://git.kernel.dk/linux-block:
  loop: reduce the loop_ctl_mutex scope
  bio: fix kerneldoc documentation for bio_alloc_kiocb()
  block, bfq: honor already-setup queue merges
  block/mq-deadline: Move dd_queued() to fix defined but not used warning

1dbe7e38

Merge tag 'misc-5.15-2021-09-05' of git://git.kernel.dk/linux-block · 03085b3d

Linus Torvalds authored Sep 06, 2021

Pull CDROM maintainer update from Jens Axboe:
 "It's been about 22 years since I originally started maintaining the
  CDROM code, and I just haven't been able to even get reviews done in a
  timely fashion the last handful of years.

  Time to pass it on, and Phillip has volunteered take over these
  duties. I'll be helping as needed for the foreseeable future"

* tag 'misc-5.15-2021-09-05' of git://git.kernel.dk/linux-block:
  cdrom: update uniform CD-ROM maintainership in MAINTAINERS file

03085b3d

Merge tag 'libata-5.15-2021-09-05' of git://git.kernel.dk/linux-block · eebb4159

Linus Torvalds authored Sep 06, 2021

Pull libata fixes from Jens Axboe:
 "Fixes for queued trim on certain Samsung SSDs, in conjunction with
  certain ATI controllers"

* tag 'libata-5.15-2021-09-05' of git://git.kernel.dk/linux-block:
  libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD.
  libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

eebb4159

Merge tag 'for-5.15/io_uring-2021-09-04' of git://git.kernel.dk/linux-block · 60f8fbaa

Linus Torvalds authored Sep 06, 2021

Pull io_uring fixes from Jens Axboe:
 "As sometimes happens, two reports came in around the merge window open
  that led to some fixes. Hence this one is a bit bigger than usual
  followup fixes, but most of it will be going towards stable, outside
  of the fixes that are addressing regressions from this merge window.

  In detail:

   - postgres is a heavy user of signals between tasks, and if we're
     unlucky this can interfere with io-wq worker creation. Make sure
     we're resilient against unrelated signal handling. This set of
     changes also includes hardening against allocation failures, which
     could previously had led to stalls.

   - Some use cases that end up having a mix of bounded and unbounded
     work would have starvation issues related to that. Split the
     pending work lists to handle that better.

   - Completion trace int -> unsigned -> long fix

   - Fix issue with REGISTER_IOWQ_MAX_WORKERS and SQPOLL

   - Fix regression with hash wait lock in this merge window

   - Fix retry issued on block devices (Ming)

   - Fix regression with links in this merge window (Pavel)

   - Fix race with multi-shot poll and completions (Xiaoguang)

   - Ensure regular file IO doesn't inadvertently skip completion
     batching (Pavel)

   - Ensure submissions are flushed after running task_work (Pavel)"

* tag 'for-5.15/io_uring-2021-09-04' of git://git.kernel.dk/linux-block:
  io_uring: io_uring_complete() trace should take an integer
  io_uring: fix possible poll event lost in multi shot mode
  io_uring: prolong tctx_task_work() with flushing
  io_uring: don't disable kiocb_done() CQE batching
  io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL
  io-wq: make worker creation resilient against signals
  io-wq: get rid of FIXED worker flag
  io-wq: only exit on fatal signals
  io-wq: split bounded and unbounded work into separate lists
  io-wq: fix queue stalling race
  io_uring: don't submit half-prepared drain request
  io_uring: fix queueing half-created requests
  io-wq: ensure that hash wait lock is IRQ disabling
  io_uring: retry in case of short read on block device
  io_uring: IORING_OP_WRITE needs hash_reg_file set
  io-wq: fix race between adding work and activating a free worker

60f8fbaa

don't make the syscall checking produce errors from warnings · 20fbb11f

Stephen Rothwell authored Sep 06, 2021

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

20fbb11f

kernel: debug: Convert to SPDX identifier · f8416aa2

Cai Huoqing authored Sep 06, 2021

use SPDX-License-Identifier instead of a verbose license text
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210906112302.937-1-caihuoqing@baidu.comSigned-off-by: Daniel Thompson <daniel.thompson@linaro.org>

f8416aa2

KVM: Drop unused kvm_dirty_gfn_invalid() · 109bbba5

Peter Xu authored Sep 01, 2021

Drop the unused function as reported by test bot.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210901230904.15164-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

109bbba5

fuse: remove unused arg in fuse_write_file_get() · a9667ac8

Miklos Szeredi authored Sep 01, 2021

The struct fuse_conn argument is not used and can be removed.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

a9667ac8

fuse: wait for writepages in syncfs · 660585b5

Miklos Szeredi authored Sep 01, 2021

In case of fuse the MM subsystem doesn't guarantee that page writeback
completes by the time ->sync_fs() is called.  This is because fuse
completes page writeback immediately to prevent DoS of memory reclaim by
the userspace file server.

This means that fuse itself must ensure that writes are synced before
sending the SYNCFS request to the server.

Introduce sync buckets, that hold a counter for the number of outstanding
write requests.  On syncfs replace the current bucket with a new one and
wait until the old bucket's counter goes down to zero.

It is possible to have multiple syncfs calls in parallel, in which case
there could be more than one waited-on buckets.  Descendant buckets must
not complete until the parent completes.  Add a count to the child (new)
bucket until the (parent) old bucket completes.

Use RCU protection to dereference the current bucket and to wake up an
emptied bucket.  Use fc->lock to protect against parallel assignments to
the current bucket.

This leaves just the counter to be a possible scalability issue.  The
fc->num_waiting counter has a similar issue, so both should be addressed at
the same time.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Fixes: 2d82ab25 ("virtiofs: propagate sync() to file server")
Cc: <stable@vger.kernel.org> # v5.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

660585b5

KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted · d9130a2d

Zelin Deng authored Apr 28, 2021

When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature
especially there's a big tsc warp (like a new vCPU is hot-added into VM
which has been up for a long time), tsc_offset is added by a large value
then go back to guest. This causes system time jump as tsc_timestamp is
not adjusted in the meantime and pvclock monotonic character.
To fix this, just notify kvm to update vCPU's guest time before back to
guest.

Cc: stable@vger.kernel.org
Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d9130a2d

KVM: MMU: mark role_regs and role accessors as maybe unused · 4ac21457

Paolo Bonzini authored Sep 06, 2021

It is reasonable for these functions to be used only in some configurations,
for example only if the host is 64-bits (and therefore supports 64-bit
guests). It is also reasonable to keep the role_regs and role accessors
in sync even though some of the accessors may be used only for one of the
two sets (as is the case currently for CR4.LA57)..

Because clang reports warnings for unused inlines declared in a .c file,
mark both sets of accessors as __maybe_unused.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4ac21457

KVM: MIPS: Remove a "set but not used" variable · a3cf527e

Huacai Chen authored Apr 06, 2021

This fix a build warning:

   arch/mips/kvm/vz.c: In function '_kvm_vz_restore_htimer':
>> arch/mips/kvm/vz.c:392:10: warning: variable 'freeze_time' set but not used [-Wunused-but-set-variable]
     392 |  ktime_t freeze_time;
         |          ^~~~~~~~~~~
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Message-Id: <20210406024911.2008046-1-chenhuacai@loongson.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a3cf527e

Merge tag 'kvmarm-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · e99314a3

Paolo Bonzini authored Sep 06, 2021

KVM/arm64 updates for 5.15

- Page ownership tracking between host EL1 and EL2

- Rely on userspace page tables to create large stage-2 mappings

- Fix incompatibility between pKVM and kmemleak

- Fix the PMU reset state, and improve the performance of the virtual PMU

- Move over to the generic KVM entry code

- Address PSCI reset issues w.r.t. save/restore

- Preliminary rework for the upcoming pKVM fixed feature

- A bunch of MM cleanups

- a vGIC fix for timer spurious interrupts

- Various cleanups

e99314a3

Merge tag 'kvm-s390-next-5.15-1' of... · 0d0a1939

Paolo Bonzini authored Sep 06, 2021

Merge tag 'kvm-s390-next-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fix and feature for 5.15

- enable interpretion of specification exceptions
- fix a vcpu_idx vs vcpu_id mixup

0d0a1939

x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait · a40b2fd0

Lai Jiangshan authored Aug 14, 2021

Commit f4e61f0c ("x86/kvm: Fix broken irq restoration in kvm_wait")
replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable()
when IRQ enabled" to suppress a warnning.

Although there is no similar debugging warnning for doing local_irq_enable()
when IRQ enabled as doing local_irq_restore() in the same IRQ situation.  But
doing local_irq_enable() when IRQ enabled is no less broken as doing
local_irq_restore() and we'd better avoid it.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210814035129.154242-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a40b2fd0

KVM: stats: Add VM stat for remote tlb flush requests · 3cc4e148

Jing Zhang authored Aug 17, 2021

Add a new stat that counts the number of times a remote TLB flush is
requested, regardless of whether it kicks vCPUs out of guest mode. This
allows us to look at how often flushes are initiated.

Unlike remote_tlb_flush, this one applies to ARM's instruction-set-based
TLB flush implementation, so apply it there too.
Original-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210817002639.3856694-1-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3cc4e148

KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() · fdde13c1

Sean Christopherson authored Sep 02, 2021

Don't export KVM's MMU notifier count helpers, under no circumstance
should any downstream module, including x86's vendor code, have a
legitimate reason to piggyback KVM's MMU notifier logic.  E.g in the x86
case, only KVM's MMU should be elevating the notifier count, and that
code is always built into the core kvm.ko module.

Fixes: edb298c6 ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210902175951.1387989-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fdde13c1

KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page · 1148bfc4

Sean Christopherson authored Sep 01, 2021

Move "lpage_disallowed_link" out of the first 64 bytes, i.e. out of the
first cache line, of kvm_mmu_page so that "spt" and to a lesser extent
"gfns" land in the first cache line.  "lpage_disallowed_link" is accessed
relatively infrequently compared to "spt", which is accessed any time KVM
is walking and/or manipulating the shadow page tables.

No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901221023.1303578-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1148bfc4

KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality · ca41c34c

Sean Christopherson authored Sep 01, 2021

Move "tdp_mmu_page" into the 1-byte void left by the recently removed
"mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page,
i.e. in the same cache line as the most commonly accessed fields.

Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in
32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch
can always wrap the field in the unlikely event KVM gains a 1-byte flag
that is 32-bit specific.

Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due
to it previously sharing an 8-byte chunk with write_flooding_count.

No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901221023.1303578-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ca41c34c

Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" · e7177339

Sean Christopherson authored Aug 31, 2021

Revert a misguided illegal GPA check when "translating" a non-nested GPA.
The check is woefully incomplete as it does not fill in @exception as
expected by all callers, which leads to KVM attempting to inject a bogus
exception, potentially exposing kernel stack information in the process.

WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0
RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
Call Trace:
x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853
kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199
handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336
__vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline]
vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038
vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712
vcpu_run arch/x86/kvm/x86.c:9779 [inline]
kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010
kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652

The bug has escaped notice because practically speaking the GPA check is
useless. The GPA check in question only comes into play when KVM is
walking guest page tables (or "translating" CR3), and KVM already handles
illegal GPA checks by setting reserved bits in rsvd_bits_mask for each
PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an
illegal CR3. This particular failure doesn't hit the existing reserved
bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture
simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging
doesn't define any reserved PA bits checks, which KVM emulates by only
incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32.

Simply remove the bogus check. There is zero meaningful value and no
architectural justification for supporting guest.MAXPHYADDR < 32, and
properly filling the exception would introduce non-trivial complexity.

This reverts commit ec7771ab.

Fixes: ec7771ab ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
Cc: stable@vger.kernel.org
Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210831164224.1119728-2-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e7177339

KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page · 678a305b

Jia He authored Aug 30, 2021

After reverting and restoring the fast tlb invalidation patch series,
the mmio_cached is not removed. Hence a unused field is left in
kvm_mmu_page.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jia He <justin.he@arm.com>
Message-Id: <20210830145336.27183-1-justin.he@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

678a305b

kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 · 1dbaf04c

Eduardo Habkost authored Sep 03, 2021

Support for 710 VCPUs was tested by Red Hat since RHEL-8.4,
so increase KVM_SOFT_MAX_VCPUS to 710.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1dbaf04c

kvm: x86: Increase MAX_VCPUS to 1024 · 074c82c8

Eduardo Habkost authored Sep 03, 2021

Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs.

I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it
might involve complicated questions around the meaning of
"supported" and "recommended" in the upstream tree.
KVM_SOFT_MAX_VCPUS will be changed in a separate patch.

For reference, visible effects of this change are:
- KVM_CAP_MAX_VCPUS will now return 1024 (of course)
- Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
  will now be 1024
- KVM_MAX_VCPU_ID will change from 1151 to 4096
- Size of struct kvm will increase from 19328 to 22272 bytes
  (in x86_64)
- Size of struct kvm_ioapic will increase from 1780 to 5084 bytes
  (in x86_64)
- Bitmap stack variables that will grow:
  - At kvm_hv_flush_tlb() kvm_hv_send_ipi(),
    vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
  - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
    once patch "KVM: x86: Fix stack-out-of-bounds memory access
    from ioapic_write_indirect()" is applied
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

074c82c8