Commits · 14a6960b3e928ccea22d687fb0626237885a20bd · Kirill Smelkov / linux

22 Dec, 2023 15 commits

cxl: Add helper function that calculate performance data for downstream ports · 14a6960b

Dave Jiang authored Dec 21, 2023

The CDAT information from the switch, Switch Scoped Latency and Bandwidth
Information Structure (SSLBIS), is parsed and stored under a cxl_dport
based on the correlated downstream port id from the SSLBIS entry. Walk
the entire CXL port paths and collect all the performance data. Also
pick up the link latency number that's stored under the dports. The
entire path PCIe bandwidth can be retrieved using the
pcie_bandwidth_available() call.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319623824.2212653.10302079766473698427.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

14a6960b

cxl: Store the access coordinates for the generic ports · 1037b82f

Dave Jiang authored Dec 21, 2023

Each CXL host bridge is represented by an ACPI0016 device. A generic port
device handle that is an ACPI device is represented by a string of
ACPI0016 device HID and UID. Create a device handle from the ACPI device
and retrieve the access coordinates from the stored memory targets. The
access coordinates are stored under the cxl_dport that is associated with
the CXL host bridge.

The access coordinates struct is dynamically allocated under cxl_dport in
order for code later on to detect whether the data exists or not.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319623196.2212653.17916695743464172534.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

1037b82f

tools/testing/cxl: Add hostbridge UID string for cxl_test mock hb devices · f2202f99

Dave Jiang authored Dec 21, 2023

In order to support acpi_device_uid() call, add static string to
acpi_device->pnp.unique_id.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319622564.2212653.1534465446670631698.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

f2202f99

cxl: Calculate and store PCI link latency for the downstream ports · 4d07a053

Dave Jiang authored Dec 21, 2023

The latency is calculated by dividing the flit size over the bandwidth. Add
support to retrieve the flit size for the CXL switch device and calculate
the latency of the PCIe link. Cache the latency number with cxl_dport.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319621931.2212653.6800240203604822886.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

4d07a053

cxl: Add support for _DSM Function for retrieving QTG ID · 79081590

Dave Jiang authored Dec 21, 2023

CXL spec v3.0 9.17.3 CXL Root Device Specific Methods (_DSM)

Add support to retrieve QTG ID via ACPI _DSM call. The _DSM call requires
an input of an ACPI package with 4 dwords (read latency, write latency,
read bandwidth, write bandwidth). The call returns a package with 1 WORD
that provides the max supported QTG ID and a package that may contain 0 or
more WORDs as the recommended QTG IDs in the recommended order.

Create a cxl_root container for the root cxl_port and provide a callback
->get_qos_class() in order to retrieve the QoS class. For the ACPI case,
the _DSM helper is used to retrieve the QTG ID and returned. A
devm_cxl_add_root() function is added for root port setup and registration
of the cxl_root callback operation(s).
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/170319621294.2212653.1649682083061569256.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

79081590

cxl: Add callback to parse the SSLBIS subtable from CDAT · 80aa780d

Dave Jiang authored Dec 21, 2023

Provide a callback to parse the Switched Scoped Latency and Bandwidth
Information Structure (SSLBIS) in the CDAT structures. The SSLBIS
contains the bandwidth and latency information that's tied to the
CXL switch that the data table has been read from. The extracted
values are stored to the cxl_dport correlated by the port_id
depending on the SSLBIS entry.

Coherent Device Attribute Table 1.03 2.1 Switched Scoped Latency
and Bandwidth Information Structure (DSLBIS)
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319620635.2212653.5194389158785365150.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

80aa780d

cxl: Add callback to parse the DSLBIS subtable from CDAT · 63cef81b

Dave Jiang authored Dec 21, 2023

Provide a callback to parse the Device Scoped Latency and Bandwidth
Information Structure (DSLBIS) in the CDAT structures. The DSLBIS
contains the bandwidth and latency information that's tied to a DSMAS
handle. The driver will retrieve the read and write latency and
bandwidth associated with the DSMAS which is tied to a DPA range.

Coherent Device Attribute Table 1.03 2.1 Device Scoped Latency and
Bandwidth Information Structure (DSLBIS)
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319620005.2212653.7475488478229720542.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

63cef81b

cxl: Add callback to parse the DSMAS subtables from CDAT · ad6f04c0

Dave Jiang authored Dec 21, 2023

Provide a callback function to the CDAT parser in order to parse the
Device Scoped Memory Affinity Structure (DSMAS). Each DSMAS structure
contains the DPA range and its associated attributes in each entry. See
the CDAT specification for details. The device handle and the DPA range
is saved and to be associated with the DSLBIS locality data when the
DSLBIS entries are parsed. The xarray is a local variable. When the
total path performance data is calculated and storred this xarray can be
discarded.

Coherent Device Attribute Table 1.03 2.1 Device Scoped memory Affinity
Structure (DSMAS)
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319619355.2212653.2675953129671561293.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ad6f04c0

acpi: numa: Add helper function to retrieve the performance attributes · ca53543d

Dave Jiang authored Dec 21, 2023

Add helper to retrieve the performance attributes based on the device
handle. The helper function is exported so the CXL driver can use that
to acquire the performance data between the CPU and the CXL host bridge.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/170319618721.2212653.5552947472849081786.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ca53543d

acpi: numa: Add setting of generic port system locality attributes · a3a3e341

Dave Jiang authored Dec 21, 2023

Add generic port support for the parsing of HMAT system locality sub-table.
The attributes will be added to the third array member of the access
coordinates in order to not mix with the existing memory attributes. It
only provides the system locality attributes from initiator to the
generic port targets and is missing the rest of the data to the actual
memory device.

The complete attributes will be updated when a memory device is
attached and the system locality information is calculated end to end.

Through hmat_update_target_attrs(), the best performance attributes will
be setup in target->coord.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/170319618135.2212653.13778540010384821833.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

a3a3e341

acpi: Break out nesting for hmat_parse_locality() · 79205651

Dave Jiang authored Dec 21, 2023

Refactor hmat_parse_locality() to break up the deep nesting of the
function.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/170319617537.2212653.10625501075519862509.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

79205651

acpi: numa: Add genport target allocation to the HMAT parsing · 6373c48b

Dave Jiang authored Dec 21, 2023

Add SRAT parsing for the HMAT init in order to collect the device handle
from the Generic Port Affinity Structure. The device handle will serve as
the key to search for target data.

Consolidate the common code with alloc_memory_target() in a helper function
alloc_target().
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/170319616951.2212653.14862375982250406464.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6373c48b

acpi: numa: Create enum for memory_target access coordinates indexing · 69b789b6

Dave Jiang authored Dec 21, 2023

Create enums to provide named indexing for the access coordinate array.
This is in preparation for adding generic port support which will add a
third index in the array to keep the generic port attributes separate from
the memory attributes.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/170319616332.2212653.3872789279950567889.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

69b789b6

base/node / acpi: Change 'node_hmem_attrs' to 'access_coordinates' · 6a954e94

Dave Jiang authored Dec 21, 2023

Dan Williams suggested changing the struct 'node_hmem_attrs' to
'access_coordinates' [1]. The struct is a container of r/w-latency and
r/w-bandwidth numbers. Moving forward, this container will also be used by
CXL to store the performance characteristics of each link hop in
the PCIE/CXL topology. So, where node_hmem_attrs is just the access
parameters of a memory-node, access_coordinates applies more broadly
to hardware topology characteristics. The observation is that seemed like
an exercise in having the application identify "where" it falls on a
spectrum of bandwidth and latency needs. For the tuple of
read/write-latency and read/write-bandwidth, "coordinates" is not a perfect
fit. Sometimes it is just conveying values in isolation and not a
"location" relative to other performance points, but in the end this data
is used to identify the performance operation point of a given memory-node.
[2]

Link: http://lore.kernel.org/r/64471313421f7_1b66294d5@dwillia2-xfh.jf.intel.com.notmuch/
Link: https://lore.kernel.org/linux-cxl/645e6215ee0de_1e6f2945e@dwillia2-xfh.jf.intel.com.notmuch/Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/170319615734.2212653.15319394025985499185.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6a954e94

lib/firmware_table: tables: Add CDAT table parsing support · 60e43fe5

Dave Jiang authored Dec 21, 2023

The CDAT table is very similar to ACPI tables when it comes to sub-table
and entry structures. The helper functions can be also used to parse the
CDAT table. Add support to the helper functions to deal with an external
CDAT table, and also handle the endieness since CDAT can be processed by a
BE host. Export a function cdat_table_parse() for CXL driver to parse
a CDAT table.

In order to minimize ACPICA code changes, __force is being utilized to deal
with the case of a big endian (BE) host parsing a CDAT. All CDAT data
structure variables are being force casted to __leX as appropriate.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/170319615131.2212653.10932785667981494238.stgit@djiang5-mobl3Signed-off-by: Dan Williams <dan.j.williams@intel.com>

60e43fe5

17 Dec, 2023 10 commits

Linux 6.7-rc6 · ceb6a6f0
Linus Torvalds authored Dec 17, 2023

ceb6a6f0

Merge tag 'perf_urgent_for_v6.7_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 177c2ffe

Linus Torvalds authored Dec 17, 2023

Pull perf fix from Borislav Petkov:

 - Avoid iterating over newly created group leader event's siblings
   because there are none, and thus prevent a lockdep splat

* tag 'perf_urgent_for_v6.7_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix perf_event_validate_size() lockdep splat

177c2ffe

Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 0e389834

Linus Torvalds authored Dec 17, 2023

Pull btrfs fix from David Sterba:
 "One more fix that verifies that the snapshot source is a root, same
  check is also done in user space but should be done by the ioctl as
  well"

* tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: do not allow non subvolume root targets for snapshot

0e389834

Merge tag 'soundwire-6.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire · accc98af

Linus Torvalds authored Dec 17, 2023

Pull soundwire fixes from Vinod Koul:

 - Null pointer dereference for mult link in core

 - AC timing fix in intel driver

* tag 'soundwire-6.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: intel_ace2x: fix AC timing setting for ACE2.x
  soundwire: stream: fix NULL pointer dereference for multi_link

accc98af

Merge tag 'phy-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy · 7f499ec2

Linus Torvalds authored Dec 17, 2023

Pull phy fixes from Vinod Koul:

  - register offset fix for TI driver

  - mediatek driver minimal supported frequency fix

  - negative error code in probe fix for sunplus driver

* tag 'phy-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: sunplus: return negative error code in sp_usb_phy_probe
  phy: mediatek: mipi: mt8183: fix minimal supported frequency
  phy: ti: gmii-sel: Fix register offset when parent is not a syscon node

7f499ec2

Merge tag 'dmaengine-fix-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine · 6d04b70e

Linus Torvalds authored Dec 17, 2023

Pull dmaengine fixes from Vinod Koul:

 - SPI PDMA data fix for TI k3-psil drivers

 - suspend fix, pointer check, logic for arbitration fix and channel
   leak fix in fsl-edma driver

 - couple of fixes in idxd driver for GRPCFG descriptions and int_handle
   field handling

 - single fix for stm32 driver for bitfield overflow

* tag 'dmaengine-fix-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: fsl-edma: fix DMA channel leak in eDMAv4
  dmaengine: fsl-edma: fix wrong pointer check in fsl_edma3_attach_pd()
  dmaengine: idxd: Fix incorrect descriptions for GRPCFG register
  dmaengine: idxd: Protect int_handle field in hw descriptor
  dmaengine: stm32-dma: avoid bitfield overflow assertion
  dmaengine: fsl-edma: Add judgment on enabling round robin arbitration
  dmaengine: fsl-edma: Do not suspend and resume the masked dma channel when the system is sleeping
  dmaengine: ti: k3-psil-am62a: Fix SPI PDMA data
  dmaengine: ti: k3-psil-am62: Fix SPI PDMA data

6d04b70e

Merge tag 'cxl-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 134fdb80

Linus Torvalds authored Dec 17, 2023

Pull CXL (Compute Express Link) fixes from Dan Williams:
 "A collection of CXL fixes.

  The touch outside of drivers/cxl/ is for a helper that allocates
  physical address space. Device hotplug tests showed that the driver
  failed to utilize (skipped over) valid capacity when allocating a new
  memory region. Outside of that, new tests uncovered a small crop of
  lockdep reports.

  There is also some miscellaneous error path and leak fixups that are
  not urgent, but useful to cleanup now.

   - Fix alloc_free_mem_region()'s scan for address space, prevent false
     negative out-of-space events

   - Fix sleeping lock acquisition from CXL trace event (atomic context)

   - Fix put_device() like for the new CXL PMU driver

   - Fix wrong pointer freed on error path

   - Fixup several lockdep reports (missing lock hold) from new
     assertion in cxl_num_decoders_committed() and new tests"

* tag 'cxl-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/pmu: Ensure put_device on pmu devices
  cxl/cdat: Free correct buffer on checksum error
  cxl/hdm: Fix dpa translation locking
  kernel/resource: Increment by align value in get_free_mem_region()
  cxl: Add cxl_num_decoders_committed() usage to cxl_test
  cxl/memdev: Hold region_rwsem during inject and clear poison ops
  cxl/core: Always hold region_rwsem while reading poison lists
  cxl/hdm: Fix a benign lockdep splat

134fdb80

Merge tag 'edac_urgent_for_v6.7_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · ef6a7c27

Linus Torvalds authored Dec 17, 2023

Pull EDAC fix from Borislav Petkov:

 - A single fix for the EDAC Versal driver to read out register fields
   properly

* tag 'edac_urgent_for_v6.7_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/versal: Read num_csrows and num_chans using the correct bitfield macro

ef6a7c27

Merge tag 'powerpc-6.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5ef3720d

Linus Torvalds authored Dec 17, 2023

Pull powerpc fixes from Michael Ellerman:

 - Fix a bug where heavy VAS (accelerator) usage could race with
   partition migration and prevent the migration from completing.

 - Update MAINTAINERS to add Aneesh & Naveen.

Thanks to Haren Myneni.

* tag 'powerpc-6.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  MAINTAINERS: powerpc: Add Aneesh & Naveen
  powerpc/pseries/vas: Migration suspend waits for no in-progress open windows

5ef3720d

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · dde0672b

Linus Torvalds authored Dec 16, 2023

Pull clk fixes from Stephen Boyd:
 "A handful of clk fixes, mostly in the rockchip clk driver:

   - Fix a clk name, clk parent, and a register for a clk gate in the
     Rockchip rk3128 clk driver

   - Add a PLL frequency on Rockchip rk3568 to fix some display
     artifacts

   - Fix a kbuild dependency for Qualcomm's SM_CAMCC_8550 symbol so that
     it isn't possible to select the associated GCC driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: rockchip: rk3128: Fix SCLK_SDMMC's clock name
  clk: rockchip: rk3128: Fix aclk_peri_src's parent
  clk: qcom: Fix SM_CAMCC_8550 dependencies
  clk: rockchip: rk3128: Fix HCLK_OTG gate register
  clk: rockchip: rk3568: Add PLL rate for 292.5MHz

dde0672b

16 Dec, 2023 3 commits

Merge tag 'trace-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 3b8a9b2e

Linus Torvalds authored Dec 16, 2023

Pull tracing fixes from Steven Rostedt:

- Fix eventfs to check creating new files for events with names greater
than NAME_MAX. The eventfs lookup needs to check the return result of
simple_lookup().

- Fix the ring buffer to check the proper max data size. Events must be
able to fit on the ring buffer sub-buffer, if it cannot, then it
fails to be written and the logic to add the event is avoided. The
code to check if an event can fit failed to add the possible absolute
timestamp which may make the event not be able to fit. This causes
the ring buffer to go into an infinite loop trying to find a
sub-buffer that would fit the event. Luckily, there's a check that
will bail out if it looped over a 1000 times and it also warns.

The real fix is not to add the absolute timestamp to an event that is
starting at the beginning of a sub-buffer because it uses the
sub-buffer timestamp.

By avoiding the timestamp at the start of the sub-buffer allows
events that pass the first check to always find a sub-buffer that it
can fit on.

- Have large events that do not fit on a trace_seq to print "LINE TOO
BIG" like it does for the trace_pipe instead of what it does now
which is to silently drop the output.

- Fix a memory leak of forgetting to free the spare page that is saved
by a trace instance.

- Update the size of the snapshot buffer when the main buffer is
updated if the snapshot buffer is allocated.

- Fix ring buffer timestamp logic by removing all the places that tried
to put the before_stamp back to the write stamp so that the next
event doesn't add an absolute timestamp. But each of these updates
added a race where by making the two timestamp equal, it was
validating the write_stamp so that it can be incorrectly used for
calculating the delta of an event.

- There's a temp buffer used for printing the event that was using the
event data size for allocation when it needed to use the size of the
entire event (meta-data and payload data)

- For hardening, use "%.*s" for printing the trace_marker output, to
limit the amount that is printed by the size of the event. This was
discovered by development that added a bug that truncated the '\0'
and caused a crash.

- Fix a use-after-free bug in the use of the histogram files when an
instance is being removed.

- Remove a useless update in the rb_try_to_discard of the write_stamp.
The before_stamp was already changed to force the next event to add
an absolute timestamp that the write_stamp is not used. But the
write_stamp is modified again using an unneeded 64-bit cmpxchg.

- Fix several races in the 32-bit implementation of the
rb_time_cmpxchg() that does a 64-bit cmpxchg.

- While looking at fixing the 64-bit cmpxchg, I noticed that because
the ring buffer uses normal cmpxchg, and this can be done in NMI
context, there's some architectures that do not have a working
cmpxchg in NMI context. For these architectures, fail recording
events that happen in NMI context.

* tag 'trace-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI
ring-buffer: Have rb_time_cmpxchg() set the msb counter too
ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()
ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs
ring-buffer: Remove useless update to write_stamp in rb_try_to_discard()
ring-buffer: Do not try to put back write_stamp
tracing: Fix uaf issue when open the hist or hist_debug file
tracing: Add size check when printing trace_marker output
ring-buffer: Have saved event hold the entire event
ring-buffer: Do not update before stamp when switching sub-buffers
tracing: Update snapshot buffer on resize if it is allocated
ring-buffer: Fix memory leak of free page
eventfs: Fix events beyond NAME_MAX blocking tasks
tracing: Have large events show up as '[LINE TOO BIG]' instead of nothing
ring-buffer: Fix writing to the buffer with max_data_size

3b8a9b2e

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · c8e97fc6

Linus Torvalds authored Dec 15, 2023

Pull arm64 fixes from Catalin Marinas:

 - Arm CMN perf: fix the DTC allocation failure path which can end up
   erroneously clearing live counters

 - arm64/mm: fix hugetlb handling of the dirty page state leading to a
   continuous fault loop in user on hardware without dirty bit
   management (DBM). That's caused by the dirty+writeable information
   not being properly preserved across a series of mprotect(PROT_NONE),
   mprotect(PROT_READ|PROT_WRITE)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
  perf/arm-cmn: Fail DTC counter allocation correctly

c8e97fc6

Merge tag 'pci-v6.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 2e3f280b

Linus Torvalds authored Dec 15, 2023

Pull pci fixes from Bjorn Helgaas:

 - Limit Max_Read_Request_Size (MRRS) on some MIPS Loongson systems
   because they don't all support MRRS > 256, and firmware doesn't
   always initialize it correctly, which meant some PCIe devices didn't
   work (Jiaxun Yang)

 - Add and use pci_enable_link_state_locked() to prevent potential
   deadlocks in vmd and qcom drivers (Johan Hovold)

 - Revert recent (v6.5) acpiphp resource assignment changes that fixed
   issues with hot-adding devices on a root bus or with large BARs, but
   introduced new issues with GPU initialization and hot-adding SCSI
   disks in QEMU VMs and (Bjorn Helgaas)

* tag 'pci-v6.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
  PCI/ASPM: Add pci_disable_link_state_locked() lockdep assert
  PCI/ASPM: Clean up __pci_disable_link_state() 'sem' parameter
  PCI: qcom: Clean up ASPM comment
  PCI: qcom: Fix potential deadlock when enabling ASPM
  PCI: vmd: Fix potential deadlock when enabling ASPM
  PCI/ASPM: Add pci_enable_link_state_locked()
  PCI: loongson: Limit MRRS to 256

2e3f280b

15 Dec, 2023 12 commits

btrfs: do not allow non subvolume root targets for snapshot · a8892fd7

Josef Bacik authored Dec 15, 2023

Our btrfs subvolume snapshot <source> <destination> utility enforces
that <source> is the root of the subvolume, however this isn't enforced
in the kernel.  Update the kernel to also enforce this limitation to
avoid problems with other users of this ioctl that don't have the
appropriate checks in place.
Reported-by: Martin Michaelis <code@mgjm.de>
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

a8892fd7

cred: get rid of CONFIG_DEBUG_CREDENTIALS · ae191417

Jens Axboe authored Dec 15, 2023

This code is rarely (never?) enabled by distros, and it hasn't caught
anything in decades. Let's kill off this legacy debug code.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ae191417

cred: switch to using atomic_long_t · f8fa5d76

Jens Axboe authored Dec 15, 2023

There are multiple ways to grab references to credentials, and the only
protection we have against overflowing it is the memory required to do
so.

With memory sizes only moving in one direction, let's bump the reference
count to 64-bit and move it outside the realm of feasibly overflowing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f8fa5d76

Revert "PCI: acpiphp: Reassign resources on bridge if necessary" · 5df12742

Bjorn Helgaas authored Dec 14, 2023

This reverts commit 40613da5 and the
subsequent fix to it:

  cc22522f ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")

40613da5 fixed a problem where hot-adding a device with large BARs
failed if the bridge windows programmed by firmware were not large enough.

cc22522f ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources()
only for non-root bus") fixed a problem with 40613da5: an ACPI hot-add
of a device on a PCI root bus (common in the virt world) or firmware
sending ACPI Bus Check to non-existent Root Ports (e.g., on Dell Inspiron
7352/0W6WV0) caused a NULL pointer dereference and suspend/resume hangs.

Unfortunately the combination of 40613da5 and cc22522f caused other
problems:

  - Fiona reported that hot-add of SCSI disks in QEMU virtual machine fails
    sometimes.

  - Dongli reported a similar problem with hot-add of SCSI disks.

  - Jonathan reported a console freeze during boot on bare metal due to an
    error in radeon GPU initialization.

Revert both patches to avoid adding these problems.  This means we will
again see the problems with hot-adding devices with large BARs and the NULL
pointer dereferences and suspend/resume issues that 40613da5 and
cc22522f were intended to fix.

Fixes: 40613da5 ("PCI: acpiphp: Reassign resources on bridge if necessary")
Fixes: cc22522f ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")
Reported-by: Fiona Ebner <f.ebner@proxmox.com>
Closes: https://lore.kernel.org/r/9eb669c0-d8f2-431d-a700-6da13053ae54@proxmox.comReported-by: Dongli Zhang <dongli.zhang@oracle.com>
Closes: https://lore.kernel.org/r/3c4a446a-b167-11b8-f36f-d3c1b49b42e9@oracle.comReported-by: Jonathan Woithe <jwoithe@just42.net>
Closes: https://lore.kernel.org/r/ZXpaNCLiDM+Kv38H@marvin.atrad.com.auSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Cc: <stable@vger.kernel.org>

5df12742

Merge tag 'io_uring-6.7-2023-12-15' of git://git.kernel.dk/linux · 3bd7d748

Linus Torvalds authored Dec 15, 2023

Pull io_uring fixes from Jens Axboe:
 "Just two minor fixes:

   - Fix for the io_uring socket option commands using the wrong value
     on some archs (Al)

   - Tweak to the poll lazy wake enable (me)"

* tag 'io_uring-6.7-2023-12-15' of git://git.kernel.dk/linux:
  io_uring/cmd: fix breakage in SOCKET_URING_OP_SIOC* implementation
  io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE

3bd7d748

Merge tag 'mm-hotfixes-stable-2023-12-15-07-11' of... · a62aa88b

Linus Torvalds authored Dec 15, 2023

Merge tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "17 hotfixes. 8 are cc:stable and the other 9 pertain to post-6.6
  issues"

* tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/mglru: reclaim offlined memcgs harder
  mm/mglru: respect min_ttl_ms with memcgs
  mm/mglru: try to stop at high watermarks
  mm/mglru: fix underprotected page cache
  mm/shmem: fix race in shmem_undo_range w/THP
  Revert "selftests: error out if kernel header files are not yet built"
  crash_core: fix the check for whether crashkernel is from high memory
  x86, kexec: fix the wrong ifdeffery CONFIG_KEXEC
  sh, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
  mips, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
  m68k, kexec: fix the incorrect ifdeffery and build dependency of CONFIG_KEXEC
  loongarch, kexec: change dependency of object files
  mm/damon/core: make damon_start() waits until kdamond_fn() starts
  selftests/mm: cow: print ksft header before printing anything else
  mm: fix VMA heap bounds checking
  riscv: fix VMALLOC_START definition
  kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP

a62aa88b

Merge tag 'sound-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 26e7a301

Linus Torvalds authored Dec 15, 2023

Pull sound fixes from Takashi Iwai:
 "A collection of HD-audio quirks for TAS2781 codec and device-specific
  workarounds"

* tag 'sound-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/tas2781: reset the amp before component_add
  ALSA: hda/tas2781: call cleanup functions only once
  ALSA: hda/tas2781: handle missing EFI calibration data
  ALSA: hda/tas2781: leave hda_component in usable state
  ALSA: hda/realtek: Apply mute LED quirk for HP15-db
  ALSA: hda/hdmi: add force-connect quirks for ASUSTeK Z170 variants
  ALSA: hda/hdmi: add force-connect quirk for NUC5CPYB

26e7a301

Merge tag 'drm-fixes-2023-12-15' of git://anongit.freedesktop.org/drm/drm · 595609b2

Linus Torvalds authored Dec 15, 2023

Pull drm fixes from Dave Airlie:
 "More regular fixes, amdgpu, i915, mediatek and nouveau are most of
  them this week. Nothing too major, then a few misc bits and pieces in
  core, panel and ivpu.

  drm:
   - fix uninit problems in crtc
   - fix fd ownership check
   - edid: add modes in fallback paths

  panel:
   - move LG panel into DSI yaml
   - ltk050h3146w: set burst mode

  mediatek:
   - mtk_disp_gamma: Fix breakage due to merge issue
   - fix kernel oops if no crtc is found
   - Add spinlock for setting vblank event in atomic_begin
   - Fix access violation in mtk_drm_crtc_dma_dev_get

  i915:
   - Fix selftest engine reset count storage for multi-tile
   - Fix out-of-bounds reads for engine reset counts
   - Fix ADL+ remapped stride with CCS
   - Fix intel_atomic_setup_scalers() plane_state handling
   - Fix ADL+ tiled plane stride when the POT stride is smaller than the original
   - Fix eDP 1.4 rate select method link configuration

  amdgpu:
   - Fix suspend fix that got accidently mangled last week
   - Fix OD regression
   - PSR fixes
   - OLED Backlight regression fix
   - JPEG 4.0.5 fix
   - Misc display fixes
   - SDMA 5.2 fix
   - SDMA 2.4 regression fix
   - GPUVM race fix

  nouveau:
   - fix gk20a instobj hierarchy
   - fix headless iors inheritance regression

  ivpu:
   - fix WA initialisation"

* tag 'drm-fixes-2023-12-15' of git://anongit.freedesktop.org/drm/drm: (31 commits)
  drm/nouveau/kms/nv50-: Don't allow inheritance of headless iors
  drm/nouveau: Fixup gk20a instobj hierarchy
  drm/amdgpu: warn when there are still mappings when a BO is destroyed v2
  drm/amdgpu: fix tear down order in amdgpu_vm_pt_free
  drm/amd: Fix a probing order problem on SDMA 2.4
  drm/amdgpu/sdma5.2: add begin/end_use ring callbacks
  drm/panel: ltk050h3146w: Set burst mode for ltk050h3148w
  dt-bindings: panel-simple-dsi: move LG 5" HD TFT LCD panel into DSI yaml
  drm/amd/display: Disable PSR-SU on Parade 0803 TCON again
  drm/amd/display: Populate dtbclk from bounding box
  drm/amd/display: Revert "Fix conversions between bytes and KB"
  drm/amdgpu/jpeg: configure doorbell for each playback
  drm/amd/display: Restore guard against default backlight value < 1 nit
  drm/amd/display: fix hw rotated modes when PSR-SU is enabled
  drm/amd/pm: fix pp_*clk_od typo
  drm/amdgpu: fix buffer funcs setting order on suspend harder
  drm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get
  drm/edid: also call add modes in EDID connector update fallback
  drm/i915/edp: don't write to DP_LINK_BW_SET when using rate select
  drm/i915: Fix ADL+ tiled plane stride when the POT stride is smaller than the original
  ...

595609b2

ring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI · 71229230

Steven Rostedt (Google) authored Dec 13, 2023

As the ring buffer recording requires cmpxchg() to work, if the
architecture does not support cmpxchg in NMI, then do not do any recording
within an NMI.

Link: https://lore.kernel.org/linux-trace-kernel/20231213175403.6fc18540@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

71229230

ring-buffer: Have rb_time_cmpxchg() set the msb counter too · 0aa0e528

Steven Rostedt (Google) authored Dec 15, 2023

The rb_time_cmpxchg() on 32-bit architectures requires setting three
32-bit words to represent the 64-bit timestamp, with some salt for
synchronization. Those are: msb, top, and bottom

The issue is, the rb_time_cmpxchg() did not properly salt the msb portion,
and the msb that was written was stale.

Link: https://lore.kernel.org/linux-trace-kernel/20231215084114.20899342@rorschach.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: f03f2abc ("ring-buffer: Have 32 bit time stamps use all 64 bits")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

0aa0e528

ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg() · dec89008

Mathieu Desnoyers authored Dec 12, 2023

The following race can cause rb_time_read() to observe a corrupted time
stamp:

rb_time_cmpxchg()
[...]
        if (!rb_time_read_cmpxchg(&t->msb, msb, msb2))
                return false;
        if (!rb_time_read_cmpxchg(&t->top, top, top2))
                return false;
<interrupted before updating bottom>
__rb_time_read()
[...]
        do {
                c = local_read(&t->cnt);
                top = local_read(&t->top);
                bottom = local_read(&t->bottom);
                msb = local_read(&t->msb);
        } while (c != local_read(&t->cnt));

        *cnt = rb_time_cnt(top);

        /* If top and msb counts don't match, this interrupted a write */
        if (*cnt != rb_time_cnt(msb))
                return false;
          ^ this check fails to catch that "bottom" is still not updated.

So the old "bottom" value is returned, which is wrong.

Fix this by checking that all three of msb, top, and bottom 2-bit cnt
values match.

The reason to favor checking all three fields over requiring a specific
update order for both rb_time_set() and rb_time_cmpxchg() is because
checking all three fields is more robust to handle partial failures of
rb_time_cmpxchg() when interrupted by nested rb_time_set().

Link: https://lore.kernel.org/lkml/20231211201324.652870-1-mathieu.desnoyers@efficios.com/
Link: https://lore.kernel.org/linux-trace-kernel/20231212193049.680122-1-mathieu.desnoyers@efficios.com

Fixes: f458a145 ("ring-buffer: Test last update in 32bit version of __rb_time_read()")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

dec89008

ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs · fff88fa0

Steven Rostedt (Google) authored Dec 12, 2023

Mathieu Desnoyers pointed out an issue in the rb_time_cmpxchg() for 32 bit
architectures. That is:

 static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set)
 {
	unsigned long cnt, top, bottom, msb;
	unsigned long cnt2, top2, bottom2, msb2;
	u64 val;

	/* The cmpxchg always fails if it interrupted an update */
	 if (!__rb_time_read(t, &val, &cnt2))
		 return false;

	 if (val != expect)
		 return false;

<<<< interrupted here!

	 cnt = local_read(&t->cnt);

The problem is that the synchronization counter in the rb_time_t is read
*after* the value of the timestamp is read. That means if an interrupt
were to come in between the value being read and the counter being read,
it can change the value and the counter and the interrupted process would
be clueless about it!

The counter needs to be read first and then the value. That way it is easy
to tell if the value is stale or not. If the counter hasn't been updated,
then the value is still good.

Link: https://lore.kernel.org/linux-trace-kernel/20231211201324.652870-1-mathieu.desnoyers@efficios.com/
Link: https://lore.kernel.org/linux-trace-kernel/20231212115301.7a9c9a64@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 10464b4a ("ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit")
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

fff88fa0