Commits · cdb3033e191fd03da2d7da23b9cd448dfa180a8e · Kirill Smelkov / linux

08 Jan, 2024 1 commit

Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes for... · cdb3033e

Ingo Molnar authored Jan 08, 2024

Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes for the v6.8 merge window

This fix didn't make it upstream in time, pick it up
for the v6.8 merge window.
Signed-off-by: Ingo Molnar <mingo@kernel.org>

cdb3033e

29 Dec, 2023 1 commit

sched/fair: Fix tg->load when offlining a CPU · f60a631a

Vincent Guittot authored Dec 21, 2023

When a CPU is taken offline, the contribution of its cfs_rqs to task_groups'
load may remain and will negatively impact the calculation of the share of
the online CPUs.

To fix this bug, clear the contribution of an offlining CPU to task groups'
load and skip its contribution while it is inactive.

Here's the reproducer of the anomaly, by Imran Khan:

	"So far I have encountered only one rather lengthy way of reproducing this issue,
	which is as follows:

	1. Take a KVM guest (booted with 4 CPUs and can be scaled up to 124 CPUs) and
	   create 2 custom cgroups: /sys/fs/cgroup/cpu/test_group_1 and /sys/fs/cgroup/
	   cpu/test_group_2

	2. Assign a CPU intensive workload to each of these cgroups and start the
	   workload.

	For my tests I am using following app:

	int main(int argc, char *argv[])
	{
		unsigned long count, i, val;
		if (argc != 2) {
		      printf("usage: ./a.out <number of random nums to generate> \n");
		      return 0;
		}

		count = strtoul(argv[1], NULL, 10);

		printf("Generating %lu random numbers \n", count);
		for (i = 0; i < count; i++) {
			val = rand();
			val = val % 2;
			//usleep(1);
		}
		printf("Generated %lu random numbers \n", count);
		return 0;
	}

	Also since the system is booted with 4 CPUs, in order to completely load the
	system I am also launching 4 instances of same test app under:

	   /sys/fs/cgroup/cpu/

	3. We can see that both of the cgroups get similar CPU time:

        # systemd-cgtop --depth 1
	Path                                 Tasks    %CPU  Memory  Input/s    Output/s
	/                                      659      -     5.5G        -        -
	/system.slice                            -      -     5.7G        -        -
	/test_group_1                            4      -        -        -        -
	/test_group_2                            3      -        -        -        -
	/user.slice                             31      -    56.5M        -        -

	Path                                 Tasks   %CPU   Memory  Input/s    Output/s
	/                                      659  394.6     5.5G        -        -
	/test_group_2                            3   65.7        -        -        -
	/user.slice                             29   55.1    48.0M        -        -
	/test_group_1                            4   47.3        -        -        -
	/system.slice                            -    2.2     5.7G        -        -

	Path                                 Tasks  %CPU    Memory  Input/s    Output/s
	/                                      659  394.8     5.5G        -        -
	/test_group_1                            4   62.9        -        -        -
	/user.slice                             28   44.9    54.2M        -        -
	/test_group_2                            3   44.7        -        -        -
	/system.slice                            -    0.9     5.7G        -        -

	Path                                 Tasks  %CPU    Memory  Input/s     Output/s
	/                                      659  394.4     5.5G        -        -
	/test_group_2                            3   58.8        -        -        -
	/test_group_1                            4   51.9        -        -        -
	/user.slice                              30   39.3    59.6M        -        -
	/system.slice                            -    1.9     5.7G        -        -

	Path                                 Tasks  %CPU     Memory  Input/s    Output/s
	/                                      659  394.7     5.5G        -        -
	/test_group_1                            4   60.9        -        -        -
	/test_group_2                            3   57.9        -        -        -
	/user.slice                             28   43.5    36.9M        -        -
	/system.slice                            -    3.0     5.7G        -        -

	Path                                 Tasks  %CPU     Memory  Input/s     Output/s
	/                                      659  395.0     5.5G        -        -
	/test_group_1                            4   66.8        -        -        -
	/test_group_2                            3   56.3        -        -        -
	/user.slice                             29   43.1    51.8M        -        -
	/system.slice                            -    0.7     5.7G        -        -

	4. Now move systemd-udevd to one of these test groups, say test_group_1, and
	   perform scale up to 124 CPUs followed by scale down back to 4 CPUs from the
	   host side.

	5. Run the same workload i.e 4 instances of CPU hogger under /sys/fs/cgroup/cpu
	   and one instance of  CPU hogger each in /sys/fs/cgroup/cpu/test_group_1 and
	   /sys/fs/cgroup/test_group_2.

	It can be seen that test_group_1 (the one where systemd-udevd was moved) is getting
	much less CPU time than the test_group_2, even though at this point of time both of
	these groups have only CPU hogger running:

        # systemd-cgtop --depth 1
	Path                                   Tasks   %CPU   Memory  Input/s   Output/s
	/                                      1219     -     5.4G        -        -
	/system.slice                           -       -     5.6G        -        -
	/test_group_1                           4       -        -        -        -
	/test_group_2                           3       -        -        -        -
	/user.slice                            26       -    91.3M        -        -

	Path                                   Tasks  %CPU     Memory  Input/s   Output/s
	/                                      1221  394.3     5.4G        -        -
	/test_group_2                             3   82.7        -        -        -
	/test_group_1                             4   14.3        -        -        -
	/system.slice                             -    0.8     5.6G        -        -
	/user.slice                              26    0.4    91.2M        -        -

	Path                                   Tasks  %CPU    Memory  Input/s    Output/s
	/                                      1221  394.6     5.4G        -        -
	/test_group_2                             3   67.4        -        -        -
	/system.slice                             -   24.6     5.6G        -        -
	/test_group_1                             4   12.5        -        -        -
	/user.slice                              26    0.4    91.2M        -        -

	Path                                  Tasks  %CPU    Memory  Input/s    Output/s
	/                                     1221  395.2     5.4G        -        -
	/test_group_2                            3   60.9        -        -        -
	/system.slice                            -   27.9     5.6G        -        -
	/test_group_1                            4   12.2        -        -        -
	/user.slice                             26    0.4    91.2M        -        -

	Path                                  Tasks  %CPU    Memory  Input/s    Output/s
	/                                     1221  395.2     5.4G        -        -
	/test_group_2                            3   69.4        -        -        -
	/test_group_1                            4   13.9        -        -        -
	/user.slice                             28    1.6    92.0M        -        -
	/system.slice                            -    1.0     5.6G        -        -

	Path                                  Tasks  %CPU    Memory  Input/s    Output/s
	/                                      1221  395.6     5.4G        -        -
	/test_group_2                             3   59.3        -        -        -
	/test_group_1                             4   14.1        -        -        -
	/user.slice                              28    1.3    92.2M        -        -
	/system.slice                             -    0.7     5.6G        -        -

	Path                                  Tasks  %CPU    Memory  Input/s    Output/s
	/                                      1221  395.5     5.4G        -        -
	/test_group_2                            3   67.2        -        -        -
	/test_group_1                            4   11.5        -        -        -
	/user.slice                             28    1.3    92.5M        -        -
	/system.slice                            -    0.6     5.6G        -        -

	Path                                  Tasks  %CPU    Memory  Input/s    Output/s
	/                                      1221  395.1     5.4G        -        -
	/test_group_2                             3   76.8        -        -        -
	/test_group_1                             4   12.9        -        -        -
	/user.slice                              28    1.3    92.8M        -        -
	/system.slice                             -    1.2     5.6G        -        -

	From sched_debug data it can be seen that in bad case the load.weight of per-CPU
	sched entities corresponding to test_group_1 has reduced significantly and
	also load_avg of test_group_1 remains much higher than that of test_group_2,
	even though systemd-udevd stopped running long time back and at this point of
	time both cgroups just have the CPU hogger app as running entity."

[ mingo: Added details from the original discussion, plus minor edits to the patch. ]
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Tested-by: Imran Khan <imran.f.khan@oracle.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lore.kernel.org/r/20231223111545.62135-1-vincent.guittot@linaro.org

f60a631a

23 Dec, 2023 14 commits

sched/fair: Remove unused 'next_buddy_marked' local variable in check_preempt_wakeup_fair() · fbb66ce0

Wang Jinchao authored Dec 14, 2023

This variable became unused in:

    5e963f2b ("sched/fair: Commit to EEVDF")
Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/202312141319+0800-wangjinchao@xfusion.com

fbb66ce0

sched/fair: Use all little CPUs for CPU-bound workloads · 3af7524b

Pierre Gondois authored Dec 06, 2023

Running N CPU-bound tasks on an N CPUs platform:

- with asymmetric CPU capacity

- not being a DynamIq system (i.e. having a PKG level sched domain
  without the SD_SHARE_PKG_RESOURCES flag set)

.. might result in a task placement where two tasks run on a big CPU
and none on a little CPU. This placement could be more optimal by
using all CPUs.

Testing platform:

  Juno-r2:
    - 2 big CPUs (1-2), maximum capacity of 1024
    - 4 little CPUs (0,3-5), maximum capacity of 383

Testing workload ([1]):

  Spawn 6 CPU-bound tasks. During the first 100ms (step 1), each tasks
  is affine to a CPU, except for:

    - one little CPU which is left idle.
    - one big CPU which has 2 tasks affine.

  After the 100ms (step 2), remove the cpumask affinity.

Behavior before the patch:

  During step 2, the load balancer running from the idle CPU tags sched
  domains as:

  - little CPUs: 'group_has_spare'. Cf. group_has_capacity() and
    group_is_overloaded(), 3 CPU-bound tasks run on a 4 CPUs
    sched-domain, and the idle CPU provides enough spare capacity
    regarding the imbalance_pct

  - big CPUs: 'group_overloaded'. Indeed, 3 tasks run on a 2 CPUs
    sched-domain, so the following path is used:

      group_is_overloaded()
      \-if (sgs->sum_nr_running <= sgs->group_weight) return true;

    The following path which would change the migration type to
    'migrate_task' is not taken:

      calculate_imbalance()
      \-if (env->idle != CPU_NOT_IDLE && env->imbalance == 0)

    as the local group has some spare capacity, so the imbalance
    is not 0.

  The migration type requested is 'migrate_util' and the busiest
  runqueue is the big CPU's runqueue having 2 tasks (each having a
  utilization of 512). The idle little CPU cannot pull one of these
  task as its capacity is too small for the task. The following path
  is used:

   detach_tasks()
   \-case migrate_util:
     \-if (util > env->imbalance) goto next;

After the patch:

As the number of failed balancing attempts grows (with
'nr_balance_failed'), progressively make it easier to migrate
a big task to the idling little CPU. A similar mechanism is
used for the 'migrate_load' migration type.

Improvement:

Running the testing workload [1] with the step 2 representing
a ~10s load for a big CPU:

  Before patch: ~19.3s
  After patch:  ~18s (-6.7%)

Similar issue reported at:

  https://lore.kernel.org/lkml/20230716014125.139577-1-qyousef@layalina.io/Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Qais Yousef <qyousef@layalina.io>
Link: https://lore.kernel.org/r/20231206090043.634697-1-pierre.gondois@arm.com

3af7524b

sched/fair: Simplify util_est · 11137d38

Vincent Guittot authored Dec 01, 2023

With UTIL_EST_FASTUP now being permanent, we can take advantage of the
fact that the ewma jumps directly to a higher utilization at dequeue to
simplify util_est and remove the enqueued field.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Reviewed-by: Alex Shi <alexs@kernel.org>
Link: https://lore.kernel.org/r/20231201161652.1241695-3-vincent.guittot@linaro.org

11137d38

sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) · 7736ae55

Vincent Guittot authored Dec 01, 2023

sched_feat(UTIL_EST_FASTUP) has been added to easily disable the feature
in order to check for possibly related regressions. After 3 years, it has
never been used and no regression has been reported. Let's remove it
and make fast increase a permanent behavior.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com>
Reviewed-by: Yanteng Si <siyanteng@loongson.cn> [for the Chinese translation]
Reviewed-by: Alex Shi <alexs@kernel.org>
Link: https://lore.kernel.org/r/20231201161652.1241695-2-vincent.guittot@linaro.org

7736ae55

arm64/amu: Use capacity_ref_freq() to set AMU ratio · 1f023007

Vincent Guittot authored Dec 11, 2023

Use the new capacity_ref_freq() method to set the ratio that is used by AMU for
computing the arch_scale_freq_capacity().
This helps to keep everything aligned using the same reference for
computing CPUs capacity.

The default value of the ratio (stored in per_cpu(arch_max_freq_scale))
ensures that arch_scale_freq_capacity() returns max capacity until it is
set to its correct value with the cpu capacity and capacity_ref_freq().
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20231211104855.558096-8-vincent.guittot@linaro.org

1f023007

cpufreq/cppc: Set the frequency used for computing the capacity · 5477fa24

Vincent Guittot authored Dec 11, 2023

Save the frequency associated to the performance that has been used when
initializing the capacity of CPUs.

Also, cppc cpufreq driver can register an artificial energy model. In such
case, it needs the frequency for this compute capacity.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20231211104855.558096-7-vincent.guittot@linaro.org

5477fa24

cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}() · 50b813b1

Vincent Guittot authored Dec 11, 2023

Move and rename cppc_cpufreq_perf_to_khz() and cppc_cpufreq_khz_to_perf() to
use them outside cppc_cpufreq in topology_init_cpu_capacity_cppc().

Modify the interface to use struct cppc_perf_caps *caps instead of
struct cppc_cpudata *cpu_data as we only use the fields of cppc_perf_caps.

cppc_cpufreq was converting the lowest and nominal freq from MHz to kHz
before using them. We move this conversion inside cppc_perf_to_khz and
cppc_khz_to_perf to make them generic and usable outside cppc_cpufreq.

No functional change
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20231211104855.558096-6-vincent.guittot@linaro.org

50b813b1

energy_model: Use a fixed reference frequency · 15cbbd1d

Vincent Guittot authored Dec 11, 2023

The last item of a performance domain is not always the performance point
that has been used to compute CPU's capacity. This can lead to different
target frequency compared with other part of the system like schedutil and
would result in wrong energy estimation.

A new arch_scale_freq_ref() is available to return a fixed and coherent
frequency reference that can be used when computing the CPU's frequency
for an level of utilization. Use this function to get this reference
frequency.

Energy model is never used without defining arch_scale_freq_ref() but
can be compiled. Define a default arch_scale_freq_ref() returning 0
in such case.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20231211104855.558096-5-vincent.guittot@linaro.org

15cbbd1d

cpufreq/schedutil: Use a fixed reference frequency · b3edde44

Vincent Guittot authored Dec 11, 2023

cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different than the one that has been
used when computing the capacity of a CPU.

The new arch_scale_freq_ref() returns a fixed and coherent reference
frequency that can be used when computing a frequency based on utilization.

Use this arch_scale_freq_ref() when available and fallback to
policy otherwise.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20231211104855.558096-4-vincent.guittot@linaro.org

b3edde44

cpufreq: Use the fixed and coherent frequency for scaling capacity · 599457ba

Vincent Guittot authored Dec 11, 2023

cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different from the frequency that has been
used to compute the capacity of a CPU.

The new arch_scale_freq_ref() returns a fixed and coherent frequency
that can be used to compute the capacity for a given frequency.

[ Also fix a arch_set_freq_scale()  newline style wart in <linux/cpufreq.h>. ]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20231211104855.558096-3-vincent.guittot@linaro.org

599457ba

sched/topology: Add a new arch_scale_freq_ref() method · 9942cb22

Vincent Guittot authored Dec 11, 2023

Create a new method to get a unique and fixed max frequency. Currently
cpuinfo.max_freq or the highest (or last) state of performance domain are
used as the max frequency when computing the frequency for a level of
utilization, but:

  - cpuinfo_max_freq can change at runtime. boost is one example of
    such change.

  - cpuinfo.max_freq and last item of the PD can be different leading to
    different results between cpufreq and energy model.

We need to save the reference frequency that has been used when computing
the CPUs capacity and use this fixed and coherent value to convert between
frequency and CPU's capacity.

In fact, we already save the frequency that has been used when computing
the capacity of each CPU. We extend the precision to save kHz instead of
MHz currently and we modify the type to be aligned with other variables
used when converting frequency to capacity and the other way.

[ mingo: Minor edits. ]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20231211104855.558096-2-vincent.guittot@linaro.org

9942cb22

Merge tag 'v6.7-rc6' into sched/core, to pick up fixes · d2e9f53a
Ingo Molnar authored Dec 23, 2023
```
Signed-off-by: Ingo Molnar <mingo@kernel.org>
```
d2e9f53a

Merge tag 'block-6.7-2023-12-22' of git://git.kernel.dk/linux · 5254c0cb

Linus Torvalds authored Dec 22, 2023

Pull block fixes from Jens Axboe:
 "Just an NVMe pull request this time, with a fix for bad sleeping
  context, and a revert of a patch that caused some trouble"

* tag 'block-6.7-2023-12-22' of git://git.kernel.dk/linux:
  nvme-pci: fix sleeping function called from interrupt context
  Revert "nvme-fc: fix race between error recovery and creating association"

5254c0cb

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 867583b3

Linus Torvalds authored Dec 22, 2023

Pull kvm fixes from Paolo Bonzini:
"RISC-V:

   - Fix a race condition in updating external interrupt for
     trap-n-emulated IMSIC swfile

   - Fix print_reg defaults in get-reg-list selftest

  ARM:

   - Ensure a vCPU's redistributor is unregistered from the MMIO bus if
     vCPU creation fails

   - Fix building KVM selftests for arm64 from the top-level Makefile

  x86:

   - Fix breakage for SEV-ES guests that use XSAVES

  Selftests:

   - Fix bad use of strcat(), by not using strcat() at all"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
  KVM: selftests: Fix dynamic generation of configuration names
  RISCV: KVM: update external interrupt atomically for IMSIC swfile
  KVM: riscv: selftests: Fix get-reg-list print_reg defaults
  KVM: selftests: Ensure sysreg-defs.h is generated at the expected path
  KVM: Convert comment into an assertion in kvm_io_bus_register_dev()
  KVM: arm64: vgic: Ensure that slots_lock is held in vgic_register_all_redist_iodevs()
  KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy
  KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy()
  KVM: arm64: vgic: Simplify kvm_vgic_destroy()

867583b3

22 Dec, 2023 12 commits

Merge tag 'kvm-riscv-fixes-6.7-1' of https://github.com/kvm-riscv/linux into kvm-master · ef5b2837

Paolo Bonzini authored Dec 22, 2023

KVM/riscv fixes for 6.7, take #1

- Fix a race condition in updating external interrupt for
  trap-n-emulated IMSIC swfile
- Fix print_reg defaults in get-reg-list selftest

ef5b2837

Merge tag 'kvmarm-fixes-6.7-2' of... · 5c2b2176

Paolo Bonzini authored Dec 22, 2023

Merge tag 'kvmarm-fixes-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm64 fixes for 6.7, part #2

 - Ensure a vCPU's redistributor is unregistered from the MMIO bus
   if vCPU creation fails

 - Fix building KVM selftests for arm64 from the top-level Makefile

5c2b2176

Merge tag 'printk-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · c0f65a7c

Linus Torvalds authored Dec 22, 2023

Pull printk fix from Petr Mladek:

 - Prevent refcount warning from code releasing a fwnode

* tag 'printk-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  lib/vsprintf: Fix %pfwf when current node refcount == 0

c0f65a7c

Merge tag 'sound-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 5414aea7

Linus Torvalds authored Dec 22, 2023

Pull sound fixes from Takashi Iwai:
 "Apparently there were so many kids wishing bug fixes that made Santa
  busy; here we have lots of fixes although it's a bit late. But all
  changes are device-specific, hence it should be relatively safe to
  apply.

  Most of changes are for Cirrus codecs (for both ASoC and HD-audio),
  while the remaining are fixes for TI codecs, HD-audio and USB-audio
  quirks"

* tag 'sound-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
  ALSA: hda: cs35l41: Only add SPI CS GPIO if SPI is enabled in kernel
  ALSA: hda: cs35l41: Do not allow uninitialised variables to be freed
  ASoC: fsl_sai: Fix channel swap issue on i.MX8MP
  ASoC: hdmi-codec: fix missing report for jack initial status
  ALSA: hda/realtek: Add quirks for ASUS Zenbook 2023 Models
  ALSA: hda: cs35l41: Support additional ASUS Zenbook 2023 Models
  ALSA: hda/realtek: Add quirks for ASUS Zenbook 2022 Models
  ALSA: hda: cs35l41: Support additional ASUS Zenbook 2022 Models
  ALSA: hda/realtek: Add quirks for ASUS ROG 2023 models
  ALSA: hda: cs35l41: Support additional ASUS ROG 2023 models
  ALSA: hda: cs35l41: Add config table to support many laptops without _DSD
  ASoC: Intel: bytcr_rt5640: Add new swapped-speakers quirk
  ASoC: Intel: bytcr_rt5640: Add quirk for the Medion Lifetab S10346
  kselftest: alsa: fixed a print formatting warning
  ALSA: usb-audio: Increase delay in MOTU M quirk
  ASoC: tas2781: check the validity of prm_no/cfg_no
  ALSA: hda/tas2781: select program 0, conf 0 by default
  ALSA: hda/realtek: Add quirk for ASUS ROG GV302XA
  ASoC: cs42l43: Don't enable bias sense during type detect
  ASoC: Intel: soc-acpi-intel-mtl-match: Change CS35L56 prefixes to AMPn
  ...

5414aea7

Merge tag 'i2c-for-6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 2618280d

Linus Torvalds authored Dec 22, 2023

Pull i2c fixes from Wolfram Sang:

 - error path fixes (qcom-geni)

 - polling mode fix (rk3x)

 - target mode state machine fix (aspeed)

* tag 'i2c-for-6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: aspeed: Handle the coalesced stop conditions with the start conditions.
  i2c: rk3x: fix potential spinlock recursion on poll
  i2c: qcom-geni: fix missing clk_disable_unprepare() and geni_se_resources_off()

2618280d

Merge tag 'gpio-fixes-for-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · a9ca0330

Linus Torvalds authored Dec 22, 2023

Pull gpio fixes from Bartosz Golaszewski:
 "Here's another round of fixes from the GPIO subsystem for this release
  cycle.

  There's one commit adding synchronization to an ioctl() we overlooked
  previously and another synchronization changeset for one of the
  drivers:

   - add protection against GPIO device removal to an overlooked ioctl()

   - synchronize the interrupt mask register manually in gpio-dwapb"

* tag 'gpio-fixes-for-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: dwapb: mask/unmask IRQ when disable/enale it
  gpiolib: cdev: add gpio_device locking wrapper around gpio_ioctl()

a9ca0330

Merge tag 'for-linus-6.7a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · b7bc7bce

Linus Torvalds authored Dec 22, 2023

Pull xen fix from Juergen Gross:
 "A single patch fixing a build issue for x86 32-bit configurations with
  CONFIG_XEN, which was introduced in the 6.7 development cycle"

* tag 'for-linus-6.7a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: add CPU dependencies for 32-bit build

b7bc7bce

Merge tag 'drm-fixes-2023-12-22' of git://anongit.freedesktop.org/drm/drm · 8afe6f0e

Linus Torvalds authored Dec 22, 2023

Pull drm fixes from Dave Airlie:
 "Pretty quiet for this week, just i915 and amdgpu fixes,

  I think the misc tree got lost this week, but didn't seem to have too
  much in it, so it can wait. I've also got a bunch of nouveau GSP fixes
  sailing around that'll probably land next time as well.

  amdgpu:
   - DCN 3.5 fixes
   - DCN 3.2 SubVP fix
   - GPUVM fix

  amdkfd:
   - SVM fix for APUs

  i915:
   - Fix state readout and check for DSC and bigjoiner combo
   - Fix a potential integer overflow
   - Reject async flips with bigjoiner
   - Fix MTL HDMI/DP PLL clock selection
   - Fix various issues by disabling pipe DMC events"

* tag 'drm-fixes-2023-12-22' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: re-create idle bo's PTE during VM state machine reset
  drm/amd/display: dereference variable before checking for zero
  drm/amd/display: get dprefclk ss info from integration info table
  drm/amd/display: Add case for dcn35 to support usb4 dmub hpd event
  drm/amd/display: disable FPO and SubVP for older DMUB versions on DCN32x
  drm/amdkfd: svm range always mapped flag not working on APU
  drm/amd/display: Revert " drm/amd/display: Use channel_width = 2 for vram table 3.0"
  drm/i915/dmc: Don't enable any pipe DMC events
  drm/i915/mtl: Fix HDMI/DP PLL clock selection
  drm/i915: Reject async flips with bigjoiner
  drm/i915/hwmon: Fix static analysis tool reported issues
  drm/i915/display: Get bigjoiner config before dsc config during readout

8afe6f0e

Merge tag '9p-for-6.7-rc7' of https://github.com/martinetd/linux · 93a165cb

Linus Torvalds authored Dec 22, 2023

Pull 9p fixes from Dominique Martinet:
 "Two small fixes scheduled for stable trees:

  A tracepoint fix that's been reading past the end of messages forever,
  but semi-recently also went over the end of the buffer. And a
  potential incorrectly freeing garbage in pdu parsing error path"

* tag '9p-for-6.7-rc7' of https://github.com/martinetd/linux:
  net: 9p: avoid freeing uninit memory in p9pdu_vreadf
  9p: prevent read overrun in protocol dump tracepoint

93a165cb

Merge tag 'drm-intel-fixes-2023-12-21' of... · d4b6e7f5

Dave Airlie authored Dec 22, 2023

Merge tag 'drm-intel-fixes-2023-12-21' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v6.7-rc7:
- Fix state readout and check for DSC and bigjoiner combo
- Fix a potential integer overflow
- Reject async flips with bigjoiner
- Fix MTL HDMI/DP PLL clock selection
- Fix various issues by disabling pipe DMC events
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87plyzsnxi.fsf@intel.com

d4b6e7f5

Merge tag 'amd-drm-fixes-6.7-2023-12-20' of... · b7ef7caf

Dave Airlie authored Dec 22, 2023

Merge tag 'amd-drm-fixes-6.7-2023-12-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.7-2023-12-20:

amdgpu:
- DCN 3.5 fixes
- DCN 3.2 SubVP fix
- GPUVM fix

amdkfd:
- SVM fix for APUs
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231220164845.4975-1-alexander.deucher@amd.com

b7ef7caf

Merge tag 'pinctrl-v6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 24e0d2e5

Linus Torvalds authored Dec 21, 2023

Pull pin control fixes from Linus Walleij:
 "Some driver fixes for v6.7, all are in drivers, the most interesting
  one is probably the AMD laptop suspend bug which really needs fixing.
  Freedestop org has the bug description:

    https://gitlab.freedesktop.org/drm/amd/-/issues/2812

  Summary:

   - Ignore disabled device tree nodes in the Starfive 7100 and 7100
     drivers.

   - Mask non-wake source pins with interrupt enabled at suspend in the
     AMD driver, this blocks unnecessary wakeups from misc interrupts.
     This can be power consuming because in many cases the system
     doesn't really suspend, it just wakes right back up.

   - Fix a typo breaking compilation of the cy8c95x0 driver, and fix up
     bugs in the get/set config callbacks.

   - Use a dedicated lock class for the PIO4 drivers IRQ. This fixes a
     crash on suspend"

* tag 'pinctrl-v6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: at91-pio4: use dedicated lock class for IRQ
  pinctrl: cy8c95x0: Fix get_pincfg
  pinctrl: cy8c95x0: Fix regression
  pinctrl: cy8c95x0: Fix typo
  pinctrl: amd: Mask non-wake source pins with interrupt enabled at suspend
  pinctrl: starfive: jh7100: ignore disabled device tree nodes
  pinctrl: starfive: jh7110: ignore disabled device tree nodes

24e0d2e5

21 Dec, 2023 12 commits

Merge tag 'nvme-6.7-2023-12-21' of git://git.infradead.org/nvme into block-6.7 · 13d822bf

Jens Axboe authored Dec 21, 2023

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.7

 - Revert a commit with improper sleep context (Keith)
 - Fix async event handling sleep context (Maurizio)"

* tag 'nvme-6.7-2023-12-21' of git://git.infradead.org/nvme:
  nvme-pci: fix sleeping function called from interrupt context
  Revert "nvme-fc: fix race between error recovery and creating association"

13d822bf

afs: Fix use-after-free due to get/remove race in volume tree · 9a6b294a

David Howells authored Dec 21, 2023

When an afs_volume struct is put, its refcount is reduced to 0 before
the cell->volume_lock is taken and the volume removed from the
cell->volumes tree.

Unfortunately, this means that the lookup code can race and see a volume
with a zero ref in the tree, resulting in a use-after-free:

    refcount_t: addition on 0; use-after-free.
    WARNING: CPU: 3 PID: 130782 at lib/refcount.c:25 refcount_warn_saturate+0x7a/0xda
    ...
    RIP: 0010:refcount_warn_saturate+0x7a/0xda
    ...
    Call Trace:
     afs_get_volume+0x3d/0x55
     afs_create_volume+0x126/0x1de
     afs_validate_fc+0xfe/0x130
     afs_get_tree+0x20/0x2e5
     vfs_get_tree+0x1d/0xc9
     do_new_mount+0x13b/0x22e
     do_mount+0x5d/0x8a
     __do_sys_mount+0x100/0x12a
     do_syscall_64+0x3a/0x94
     entry_SYSCALL_64_after_hwframe+0x62/0x6a

Fix this by:

 (1) When putting, use a flag to indicate if the volume has been removed
     from the tree and skip the rb_erase if it has.

 (2) When looking up, use a conditional ref increment and if it fails
     because the refcount is 0, replace the node in the tree and set the
     removal flag.

Fixes: 20325960 ("afs: Reorganise volume and server trees to be rooted on the cell")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9a6b294a

ida: Fix crash in ida_free when the bitmap is empty · af73483f

Matthew Wilcox (Oracle) authored Dec 21, 2023

The IDA usually detects double-frees, but that detection failed to
consider the case when there are no nearby IDs allocated and so we have a
NULL bitmap rather than simply having a clear bit.  Add some tests to the
test-suite to be sure we don't inadvertently reintroduce this problem.
Unfortunately they're quite noisy so include a message to disregard
the warnings.
Reported-by: Zhenghan Wang <wzhmmmmm@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

af73483f

afs: Fix overwriting of result of DNS query · a9e01ac8

David Howells authored Dec 21, 2023

In afs_update_cell(), ret is the result of the DNS lookup and the errors
are to be handled by a switch - however, the value gets clobbered in
between by setting it to -ENOMEM in case afs_alloc_vlserver_list()
fails.

Fix this by moving the setting of -ENOMEM into the error handling for
OOM failure.  Further, only do it if we don't have an alternative error
to return.

Found by Linux Verification Center (linuxtesting.org) with SVACE.  Based
on a patch from Anastasia Belova [1].

Fixes: d5c32c89 ("afs: Fix cell DNS lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Anastasia Belova <abelova@astralinux.ru>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: lvc-project@linuxtesting.org
Link: https://lore.kernel.org/r/20231221085849.1463-1-abelova@astralinux.ru/ [1]
Link: https://lore.kernel.org/r/1700862.1703168632@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a9e01ac8

Merge tag 'afs-fixes-20231221' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 937fd403

Linus Torvalds authored Dec 21, 2023

Pull AFS fixes from David Howells:
 "Improve the interaction of arbitrary lookups in the AFS dynamic root
  that hit DNS lookup failures [1] where kafs behaves differently from
  openafs and causes some applications to fail that aren't expecting
  that. Further, negative DNS results aren't getting removed and are
  causing failures to persist.

   - Always delete unused (particularly negative) dentries as soon as
     possible so that they don't prevent future lookups from retrying.

   - Fix the handling of new-style negative DNS lookups in ->lookup() to
     make them return ENOENT so that userspace doesn't get confused when
     stat succeeds but the following open on the looked up file then
     fails.

   - Fix key handling so that DNS lookup results are reclaimed almost as
     soon as they expire rather than sitting round either forever or for
     an additional 5 mins beyond a set expiry time returning
     EKEYEXPIRED. They persist for 1s as /bin/ls will do a second stat
     call if the first fails"

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637 [1]
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>

* tag 'afs-fixes-20231221' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiry
  afs: Fix dynamic root lookup DNS check
  afs: Fix the dynamic root's d_delete to always delete unused dentries

937fd403

Merge tag 'trace-v6.7-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 13b73446

Linus Torvalds authored Dec 21, 2023

Pull tracing fixes from Steven Rostedt:

 - Fix another kerneldoc warning

 - Fix eventfs files to inherit the ownership of its parent directory.

   The dynamic creation of dentries in eventfs did not take into account
   if the tracefs file system was mounted with a gid/uid, and would
   still default to the gid/uid of root. This is a regression.

 - Fix warning when synthetic event testing is enabled along with
   startup event tracing testing is enabled

* tag 'trace-v6.7-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing / synthetic: Disable events after testing in synth_event_gen_test_init()
  eventfs: Have event files and directories default to parent uid and gid
  tracing/synthetic: fix kernel-doc warnings

13b73446

Merge tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 7c5e046b

Linus Torvalds authored Dec 21, 2023

Pull networking fixes from Paolo Abeni:
 "Including fixes from WiFi and bpf.

  Current release - regressions:

   - bpf: syzkaller found null ptr deref in unix_bpf proto add

   - eth: i40e: fix ST code value for clause 45

  Previous releases - regressions:

   - core: return error from sk_stream_wait_connect() if sk_wait_event()
     fails

   - ipv6: revert remove expired routes with a separated list of routes

   - wifi rfkill:
       - set GPIO direction
       - fix crash with WED rx support enabled

   - bluetooth:
       - fix deadlock in vhci_send_frame
       - fix use-after-free in bt_sock_recvmsg

   - eth: mlx5e: fix a race in command alloc flow

   - eth: ice: fix PF with enabled XDP going no-carrier after reset

   - eth: bnxt_en: do not map packet buffers twice

  Previous releases - always broken:

   - core:
       - check vlan filter feature in vlan_vids_add_by_dev() and
         vlan_vids_del_by_dev()
       - check dev->gso_max_size in gso_features_check()

   - mptcp: fix inconsistent state on fastopen race

   - phy: skip LED triggers on PHYs on SFP modules

   - eth: mlx5e:
       - fix double free of encap_header
       - fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()"

* tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  net: check dev->gso_max_size in gso_features_check()
  kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail
  net/ipv6: Revert remove expired routes with a separated list of routes
  net: avoid build bug in skb extension length calculation
  net: ethernet: mtk_wed: fix possible NULL pointer dereference in mtk_wed_wo_queue_tx_clean()
  net: stmmac: fix incorrect flag check in timestamp interrupt
  selftests: add vlan hw filter tests
  net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()
  net: hns3: add new maintainer for the HNS3 ethernet driver
  net: mana: select PAGE_POOL
  net: ks8851: Fix TX stall caused by TX buffer overrun
  ice: Fix PF with enabled XDP going no-carrier after reset
  ice: alter feature support check for SRIOV and LAG
  ice: stop trashing VF VSI aggregator node ID information
  mailmap: add entries for Geliang Tang
  mptcp: fill in missing MODULE_DESCRIPTION()
  mptcp: fix inconsistent state on fastopen race
  selftests: mptcp: join: fix subflow_send_ack lookup
  net: phy: skip LED triggers on PHYs on SFP modules
  bpf: Add missing BPF_LINK_TYPE invocations
  ...

7c5e046b

tracing / synthetic: Disable events after testing in synth_event_gen_test_init() · 88b30c7f

Steven Rostedt (Google) authored Dec 20, 2023

The synth_event_gen_test module can be built in, if someone wants to run
the tests at boot up and not have to load them.

The synth_event_gen_test_init() function creates and enables the synthetic
events and runs its tests.

The synth_event_gen_test_exit() disables the events it created and
destroys the events.

If the module is builtin, the events are never disabled. The issue is, the
events should be disable after the tests are run. This could be an issue
if the rest of the boot up tests are enabled, as they expect the events to
be in a known state before testing. That known state happens to be
disabled.

When CONFIG_SYNTH_EVENT_GEN_TEST=y and CONFIG_EVENT_TRACE_STARTUP_TEST=y
a warning will trigger:

 Running tests on trace events:
 Testing event create_synth_test:
 Enabled event during self test!
 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1 at kernel/trace/trace_events.c:4150 event_trace_self_tests+0x1c2/0x480
 Modules linked in:
 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc2-test-00031-gb803d7c6-dirty #276
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 RIP: 0010:event_trace_self_tests+0x1c2/0x480
 Code: bb e8 a2 ab 5d fc 48 8d 7b 48 e8 f9 3d 99 fc 48 8b 73 48 40 f6 c6 01 0f 84 d6 fe ff ff 48 c7 c7 20 b6 ad bb e8 7f ab 5d fc 90 <0f> 0b 90 48 89 df e8 d3 3d 99 fc 48 8b 1b 4c 39 f3 0f 85 2c ff ff
 RSP: 0000:ffffc9000001fdc0 EFLAGS: 00010246
 RAX: 0000000000000029 RBX: ffff88810399ca80 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffffffffb9f19478 RDI: ffff88823c734e64
 RBP: ffff88810399f300 R08: 0000000000000000 R09: fffffbfff79eb32a
 R10: ffffffffbcf59957 R11: 0000000000000001 R12: ffff888104068090
 R13: ffffffffbc89f0a0 R14: ffffffffbc8a0f08 R15: 0000000000000078
 FS:  0000000000000000(0000) GS:ffff88823c700000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001f6282001 CR4: 0000000000170ef0
 Call Trace:
  <TASK>
  ? __warn+0xa5/0x200
  ? event_trace_self_tests+0x1c2/0x480
  ? report_bug+0x1f6/0x220
  ? handle_bug+0x6f/0x90
  ? exc_invalid_op+0x17/0x50
  ? asm_exc_invalid_op+0x1a/0x20
  ? tracer_preempt_on+0x78/0x1c0
  ? event_trace_self_tests+0x1c2/0x480
  ? __pfx_event_trace_self_tests_init+0x10/0x10
  event_trace_self_tests_init+0x27/0xe0
  do_one_initcall+0xd6/0x3c0
  ? __pfx_do_one_initcall+0x10/0x10
  ? kasan_set_track+0x25/0x30
  ? rcu_is_watching+0x38/0x60
  kernel_init_freeable+0x324/0x450
  ? __pfx_kernel_init+0x10/0x10
  kernel_init+0x1f/0x1e0
  ? _raw_spin_unlock_irq+0x33/0x50
  ret_from_fork+0x34/0x60
  ? __pfx_kernel_init+0x10/0x10
  ret_from_fork_asm+0x1b/0x30
  </TASK>

This is because the synth_event_gen_test_init() left the synthetic events
that it created enabled. By having it disable them after testing, the
other selftests will run fine.

Link: https://lore.kernel.org/linux-trace-kernel/20231220111525.2f0f49b0@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 9fe41efa ("tracing: Add synth event generation test module")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Alexander Graf <graf@amazon.com>
Tested-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

88b30c7f

eventfs: Have event files and directories default to parent uid and gid · 0dfc852b

Steven Rostedt (Google) authored Dec 20, 2023

Dongliang reported:

  I found that in the latest version, the nodes of tracefs have been
  changed to dynamically created.

  This has caused me to encounter a problem where the gid I specified in
  the mounting parameters cannot apply to all files, as in the following
  situation:

  /data/tmp/events # mount | grep tracefs
  tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012)

  gid 3012 = readtracefs

  /data/tmp # ls -lh
  total 0
  -r--r-----   1 root readtracefs 0 1970-01-01 08:00 README
  -r--r-----   1 root readtracefs 0 1970-01-01 08:00 available_events

  ums9621_1h10:/data/tmp/events # ls -lh
  total 0
  drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer
  drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc

  It will prevent certain applications from accessing tracefs properly, I
  try to avoid this issue by making the following modifications.

To fix this, have the files created default to taking the ownership of
the parent dentry unless the ownership was previously set by the user.

Link: https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang.cui@unisoc.com/
Link: https://lore.kernel.org/linux-trace-kernel/20231220105017.1489d790@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Hongyu Jin  <hongyu.jin@unisoc.com>
Fixes: 28e12c09 ("eventfs: Save ownership and mode")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Dongliang Cui <cuidongliang390@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

0dfc852b

keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiry · 39299bdd

David Howells authored Dec 09, 2023

If a key has an expiration time, then when that time passes, the key is
left around for a certain amount of time before being collected (5 mins by
default) so that EKEYEXPIRED can be returned instead of ENOKEY.  This is a
problem for DNS keys because we want to redo the DNS lookup immediately at
that point.

Fix this by allowing key types to be marked such that keys of that type
don't have this extra period, but are reclaimed as soon as they expire and
turn this on for dns_resolver-type keys.  To make this easier to handle,
key->expiry is changed to be permanent if TIME64_MAX rather than 0.

Furthermore, give such new-style negative DNS results a 1s default expiry
if no other expiry time is set rather than allowing it to stick around
indefinitely.  This shouldn't be zero as ls will follow a failing stat call
immediately with a second with AT_SYMLINK_NOFOLLOW added.

Fixes: 1a4240f4 ("DNS: Separate out CIFS DNS Resolver code")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: Wang Lei <wang840925@gmail.com>
cc: Jeff Layton <jlayton@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jarkko Sakkinen <jarkko@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: keyrings@vger.kernel.org
cc: netdev@vger.kernel.org

39299bdd

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 74769d81

Paolo Abeni authored Dec 21, 2023

Daniel Borkmann says:

====================
pull-request: bpf 2023-12-21

Hi David, hi Jakub, hi Paolo, hi Eric,

The following pull-request contains BPF updates for your *net* tree.

We've added 3 non-merge commits during the last 5 day(s) which contain
a total of 4 files changed, 45 insertions(+).

The main changes are:

1) Fix a syzkaller splat which triggered an oob issue in bpf_link_show_fdinfo(),
   from Jiri Olsa.

2) Fix another syzkaller-found issue which triggered a NULL pointer dereference
   in BPF sockmap for unconnected unix sockets, from John Fastabend.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add missing BPF_LINK_TYPE invocations
  bpf: sockmap, test for unconnected af_unix sock
  bpf: syzkaller found null ptr deref in unix_bpf proto add
====================

Link: https://lore.kernel.org/r/20231221104844.1374-1-daniel@iogearbox.netSigned-off-by: Paolo Abeni <pabeni@redhat.com>

74769d81

gpio: dwapb: mask/unmask IRQ when disable/enale it · 1cc3542c

xiongxin authored Dec 20, 2023

In the hardware implementation of the I2C HID driver based on DesignWare
GPIO IRQ chip, when the user continues to use the I2C HID device in the
suspend process, the I2C HID interrupt will be masked after the resume
process is finished.

This is because the disable_irq()/enable_irq() of the DesignWare GPIO
driver does not synchronize the IRQ mask register state. In normal use
of the I2C HID procedure, the GPIO IRQ irq_mask()/irq_unmask() functions
are called in pairs. In case of an exception, i2c_hid_core_suspend()
calls disable_irq() to disable the GPIO IRQ. With low probability, this
causes irq_unmask() to not be called, which causes the GPIO IRQ to be
masked and not unmasked in enable_irq(), raising an exception.

Add synchronization to the masked register state in the
dwapb_irq_enable()/dwapb_irq_disable() function. mask the GPIO IRQ
before disabling it. After enabling the GPIO IRQ, unmask the IRQ.

Fixes: 7779b345 ("gpio: add a driver for the Synopsys DesignWare APB GPIO block")
Cc: stable@kernel.org
Co-developed-by: Riwen Lu <luriwen@kylinos.cn>
Signed-off-by: Riwen Lu <luriwen@kylinos.cn>
Signed-off-by: xiongxin <xiongxin@kylinos.cn>
Acked-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

1cc3542c