1. 31 Oct, 2021 1 commit
  2. 26 Oct, 2021 1 commit
  3. 22 Oct, 2021 3 commits
  4. 15 Oct, 2021 13 commits
    • Sebastian Andrzej Siewior's avatar
      irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT · 09089db7
      Sebastian Andrzej Siewior authored
      On PREEMPT_RT most items are processed as LAZY via softirq context.
      Avoid to spin-wait for them because irq_work_sync() could have higher
      priority and not allow the irq-work to be completed.
      
      Wait additionally for !IRQ_WORK_HARD_IRQ irq_work items on PREEMPT_RT.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20211006111852.1514359-5-bigeasy@linutronix.de
      09089db7
    • Sebastian Andrzej Siewior's avatar
      irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT · b4c6f86e
      Sebastian Andrzej Siewior authored
      The irq_work callback is invoked in hard IRQ context. By default all
      callbacks are scheduled for invocation right away (given supported by
      the architecture) except for the ones marked IRQ_WORK_LAZY which are
      delayed until the next timer-tick.
      
      While looking over the callbacks, some of them may acquire locks
      (spinlock_t, rwlock_t) which are transformed into sleeping locks on
      PREEMPT_RT and must not be acquired in hard IRQ context.
      Changing the locks into locks which could be acquired in this context
      will lead to other problems such as increased latencies if everything
      in the chain has IRQ-off locks. This will not solve all the issues as
      one callback has been noticed which invoked kref_put() and its callback
      invokes kfree() and this can not be invoked in hardirq context.
      
      Some callbacks are required to be invoked in hardirq context even on
      PREEMPT_RT to work properly. This includes for instance the NO_HZ
      callback which needs to be able to observe the idle context.
      
      The callbacks which require to be run in hardirq have already been
      marked. Use this information to split the callbacks onto the two lists
      on PREEMPT_RT:
      - lazy_list
        Work items which are not marked with IRQ_WORK_HARD_IRQ will be added
        to this list. Callbacks on this list will be invoked from a per-CPU
        thread.
        The handler here may acquire sleeping locks such as spinlock_t and
        invoke kfree().
      
      - raised_list
        Work items which are marked with IRQ_WORK_HARD_IRQ will be added to
        this list. They will be invoked in hardirq context and must not
        acquire any sleeping locks.
      
      The wake up of the per-CPU thread occurs from irq_work handler/
      hardirq context. The thread runs with lowest RT priority to ensure it
      runs before any SCHED_OTHER tasks do.
      
      [bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
      	  hard and soft variant. Collected fixes over time from Steven
      	  Rostedt and Mike Galbraith. Move to per-CPU threads instead of
      	  softirq as suggested by PeterZ.]
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20211007092646.uhshe3ut2wkrcfzv@linutronix.de
      b4c6f86e
    • Sebastian Andrzej Siewior's avatar
      irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. · 81097968
      Sebastian Andrzej Siewior authored
      irq_work() triggers instantly an interrupt if supported by the
      architecture. Otherwise the work will be processed on the next timer
      tick. In worst case irq_work_sync() could spin up to a jiffy.
      
      irq_work_sync() is usually used in tear down context which is fully
      preemptible. Based on review irq_work_sync() is invoked from preemptible
      context and there is one waiter at a time. This qualifies it to use
      rcuwait for synchronisation.
      
      Let irq_work_sync() synchronize with rcuwait if the architecture
      processes irqwork via the timer tick.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20211006111852.1514359-3-bigeasy@linutronix.de
      81097968
    • Sebastian Andrzej Siewior's avatar
      sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ · da6ff099
      Sebastian Andrzej Siewior authored
      The push-IPI logic for RT tasks expects to be invoked from hardirq
      context. One reason is that a RT task on the remote CPU would block the
      softirq processing on PREEMPT_RT and so avoid pulling / balancing the RT
      tasks as intended.
      
      Annotate root_domain::rto_push_work as IRQ_WORK_HARD_IRQ.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20211006111852.1514359-2-bigeasy@linutronix.de
      da6ff099
    • Tim Chen's avatar
      sched: Add cluster scheduler level for x86 · 66558b73
      Tim Chen authored
      There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is
      shared among a cluster of cores instead of being exclusive to one
      single core.
      
      To prevent oversubscription of L2 cache, load should be balanced
      between such L2 clusters, especially for tasks with no shared data.
      On benchmark such as SPECrate mcf test, this change provides a boost
      to performance especially on medium load system on Jacobsville.  on a
      Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4
      cores each, the benchmark number is as follow:
      
       Improvement over baseline kernel for mcf_r
       copies		run time	base rate
       1		-0.1%		-0.2%
       6		25.1%		25.1%
       12		18.8%		19.0%
       24		0.3%		0.3%
      
      So this looks pretty good. In terms of the system's task distribution,
      some pretty bad clumping can be seen for the vanilla kernel without
      the L2 cluster domain for the 6 and 12 copies case. With the extra
      domain for cluster, the load does get evened out between the clusters.
      
      Note this patch isn't an universal win as spreading isn't necessarily
      a win, particually for those workload who can benefit from packing.
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210924085104.44806-4-21cnbao@gmail.com
      66558b73
    • Barry Song's avatar
      sched: Add cluster scheduler level in core and related Kconfig for ARM64 · 778c558f
      Barry Song authored
      This patch adds scheduler level for clusters and automatically enables
      the load balance among clusters. It will directly benefit a lot of
      workload which loves more resources such as memory bandwidth, caches.
      
      Testing has widely been done in two different hardware configurations of
      Kunpeng920:
      
       24 cores in one NUMA(6 clusters in each NUMA node);
       32 cores in one NUMA(8 clusters in each NUMA node)
      
      Workload is running on either one NUMA node or four NUMA nodes, thus,
      this can estimate the effect of cluster spreading w/ and w/o NUMA load
      balance.
      
      * Stream benchmark:
      
      4threads stream (on 1NUMA * 24cores = 24cores)
                      stream                 stream
                      w/o patch              w/ patch
      MB/sec copy     29929.64 (   0.00%)    32932.68 (  10.03%)
      MB/sec scale    29861.10 (   0.00%)    32710.58 (   9.54%)
      MB/sec add      27034.42 (   0.00%)    32400.68 (  19.85%)
      MB/sec triad    27225.26 (   0.00%)    31965.36 (  17.41%)
      
      6threads stream (on 1NUMA * 24cores = 24cores)
                      stream                 stream
                      w/o patch              w/ patch
      MB/sec copy     40330.24 (   0.00%)    42377.68 (   5.08%)
      MB/sec scale    40196.42 (   0.00%)    42197.90 (   4.98%)
      MB/sec add      37427.00 (   0.00%)    41960.78 (  12.11%)
      MB/sec triad    37841.36 (   0.00%)    42513.64 (  12.35%)
      
      12threads stream (on 1NUMA * 24cores = 24cores)
                      stream                 stream
                      w/o patch              w/ patch
      MB/sec copy     52639.82 (   0.00%)    53818.04 (   2.24%)
      MB/sec scale    52350.30 (   0.00%)    53253.38 (   1.73%)
      MB/sec add      53607.68 (   0.00%)    55198.82 (   2.97%)
      MB/sec triad    54776.66 (   0.00%)    56360.40 (   2.89%)
      
      Thus, it could help memory-bound workload especially under medium load.
      Similar improvement is also seen in lkp-pbzip2:
      
      * lkp-pbzip2 benchmark
      
      2-96 threads (on 4NUMA * 24cores = 96cores)
                        lkp-pbzip2              lkp-pbzip2
                        w/o patch               w/ patch
      Hmean     tput-2   11062841.57 (   0.00%)  11341817.51 *   2.52%*
      Hmean     tput-5   26815503.70 (   0.00%)  27412872.65 *   2.23%*
      Hmean     tput-8   41873782.21 (   0.00%)  43326212.92 *   3.47%*
      Hmean     tput-12  61875980.48 (   0.00%)  64578337.51 *   4.37%*
      Hmean     tput-21 105814963.07 (   0.00%) 111381851.01 *   5.26%*
      Hmean     tput-30 150349470.98 (   0.00%) 156507070.73 *   4.10%*
      Hmean     tput-48 237195937.69 (   0.00%) 242353597.17 *   2.17%*
      Hmean     tput-79 360252509.37 (   0.00%) 362635169.23 *   0.66%*
      Hmean     tput-96 394571737.90 (   0.00%) 400952978.48 *   1.62%*
      
      2-24 threads (on 1NUMA * 24cores = 24cores)
                       lkp-pbzip2               lkp-pbzip2
                       w/o patch                w/ patch
      Hmean     tput-2   11071705.49 (   0.00%)  11296869.10 *   2.03%*
      Hmean     tput-4   20782165.19 (   0.00%)  21949232.15 *   5.62%*
      Hmean     tput-6   30489565.14 (   0.00%)  33023026.96 *   8.31%*
      Hmean     tput-8   40376495.80 (   0.00%)  42779286.27 *   5.95%*
      Hmean     tput-12  61264033.85 (   0.00%)  62995632.78 *   2.83%*
      Hmean     tput-18  86697139.39 (   0.00%)  86461545.74 (  -0.27%)
      Hmean     tput-24 104854637.04 (   0.00%) 104522649.46 *  -0.32%*
      
      In the case of 6 threads and 8 threads, we see the greatest performance
      improvement.
      
      Similar improvement can be seen on lkp-pixz though the improvement is
      smaller:
      
      * lkp-pixz benchmark
      
      2-24 threads lkp-pixz (on 1NUMA * 24cores = 24cores)
                        lkp-pixz               lkp-pixz
                        w/o patch              w/ patch
      Hmean     tput-2   6486981.16 (   0.00%)  6561515.98 *   1.15%*
      Hmean     tput-4  11645766.38 (   0.00%) 11614628.43 (  -0.27%)
      Hmean     tput-6  15429943.96 (   0.00%) 15957350.76 *   3.42%*
      Hmean     tput-8  19974087.63 (   0.00%) 20413746.98 *   2.20%*
      Hmean     tput-12 28172068.18 (   0.00%) 28751997.06 *   2.06%*
      Hmean     tput-18 39413409.54 (   0.00%) 39896830.55 *   1.23%*
      Hmean     tput-24 49101815.85 (   0.00%) 49418141.47 *   0.64%*
      
      * SPECrate benchmark
      
      4,8,16 copies mcf_r(on 1NUMA * 32cores = 32cores)
      		Base     	 	Base
      		Run Time   	 	Rate
      		-------  	 	---------
      4 Copies	w/o 580 (w/ 570)       	w/o 11.1 (w/ 11.3)
      8 Copies	w/o 647 (w/ 605)       	w/o 20.0 (w/ 21.4, +7%)
      16 Copies	w/o 844 (w/ 844)       	w/o 30.6 (w/ 30.6)
      
      32 Copies(on 4NUMA * 32 cores = 128cores)
      [w/o patch]
                       Base     Base        Base
      Benchmarks       Copies  Run Time     Rate
      --------------- -------  ---------  ---------
      500.perlbench_r      32        584       87.2  *
      502.gcc_r            32        503       90.2  *
      505.mcf_r            32        745       69.4  *
      520.omnetpp_r        32       1031       40.7  *
      523.xalancbmk_r      32        597       56.6  *
      525.x264_r            1         --            CE
      531.deepsjeng_r      32        336      109    *
      541.leela_r          32        556       95.4  *
      548.exchange2_r      32        513      163    *
      557.xz_r             32        530       65.2  *
       Est. SPECrate2017_int_base              80.3
      
      [w/ patch]
                        Base     Base        Base
      Benchmarks       Copies  Run Time     Rate
      --------------- -------  ---------  ---------
      500.perlbench_r      32        580      87.8 (+0.688%)  *
      502.gcc_r            32        477      95.1 (+5.432%)  *
      505.mcf_r            32        644      80.3 (+13.574%) *
      520.omnetpp_r        32        942      44.6 (+9.58%)   *
      523.xalancbmk_r      32        560      60.4 (+6.714%%) *
      525.x264_r            1         --           CE
      531.deepsjeng_r      32        337      109  (+0.000%) *
      541.leela_r          32        554      95.6 (+0.210%) *
      548.exchange2_r      32        515      163  (+0.000%) *
      557.xz_r             32        524      66.0 (+1.227%) *
       Est. SPECrate2017_int_base              83.7 (+4.062%)
      
      On the other hand, it is slightly helpful to CPU-bound tasks like
      kernbench:
      
      * 24-96 threads kernbench (on 4NUMA * 24cores = 96cores)
                           kernbench              kernbench
                           w/o cluster            w/ cluster
      Min       user-24    12054.67 (   0.00%)    12024.19 (   0.25%)
      Min       syst-24     1751.51 (   0.00%)     1731.68 (   1.13%)
      Min       elsp-24      600.46 (   0.00%)      598.64 (   0.30%)
      Min       user-48    12361.93 (   0.00%)    12315.32 (   0.38%)
      Min       syst-48     1917.66 (   0.00%)     1892.73 (   1.30%)
      Min       elsp-48      333.96 (   0.00%)      332.57 (   0.42%)
      Min       user-96    12922.40 (   0.00%)    12921.17 (   0.01%)
      Min       syst-96     2143.94 (   0.00%)     2110.39 (   1.56%)
      Min       elsp-96      211.22 (   0.00%)      210.47 (   0.36%)
      Amean     user-24    12063.99 (   0.00%)    12030.78 *   0.28%*
      Amean     syst-24     1755.20 (   0.00%)     1735.53 *   1.12%*
      Amean     elsp-24      601.60 (   0.00%)      600.19 (   0.23%)
      Amean     user-48    12362.62 (   0.00%)    12315.56 *   0.38%*
      Amean     syst-48     1921.59 (   0.00%)     1894.95 *   1.39%*
      Amean     elsp-48      334.10 (   0.00%)      332.82 *   0.38%*
      Amean     user-96    12925.27 (   0.00%)    12922.63 (   0.02%)
      Amean     syst-96     2146.66 (   0.00%)     2122.20 *   1.14%*
      Amean     elsp-96      211.96 (   0.00%)      211.79 (   0.08%)
      
      Note this patch isn't an universal win, it might hurt those workload
      which can benefit from packing. Though tasks which want to take
      advantages of lower communication latency of one cluster won't
      necessarily been packed in one cluster while kernel is not aware of
      clusters, they have some chance to be randomly packed. But this
      patch will make them more likely spread.
      Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Tested-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      778c558f
    • Jonathan Cameron's avatar
      topology: Represent clusters of CPUs within a die · c5e22fef
      Jonathan Cameron authored
      Both ACPI and DT provide the ability to describe additional layers of
      topology between that of individual cores and higher level constructs
      such as the level at which the last level cache is shared.
      In ACPI this can be represented in PPTT as a Processor Hierarchy
      Node Structure [1] that is the parent of the CPU cores and in turn
      has a parent Processor Hierarchy Nodes Structure representing
      a higher level of topology.
      
      For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each
      cluster has 4 cpus. All clusters share L3 cache data, but each cluster
      has local L3 tag. On the other hand, each clusters will share some
      internal system bus.
      
      +-----------------------------------+                          +---------+
      |  +------+    +------+             +--------------------------+         |
      |  | CPU0 |    | cpu1 |             |    +-----------+         |         |
      |  +------+    +------+             |    |           |         |         |
      |                                   +----+    L3     |         |         |
      |  +------+    +------+   cluster   |    |    tag    |         |         |
      |  | CPU2 |    | CPU3 |             |    |           |         |         |
      |  +------+    +------+             |    +-----------+         |         |
      |                                   |                          |         |
      +-----------------------------------+                          |         |
      +-----------------------------------+                          |         |
      |  +------+    +------+             +--------------------------+         |
      |  |      |    |      |             |    +-----------+         |         |
      |  +------+    +------+             |    |           |         |         |
      |                                   |    |    L3     |         |         |
      |  +------+    +------+             +----+    tag    |         |         |
      |  |      |    |      |             |    |           |         |         |
      |  +------+    +------+             |    +-----------+         |         |
      |                                   |                          |         |
      +-----------------------------------+                          |   L3    |
                                                                     |   data  |
      +-----------------------------------+                          |         |
      |  +------+    +------+             |    +-----------+         |         |
      |  |      |    |      |             |    |           |         |         |
      |  +------+    +------+             +----+    L3     |         |         |
      |                                   |    |    tag    |         |         |
      |  +------+    +------+             |    |           |         |         |
      |  |      |    |      |             |    +-----------+         |         |
      |  +------+    +------+             +--------------------------+         |
      +-----------------------------------|                          |         |
      +-----------------------------------|                          |         |
      |  +------+    +------+             +--------------------------+         |
      |  |      |    |      |             |    +-----------+         |         |
      |  +------+    +------+             |    |           |         |         |
      |                                   +----+    L3     |         |         |
      |  +------+    +------+             |    |    tag    |         |         |
      |  |      |    |      |             |    |           |         |         |
      |  +------+    +------+             |    +-----------+         |         |
      |                                   |                          |         |
      +-----------------------------------+                          |         |
      +-----------------------------------+                          |         |
      |  +------+    +------+             +--------------------------+         |
      |  |      |    |      |             |   +-----------+          |         |
      |  +------+    +------+             |   |           |          |         |
      |                                   |   |    L3     |          |         |
      |  +------+    +------+             +---+    tag    |          |         |
      |  |      |    |      |             |   |           |          |         |
      |  +------+    +------+             |   +-----------+          |         |
      |                                   |                          |         |
      +-----------------------------------+                          |         |
      +-----------------------------------+                          |         |
      |  +------+    +------+             +--------------------------+         |
      |  |      |    |      |             |  +-----------+           |         |
      |  +------+    +------+             |  |           |           |         |
      |                                   |  |    L3     |           |         |
      |  +------+    +------+             +--+    tag    |           |         |
      |  |      |    |      |             |  |           |           |         |
      |  +------+    +------+             |  +-----------+           |         |
      |                                   |                          +---------+
      +-----------------------------------+
      
      That means spreading tasks among clusters will bring more bandwidth
      while packing tasks within one cluster will lead to smaller cache
      synchronization latency. So both kernel and userspace will have
      a chance to leverage this topology to deploy tasks accordingly to
      achieve either smaller cache latency within one cluster or an even
      distribution of load among clusters for higher throughput.
      
      This patch exposes cluster topology to both kernel and userspace.
      Libraried like hwloc will know cluster by cluster_cpus and related
      sysfs attributes. PoC of HWLOC support at [2].
      
      Note this patch only handle the ACPI case.
      
      Special consideration is needed for SMT processors, where it is
      necessary to move 2 levels up the hierarchy from the leaf nodes
      (thus skipping the processor core level).
      
      Note that arm64 / ACPI does not provide any means of identifying
      a die level in the topology but that may be unrelate to the cluster
      level.
      
      [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node
          structure (Type 0)
      [2] https://github.com/hisilicon/hwloc/tree/linux-clusterSigned-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarTian Tao <tiantao6@hisilicon.com>
      Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210924085104.44806-2-21cnbao@gmail.com
      c5e22fef
    • Peter Zijlstra's avatar
      sched: Disable -Wunused-but-set-variable · 37b47298
      Peter Zijlstra authored
      The compilers can't deal with obvious DCE vs that warning, resulting
      in code like:
      
      	if (0) {
      		sched sched_statistics *stats;
      
      		stats = __schedstats_from_se(se);
      
      		...
      	}
      
      triggering the warning. Kill the warning to make the robots stop
      reporting this.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://lkml.kernel.org/r/YWWPLnaZGybHsTkv@hirez.programming.kicks-ass.net
      37b47298
    • Kees Cook's avatar
      sched: Add wrapper for get_wchan() to keep task blocked · 42a20f86
      Kees Cook authored
      Having a stable wchan means the process must be blocked and for it to
      stay that way while performing stack unwinding.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
      Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
      Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
      42a20f86
    • Qi Zheng's avatar
      x86: Fix get_wchan() to support the ORC unwinder · bc9bbb81
      Qi Zheng authored
      Currently, the kernel CONFIG_UNWINDER_ORC option is enabled by default
      on x86, but the implementation of get_wchan() is still based on the frame
      pointer unwinder, so the /proc/<pid>/wchan usually returned 0 regardless
      of whether the task <pid> is running.
      
      Reimplement get_wchan() by calling stack_trace_save_tsk(), which is
      adapted to the ORC and frame pointer unwinders.
      
      Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20211008111626.271115116@infradead.org
      bc9bbb81
    • Kees Cook's avatar
      proc: Use task_is_running() for wchan in /proc/$pid/stat · 4e046156
      Kees Cook authored
      The implementations of get_wchan() can be expensive. The only information
      imparted here is whether or not a process is currently blocked in the
      scheduler (and even this doesn't need to be exact). Avoid doing the
      heavy lifting of stack walking and just report that information by using
      task_is_running().
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20211008111626.211281780@infradead.org
      4e046156
    • Kees Cook's avatar
      leaking_addresses: Always print a trailing newline · cf2a85ef
      Kees Cook authored
      For files that lack trailing newlines and match a leaking address (e.g.
      wchan[1]), the leaking_addresses.pl report would run together with the
      next line, making things look corrupted.
      
      Unconditionally remove the newline on input, and write it back out on
      output.
      
      [1] https://lore.kernel.org/all/20210103142726.GC30643@xsang-OptiPlex-9020/Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20211008111626.151570317@infradead.org
      cf2a85ef
    • Kees Cook's avatar
      Revert "proc/wchan: use printk format instead of lookup_symbol_name()" · 54354c6a
      Kees Cook authored
      This reverts commit 152c432b.
      
      When a kernel address couldn't be symbolized for /proc/$pid/wchan, it
      would leak the raw value, a potential information exposure. This is a
      regression compared to the safer pre-v5.12 behavior.
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Reported-by: default avatarVito Caputo <vcaputo@pengaru.com>
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20211008111626.090829198@infradead.org
      54354c6a
  5. 14 Oct, 2021 7 commits
  6. 07 Oct, 2021 4 commits
  7. 06 Oct, 2021 1 commit
    • Peter Zijlstra's avatar
      sched: Fix DEBUG && !SCHEDSTATS warn · 769fdf83
      Peter Zijlstra authored
      When !SCHEDSTATS schedstat_enabled() is an unconditional 0 and the
      whole block doesn't exist, however GCC figures the scoped variable
      'stats' is unused and complains about it.
      
      Upgrade the warning from -Wunused-variable to -Wunused-but-set-variable
      by writing it in two statements. This fixes the build because the new
      warning is in W=1.
      
      Given that whole if(0) {} thing, I don't feel motivated to change
      things overly much and quite strongly feel this is the compiler being
      daft.
      
      Fixes: cb3e971c435d ("sched: Make struct sched_statistics independent of fair sched class")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      769fdf83
  8. 05 Oct, 2021 10 commits