• Neeraj Upadhyay's avatar
    srcu: Make expedited RCU grace periods block even less frequently · 4f2bfd94
    Neeraj Upadhyay authored
    The purpose of commit 282d8998 ("srcu: Prevent expedited GPs
    and blocking readers from consuming CPU") was to prevent a long
    series of never-blocking expedited SRCU grace periods from blocking
    kernel-live-patching (KLP) progress.  Although it was successful, it also
    resulted in excessive boot times on certain embedded workloads running
    under qemu with the "-bios QEMU_EFI.fd" command line.  Here "excessive"
    means increasing the boot time up into the three-to-four minute range.
    This increase in boot time was due to the more than 6000 back-to-back
    invocations of synchronize_rcu_expedited() within the KVM host OS, which
    in turn resulted from qemu's emulation of a long series of MMIO accesses.
    
    Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace
    periods") did not significantly help this particular use case.
    
    Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
    value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
    of non-sleeping per phase counts on a system with preemption enabled,
    and observed the following boot times:
    
    +──────────────────────────+────────────────+
    | SRCU_MAX_NODELAY_PHASE   | Boot time (s)  |
    +──────────────────────────+────────────────+
    | 100                      | 30.053         |
    | 150                      | 25.151         |
    | 200                      | 20.704         |
    | 250                      | 15.748         |
    | 500                      | 11.401         |
    | 1000                     | 11.443         |
    | 10000                    | 11.258         |
    | 1000000                  | 11.154         |
    +──────────────────────────+────────────────+
    
    Analysis on the experiment results show additional improvements with
    CPU-bound delays approaching one jiffy in duration. This improvement was
    also seen when number of per-phase iterations were scaled to one jiffy.
    
    This commit therefore scales per-grace-period phase number of non-sleeping
    polls so that non-sleeping polls extend for about one jiffy. In addition,
    the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
    replaced with a simple check for an expedited grace period.  This change
    schedules callback invocation immediately after expedited grace periods
    complete, which results in greatly improved boot times.  Testing done
    by Marc and Zhangfei confirms that this change recovers most of the
    performance degradation in boottime; for CONFIG_HZ_250 configuration,
    specifically, boot times improve from 3m50s to 41s on Marc's setup;
    and from 2m40s to ~9.7s on Zhangfei's setup.
    
    In addition to the changes to default per phase delays, this
    change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
    srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
    This allows users to configure the srcu grace period scanning delays in
    order to more quickly react to additional use cases.
    
    Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods")
    Fixes: 282d8998 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
    Reported-by: default avatarZhangfei Gao <zhangfei.gao@linaro.org>
    Reported-by: default avataryueluck <yueluck@163.com>
    Signed-off-by: default avatarNeeraj Upadhyay <quic_neeraju@quicinc.com>
    Tested-by: default avatarMarc Zyngier <maz@kernel.org>
    Tested-by: default avatarZhangfei Gao <zhangfei.gao@linaro.org>
    Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    4f2bfd94
srcutree.c 59.7 KB