• Waiman Long's avatar
    blk-cgroup: Optimize blkcg_rstat_flush() · 3b8cc629
    Waiman Long authored
    For a system with many CPUs and block devices, the time to do
    blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It
    can be especially problematic as interrupt is disabled during the flush.
    It was reported that it might take seconds to complete in some extreme
    cases leading to hard lockup messages.
    
    As it is likely that not all the percpu blkg_iostat_set's has been
    updated since the last flush, those stale blkg_iostat_set's don't need
    to be flushed in this case. This patch optimizes blkcg_rstat_flush()
    by keeping a lockless list of recently updated blkg_iostat_set's in a
    newly added percpu blkcg->lhead pointer.
    
    The blkg_iostat_set is added to a lockless list on the update side
    in blk_cgroup_bio_start(). It is removed from the lockless list when
    flushed in blkcg_rstat_flush(). Due to racing, it is possible that
    blk_iostat_set's in the lockless list may have no new IO stats to be
    flushed, but that is OK.
    
    To protect against destruction of blkg, a percpu reference is gotten
    when putting into the lockless list and put back when removed.
    
    When booting up an instrumented test kernel with this patch on a
    2-socket 96-thread system with cgroup v2, out of the 2051 calls to
    cgroup_rstat_flush() after bootup, 1788 of the calls were exited
    immediately because of empty lockless list. After an all-cpu kernel
    build, the ratio became 6295424/6340513. That was more than 99%.
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Acked-by: default avatarTejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20221105005902.407297-3-longman@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    3b8cc629
blk-cgroup.c 53.3 KB