• Sandeep Dhavale's avatar
    erofs: add per-cpu threads for decompression as an option · 3fffb589
    Sandeep Dhavale authored
    Using per-cpu thread pool we can reduce the scheduling latency compared
    to workqueue implementation. With this patch scheduling latency and
    variation is reduced as per-cpu threads are high priority kthread_workers.
    
    The results were evaluated on arm64 Android devices running 5.10 kernel.
    
    The table below shows resulting improvements of total scheduling latency
    for the same app launch benchmark runs with 50 iterations. Scheduling
    latency is the latency between when the task (workqueue kworker vs
    kthread_worker) became eligible to run to when it actually started
    running.
    +-------------------------+-----------+----------------+---------+
    |                         | workqueue | kthread_worker |  diff   |
    +-------------------------+-----------+----------------+---------+
    | Average (us)            |     15253 |           2914 | -80.89% |
    | Median (us)             |     14001 |           2912 | -79.20% |
    | Minimum (us)            |      3117 |           1027 | -67.05% |
    | Maximum (us)            |     30170 |           3805 | -87.39% |
    | Standard deviation (us) |      7166 |            359 |         |
    +-------------------------+-----------+----------------+---------+
    
    Background: Boot times and cold app launch benchmarks are very
    important to the Android ecosystem as they directly translate to
    responsiveness from user point of view. While EROFS provides
    a lot of important features like space savings, we saw some
    performance penalty in cold app launch benchmarks in few scenarios.
    Analysis showed that the significant variance was coming from the
    scheduling cost while decompression cost was more or less the same.
    
    Having per-cpu thread pool we can see from the above table that this
    variation is reduced by ~80% on average. This problem was discussed
    at LPC 2022. Link to LPC 2022 slides and talk at [1]
    
    [1] https://lpc.events/event/16/contributions/1338/
    
    [ Gao Xiang: At least, we have to add this until WQ_UNBOUND workqueue
                 issue [2] on many arm64 devices is resolved. ]
    [2] https://lore.kernel.org/r/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.comSigned-off-by: default avatarSandeep Dhavale <dhavale@google.com>
    Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
    Link: https://lore.kernel.org/r/20230208093322.75816-1-hsiangkao@linux.alibaba.com
    3fffb589
Kconfig 4.02 KB