• Saravana Kannan's avatar
    arm64/module: Optimize module load time by optimizing PLT counting · d4e03409
    Saravana Kannan authored
    When loading a module, module_frob_arch_sections() tries to figure out
    the number of PLTs that'll be needed to handle all the RELAs. While
    doing this, it tries to dedupe PLT allocations for multiple
    R_AARCH64_CALL26 relocations to the same symbol. It does the same for
    R_AARCH64_JUMP26 relocations.
    
    To make checks for duplicates easier/faster, it sorts the relocation
    list by type, symbol and addend. That way, to check for a duplicate
    relocation, it just needs to compare with the previous entry.
    
    However, sorting the entire relocation array is unnecessary and
    expensive (O(n log n)) because there are a lot of other relocation types
    that don't need deduping or can't be deduped.
    
    So this commit partitions the array into entries that need deduping and
    those that don't. And then sorts just the part that needs deduping. And
    when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
    because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
    if it's disabled.
    
    This gives significant reduction in module load time for modules with
    large number of relocations with no measurable impact on modules with a
    small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
    enabled, these were the results for a few downstream modules:
    
    Module		Size (MB)
    wlan		14
    video codec	3.8
    drm		1.8
    IPA		2.5
    audio		1.2
    gpu		1.8
    
    Without this patch:
    Module		Number of entries sorted	Module load time (ms)
    wlan		243739				283
    video codec	74029				138
    drm		53837				67
    IPA		42800				90
    audio		21326				27
    gpu		20967				32
    
    Total time to load all these module: 637 ms
    
    With this patch:
    Module		Number of entries sorted	Module load time (ms)
    wlan		22454				61
    video codec	10150				47
    drm		13014				40
    IPA		8097				63
    audio		4606				16
    gpu		6527				20
    
    Total time to load all these modules: 247
    
    Time saved during boot for just these 6 modules: 390 ms
    Signed-off-by: default avatarSaravana Kannan <saravanak@google.com>
    Acked-by: default avatarWill Deacon <will@kernel.org>
    Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Link: https://lore.kernel.org/r/20200623011803.91232-1-saravanak@google.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    d4e03409
module-plts.c 10.8 KB