• Palmer Dabbelt's avatar
    Merge patch series "riscv: Reduce ARCH_KMALLOC_MINALIGN to 8" · 52b77c28
    Palmer Dabbelt authored
    Jisheng Zhang <jszhang@kernel.org> says:
    
    Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E
    64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel
    Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus
    it brings some bad effects to coherent platforms:
    
    Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and
    kmalloc-8 slab caches don't exist any more, they are replaced with
    either kmalloc-128 or kmalloc-64.
    
    Secondly, larger than necessary kmalloc aligned allocations results
    in unnecessary cache/TLB pressure.
    
    This issue also exists on arm64 platforms. From last year, Catalin
    tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from
    ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to
    dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage
    in various drivers with ARCH_DMA_MINALIGN etc.[1]
    
    One fact we can make use of for riscv: if the CPU doesn't support
    ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on
    Catalin's work and above fact, we can easily solve the kmalloc align
    issue for riscv: we can override dma_get_cache_alignment(), then let
    it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know
    the underlying HW neither supports ZICBOM nor supports T-HEAD CMO.
    
    So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the
    devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the
    kmalloc minimum alignment, nothing changed in this case. This case
    can be improved in the future once we see such platforms in mainline.
    
    After this patch, a simple test of booting to a small buildroot rootfs
    on qemu shows:
    
    kmalloc-96           5041    5041     96  ...
    kmalloc-64           9606    9606     64  ...
    kmalloc-32           5128    5128     32  ...
    kmalloc-16           7682    7682     16  ...
    kmalloc-8           10246   10246      8  ...
    
    So we save about 1268KB memory. The saving will be much larger in normal
    OS env on real HW platforms.
    
    patch1 allows kmalloc() caches aligned to the smallest value.
    patch2 enables DMA_BOUNCE_UNALIGNED_KMALLOC.
    
    After this series:
    
    As for coherent platforms, kmalloc-{8,16,32,96} caches come back on
    coherent both RV32 and RV64 platforms, I.E !ZICBOM and !THEAD_CMO.
    
    As for noncoherent RV32 platforms, nothing changed.
    
    As for noncoherent RV64 platforms, I.E either ZICBOM or THEAD_CMO, the
    above kmalloc caches also come back if > 4GB memory or users pass
    "swiotlb=mmnn,force" to force swiotlb creation if <= 4GB memory. How
    much mmnn should be depends on the specific platform, it needs to be
    tried and tested all possible usage case on the specific hardware. For
    example, I can use the minimal I/O TLB slabs on Sipeed M1S Dock.
    
    * b4-shazam-merge:
      riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
      riscv: allow kmalloc() caches aligned to the smallest value
    
    Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1]
    Link: https://lore.kernel.org/r/20230718152214.2907-1-jszhang@kernel.org
    
    Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
    52b77c28
Kconfig 27.7 KB