• Russell King's avatar
    ARM: move heavy barrier support out of line · f8130906
    Russell King authored
    The existing memory barrier macro causes a significant amount of code
    to be inserted inline at every call site.  For example, in
    gpio_set_irq_type(), we have this for mb():
    
    c0344c08:       f57ff04e        dsb     st
    c0344c0c:       e59f8190        ldr     r8, [pc, #400]  ; c0344da4 <gpio_set_irq_type+0x230>
    c0344c10:       e3590004        cmp     r9, #4
    c0344c14:       e5983014        ldr     r3, [r8, #20]
    c0344c18:       0a000054        beq     c0344d70 <gpio_set_irq_type+0x1fc>
    c0344c1c:       e3530000        cmp     r3, #0
    c0344c20:       0a000004        beq     c0344c38 <gpio_set_irq_type+0xc4>
    c0344c24:       e50b2030        str     r2, [fp, #-48]  ; 0xffffffd0
    c0344c28:       e50bc034        str     ip, [fp, #-52]  ; 0xffffffcc
    c0344c2c:       e12fff33        blx     r3
    c0344c30:       e51bc034        ldr     ip, [fp, #-52]  ; 0xffffffcc
    c0344c34:       e51b2030        ldr     r2, [fp, #-48]  ; 0xffffffd0
    c0344c38:       e5963004        ldr     r3, [r6, #4]
    
    Moving the outer_cache_sync() call out of line reduces the impact of
    the barrier:
    
    c0344968:       f57ff04e        dsb     st
    c034496c:       e35a0004        cmp     sl, #4
    c0344970:       e50b2030        str     r2, [fp, #-48]  ; 0xffffffd0
    c0344974:       0a000044        beq     c0344a8c <gpio_set_irq_type+0x1b8>
    c0344978:       ebf363dd        bl      c001d8f4 <arm_heavy_mb>
    c034497c:       e5953004        ldr     r3, [r5, #4]
    
    This should reduce the cache footprint of this code.  Overall, this
    results in a reduction of around 20K in the kernel size:
    
        text    data      bss      dec     hex filename
    10773970  667392 10369656 21811018 14ccf4a ../build/imx6/vmlinux-old
    10754219  667392 10369656 21791267 14c8223 ../build/imx6/vmlinux-new
    
    Another advantage to this approach is that we can finally resolve the
    issue of SoCs which have their own memory barrier requirements within
    multiplatform kernels (such as OMAP.)  Here, the bus interconnects
    need additional handling to ensure that writes become visible in the
    correct order (eg, between dma_map() operations, writes to DMA
    coherent memory, and MMIO accesses.)
    Acked-by: default avatarTony Lindgren <tony@atomide.com>
    Acked-by: default avatarRichard Woodruff <r-woodruff2@ti.com>
    Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
    f8130906
cache-l2x0.c 1.6 KB