• Kees Cook's avatar
    randomize_kstack: Improve stack alignment codegen · 872bb37f
    Kees Cook authored
    The codgen for adding architecture-specific stack alignment to the
    effective alloca() usage is somewhat inefficient and allows a bit to get
    carried beyond the desired entropy range. This isn't really a problem,
    but it's unexpected and the codegen is kind of bad.
    
    Quoting Mark[1], the disassembly for arm64's invoke_syscall() looks like:
    
    	// offset = raw_cpu_read(kstack_offset)
    	mov     x4, sp
    	adrp    x0, kstack_offset
    	mrs     x5, tpidr_el1
    	add     x0, x0, #:lo12:kstack_offset
    	ldr     w0, [x0, x5]
    
    	// offset = KSTACK_OFFSET_MAX(offset)
    	and     x0, x0, #0x3ff
    
    	// alloca(offset)
    	add     x0, x0, #0xf
    	and     x0, x0, #0x7f0
    	sub     sp, x4, x0
    
    ... which in C would be:
    
    	offset = raw_cpu_read(kstack_offset)
    	offset &= 0x3ff;			// [0x0, 0x3ff]
    	offset += 0xf;				// [0xf, 0x40e]
    	offset &= 0x7f0;			// [0x0,
    
    ... so when *all* bits [3:0] are 0, they'll have no impact, and when
    *any* of bits [3:0] are 1 they'll trigger a carry into bit 4, which
    could ripple all the way up and spill into bit 10.
    
    Switch the masking in KSTACK_OFFSET_MAX() to explicitly clear the bottom
    bits to avoid the rounding by using 0b1111110000 instead of 0b1111111111:
    
    	// offset = raw_cpu_read(kstack_offset)
    	mov     x4, sp
    	adrp    x0, 0 <kstack_offset>
    	mrs     x5, tpidr_el1
    	add     x0, x0, #:lo12:kstack_offset
    	ldr     w0, [x0, x5]
    
    	// offset = KSTACK_OFFSET_MAX(offset)
    	and     x0, x0, #0x3f0
    
    	// alloca(offset)
    	sub     sp, x4, x0
    Suggested-by: default avatarMark Rutland <mark.rutland@arm.com>
    Link: https://lore.kernel.org/lkml/ZnVfOnIuFl2kNWkT@J2N7QTR9R3/ [1]
    Link: https://lore.kernel.org/r/20240702211612.work.576-kees@kernel.orgSigned-off-by: default avatarKees Cook <kees@kernel.org>
    872bb37f
randomize_kstack.h 3.96 KB