• Ingo Molnar's avatar
    [PATCH] improve preemption on SMP · 38e387ee
    Ingo Molnar authored
    SMP locking latencies are one of the last architectural problems that cause
    millisec-category scheduling delays.  CONFIG_PREEMPT tries to solve some of
    the SMP issues but there are still lots of problems remaining: spinlocks
    nested at multiple levels, spinning with irqs turned off, and non-nested
    spinning with preemption turned off permanently.
    
    The nesting problem goes like this: if a piece of kernel code (e.g.  the MM
    or ext3's journalling code) does the following:
    
    	spin_lock(&spinlock_1);
    	...
    	spin_lock(&spinlock_2);
    	...
    
    then even with CONFIG_PREEMPT enabled, current kernels may spin on
    spinlock_2 indefinitely.  A number of critical sections break their long
    paths by using cond_resched_lock(), but this does not break the path on
    SMP, because need_resched() *of the other CPU* is not set so
    cond_resched_lock() doesnt notice that a reschedule is due.
    
    to solve this problem i've introduced a new spinlock field,
    lock->break_lock, which signals towards the holding CPU that a
    spinlock-break is requested by another CPU.  This field is only set if a
    CPU is spinning in a spinlock function [at any locking depth], so the
    default overhead is zero.  I've extended cond_resched_lock() to check for
    this flag - in this case we can also save a reschedule.  I've added the
    lock_need_resched(lock) and need_lockbreak(lock) methods to check for the
    need to break out of a critical section.
    
    Another latency problem was that the stock kernel, even with CONFIG_PREEMPT
    enabled, didnt have any spin-nicely preemption logic for the following,
    commonly used SMP locking primitives: read_lock(), spin_lock_irqsave(),
    spin_lock_irq(), spin_lock_bh(), read_lock_irqsave(), read_lock_irq(),
    read_lock_bh(), write_lock_irqsave(), write_lock_irq(), write_lock_bh().
    Only spin_lock() and write_lock() [the two simplest cases] where covered.
    
    In addition to the preemption latency problems, the _irq() variants in the
    above list didnt do any IRQ-enabling while spinning - possibly resulting in
    excessive irqs-off sections of code!
    
    preempt-smp.patch fixes all these latency problems by spinning irq-nicely
    (if possible) and by requesting lock-breaks if needed.  Two
    architecture-level changes were necessary for this: the addition of the
    break_lock field to spinlock_t and rwlock_t, and the addition of the
    _raw_read_trylock() function.
    
    Testing done by Mark H Johnson and myself indicate SMP latencies comparable
    to the UP kernel - while they were basically indefinitely high without this
    patch.
    
    i successfully test-compiled and test-booted this patch ontop of BK-curr
    using the following .config combinations: SMP && PREEMPT, !SMP && PREEMPT,
    SMP && !PREEMPT and !SMP && !PREEMPT on x86, !SMP && !PREEMPT and SMP &&
    PREEMPT on x64.  I also test-booted x86 with the generic_read_trylock
    function to check that it works fine.  Essentially the same patch has been
    in testing as part of the voluntary-preempt patches for some time already.
    
    NOTE to architecture maintainers: generic_raw_read_trylock() is a crude
    version that should be replaced with the proper arch-optimized version
    ASAP.
    
    From: Hugh Dickins <hugh@veritas.com>
    
    The i386 and x86_64 _raw_read_trylocks in preempt-smp.patch are too
    successful: atomic_read() returns a signed integer.
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    38e387ee
sched.c 116 KB