• Paul Mackerras's avatar
    [PATCH] PPC64 Move thread_info flags to its own cache line · de10f9d4
    Paul Mackerras authored
    This patch fixes a problem I have been seeing since all the preempt
    changes went in, which is that ppc64 SMP systems would livelock
    randomly if preempt was enabled.
    
    It turns out that what was happening was that one cpu was spinning in
    spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly
    doing preempt_enable() and preempt_disable() calls.  The other cpu had
    the lock and was trying to set the TIF_NEED_RESCHED flag for the task
    running on the first cpu.  That is an atomic operation which has to be
    retried if another cpu writes to the same cacheline between the load
    and the store, which the other cpu was doing every time it did
    preempt_enable() or preempt_disable().
    
    I decided to move the thread_info flags field into the next cache
    line, since it is the only field that would regularly be modified by
    cpus other than the one running the task that owns the thread_info.
    (OK possibly the `cpu' field would be on a rebalance; I don't know the
    rebalancing code, but that should be pretty infrequent.)  Thus, moving
    the flags field seems like a good idea generally as well as solving the
    immediate problem.
    
    For the record I am pretty unhappy with the code we use for spin_lock
    et al. with preemption turned on (the BUILD_LOCK_OPS stuff in
    spinlock.c).  For a start we do the atomic op (_raw_spin_trylock) each
    time around the loop.  That is going to be generating a lot of
    unnecessary bus (or fabric) traffic.  Instead, after we fail to get
    the lock we should poll it with simple loads until we see that it is
    clear and then retry the atomic op.  Assuming a reasonable cache
    design, the loads won't generate any bus traffic until another cpu
    writes to the cacheline containing the lock.
    
    Secondly we have lost the __spin_yield call that we had on ppc64,
    which is an important optimization when we are running under the
    hypervisor.  I can't just put that in cpu_relax because I need to know
    which (virtual) cpu is holding the lock, so that I can tell the
    hypervisor which virtual cpu to give my time slice to.  That
    information is stored in the lock variable, which is why __spin_yield
    needs the address of the lock.
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    de10f9d4
thread_info.h 3.57 KB