1. 13 Jan, 2005 7 commits
    • Paul Mackerras's avatar
      [PATCH] PPC64 Move thread_info flags to its own cache line · de10f9d4
      Paul Mackerras authored
      This patch fixes a problem I have been seeing since all the preempt
      changes went in, which is that ppc64 SMP systems would livelock
      randomly if preempt was enabled.
      
      It turns out that what was happening was that one cpu was spinning in
      spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly
      doing preempt_enable() and preempt_disable() calls.  The other cpu had
      the lock and was trying to set the TIF_NEED_RESCHED flag for the task
      running on the first cpu.  That is an atomic operation which has to be
      retried if another cpu writes to the same cacheline between the load
      and the store, which the other cpu was doing every time it did
      preempt_enable() or preempt_disable().
      
      I decided to move the thread_info flags field into the next cache
      line, since it is the only field that would regularly be modified by
      cpus other than the one running the task that owns the thread_info.
      (OK possibly the `cpu' field would be on a rebalance; I don't know the
      rebalancing code, but that should be pretty infrequent.)  Thus, moving
      the flags field seems like a good idea generally as well as solving the
      immediate problem.
      
      For the record I am pretty unhappy with the code we use for spin_lock
      et al. with preemption turned on (the BUILD_LOCK_OPS stuff in
      spinlock.c).  For a start we do the atomic op (_raw_spin_trylock) each
      time around the loop.  That is going to be generating a lot of
      unnecessary bus (or fabric) traffic.  Instead, after we fail to get
      the lock we should poll it with simple loads until we see that it is
      clear and then retry the atomic op.  Assuming a reasonable cache
      design, the loads won't generate any bus traffic until another cpu
      writes to the cacheline containing the lock.
      
      Secondly we have lost the __spin_yield call that we had on ppc64,
      which is an important optimization when we are running under the
      hypervisor.  I can't just put that in cpu_relax because I need to know
      which (virtual) cpu is holding the lock, so that I can tell the
      hypervisor which virtual cpu to give my time slice to.  That
      information is stored in the lock variable, which is why __spin_yield
      needs the address of the lock.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      de10f9d4
    • Paul Mackerras's avatar
      [PATCH] PPC64 Add PREEMPT_BKL option · c08c4dd5
      Paul Mackerras authored
      This patch adds the PREEMPT_BKL config option for PPC64, shamelessly
      stolen from the i386 version.  I have this turned on in the kernel on
      my desktop G5 and it seems to be just fine.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c08c4dd5
    • Paul Mackerras's avatar
      [PATCH] PPC64 can do preempt debug too · 44462c19
      Paul Mackerras authored
      This patch enables the DEBUG_PREEMPT config option for PPC64.  I have
      this turned on on my desktop G5 and it isn't finding any problems.
      (It did find one problem, in flush_tlb_pending(), that I have just
      sent a patch for.)
      
      BTW, do we really need to restrict which architectures the config
      option is available on?
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      44462c19
    • Paul Mackerras's avatar
      [PATCH] PPC64 Call preempt_schedule on exception exit · f4d0d3c5
      Paul Mackerras authored
      This patch mirrors the recent changes on x86 to call preempt_schedule
      rather than schedule in the exception exit path, in the case where the
      preempt_count is zero and the TIF_NEED_RESCHED bit is set.
      
      I'm a little concerned that this means that we have a window where
      interrupts are enabled and we are on our way into preempt_schedule,
      but preempt_count is still zero.  Ingo's proposed preempt_schedule_irq
      would fix this, and I think something like that should go in.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f4d0d3c5
    • Paul Mackerras's avatar
      [PATCH] PPC64 Disable preemption in flush_tlb_pending · ec72859c
      Paul Mackerras authored
      The preempt debug stuff found a place where we were using
      smp_processor_id() without having preemption disabled, in
      flush_tlb_pending.  This patch fixes it by using get_cpu_var and
      put_cpu_var instead of the __get_cpu_var variant.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ec72859c
    • Jens Axboe's avatar
      [PATCH] possible rq starvation on oom · 80ce63d3
      Jens Axboe authored
      I stumbled across this the other day. The block layer only uses a single
      memory pool for request allocation, so it's very possible for eg writes
      to have allocated them all at any point in time. If that is the case and
      the machine is low on memory, a reader attempting to allocate a request
      and failing in blk_alloc_request() can get stuck for a long time since
      no one is there to wake it up.
      
      The solution is either to add the extra mempool so both reads and writes
      have one, or attempt to handle the situation. I chose the latter, to
      save the extra memory required for the additional mempool with
      BLKDEV_MIN_RQ statically allocated requests per-queue.
      
      If a read allocation fails and we have no readers in flight for this
      queue, mark us rq-starved so that the next write being freed will wake
      up the sleeping reader(s). Same situation would happen for writes as
      well of course, it's just a lot more unlikely.
      Signed-off-by: default avatarJens Axboe <axboe@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      80ce63d3
    • Jens Axboe's avatar
      [PATCH] Don't enable ata over eth by default · 6ddb58de
      Jens Axboe authored
      "ATA over Ethernet support" should not default to 'm', it doesn't make
      any sense for a special case driver to do so.
      Signed-off-by: default avatarJens Axboe <axboe@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6ddb58de
  2. 12 Jan, 2005 33 commits