1. 29 Sep, 2010 3 commits
    • Dave Chinner's avatar
      xfs: force background CIL push under sustained load · 80168676
      Dave Chinner authored
      I have been seeing occasional pauses in transaction throughput up to
      30s long under heavy parallel workloads. The only notable thing was
      that the xfsaild was trying to be active during the pauses, but
      making no progress. It was running exactly 20 times a second (on the
      50ms no-progress backoff), and the number of pushbuf events was
      constant across this time as well.  IOWs, the xfsaild appeared to be
      stuck on buffers that it could not push out.
      
      Further investigation indicated that it was trying to push out inode
      buffers that were pinned and/or locked. The xfsbufd was also getting
      woken at the same frequency (by the xfsaild, no doubt) to push out
      delayed write buffers. The xfsbufd was not making any progress
      because all the buffers in the delwri queue were pinned. This scan-
      and-make-no-progress dance went one in the trace for some seconds,
      before the xfssyncd came along an issued a log force, and then
      things started going again.
      
      However, I noticed something strange about the log force - there
      were way too many IO's issued. 516 log buffers were written, to be
      exact. That added up to 129MB of log IO, which got me very
      interested because it's almost exactly 25% of the size of the log.
      He delayed logging code is suppose to aggregate the minimum of 25%
      of the log or 8MB worth of changes before flushing. That's what
      really puzzled me - why did a log force write 129MB instead of only
      8MB?
      
      Essentially what has happened is that no CIL pushes had occurred
      since the previous tail push which cleared out 25% of the log space.
      That caused all the new transactions to block because there wasn't
      log space for them, but they kick the xfsaild to push the tail.
      However, the xfsaild was not making progress because there were
      buffers it could not lock and flush, and the xfsbufd could not flush
      them because they were pinned. As a result, both the xfsaild and the
      xfsbufd could not move the tail of the log forward without the CIL
      first committing.
      
      The cause of the problem was that the background CIL push, which
      should happen when 8MB of aggregated changes have been committed, is
      being held off by the concurrent transaction commit load. The
      background push does a down_write_trylock() which will fail if there
      is a concurrent transaction commit holding the push lock in read
      mode. With 8 CPUs all doing transactions as fast as they can, there
      was enough concurrent transaction commits to hold off the background
      push until tail-pushing could no longer free log space, and the halt
      would occur.
      
      It should be noted that there is no reason why it would halt at 25%
      of log space used by a single CIL checkpoint. This bug could
      definitely violate the "no transaction should be larger than half
      the log" requirement and hence result in corruption if the system
      crashed under heavy load. This sort of bug is exactly the reason why
      delayed logging was tagged as experimental....
      
      The fix is to start blocking background pushes once the threshold
      has been exceeded. Rework the threshold calculations to keep the
      amount of log space a CIL checkpoint can use to below that of the
      AIL push threshold to avoid the problem completely.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      80168676
    • Linus Torvalds's avatar
      Linux 2.6.36-rc6 · 899611ee
      Linus Torvalds authored
      899611ee
    • David Howells's avatar
      MN10300: Handle missing sys_cacheflush() when caching disabled · 62bdb288
      David Howells authored
      When caching is disabled on the MN10300 arch, the sys_cacheflush()
      function is removed by conditional stuff in the makefiles, but is still
      referred to by the syscall table.
      
      Provide a null version that just returns 0 when caching is disabled (or
      -EINVAL if the arguments are silly).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62bdb288
  2. 28 Sep, 2010 9 commits
  3. 27 Sep, 2010 28 commits