• Andrew Morton's avatar
    [PATCH] Fix busy-wait with writeback to large queues · 5fa9d488
    Andrew Morton authored
    blk_congestion_wait() is a utility function which various callers use
    to throttle themselves to the rate at which the IO system can retire
    writes.
    
    The current implementation refuses to wait if no queues are "congested"
    (>75% of requests are in flight).
    
    That doesn't work if the queue is so huge that it can hold more than
    40% (dirty_ratio) of memory.  The queue simply cannot enter congestion
    because the VM refuses to allow more than 40% of memory to be dirtied.
    (This spin could happen with a lot of normal-sized queues too)
    
    So this patch simply changes blk_congestion_wait() to throttle even if
    there are no congested queues.  It will cause the caller to sleep until
    someone puts back a write request against any queue.  (Nobody uses
    blk_congestion_wait for read congestion).
    
    The patch adds new state to backing_dev_info->state: a couple of flags
    which indicate whether there are _any_ reads or writes in flight
    against that queue.  This was added to prevent blk_congestion_wait()
    from taking a nap when there are no writes at all in flight.
    
    But the "are there any reads" info could be used to defer background
    writeout from pdflush, to reduce read-vs-write competition.  We'll see.
    
    Because the large request queues have made a fundamental change:
    blocking in get_request_wait() has been the main form of VM throttling
    for years.  But with large queues it doesn't work any more - all
    throttling happens in blk_congestion_wait().
    
    Also, change io_schedule_timeout() to propagate the schedule_timeout()
    return value.  I was using that in some debug code, but it should have
    been like that from day one.
    5fa9d488
sched.c 54.2 KB