• Andrew Morton's avatar
    [PATCH] fs-writeback rework. · 20b96b52
    Andrew Morton authored
    I've revisited all the superblock->inode->page writeback paths.  There
    were several silly things in there, and things were not as clear as they
    could be.
    
    scenario 1: create and dirty a MAP_SHARED segment over a sparse file,
    then exit.
    
      All the memory turns into dirty pagecache, but the kupdate function
      only writes it out at a trickle - 4 megabytes every thirty seconds.
      We should sync it all within 30 seconds.
    
      What's happening is that when writeback tries to write those pages,
      the filesystem needs to instantiate new blocks for them (they're over
      holes).  The filesystem runs mark_inode_dirty() within the writeback
      function.
    
      This redirtying of the inode while we're writing it out triggers
      some livelock avoidance code in __sync_single_inode().  That function
      says "ah, someone redirtied the file while I was writing it.  Let's
      move the file to the new end of the superblock dirty list and write
      it out later." Problem is, writeback dirtied the inode itself.
    
      (It is rather silly that mark_inode_dirty() sets I_DIRTY_PAGES when
      clearly no pages have been dirtied.  Fixing that up would be a
      largish work, so work around it here).
    
      So this patch just removes the livelock avoidance from
      __sync_single_inode().  It is no longer needed anyway - writeback
      livelock is now avoided (in all writeback paths) by writing a finite
      number of pages.
    
    scenario 2: an application is continuously dirtying a 200 megabyte
    file, and your disk has a bandwidth of less than 40 megabytes/sec.
    
      What happens is that once 30 seconds passes, pdflush starts writing
      out the file.  And because that writeout will take more than five
      seconds (a `kupdate' interval), pdflush just keeps writing it out
      forever - continuous I/O.
    
      What we _want_ to happen is that the 200 megabytes gets written,
      and then IO stops for thirty seconds (minus the writeout period).  So
      the file is fully synced every thirty seconds.
    
    The patch solves this by using mapping->io_pages more intelligently.
    When the time comes to write the file out, move all the dirty pages
    onto io_pages.  That is a "batch of pages for this kupdate round".
    When io_pages is empty, we know we're done.
    
    The address_space_operations.writepages() API is changed!  It now only
    needs to write the pages which the caller placed on mapping->io_pages.
    
    This conceptually cleans things up a bit, by more clearly defining the
    role of ->io_pages, and the motion between the various mapping lists.
    
    The treatment of sb->s_dirty and sb->s_io is now conceptually identical
    to mapping->dirty_pages and mapping->io_pages: move the items-to-be
    written onto ->s_io/io_pages, alk walk that list.  As inodes (or pages)
    are written, move them over to the clean/locked/dirty lists.
    
    Oh, scenario 3: start an app whcih continuously overwrites a 5 meg
    file.  Wait five seconds, start another, wait 5 seconds, start another.
     What we _should_ see is three 5-meg writes, five seconds apart, every
    thirty seconds.  That did all sorts of odd things.  It now does the
    right thing.
    20b96b52
fs-writeback.c 15.8 KB