1. 24 Aug, 2010 5 commits
    • Dave Chinner's avatar
      xfs: handle negative wbc->nr_to_write during sync writeback · efceab1d
      Dave Chinner authored
      During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will
      go negative on inodes with more than 1024 dirty pages due to
      implementation details of write_cache_pages(). Currently XFS will
      abort page clustering in writeback once nr_to_write drops below
      zero, and so for data integrity writeback we will do very
      inefficient page at a time allocation and IO submission for inodes
      with large numbers of dirty pages.
      
      Fix this by only aborting the page clustering code when
      wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      efceab1d
    • Dave Chinner's avatar
      writeback: write_cache_pages doesn't terminate at nr_to_write <= 0 · 546a1924
      Dave Chinner authored
      I noticed XFS writeback in 2.6.36-rc1 was much slower than it should have
      been. Enabling writeback tracing showed:
      
          flush-253:16-8516  [007] 1342952.351608: wbc_writepage: bdi 253:16: towrt=1024 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [007] 1342952.351654: wbc_writepage: bdi 253:16: towrt=1023 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369520: wbc_writepage: bdi 253:16: towrt=0 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369542: wbc_writepage: bdi 253:16: towrt=-1 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369549: wbc_writepage: bdi 253:16: towrt=-2 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
      
      Writeback is not terminating in background writeback if ->writepage is
      returning with wbc->nr_to_write == 0, resulting in sub-optimal single page
      writeback on XFS.
      
      Fix the write_cache_pages loop to terminate correctly when this situation
      occurs and so prevent this sub-optimal background writeback pattern. This
      improves sustained sequential buffered write performance from around
      250MB/s to 750MB/s for a 100GB file on an XFS filesystem on my 8p test VM.
      
      Cc:<stable@kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      546a1924
    • Dave Chinner's avatar
      xfs: fix untrusted inode number lookup · 4536f2ad
      Dave Chinner authored
      Commit 7124fe0a ("xfs: validate untrusted inode
      numbers during lookup") changes the inode lookup code to do btree lookups for
      untrusted inode numbers. This change made an invalid assumption about the
      alignment of inodes and hence incorrectly calculated the first inode in the
      cluster. As a result, some inode numbers were being incorrectly considered
      invalid when they were actually valid.
      
      The issue was not picked up by the xfstests suite because it always runs fsr
      and dump (the two utilities that utilise the bulkstat interface) on cache hot
      inodes and hence the lookup code in the cold cache path was not sufficiently
      exercised to uncover this intermittent problem.
      
      Fix the issue by relaxing the btree lookup criteria and then checking if the
      record returned contains the inode number we are lookup for. If it we get an
      incorrect record, then the inode number is invalid.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      4536f2ad
    • Dave Chinner's avatar
      xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE · 5b3eed75
      Dave Chinner authored
      Under heavy load parallel metadata loads (e.g. dbench), we can fail
      to mark all the inodes in a cluster being freed as XFS_ISTALE as we
      skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
      When this happens and the inode cluster buffer has already been
      marked stale and freed, inode reclaim can try to write the inode out
      as it is dirty and not marked stale. This can result in writing th
      metadata to an freed extent, or in the case it has already
      been overwritten trigger a magic number check failure and return an
      EUCLEAN error such as:
      
      Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117
      
      Fix this by ensuring that we hoover up all in memory inodes in the
      cluster and mark them XFS_ISTALE when freeing the cluster.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      5b3eed75
    • Dave Chinner's avatar
      xfs: unlock items before allowing the CIL to commit · d17c701c
      Dave Chinner authored
      When we commit a transaction using delayed logging, we need to
      unlock the items in the transaciton before we unlock the CIL context
      and allow it to be checkpointed. If we unlock them after we release
      the CIl context lock, the CIL can checkpoint and complete before
      we free the log items. This breaks stale buffer item unlock and
      unpin processing as there is an implicit assumption that the unlock
      will occur before the unpin.
      
      Also, some log items need to store the LSN of the transaction commit
      in the item (inodes and EFIs) and so can race with other transaction
      completions if we don't prevent the CIL from checkpointing before
      the unlock occurs.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      d17c701c
  2. 23 Aug, 2010 1 commit
  3. 22 Aug, 2010 12 commits
  4. 21 Aug, 2010 6 commits
  5. 20 Aug, 2010 16 commits