1. 27 Apr, 2013 9 commits
  2. 21 Apr, 2013 8 commits
  3. 16 Apr, 2013 2 commits
  4. 05 Apr, 2013 1 commit
    • Dave Chinner's avatar
      xfs: don't free EFIs before the EFDs are committed · 666d644c
      Dave Chinner authored
      Filesystems are occasionally being shut down with this error:
      
      xfs_trans_ail_delete_bulk: attempting to delete a log item that is
      not in the AIL.
      
      It was diagnosed to be related to the EFI/EFD commit order when the
      EFI and EFD are in different checkpoints and the EFD is committed
      before the EFI here:
      
      http://oss.sgi.com/archives/xfs/2013-01/msg00082.html
      
      The real problem is that a single bit cannot fully describe the
      states that the EFI/EFD processing can be in. These completion
      states are:
      
      EFI			EFI in AIL	EFD		Result
      committed/unpinned	Yes		committed	OK
      committed/pinned	No		committed	Shutdown
      uncommitted		No		committed	Shutdown
      
      
      Note that the "result" field is what should happen, not what does
      happen. The current logic is broken and handles the first two cases
      correctly by luck.  That is, the code will free the EFI if the
      XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
      inverted logic "works" because if both EFI and EFD are committed,
      then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
      bit, and the second frees the EFI item. Hence as long as
      xfs_efi_item_committed() has been called, everything appears to be
      fine.
      
      It is the third case where the logic fails - where
      xfs_efd_item_committed() is called before xfs_efi_item_committed(),
      and that results in the EFI being freed before it has been
      committed. That is the bug that triggered the shutdown, and hence
      keeping track of whether the EFI has been committed or not is
      insufficient to correctly order the EFI/EFD operations w.r.t. the
      AIL.
      
      What we really want is this: the EFI is always placed into the
      AIL before the last reference goes away. The only way to guarantee
      that is that the EFI is not freed until after it has been unpinned
      *and* the EFD has been committed. That is, restructure the logic so
      that the only case that can occur is the first case.
      
      This can be done easily by replacing the XFS_EFI_COMMITTED with an
      EFI reference count. The EFI is initialised with it's own count, and
      that is not released until it is unpinned. However, there is a
      complication to this method - the high level EFI/EFD code in
      xfs_bmap_finish() does not hold direct references to the EFI
      structure, and runs a transaction commit between the EFI and EFD
      processing. Hence the EFI can be freed even before the EFD is
      created using such a method.
      
      Further, log recovery uses the AIL for tracking EFI/EFDs that need
      to be recovered, but it uses the AIL *differently* to the EFI
      transaction commit. Hence log recovery never pins or unpins EFIs, so
      we can't drop the EFI reference count indirectly to free the EFI.
      
      However, this doesn't prevent us from using a reference count here.
      There is a 1:1 relationship between EFIs and EFDs, so when we
      initialise the EFI we can take a reference count for the EFD as
      well. This solves the xfs_bmap_finish() issue - the EFI will never
      be freed until the EFD is processed. In terms of log recovery,
      during the committing of the EFD we can look for the
      XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
      thereby ensuring everything works correctly there as well.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      666d644c
  5. 03 Apr, 2013 1 commit
  6. 22 Mar, 2013 7 commits
  7. 14 Mar, 2013 3 commits
  8. 07 Mar, 2013 6 commits
    • Dave Chinner's avatar
      xfs: rearrange some code in xfs_bmap for better locality · 9e5987a7
      Dave Chinner authored
      xfs_bmap.c is a big file, and some of the related code is spread all
      throughout the file requiring function prototypes for static
      function and jumping all through the file to follow a single call
      path. Rearrange the code so that:
      
      	a) related functionality is grouped together; and
      	b) functions are grouped in call dependency order
      
      While the diffstat is large, there are no code changes in the patch;
      it is just moving the functionality around and removing the function
      prototypes at the top of the file. The resulting layout of the code
      is as follows (top of file to bottom):
      
      	- miscellaneous helper functions
      	- extent tree block counting routines
      	- debug/sanity checking code
      	- bmap free list manipulation functions
      	- inode fork format manipulation functions
      	- internal/external extent tree seach functions
      	- extent tree manipulation functions used during allocation
      	- functions used during extent read/allocate/removal
      	  operations (i.e. xfs_bmapi_write, xfs_bmapi_read,
      	  xfs_bunmapi and xfs_getbmap)
      
      This means that following logic paths through the bmapi code is much
      simpler - most of the code relevant to a specific operation is now
      clustered together rather than spread all over the file....
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      9e5987a7
    • Akinobu Mita's avatar
      xfs: rename random32() to prandom_u32() · ecb3403d
      Akinobu Mita authored
      Use more preferable function name which implies using a pseudo-random
      number generator.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: <bpm@sgi.com>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: xfs@oss.sgi.com
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      ecb3403d
    • Dave Chinner's avatar
      xfs: don't verify buffers after IO errors · d5929de8
      Dave Chinner authored
      When we read a buffer, we might get an error from the underlying
      block device and not the real data. Hence if we get an IO error, we
      shouldn't run the verifier but instead just pass the IO error
      straight through.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      d5929de8
    • Mark Tinguely's avatar
      xfs: fix xfs_iomap_eof_prealloc_initial_size type · e8108ced
      Mark Tinguely authored
      Fix the return type of xfs_iomap_eof_prealloc_initial_size() to
      xfs_fsblock_t to reflect the fact that the return value may be an
      unsigned 64 bits if XFS_BIG_BLKNOS is defined.
      Signed-off-by: default avatarMark Tinguely <tinguely@sgi.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      e8108ced
    • Brian Foster's avatar
      xfs: increase prealloc size to double that of the previous extent · e114b5fc
      Brian Foster authored
      The updated speculative preallocation algorithm for handling sparse
      files can becomes less effective in situations with a high number of
      concurrent, sequential writers. The number of writers and amount of
      available RAM affect the writeback bandwidth slicing algorithm,
      which in turn affects the block allocation pattern of XFS. For
      example, running 32 sequential writers on a system with 32GB RAM,
      preallocs become fixed at a value of around 128MB (instead of
      steadily increasing to the 8GB maximum as sequential writes
      proceed).
      
      Update the speculative prealloc heuristic to base the size of the
      next prealloc on double the size of the preceding extent. This
      preserves the original aggressive speculative preallocation
      behavior and continues to accomodate sparse files at a slight cost
      of increasing the size of preallocated data regions following holes
      of sparse files.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      e114b5fc
    • Brian Foster's avatar
      xfs: fix potential infinite loop in xfs_iomap_prealloc_size() · e78c420b
      Brian Foster authored
      If freesp == 0, we could end up in an infinite loop while squashing
      the preallocation. Break the loop when we've killed the prealloc
      entirely.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      e78c420b
  9. 03 Mar, 2013 3 commits