1. 17 Sep, 2019 1 commit
    • Bob Peterson's avatar
      gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps · f0b444b3
      Bob Peterson authored
      In function sweep_bh_for_rgrps, which is a helper for punch_hole,
      it uses variable buf_in_tr to keep track of when it needs to commit
      pending block frees on a partial delete that overflows the
      transaction created for the delete. The problem is that the
      variable was initialized at the start of function sweep_bh_for_rgrps
      but it was never cleared, even when starting a new transaction.
      
      This patch reinitializes the variable when the transaction is
      ended, so the next transaction starts out with it cleared.
      
      Fixes: d552a2b9 ("GFS2: Non-recursive delete")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      f0b444b3
  2. 06 Sep, 2019 1 commit
    • Andreas Gruenbacher's avatar
      gfs2: Improve mmap write vs. truncate consistency · b473bc2d
      Andreas Gruenbacher authored
      On filesystems with a block size smaller than PAGE_SIZE, page_mkwrite is
      called for each memory-mapped page before that page can be written to.
      When such a memory-mapped file is truncated down to size x which is not
      a multiple of the page size and then back to a larger size, the page
      straddling size x can end up with a partial block mapping.  In that
      case, make sure to mark that page read-only so that page_mkwrite will be
      called before the page can be written to the next time.
      
      (There is no point in marking the page straddling size x read-only when
      truncating down as writing to memory beyond the end of the file will
      result in SIGBUS instead of growing the file.)
      
      Fixes xfstests generic/029, generic/030 on filesystems with a block size
      smaller than PAGE_SIZE.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b473bc2d
  3. 04 Sep, 2019 5 commits
    • Bob Peterson's avatar
      gfs2: Use async glocks for rename · ad26967b
      Bob Peterson authored
      Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can
      reverse the roles of which directories are "old" and which are "new" for
      the purposes of rename. This can cause deadlocks where two nodes end up
      waiting for each other.
      
      There can be several layers of directory dependencies across many nodes.
      
      This patch fixes the problem by acquiring all gfs2_rename's inode glocks
      asychronously and waiting for all glocks to be acquired.  That way all
      inodes are locked regardless of the order.
      
      The timeout value for multiple asynchronous glocks is calculated to be
      the total of the individual wait times for each glock times two.
      
      Since gfs2_exchange is very similar to gfs2_rename, both functions are
      patched in the same way.
      
      A new async glock wait queue, sd_async_glock_wait, keeps a list of
      waiters for these events. If gfs2's holder_wake function detects an
      async holder, it wakes up any waiters for the event. The waiter only
      tests whether any of its requests are still pending.
      
      Since the glocks are sent to dlm asychronously, the wait function needs
      to check to see which glocks, if any, were granted.
      
      If a glock is granted by dlm (and therefore held), its minimum hold time
      is checked and adjusted as necessary, as other glock grants do.
      
      If the event times out, all glocks held thus far must be dequeued to
      resolve any existing deadlocks.  Then, if there are any outstanding
      locking requests, we need to loop around and wait for dlm to respond to
      those requests too.  After we release all requests, we return -ESTALE to
      the caller (vfs rename) which loops around and retries the request.
      
          Node1           Node2
          ---------       ---------
      1.  Enqueue A       Enqueue B
      2.  Enqueue B       Enqueue A
      3.  A granted
      6.                  B granted
      7.  Wait for B
      8.                  Wait for A
      9.                  A times out (since Node 1 holds A)
      10.                 Dequeue B (since it was granted)
      11.                 Wait for all requests from DLM
      12. B Granted (since Node2 released it in step 10)
      13. Rename
      14. Dequeue A
      15.                 DLM Grants A
      16.                 Dequeue A (due to the timeout and since we
                          no longer have B held for our task).
      17. Dequeue B
      18.                 Return -ESTALE to vfs
      19.                 VFS retries the operation, goto step 1.
      
      This release-all-locks / acquire-all-locks may slow rename / exchange
      down as both nodes struggle in the same way and do the same thing.
      However, this will only happen when there is contention for the same
      inodes, which ought to be rare.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      ad26967b
    • Andreas Gruenbacher's avatar
      gfs2: create function gfs2_glock_update_hold_time · 01123cf1
      Andreas Gruenbacher authored
      This patch moves the code that updates glock minimum hold
      time to a separate function. This will be called by a future
      patch.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      01123cf1
    • Bob Peterson's avatar
      gfs2: separate holder for rgrps in gfs2_rename · bc74aaef
      Bob Peterson authored
      Before this patch, gfs2_rename added a holder for the rgrp glock to
      its array of holders, ghs. There's nothing wrong with that, but this
      patch separates it into a separate holder. This is done to ensure
      it's always locked last as per the proper glock lock ordering,
      and also to pave the way for a future patch in which we will
      lock the non-rgrp glocks asynchronously.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      bc74aaef
    • Markus Elfring's avatar
      gfs2: Delete an unnecessary check before brelse() · bccaef90
      Markus Elfring authored
      The brelse() function tests whether its argument is NULL and then
      returns immediately.  Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      
      [The same applies to brelse() in gfs2_dir_no_add (which Coccinelle
      apparently missed), so fix that as well.]
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      bccaef90
    • Andreas Gruenbacher's avatar
      gfs2: Minor PAGE_SIZE arithmetic cleanups · 45eb0504
      Andreas Gruenbacher authored
      Replace divisions by PAGE_SIZE with shifts by PAGE_SHIFT and similar.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      45eb0504
  4. 03 Sep, 2019 4 commits
  5. 09 Aug, 2019 5 commits
  6. 05 Aug, 2019 1 commit
  7. 04 Aug, 2019 10 commits
  8. 03 Aug, 2019 13 commits