1. 26 Feb, 2010 20 commits
    • Joel Becker's avatar
      ocfs2_dlmfs: Use poll() to signify BASTs. · 65b6f340
      Joel Becker authored
      o2dlm's userspace filesystem is an easy way to use the DLM from
      userspace.  It is intentionally simple. For example, it does not allow
      for asynchronous behavior or lock conversion.  This is intentional to
      keep the interface simple.
      
      Because there is no asynchronous notification, there is no way for a
      process holding a lock to know another node needs the lock.  This is the
      number one complaint of ocfs2_dlmfs users.  Turns out, we can solve this
      very easily.  We add poll() support to ocfs2_dlmfs.  When a BAST is
      received, the lock's file descriptor will receive POLLIN.
      
      This is trivial to implement.  Userdlm already has an appropriate
      waitqueue, and the lock knows when it is blocked.
      
      We add the "bast" capability to tell userspace this is available.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Acked-by: default avatarMark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      65b6f340
    • Joel Becker's avatar
      ocfs2_dlmfs: Add capabilities parameter. · 14a437c2
      Joel Becker authored
      Over time, dlmfs has added some features that were not part of the
      initial ABI.  Unfortunately, some of these features are not detectable
      via standard usage.  For example, Linux's default poll always returns
      POLLIN, so there is no way for a caller of poll(2) to know when dlmfs
      added poll support.  Instead, we provide this list of new capabilities.
      
      Capabilities is a read-only attribute.  We do it as a module parameter
      so we can discover it whether dlmfs is built in, loaded, or even not
      loaded (via modinfo).
      
      The ABI features are local to this machine's dlmfs mount.  This is
      distinct from the locking protocol, which is concerned with inter-node
      interaction.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      14a437c2
    • Joel Becker's avatar
      ocfs2: Handle errors while setting external xattr values. · 399ff3a7
      Joel Becker authored
      ocfs2 can store extended attribute values as large as a single file.  It
      does this using a standard ocfs2 btree for the large value.  However,
      the previous code did not handle all error cases cleanly.
      
      There are multiple problems to have.
      
      1) We have trouble allocating space for a new xattr.  This leaves us
         with an empty xattr.
      2) We overwrote an existing local xattr with a value root, and now we
         have an error allocating the storage.  This leaves us an empty xattr.
         where there used to be a value.  The value is lost.
      3) We have trouble truncating a reused value.  This leaves us with the
         original entry pointing to the truncated original value.  The value
         is lost.
      4) We have trouble extending the storage on a reused value.  This leaves
         us with the original value safely in place, but with more storage
         allocated when needed.
      
      This doesn't consider storing local xattrs (values that don't require a
      btree).  Those only fail when the journal fails.
      
      Case (1) is easy.  We just remove the xattr we added.  We leak the
      storage because we can't safely remove it, but otherwise everything is
      happy.  We'll print a warning about the leak.
      
      Case (4) is easy.  We still have the original value in place.  We can
      just leave the extra storage attached to this xattr.  We return the
      error, but the old value is untouched.  We print a warning about the
      storage.
      
      Case (2) and (3) are hard because we've lost the original values.  In
      the old code, we ended up with values that could be partially read.
      That's not good.  Instead, we just wipe the xattr entry and leak the
      storage.  It stinks that the original value is lost, but now there isn't
      a partial value to be read.  We'll print a big fat warning.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      399ff3a7
    • Joel Becker's avatar
      ocfs2: Set inline xattr entries with ocfs2_xa_set() · 139fface
      Joel Becker authored
      ocfs2_xattr_ibody_set() is the only remaining user of
      ocfs2_xattr_set_entry().  ocfs2_xattr_set_entry() actually does two
      things: it calls ocfs2_xa_set(), and it initializes the inline xattrs.
      Initializing the inline space really belongs in its own call.
      
      We lift the initialization to ocfs2_xattr_ibody_init(), called from
      ocfs2_xattr_ibody_set() only when necessary.  Now
      ocfs2_xattr_ibody_set() can call ocfs2_xa_set() directly.
      ocfs2_xattr_set_entry() goes away.
      
      Another nice fact is that ocfs2_init_dinode_xa_loc() can trust
      i_xattr_inline_size.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      139fface
    • Joel Becker's avatar
      ocfs2: Set xattr block entries with ocfs2_xa_set() · d3981544
      Joel Becker authored
      ocfs2_xattr_block_set() calls into ocfs2_xattr_set_entry() with just the
      HAS_XATTR flag.  Most of the machinery of ocfs2_xattr_set_entry() is
      skipped.  All that really happens other than the call to ocfs2_xa_set()
      is making sure the HAS_XATTR flag is set on the inode.
      
      But HAS_XATTR should be set when we also set di->i_xattr_loc.  And
      that's done in ocfs2_create_xattr_block().  So let's move it there, and
      then ocfs2_xattr_block_set() can just call ocfs2_xa_set().
      
      While we're there, ocfs2_create_xattr_block() can take the set_ctxt for
      a smaller argument list.  It also learns to set HAS_XATTR_FL, because it
      knows for sure.  ocfs2_create_empty_xatttr_block() in the reflink path
      fakes a set_ctxt to call ocfs2_create_xattr_block().
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      d3981544
    • Joel Becker's avatar
      ocfs2: Let ocfs2_xa_prepare_entry() do space checks. · c5d95df5
      Joel Becker authored
      ocfs2_xattr_set_in_bucket() doesn't need to do its own hacky space
      checking.  Let's let ocfs2_xa_prepare_entry() (via ocfs2_xa_set()) do
      the more accurate work.  Whenever it doesn't have space,
      ocfs2_xattr_set_in_bucket() can try to get more space.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      c5d95df5
    • Joel Becker's avatar
      ocfs2: Gell into ocfs2_xa_set() · bca5e9bd
      Joel Becker authored
      ocfs2_xa_set() wraps the ocfs2_xa_prepare_entry()/ocfs2_xa_store_value()
      logic.  Both callers can now use the same routine.  ocfs2_xa_remove()
      moves directly into ocfs2_xa_set().
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      bca5e9bd
    • Joel Becker's avatar
      ocfs2: Allocation in ocfs2_xa_prepare_entry(), values in ocfs2_xa_store_value() · 73857ee0
      Joel Becker authored
      ocfs2_xa_prepare_entry() gets all the logic to add, remove, or modify
      external value trees.  Now, when it exits, the entry is ready to receive
      a value of any size.
      
      ocfs2_xa_remove() is added to handle the complete removal of an entry.
      It truncates the external value tree before calling
      ocfs2_xa_remove_entry().
      
      ocfs2_xa_store_inline_value() becomes ocfs2_xa_store_value().  It can
      store any value.
      
      ocfs2_xattr_set_entry() loses all the allocation logic and just uses
      these functions.  ocfs2_xattr_set_value_outside() disappears.
      
      ocfs2_xattr_set_in_bucket() uses these functions and makes
      ocfs2_xattr_set_entry_in_bucket() obsolete.  That goes away, as does
      ocfs2_xattr_bucket_set_value_outside() and
      ocfs2_xattr_bucket_value_truncate().
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      73857ee0
    • Joel Becker's avatar
      ocfs2: Teach ocfs2_xa_loc how to do its own journal work · cf2bc809
      Joel Becker authored
      We're going to want to make sure our buffers get accessed and dirtied
      correctly.  So have the xa_loc do the work.  This includes storing the
      inode on ocfs2_xa_loc.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      cf2bc809
    • Joel Becker's avatar
      ocfs2: Provide ocfs2_xa_fill_value_buf() for external value processing · 3fc12afa
      Joel Becker authored
      We use the ocfs2_xattr_value_buf structure to manage external values.
      It lets the value tree code do its work regardless of the containing
      storage.  ocfs2_xa_fill_value_buf() initializes a value buf from an
      ocfs2_xa_loc entry.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      3fc12afa
    • Joel Becker's avatar
      ocfs2: Handle value tree roots in ocfs2_xa_set_inline_value() · 9dc47400
      Joel Becker authored
      Previously the xattr code would send in a fake value, containing a tree
      root, to the function that installed name+value pairs.  Instead, we pass
      the real value to ocfs2_xa_set_inline_value(), and it notices that the
      value cannot fit.  Thus, it installs a tree root.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      9dc47400
    • Joel Becker's avatar
      ocfs2: Set the xattr name+value pair in one place · 69a3e539
      Joel Becker authored
      We create two new functions on ocfs2_xa_loc, ocfs2_xa_prepare_entry()
      and ocfs2_xa_store_inline_value().
      
      ocfs2_xa_prepare_entry() makes sure that the xl_entry field of
      ocfs2_xa_loc is ready to receive an xattr.  The entry will point to an
      appropriately sized name+value region in storage.  If an existing entry
      can be reused, it will be.  If no entry already exists, it will be
      allocated.  If there isn't space to allocate it, -ENOSPC will be
      returned.
      
      ocfs2_xa_store_inline_value() stores the data that goes into the 'value'
      part of the name+value pair.  For values that don't fit directly, this
      stores the value tree root.
      
      A number of operations are added to ocfs2_xa_loc_operations to support
      these functions.  This reflects the disparate behaviors of xattr blocks
      and buckets.
      
      With these functions, the overlapping ocfs2_xattr_set_entry_local() and
      ocfs2_xattr_set_entry_normal() can be replaced with a single call
      scheme.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      69a3e539
    • Joel Becker's avatar
      ocfs2: Wrap calculation of name+value pair size. · 199799a3
      Joel Becker authored
      An ocfs2 xattr entry stores the text name and value as a pair in the
      storage area.  Obviously names and values can be variable-sized.  If a
      value is too large for the entry storage, a tree root is stored instead.
      The name+value pair is also padded.
      
      Because of this, there are a million places in the code that do:
      
      	if (needs_external_tree(value_size)
      		namevalue_size = pad(name_size) + tree_root_size;
      	else
      		namevalue_size = pad(name_size) + pad(value_size);
      
      Let's create some convenience functions to make the code more readable.
      There are three forms.  The first takes the raw sizes.  The second takes
      an ocfs2_xattr_info structure.  The third takes an existing
      ocfs2_xattr_entry.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      199799a3
    • Joel Becker's avatar
      ocfs2: Add a name_len field to ocfs2_xattr_info. · 18853b95
      Joel Becker authored
      Rather than calculating strlen all over the place, let's store the
      name length directly on ocfs2_xattr_info.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      18853b95
    • Joel Becker's avatar
      ocfs2: Prefix the member fields of struct ocfs2_xattr_info. · 6b240ff6
      Joel Becker authored
      struct ocfs2_xattr_info is a useful structure describing an xattr
      you'd like to set.  Let's put prefixes on the member fields so it's
      easier to read and use.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      6b240ff6
    • Joel Becker's avatar
      ocfs2: Remove xattrs via ocfs2_xa_loc · bde1e540
      Joel Becker authored
      Add ocfs2_xa_remove_entry(), which will remove an xattr entry from its
      storage via the ocfs2_xa_loc descriptor.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      bde1e540
    • Joel Becker's avatar
      ocfs2: Introduce ocfs2_xa_loc · 11179f2c
      Joel Becker authored
      The ocfs2 extended attribute (xattr) code is very flexible.  It can
      store xattrs in the inode itself, in an external block, or in a tree of
      data structures.  This allows the number of xattrs to be bounded by the
      filesystem size.
      
      However, the code that manages each possible storage location is
      different.  Maintaining the ocfs2 xattr code requires changing each hunk
      separately.
      
      This patch is the start of a series introducing the ocfs2_xa_loc
      structure.  This structure wraps the on-disk details of an xattr
      entry.  The goal is that the generic xattr routines can use
      ocfs2_xa_loc without knowing the underlying storage location.
      
      This first pass merely implements the basic structure, initializing it,
      and wiping the name+value pair of the entry.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      11179f2c
    • Sunil Mushran's avatar
      ocfs2: Add current->comm in trace output · 8545e03d
      Sunil Mushran authored
      Add current->comm to the standard mlog() output to help with debugging.
      Signed-off-by: default avatarSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      8545e03d
    • Wengang Wang's avatar
      ocfs2: Clean up the checks for CoW and direct I/O. · 96a1cc73
      Wengang Wang authored
      When ocfs2 has to do CoW for refcounted extents, we disable direct I/O
      and go through the buffered I/O path.  This makes the combined check
      easier to read.
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      96a1cc73
    • Tiger Yang's avatar
      ocfs2: add extent block stealing for ocfs2 v5 · b89c5428
      Tiger Yang authored
      This patch add extent block (metadata) stealing mechanism for
      extent allocation. This mechanism is same as the inode stealing.
      if no room in slot specific extent_alloc, we will try to
      allocate extent block from the next slot.
      Signed-off-by: default avatarTiger Yang <tiger.yang@oracle.com>
      Acked-by: default avatarTao Ma <tao.ma@oracle.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      b89c5428
  2. 09 Feb, 2010 1 commit
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 · a5f28ae4
      Linus Torvalds authored
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
        ocfs2/cluster: Make o2net connect messages KERN_NOTICE
        ocfs2/dlm: Fix printing of lockname
        ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
        ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
        ocfs2: Plugs race between the dc thread and an unlock ast message
        ocfs2: Remove overzealous BUG_ON during blocked lock processing
        ocfs2: Do not downconvert if the lock level is already compatible
        ocfs2: Prevent a livelock in dlmglue
        ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
        ocfs2: Use compat_ptr in reflink_arguments.
        ocfs2/dlm: Handle EAGAIN for compatibility - v2
        ocfs2: Add parenthesis to wrap the check for O_DIRECT.
        ocfs2: Only bug out when page size is larger than cluster size.
        ocfs2: Fix memory overflow in cow_by_page.
        ocfs2/dlm: Print more messages during lock migration
        ocfs2/dlm: Ignore LVBs of locks in the Blocked list
        ocfs2/trivial: Remove trailing whitespaces
        ocfs2: fix a misleading variable name
        ocfs2: Sync max_inline_data_with_xattr from tools.
        ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path
      a5f28ae4
  3. 08 Feb, 2010 16 commits
  4. 07 Feb, 2010 3 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 6339204e
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
        Take ima_file_free() to proper place.
        ima: rename PATH_CHECK to FILE_CHECK
        ima: rename ima_path_check to ima_file_check
        ima: initialize ima before inodes can be allocated
        fix ima breakage
        Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
        freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb
        befs: fix leak
      6339204e
    • Linus Torvalds's avatar
      Fix race in tty_fasync() properly · 80e1e823
      Linus Torvalds authored
      This reverts commit 70362511 ("tty: fix race in tty_fasync") and
      commit b04da8bf ("fnctl: f_modown should call write_lock_irqsave/
      restore") that tried to fix up some of the fallout but was incomplete.
      
      It turns out that we really cannot hold 'tty->ctrl_lock' over calling
      __f_setown, because not only did that cause problems with interrupt
      disables (which the second commit fixed), it also causes a potential
      ABBA deadlock due to lock ordering.
      
      Thanks to Tetsuo Handa for following up on the issue, and running
      lockdep to show the problem.  It goes roughly like this:
      
       - f_getown gets filp->f_owner.lock for reading without interrupts
         disabled, so an interrupt that happens while that lock is held can
         cause a lockdep chain from f_owner.lock -> sighand->siglock.
      
       - at the same time, the tty->ctrl_lock -> f_owner.lock chain that
         commit 70362511 introduced, together with the pre-existing
         sighand->siglock -> tty->ctrl_lock chain means that we have a lock
         dependency the other way too.
      
      So instead of extending tty->ctrl_lock over the whole __f_setown() call,
      we now just take a reference to the 'pid' structure while holding the
      lock, and then release it after having done the __f_setown.  That still
      guarantees that 'struct pid' won't go away from under us, which is all
      we really ever needed.
      Reported-and-tested-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      Acked-by: default avatarAmérico Wang <xiyou.wangcong@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80e1e823
    • Al Viro's avatar
      Take ima_file_free() to proper place. · 89068c57
      Al Viro authored
      Hooks: Just Say No.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      89068c57