1. 18 Jun, 2003 14 commits
    • Andrew Morton's avatar
      [PATCH] JBD: Implement b_next_transaction locking rules · e87dd8c3
      Andrew Morton authored
      Go through all b_next_transaction instances, implement locking rules.
      (Nothing to do here - b_transaction locking covered it)
      e87dd8c3
    • Andrew Morton's avatar
      [PATCH] JBD: implement b_transaction locking rules · e821ceb2
      Andrew Morton authored
      Go through all use of b_transaction and implement the rules.
      
      Fairly straightforward.
      e821ceb2
    • Andrew Morton's avatar
      [PATCH] JBD: implement b_committed_data locking · b07da5e5
      Andrew Morton authored
      Implement the designed locking schema around the
      journal_head.b_committed_data field.
      b07da5e5
    • Andrew Morton's avatar
      [PATCH] JBD: Finish protection of journal_head.b_frozen_data · 990aef1a
      Andrew Morton authored
      We now start to move across the JBD data structure's fields, from "innermost"
      and outwards.
      
      Start with journal_head.b_frozen_data, because the locking for this field was
      partially implemented in jbd-010-b_committed_data-race-fix.patch.
      
      It is protected by jbd_lock_bh_state().  We keep the lock_journal() and
      spin_lock(&journal_datalist_lock) calls in place.  Later,
      spin_lock(&journal_datalist_lock) is replaced by
      spin_lock(&journal->j_list_lock).
      
      Of course, this completion of the locking around b_frozen_data also puts a
      lot of the locking for other fields in place.
      990aef1a
    • Andrew Morton's avatar
      [PATCH] JBD: rename journal_unlock_journal_head to · eacf9510
      Andrew Morton authored
      journal_unlock_journal_head() is misnamed: what it does is to drop a ref on
      the journal_head and free it if that ref fell to zero.  It doesn't actually
      unlock anything.
      
      Rename it to journal_put_journal_head().
      eacf9510
    • Andrew Morton's avatar
      [PATCH] JBD: fine-grain journal_add_journal_head locking · 1c69516f
      Andrew Morton authored
      buffer_heads and journal_heads are joined at the hip.  We need a lock to
      protect the joint and its refcounts.
      
      JBD is currently using a global spinlock for that.  Change it to use one bit
      in bh->b_state.
      1c69516f
    • Andrew Morton's avatar
      [PATCH] JBD: remove jh_splice_lock · 6fe2ab38
      Andrew Morton authored
      This was a strange spinlock which was designed to prevent another CPU from
      ripping a buffer's journal_head away while this CPU was inspecting its state.
      
      Really, we don't need it - we can inspect that state directly from bh->b_state.
      
      So kill it off, along with a few things which used it which are themselves
      not actually used any more.
      6fe2ab38
    • Andrew Morton's avatar
      [PATCH] JBD: plan JBD locking schema · 13d8498a
      Andrew Morton authored
      This is the start of the JBD locking rework.
      
      The aims of all this are to remove all lock_kernel() calls from JBD, to
      remove all lock_journal() calls (the context switch rate is astonishing when
      the lock_kernel()s are removed) and to remove all sleep_on() instances.
      
      
      
      
      The strategy which is taken is:
      
      a) Define the lcoking schema (this patch)
      
      b) Work through every JBD data structure and implement its locking fully,
         according to the above schema.  We work from "innermost" data structures
         and outwards.
      
      It isn't guaranteed that the filesystem will work very well at all stages of
      this patch series.
      
      
      
      In this patch:
      
      
      Add commentary and various locks to jbd.h describing the locking scheme which
      is about to be implemented.
      
      Initialise the new locks.
      
      Coding-style goodness in jbd.h
      13d8498a
    • Andrew Morton's avatar
      [PATCH] JBD: fix race over access to b_committed_data · 47bb09d8
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      We have a race wherein the block allocator can decide that
      journal_head.b_committed_data is present and then will use it.  But kjournald
      can concurrently free it and set the pointer to NULL.  It goes oops.
      
      We introduce per-buffer_head "spinlocking" based on a bit in b_state.  To do
      this we abstract out pte_chain_lock() and reuse the implementation.
      
      The bit-based spinlocking is pretty inefficient CPU-wise (hence the warning
      in there) and we may move this to a hashed spinlock later.
      47bb09d8
    • Andrew Morton's avatar
      [PATCH] ext3: scalable counters and locks · 17aff938
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      This is a port from ext2 of the fuzzy counters (for Orlov allocator
      heuristics) and the hashed spinlocking (for the inode and bloock allocators).
      17aff938
    • Andrew Morton's avatar
      [PATCH] ext3: concurrent block/inode allocation · c12b9866
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      
      This patch weans ext3 off lock_super()-based protection for the inode and
      block allocators.
      
      It's basically the same as the ext2 changes.
      
      
      1) each group has own spinlock, which is used for group counter
         modifications
      
      2) sb->s_free_blocks_count isn't used any more.  ext2_statfs() and
         find_group_orlov() loop over groups to count free blocks
      
      3) sb->s_free_blocks_count is recalculated at mount/umount/sync_super time
         in order to check consistency and to avoid fsck warnings
      
      4) reserved blocks are distributed over last groups
      
      5) ext3_new_block() tries to use non-reserved blocks and if it fails then
         tries to use reserved blocks
      
      6) ext3_new_block() and ext3_free_blocks do not modify sb->s_free_blocks,
         therefore they do not call mark_buffer_dirty() for superblock's
         buffer_head. this should reduce I/O a bit
      
      
      Also fix orlov allocator boundary case:
      
      In the interests of SMP scalability the ext2 free blocks and free inodes
      counters are "approximate".  But there is a piece of code in the Orlov
      allocator which fails due to boundary conditions on really small
      filesystems.
      
      Fix that up via a final allocation pass which simply uses first-fit for
      allocatiopn of a directory inode.
      c12b9866
    • Andrew Morton's avatar
      [PATCH] JBD: journal_get_write_access() speedup · 78f2f471
      Andrew Morton authored
      Move some lock_kernel() calls from the caller to the callee, reducing
      holdtimes.
      78f2f471
    • Andrew Morton's avatar
      [PATCH] ext3: move lock_kernel() down into the JBD layer. · 3307fbd1
      Andrew Morton authored
      This is the start of the ext3 scalability rework.  It basically comes in two
      halves:
      
      - ext3 BKL/lock_super removal and scalable inode/block allocators
      
      - JBD locking rework.
      
      The ext3 scalability work was completed a couple of months ago.
      
      The JBD rework has been stable for a couple of weeks now.  My gut feeling is
      that there should be one, maybe two bugs left in it, but no problems have
      been discovered...
      
      
      Performance-wise, throughput is increased by up to 2x on dual CPU.  10x on
      16-way has been measured.  Given that current ext3 is able to chew two whole
      CPUs spinning on locks on a 4-way, that wasn't especially suprising.
      
      These patches were prepared by Alex Tomas <bzzz@tmi.comex.ru> and myself.
      
      
      First patch: ext3 lock_kernel() removal.
      
      The only reason why ext3 takes lock_kernel() is because it is requires by the
      JBD API.
      
      The patch removes the lock_kernels() from ext3 and pushes them down into JBD
      itself.
      3307fbd1
    • Linus Torvalds's avatar
      Merge http://lia64.bkbits.net/to-linus-2.5 · 0d0d8534
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      0d0d8534
  2. 17 Jun, 2003 26 commits