1. 17 Jul, 2013 1 commit
  2. 12 Jul, 2013 8 commits
    • Kent Overstreet's avatar
      bcache: Allocation kthread fixes · 79826c35
      Kent Overstreet authored
      The alloc kthread should've been using try_to_freeze() - and also there
      was the potential for the alloc kthread to get woken up after it had
      shut down, which would have been bad.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      79826c35
    • Kent Overstreet's avatar
      bcache: Fix GC_SECTORS_USED() calculation · 29ebf465
      Kent Overstreet authored
      Part of the job of garbage collection is to add up however many sectors
      of live data it finds in each bucket, but that doesn't work very well if
      it doesn't reset GC_SECTORS_USED() when it starts. Whoops.
      
      This wouldn't have broken anything horribly, but allocation tries to
      preferentially reclaim buckets that are mostly empty and that's not
      gonna work with an incorrect GC_SECTORS_USED() value.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      29ebf465
    • Kent Overstreet's avatar
      bcache: Journal replay fix · faa56736
      Kent Overstreet authored
      The journal replay code starts by finding something that looks like a
      valid journal entry, then it does a binary search over the unchecked
      region of the journal for the journal entries with the highest sequence
      numbers.
      
      Trouble is, the logic was wrong - journal_read_bucket() returns true if
      it found journal entries we need, but if the range of journal entries
      we're looking for loops around the end of the journal - in that case
      journal_read_bucket() could return true when it hadn't found the highest
      sequence number we'd seen yet, and in that case the binary search did
      the wrong thing. Whoops.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      faa56736
    • Kent Overstreet's avatar
      bcache: Shutdown fix · 5caa52af
      Kent Overstreet authored
      Stopping a cache set is supposed to make it stop attached backing
      devices, but somewhere along the way that code got lost. Fixing this
      mainly has the effect of fixing our reboot notifier.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      5caa52af
    • Kent Overstreet's avatar
      bcache: Fix a sysfs splat on shutdown · c9502ea4
      Kent Overstreet authored
      If we stopped a bcache device when we were already detaching (or
      something like that), bcache_device_unlink() would try to remove a
      symlink from sysfs that was already gone because the bcache dev kobject
      had already been removed from sysfs.
      
      So keep track of whether we've removed stuff from sysfs.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      c9502ea4
    • Kent Overstreet's avatar
      bcache: Advertise that flushes are supported · 54d12f2b
      Kent Overstreet authored
      Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered
      out unless we say we support them...
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      54d12f2b
    • Dan Carpenter's avatar
      bcache: check for allocation failures · d2a65ce2
      Dan Carpenter authored
      There is a missing NULL check after the kzalloc().
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      d2a65ce2
    • Kent Overstreet's avatar
      bcache: Fix a dumb race · 6aa8f1a6
      Kent Overstreet authored
      In the far-too-complicated closure code - closures can have destructors,
      for probably dubious reasons; they get run after the closure is no
      longer waiting on anything but before dropping the parent ref, intended
      just for freeing whatever memory the closure is embedded in.
      
      Trouble is, when remaining goes to 0 and we've got nothing more to run -
      we also have to unlock the closure, setting remaining to -1. If there's
      a destructor, that unlock isn't doing anything - nobody could be trying
      to lock it if we're about to free it - but if the unlock _is needed...
      that check for a destructor was racy. Argh.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      6aa8f1a6
  3. 02 Jul, 2013 2 commits
  4. 01 Jul, 2013 5 commits
  5. 28 Jun, 2013 7 commits
    • Philipp Reisner's avatar
      drbd: Allow online change of al-stripes and al-stripe-size · d752b269
      Philipp Reisner authored
      Allow to change the AL layout with an resize operation. For that
      the reisze command gets two new fields: al_stripes and al_stripe_size.
      
      In order to make the operation crash save:
      1) Lock out all IO and MD-IO
      2) Write the super block with MDF_PRIMARY_IND clear
      3) write the bitmap to the new location (all zeros, since
         we allow only while connected)
      4) Initialize the new AL-area
      5) Write the super block with the restored MDF_PRIMARY_IND.
      6) Unfreeze all IO
      
      Since the AL-layout has no influence on the protocol, this operation
      needs to be beforemed on both sides of a resource (if intended).
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@linbit.com>
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d752b269
    • Philipp Reisner's avatar
      e96c9633
    • Philipp Reisner's avatar
      drbd: Ignore the exit code of a fence-peer handler if it returns too late · 28e448bb
      Philipp Reisner authored
      In case the connection was established and lost again before
      the a fence-peer handler returns, ignore the exit code of this
      instance. (And use the exit code of the later started instance)
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@linbit.com>
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      28e448bb
    • Andreas Gruenbacher's avatar
    • Wei Yongjun's avatar
      drbd: fix error return code in drbd_init() · 6110d70b
      Wei Yongjun authored
      Fix to return a negative error code from the error handling
      case instead of 0, as returned elsewhere in this function.
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@linbit.com>
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6110d70b
    • Andreas Gruenbacher's avatar
      drbd: Do not sleep inside rcu · 26ea8f92
      Andreas Gruenbacher authored
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26ea8f92
    • Jens Axboe's avatar
      Merge branch 'stable/for-jens-3.10' of... · f35546e0
      Jens Axboe authored
      Merge branch 'stable/for-jens-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.11/drivers
      
      Konrad writes:
      
      It has the 'feature-max-indirect-segments' implemented in both backend
      and frontend. The current problem with the backend and frontend is that the
      segment size is limited to 11 pages. It means we can at most squeeze in 44kB per
      request. The ring can hold 32 (next power of two below 36) requests, meaning we
      can do 1.4M of outstanding requests. Nowadays that is not enough.
      
      The problem in the past was addressed in two ways - but neither one went upstream.
      The first solution to this proposed by Justin from Spectralogic was to negotiate
      the segment size.  This means that the ‘struct blkif_sring_entry’ is now a variable size.
      It can expand from 112 bytes (cover 11 pages of data - 44kB) to 1580 bytes
      (256 pages of data - so 1MB). It is a simple extension by just making the array in the
      request expand from 11 to a variable size negotiated. But it had limits: this extension
      still limits the number of segments per request to 255 (as the total number must be
      specified in the request, which only has an 8-bit field for that purpose).
      
      The other solution (from Intel - Ronghui) was to create one extra ring that only has the
      ‘struct blkif_request_segment’ in them. The ‘struct blkif_request’ would be changed to have
      an index in said ‘segment ring’. There is only one segment ring. This means that the size of
      the initial ring is still the same. The requests would point to the segment and enumerate out
      how many of the indexes it wants to use. The limit is of course the size of the segment.
      If one assumes a one-page segment this means we can in one request cover ~4MB.
      
      Those patches were posted as RFC and the author never followed up on the ideas on changing
      it to be a bit more flexible.
      
      There is yet another mechanism that could be employed  (which these patches implement) - and it
      borrows from VirtIO protocol. And that is the ‘indirect descriptors’. This very similar to
      what Intel suggests, but with a twist. The twist is to negotiate how many of these
      'segment' pages (aka indirect descriptor pages) we want to support (in reality we negotiate
      how many entries in the segment we want to cover, and we module the number if it is
      bigger than the segment size).
      
      This means that with the existing 36 slots in the ring (single page) we can cover:
      32 slots * each blkif_request_indirect covers: 512 * 4096 ~= 64M. Since we ample space
      in the blkif_request_indirect to span more than one indirect page, that number (64M)
      can be also multiplied by eight = 512MB.
      
      Roger Pau Monne took the idea and implemented them in these patches. They work
      great and the corner cases (migration between backends with and without this extension)
      work nicely. The backend has a limit right now off how many indirect entries
      it can handle: one indirect page, and at maximum 256 entries (out of 512 - so  50% of the page
      is used). That comes out to 32 slots * 256 entries in a indirect page * 1 indirect page
      per request * 4096 = 32MB.
      
      This is a conservative number that can change in the future. Right now it strikes
      a good balance between giving excellent performance, memory usage in the backend, and
      balancing the needs of many guests.
      
      In the patchset there is also the split of the blkback structure to be per-VBD.
      This means that the spinlock contention we had with many guests trying to do I/O and
      all the blkback threads hitting the same lock has been eliminated.
      
      Also there are bug-fixes to deal with oddly sized sectors, insane amounts on
      th ring, and also a security fix (posted earlier).
      f35546e0
  6. 27 Jun, 2013 15 commits
  7. 25 Jun, 2013 1 commit
  8. 22 Jun, 2013 1 commit