1. 02 May, 2013 40 commits
    • Alex Elder's avatar
      rbd: remove parent devices on probe error · 2e93bf9e
      Alex Elder authored
      When an error occurs while finishing probing a device it is assumed
      that parent devices get cleaned up when deleting a device.  They
      don't.  Add a call to clean them up.  Note that this means the
      parent spec will already be cleaned up so it doesn't have to be
      in one of the rbd_add() error paths.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      2e93bf9e
    • Alex Elder's avatar
      rbd: fix rbd_dev_remove_parent() · ad945fc1
      Alex Elder authored
      In certain error paths, it is possible for an rbd device to have a
      parent spec but no parent rbd_dev.  In rbd_dev_remove_parent() use
      the parent field rather than parent_spec in determining whether to
      try to remove any parent devices.  Use assertions to indicate that
      any non-null parent pointer has parent_spec associated with it.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      ad945fc1
    • Alex Elder's avatar
      rbd: kill __rbd_remove() · b480815a
      Alex Elder authored
      The function __rbd_remove() is used in two spots, and it's fairly
      simple.  It combines cleanup of part of the ceph-side state as well
      as cleaning up the Linux-side state.  Just open code it in the two
      callers and eliminate the function.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      b480815a
    • Alex Elder's avatar
      rbd: set mapping info earlier · d1cf5788
      Alex Elder authored
      Set the mapping size and features earlier in rbd_dev_probe_finish().
      
      Define rbd_dev_mapping_clear() as an inverse for setting those
      fields, and use it both in error handling in rbd_dev_image_probe()
      and in the final cleanup in rbd_dev_release().  Change the name
      of rbd_dev_set_mapping() to of rbd_dev_mapping_set().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      d1cf5788
    • Alex Elder's avatar
      rbd: encapsulate removing parent devices · 05a46afd
      Alex Elder authored
      Encapsulate the code that removes an rbd device's parent images into
      a new function, rbd_dev_remove_parent().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      05a46afd
    • Alex Elder's avatar
      rbd: encapsulate probing for parent devices · 124afba2
      Alex Elder authored
      Encapsulate the code that probes for an rbd device's parent images
      into a new function, rbd_dev_probe_parent().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      124afba2
    • Alex Elder's avatar
      rbd: defer setting disk capacity · b5156e76
      Alex Elder authored
      Don't set the disk capacity until right before we announce the
      device as available for use.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      b5156e76
    • Alex Elder's avatar
      rbd: only set device exists flag when ready · 129b79d4
      Alex Elder authored
      Hold off setting the EXISTS rbd device flag until just before we
      announce the disk as available for use.  There's no point in doing
      so any earlier than that, and at that point the device truly is
      fully set up and ready to use.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      129b79d4
    • Alex Elder's avatar
      rbd: fix up some sysfs stuff · fc71d833
      Alex Elder authored
      This just tweaks a few things in the routines that implement
      rbd sysfs files.
      
      All of the entries for an rbd device in /sys/bus/rbd/devices/<id>/
      will represent information whose valid values are known by the time
      they are accessible.
      
      Right now we get the size of the mapped image by a call to
      get_capacity().  There's no need to do this, because that will
      return what we last set the capacity to, which is just the size
      recorded for the mapping.  So just show that value instead.
      
      We also get this under protection of the header semaphore, in order
      to provide a precisely correct value.  This isn't really necessary;
      these files are really informational only and it's not necessary to
      be so careful.
      
      Finally, print a special value in case the major device number is
      not recorded.  Right now that won't matter much but soon the parent
      images won't have devices associated with them.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      fc71d833
    • Alex Elder's avatar
      rbd: fix a bug in resizing a mapping · e28626a0
      Alex Elder authored
      When a snapshot context update occurs, rbd_update_mapping_size() is
      called to set the capacity of the disk to record the updated
      size of the image in case it has changed.
      
      There's a bug though.  The mapping size is in units of *bytes*.  The
      code that updates the mapping size field is assigning a value that
      has been scaled down to *sectors*.
      
      Fix that.  Also, check to see if the size has actually changed, and
      don't bother updating things (specifically, calling set_capacity())
      if it has not.
      
      This resolves:
          http://tracker.ceph.com/issues/4833Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e28626a0
    • Alex Elder's avatar
      rbd: refactor rbd_dev_probe_update_spec() · 2e9f7f1c
      Alex Elder authored
      Fairly straightforward refactoring of rbd_dev_probe_update_spec().
      The name is changed to rbd_dev_spec_update().
      
      Rearrange it so nothing gets assigned to the spec until all of the
      names have been successfully acquired.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      2e9f7f1c
    • Alex Elder's avatar
      rbd: rename rbd_dev_probe() · 71f293e2
      Alex Elder authored
      Rename rbd_dev_probe() to be rbd_dev_image_probe().  Its purpose
      will eventually be to probe for the existence of a valid rbd image
      for the rbd device--focusing only on the ceph side and not the Linux
      device side of initialization.
      
      For now the two "sides" are not fully separated, and this function
      is still the entry point for initializing the full rbd device.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      71f293e2
    • Alex Elder's avatar
      rbd: make rbd_dev_destroy() match rbd_dev_create() · 9f5dffdc
      Alex Elder authored
      Currently, rbd_dev_destroy() does more than just the inverse of what
      rbd_dev_create() does.  Stop doing that, and move the two extra
      things it does into the three call sites.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      9f5dffdc
    • Alex Elder's avatar
      rbd: define rbd snap context routines · 468521c1
      Alex Elder authored
      Encapsulate the creation of a snapshot context for rbd in a new
      function rbd_snap_context_create().  Define rbd wrappers for getting
      and dropping references to them once they're created.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      468521c1
    • Alex Elder's avatar
      rbd: use rbd_warn(), not WARN_ON() · c0cd10db
      Alex Elder authored
      Change some calls to WARN_ON() so they use rbd_warn() instead, so we
      get consistent messaging.  A few remain but they can probably just
      go away eventually.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c0cd10db
    • Alex Elder's avatar
      rbd: move stripe_unit and stripe_count into header · 500d0c0f
      Alex Elder authored
      This commit added fetching if fancy striping parameters:
          09186ddb rbd: get and check striping parameters
      
      They are almost unused, but the two fields storing the information
      really belonged in the rbd_image_header structure.
      
      This patch moves them there.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      500d0c0f
    • Alex Elder's avatar
      rbd: make rbd spec names pointer to const · ecb4dc22
      Alex Elder authored
      Make the names and image id in an rbd_spec be pointers to constant
      data.  This required the use of a local variable to hold the
      snapshot name in rbd_add_parse_args() to avoid a warning.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      ecb4dc22
    • Alex Elder's avatar
      rbd: set snapshot id in rbd_dev_probe_update_spec() · e1d4213f
      Alex Elder authored
      Set the rbd spec's snapshot id for an image getting mapped in
      rbd_dev_probe_update_spec() rather than rbd_dev_set_mapping().
      This is the more logical place for that to happen (even though
      it means we might look up the snapshot by name twice).
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e1d4213f
    • Alex Elder's avatar
      rbd: have snap_by_name() return a snapshot · 8b0241f8
      Alex Elder authored
      A function called snap_by_name() ought to just look up a snapshot by
      name.  It does that, but then it assigns some stuff to the rbd
      device structure as well.
      
      Change the function to do just the lookup, and have the caller do
      the assignments that follow.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      8b0241f8
    • Alex Elder's avatar
      rbd: fix image id leak in initial probe · 5655c4d9
      Alex Elder authored
      If a format 2 image id is found for an image being mapped, but the
      subsequent probe of the image fails, rbd_dev_probe() quits without
      freeing the image id.  Fix that.
      
      Also drop a redundant hunk of code in rbd_dev_image_id().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      5655c4d9
    • Alex Elder's avatar
      rbd: have rbd_dev_image_id() set format 1 image id · c0fba368
      Alex Elder authored
      Currently, rbd_dev_probe() assumes that any error returned by
      rbd_dev_image_id() is most likely -ENOENT, and responds by
      calling the format 1 probe routine, rbd_dev_v1_probe().  Then,
      at the top of rbd_dev_v1_probe(), an empty string is allocated
      for the image id.
      
      This is sort of unbalanced.  Fix this by having rbd_dev_image_id()
      look for -ENOENT from its "get_id" method call.  If that is seen,
      have it allocate the empty string there rather than depending on
      rbd_dev_v1_probe() to do it.
      
      Given that this is effectively defining the format of the image,
      set rbd_dev->image_format inside rbd_dev_image_id() rather than in
      the format-specific probe routines.
      
      Also drop a redundant hunk of code in rbd_dev_image_id().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c0fba368
    • Alex Elder's avatar
      rbd: avoid dropping extra reference in rbd_free_disk() · a0cab924
      Alex Elder authored
      I found during some failure injection testing that the call to
      rbd_free_disk() in the error path of rbd_dev_probe_finish() was
      dropping an extra reference to the disk queue.  The problem
      occurred when put_disk tried to drop a reference to the disk's
      queue.  A call to blk_cleanup_queue() just prior to that will have
      also dropped a reference to the queue.
      
      The problem is that the reference dropped by put_disk() is assumed
      to have been taken by add_disk().  Our code has error paths that can
      occur after the disk and its queue are initialized, but before the
      call to add_disk(), and in those paths we won't have that extra
      reference.
      
      The fix is easy though.  In rbd_free_disk() we're already checking
      the disk's GENHD_FL_UP flag.  That flag is an indication that
      add_disk() has been called, so just call blk_cleanup_queue()
      conditional on that flag being set.
      
      This resolves:
          http://tracker.ceph.com/issues/4800Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      a0cab924
    • Alex Elder's avatar
      rbd: use rbd_obj_method_sync() return value · f40eb349
      Alex Elder authored
      Now that rbd_obj_method_sync() returns the number of bytes
      returned by the method call, that value should be used by
      callers to ensure we don't overrun the valid portion of the
      buffer.
      
      Fix the two spots that remained that weren't doing that,
      rbd_dev_image_name() and rbd_dev_v2_snap_name().
      
      Rearrange the error path slightly in rbd_dev_v2_snap_name().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      f40eb349
    • Alex Elder's avatar
      rbd: fix leak of format 2 snapshot names · 6e584f52
      Alex Elder authored
      When the snapshot context for an rbd device gets updated (or the
      initial one is recorded) a a list of snapshot structures is created
      to represent them, one entry per snapshot.  Each entry includes a
      dynamically-allocated copy of the snapshot name.
      
      Currently the name is allocated in rbd_snap_create(), as a duplicate
      of the passed-in name.
      
      For format 1 images, the snapshot name provided is just a pointer to
      an existing name.  But for format 2 images, the passed-in name is
      already dynamically allocated, and in the the process of duplicating
      it here we are leaking the passed-in name.
      
      Fix this by dynamically allocating the name for format 1 snapshots
      also, and then stop allocating a duplicate in rbd_snap_create().
      
      Change rbd_dev_v1_snap_info() so none of its parameters is
      side-effected unless it's going to return success.
      
      This is part of:
          http://tracker.ceph.com/issues/4803Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      6e584f52
    • Alex Elder's avatar
      rbd: rename __rbd_add_snap_dev() · 6087b51b
      Alex Elder authored
      Rename __rbd_add_snap_dev() to be rbd_snap_create().  We no longer
      have devices for non-mapped snapshots, and we're not actually
      "adding" it to the list in this function, just creating it.
      
      Rename rbd_remove_snap_dev() to be rbd_snap_destroy() for reasons
      similar to the above.  Stop having this function delete the snapshot
      from its list (to be symmetrical with its create counterpart) and do
      that in the caller instead.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      6087b51b
    • Alex Elder's avatar
      rbd: only update values on snap_info success · acb1b6ca
      Alex Elder authored
      Change rbd_dev_v2_snap_info() so it only ever sets values of the
      size and features parameters if looking up the snapshot name was
      successful.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      acb1b6ca
    • Alex Elder's avatar
      rbd: make snap_size order parameter optional · c86f86e9
      Alex Elder authored
      Only one of the two callers of _rbd_dev_v2_snap_size() needs the
      order value returned.  So make that an optional argument--a null
      pointer if the caller doesn't need it.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c86f86e9
    • Alex Elder's avatar
      rbd: fix leak of snapshots during initial probe · 522a0cc0
      Alex Elder authored
      When an rbd image is initially mapped, its snapshot context is
      collected, and then a list of snapshot entries representing the
      snapshots in that context is created.  The list is created using
      rbd_dev_snaps_update().  (This function also supports updating an
      existing snapshot list based on a new snapshot context.)
      
      If an error occurs, updating the list is aborted, and the list is
      currently left as-is, in an inconsistent state.  At that point,
      there may be a partially-constructed list, but the calling functions
      (rbd_dev_probe_finish() from rbd_dev_probe() from rbd_add()) never
      clean them up.  So this constitutes a leak.
      
      A snapshot list that is inconsistent with the current snapshot
      context is of no use, and might even be actively bad.  So rather
      than just having the caller clean it up, have rbd_dev_snaps_update()
      just clear out the entire snapshot list in the event an error
      occurs.
      
      The other place rbd_dev_snaps_update() is used is when a refresh is
      triggered, either because of a watch callback or via a write to the
      /sys/bus/rbd/devices/<id>/refresh interface.  An error while
      updating the snapshots has no substantive effect in either of those
      cases, but one of them issues a warning.  Move that warning to the
      common rbd_dev_refresh() function so it gets issued regardless of
      how it got initiated.
      
      This is part of:
          http://tracker.ceph.com/issues/4803Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      522a0cc0
    • Alex Elder's avatar
      rbd: don't create sysfs entries for non-mapped snapshots · 3e83b65b
      Alex Elder authored
      When an rbd image gets mapped a device entry gets created for it
      under /sys/bus/rbd/devices/<id>/.  Inside that directory there are
      sysfs files that contain information about the image: its size,
      feature bits, major device number, and so on.
      
      Additionally, if that image has any snapshots, a device entry gets
      created for each of those as a "child" of the mapped device.  Each
      of these is a subdirectory of the mapped device, and each directory
      contains a few files with information about the snapshot (its
      snapshot id, size, and feature mask).
      
      There is no clear benefit to having those device entries for the
      snapshots.  The information provided via sysfs of of little real
      value--and all of it is available via rbd CLI commands.  If we
      still wanted to see the kernel's view of this information it could
      be done much more simply by including it in a single sysfs file for
      the mapped image.
      
      But there *is* a clear cost to supporting them.  Every time a snapshot
      context changes, these entries need to be updated (deleted snapshots
      removed, new snapshots created).  The rbd driver is notified of
      changes to the snapshot context via callbacks from an osd, and care
      must be taken to coordinate removal of snapshot data structures
      with the possibility of one these notifications occurring.
      
      Things would be considerably simpler if we just didn't have to
      maintain device entries for the snapshots.
      
      So get rid of them.
      
      The ability to map a snapshot of an rbd image will remain; the only
      thing lost will be the ability to query these sysfs directories for
      information about snapshots of mapped images.
      
      This resolves:
          http://tracker.ceph.com/issues/4796Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      3e83b65b
    • Alex Elder's avatar
      libceph: fix byte order mismatch · 9ef1ee5a
      Alex Elder authored
      A WATCH op includes an object version.  The version that's supplied
      is incorrectly byte-swapped osd_req_op_watch_init() where it's first
      assigned (it's been this way since that code was first added).
      
      The result is that the version sent to the osd is wrong, because
      that value gets byte-swapped again in osd_req_encode_op().  This
      is the source of a sparse warning related to improper byte order in
      the assignment.
      
      The approach of using the version to avoid a race is deprecated
      (see http://tracker.ceph.com/issues/3871), and the watch parameter
      is no longer even examined by the osd.  So fix the assignment in
      osd_req_op_watch_init() so it no longer does the byte swap.
      
      This resolves:
          http://tracker.ceph.com/issues/3847Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      9ef1ee5a
    • Alex Elder's avatar
      rbd: activate support for layered images · 770eba6e
      Alex Elder authored
      Now that we have most everything in place to support layered rbd
      images, enable support for them in the kernel client.  Issue a
      warning to the log that the support is considered experimental
      whenever a format 2 layered image is mapped.
      
      Note that we also have to claim to support the STRIPINGV2 feature,
      due to a mistake in the way the rbd CLI set up those flags.  This
      feature can work if it has the right parameters, and safeguards
      have been put in place to reject those images that do not have
      compatible parameters.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      770eba6e
    • Alex Elder's avatar
      rbd: get and check striping parameters · cc070d59
      Alex Elder authored
      If an rbd format 2 image indicates it supports the STRIPINGV2
      feature we need to find out its stripe unit and stripe count in
      order to know whether we can use it.  We don't yet support fancy
      striping fully, but if the default parameters are used the behavior
      is indistinguishible from non-fancy striping.
      
      This is necessary because some images require the STRIPINGV2 feature
      even if they use the default parameters.  (Which is to say the feature
      bit was erroneously set even if the feature was not used.)
      
      This resolves:
          http://tracker.ceph.com/issues/4709Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      cc070d59
    • Alex Elder's avatar
      rbd: have rbd_obj_method_sync() return transfer count · 57385b51
      Alex Elder authored
      Callers of rbd_obj_method_sync() don't know how many bytes of data
      got returned by the class method call.  As a result, they have been
      assuming enough got returned to decode whatever was expected.
      
      This isn't safe.  We know how many bytes got transferred, so have
      rbd_obj_method_sync() return that amount (rather than just 0) if
      the call is successful.
      
      Change all callers to use this return value to ensure decoding of
      the results is done safely.
      
      On the other hand, most callers of rbd_obj_method_sync() only
      indicate success or failure, so all of *their* callers can simply
      test for non-zero result.
      
      This resolves:
          http://tracker.ceph.com/issues/4773Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      57385b51
    • Alex Elder's avatar
      rbd: void data pointers for rbd_obj_method_sync() · 4157976b
      Alex Elder authored
      Make the inbound and outbound data parameters have void rather than
      character type for rbd_obj_method_sync().  This makes it more clear
      they don't expect typed data, and eliminates the need for some silly
      type casts.
      
      One more unrelated change: define the features buffer used in
      _rbd_dev_v2_snap_features() to be a packed data structure.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      4157976b
    • Alex Elder's avatar
      rbd: give rbd_obj_read_sync() buffer void type · 80ef15bf
      Alex Elder authored
      Make the buf parameter into which the data is to be read have type
      void pointer.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      80ef15bf
    • Alex Elder's avatar
      libceph: validate timespec conversions · c3f56102
      Alex Elder authored
      A ceph timespec contains 32-bit unsigned values for its seconds and
      nanoseconds components.  For a standard timespec, both fields are
      signed, and the seconds field is almost surely 64 bits.
      
      Add some explicit casts so the fact that this conversion is taking
      place is obvious.  Also trip a bug if we ever try to put out of
      range (negative or too big) values into a ceph timespec.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c3f56102
    • Alex Elder's avatar
      libceph: add signed type limits · b587398a
      Alex Elder authored
      Flesh out the limits defined in <linux/ceph/decode.h> to include the
      maximum and minimum values for signed type S8, S16, S32, and S64.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      b587398a
    • Alex Elder's avatar
      rbd: enforce parent overlap · a9e8ba2c
      Alex Elder authored
      A clone image has a defined overlap point with its parent image.
      That is the byte offset beyond which the parent image has no
      defined data to back the clone, and anything thereafter can be
      viewed as being zero-filled by the clone image.
      
      This is needed because a clone image can be resized.  If it gets
      resized larger than the snapshot it is based on, the overlap defines
      the original size.  If the clone gets resized downward below the
      original size the new clone size defines the overlap.  If the clone
      is subsequently resized to be larger, the overlap won't be increased
      because the previous resize invalidated any parent data beyond that
      point.
      
      This resolves:
          http://tracker.ceph.com/issues/4724Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      a9e8ba2c
    • Alex Elder's avatar
      rbd: issue a copyup for layered writes · 0eefd470
      Alex Elder authored
      This implements the main copyup functionality for layered writes.
      
      Here we add a copyup_pages field to the object request, which is
      used only for copyup requests to keep track of the page array
      containing data read from the parent image.
      
      A copyup request is currently the only request rbd has that requires
      two osd operations.  Because of this we handle copyup specially.
      All image object requests get an osd request allocated when they are
      created.  For a write request, if a copyup is required, the osd
      request originally allocated is released, and a new one (with room
      for two osd ops) is allocated to replace it.  A new function
      rbd_osd_req_create_copyup() allocates an osd request suitable for
      a copyup request.
      
      The first op is then filled with a copyup object class method call,
      supplying the array of pages containing data read from the parent.
      The second op is filled in with the original write request.
      
      The original request otherwise remains intact, and it describes the
      original write request (found in the second osd op).  The presence
      of the copyup op is sort of implicit; a non-null copyup_pages field
      could be used to distinguish between a "normal" write request and a
      request containing both a copyup call and a write.
      
      This resolves:
          http://tracker.ceph.com/issues/3419Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      0eefd470
    • Alex Elder's avatar
      rbd: implement full object parent reads · 3d7efd18
      Alex Elder authored
      As a step toward implementing layered writes, implement reading the
      data for a target object from the parent image for a write request
      whose target object is known to not exist.  Add a copyup_pages field
      to an image request to track the page array used (only) for such a
      request.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      3d7efd18