Commits · 3ee5234df68d253c415ba4f2db72ad250d9c21a9 · Kirill Smelkov / linux

20 Dec, 2012 3 commits

libceph: init event->node in ceph_osdc_create_event() · 3ee5234d

Alex Elder authored Dec 17, 2012

The red-black node node in the ceph osd event structure is not
initialized in create_osdc_create_event().  Because this node can
be the subject of a RB_EMPTY_NODE() call later on, we should ensure
the node is initialized properly for that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

3ee5234d

libceph: init osd->o_node in create_osd() · f407731d

Alex Elder authored Dec 06, 2012

The red-black node node in the ceph osd structure is not initialized
in create_osd().  Because this node can be the subject of a
RB_EMPTY_NODE() call later on, we should ensure the node is
initialized properly for that.  Add a call to RB_CLEAR_NODE()
initialize it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

f407731d

libceph: report connection fault with warning · 28362986

Alex Elder authored Dec 14, 2012

When a connection's socket disconnects, or if there's a protocol
error of some kind on the connection, a fault is signaled and
the connection is reset (closed and reopened, basically).  We
currently get an error message on the log whenever this occurs.

A ceph connection will attempt to reestablish a socket connection
repeatedly if a fault occurs.  This means that these error messages
will get repeatedly added to the log, which is undesirable.

Change the error message to be a warning, so they don't get
logged by default.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

28362986

17 Dec, 2012 7 commits

libceph: socket can close in any connection state · 7bb21d68

Alex Elder authored Dec 07, 2012

A connection's socket can close for any reason, independent of the
state of the connection (and without irrespective of the connection
mutex).  As a result, the connectino can be in pretty much any state
at the time its socket is closed.

Handle those other cases at the top of con_work().  Pull this whole
block of code into a separate function to reduce the clutter.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

7bb21d68

rbd: don't use ENOTSUPP · b8f5c6ed

Alex Elder authored Nov 01, 2012

ENOTSUPP is not a standard errno (it shows up as "Unknown error 524"
in an error message).  This is what was getting produced when the
the local rbd code does not implement features required by a
discovered rbd image.

Change the error code returned in this case to ENXIO.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

b8f5c6ed

rbd: remove linger unconditionally · 61c74035

Alex Elder authored Dec 06, 2012

In __unregister_linger_request(), the request is being removed
from the osd client's req_linger list only when the request
has a non-null osd pointer.  It should be done whether or not
the request currently has an osd.

This is most likely a non-issue because I believe the request
will always have an osd when this function is called.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

61c74035

rbd: get rid of RBD_MAX_SEG_NAME_LEN · 2fd82b9e

Alex Elder authored Nov 09, 2012

RBD_MAX_SEG_NAME_LEN represents the maximum length of an rbd object
name (i.e., one of the objects providing storage backing an rbd
image).

Another symbol, MAX_OBJ_NAME_SIZE, is used in the osd client code to
define the maximum length of any object name in an osd request.

Right now they disagree, with RBD_MAX_SEG_NAME_LEN being too big.

There's no real benefit at this point to defining the rbd object
name length limit separate from any other object name, so just
get rid of RBD_MAX_SEG_NAME_LEN and use MAX_OBJ_NAME_SIZE in its
place.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

2fd82b9e

libceph: avoid using freed osd in __kick_osd_requests() · 685a7555

Alex Elder authored Dec 07, 2012

If an osd has no requests and no linger requests, __reset_osd()
will just remove it with a call to __remove_osd().  That drops
a reference to the osd, and therefore the osd may have been free
by the time __reset_osd() returns.  That function offers no
indication this may have occurred, and as a result the osd will
continue to be used even when it's no longer valid.

Change__reset_osd() so it returns an error (ENODEV) when it
deletes the osd being reset.  And change __kick_osd_requests() so it
returns immediately (before referencing osd again) if __reset_osd()
returns *any* error.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

685a7555

ceph: don't reference req after put · 7d5f2481

Alex Elder authored Nov 29, 2012

In __unregister_request(), there is a call to list_del_init()
referencing a request that was the subject of a call to
ceph_osdc_put_request() on the previous line.  This is not
safe, because the request structure could have been freed
by the time we reach the list_del_init().

Fix this by reversing the order of these lines.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>

7d5f2481

rbd: do not allow remove of mounted-on image · 42382b70

Alex Elder authored Nov 16, 2012

There is no check in rbd_remove() to see if anybody holds open the
image being removed.  That's not cool.

Add a simple open count that goes up and down with opens and closes
(releases) of the device, and don't allow an rbd image to be removed
if the count is non-zero.

Protect the updates of the open count value with ctl_mutex to ensure
the underlying rbd device doesn't get removed while concurrently
being opened.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

42382b70

13 Dec, 2012 9 commits

libceph: Unlock unprocessed pages in start_read() error path · 8884d53d

David Zafman authored Dec 03, 2012

Function start_read() can get an error before processing all pages.
It must not only release the remaining pages, but unlock them too.

This fixes http://tracker.newdream.net/issues/3370Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

8884d53d

ceph: call handle_cap_grant() for cap import message · 0e5e1774

Yan, Zheng authored Nov 19, 2012

If client sends cap message that requests new max size during
exporting caps, the exporting MDS will drop the message quietly.
So the client may wait for the reply that updates the max size
forever. call handle_cap_grant() for cap import message can
avoid this issue.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>

0e5e1774

ceph: Fix __ceph_do_pending_vmtruncate · a85f50b6

Yan, Zheng authored Nov 19, 2012

we should set i_truncate_pending to 0 after page cache is truncated
to i_truncate_size
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>

a85f50b6

ceph: Don't add dirty inode to dirty list if caps is in migration · 0685235f

Yan, Zheng authored Nov 19, 2012

Add dirty inode to cap_dirty_migrating list instead, this can avoid
ceph_flush_dirty_caps() entering infinite loop.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>

0685235f

ceph: Fix infinite loop in __wake_requests · ed75ec2c

Yan, Zheng authored Nov 19, 2012

__wake_requests() will enter infinite loop if we use it to wake
requests in the session->s_waiting list. __wake_requests() deletes
requests from the list and __do_request() adds requests back to
the list.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>

ed75ec2c

ceph: Don't update i_max_size when handling non-auth cap · 5e62ad30

Yan, Zheng authored Nov 19, 2012

The cap from non-auth mds doesn't have a meaningful max_size value.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>

5e62ad30

bdi_register: add __printf verification, fix arg mismatch · d2cc4dde

Joe Perches authored Nov 29, 2012

__printf is useful to verify format and arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Alex Elder <elder@inktank.com>

d2cc4dde

libceph: remove 'osdtimeout' option · 83aff95e

Sage Weil authored Nov 28, 2012

This would reset a connection with any OSD that had an outstanding
request that was taking more than N seconds.  The idea was that if the
OSD was buggy, the client could compensate by resending the request.

In reality, this only served to hide server bugs, and we haven't
actually seen such a bug in quite a while.  Moreover, the userspace
client code never did this.

More importantly, often the request is taking a long time because the
OSD is trying to recover, or overloaded, and killing the connection
and retrying would only make the situation worse by giving the OSD
more work to do.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

83aff95e

ceph: fix dentry reference leak in ceph_encode_fh(). · cfc84c9f

Cyril Roelandt authored Nov 20, 2012

dput() was not called in the error path.
Signed-off-by: Cyril Roelandt <tipecaml@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>

cfc84c9f

05 Nov, 2012 1 commit

ceph: Fix i_size update race · 22cddde1

Sage Weil authored Nov 05, 2012

ceph_aio_write() has an optimization that marks cap EPH_CAP_FILE_WR
dirty before data is copied to page cache and inode size is updated.
If ceph_check_caps() flushes the dirty cap before the inode size is
updated, MDS can miss the new inode size. The fix is move
ceph_{get,put}_cap_refs() into ceph_write_{begin,end}() and call
__ceph_mark_dirty_caps() after inode size is updated.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>

22cddde1

04 Nov, 2012 1 commit
- ceph: Hold caps_list_lock when adjusting caps_{use, total}_count · 4d1d0534
  Yan, Zheng authored Nov 03, 2012
```
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
```
  4d1d0534
01 Nov, 2012 11 commits

rbd: get additional info in parent spec · 9e15b77d

Alex Elder authored Oct 30, 2012

When a layered rbd image has a parent, that parent is identified
only by its pool id, image id, and snapshot id.  Images that have
been mapped also record *names* for those three id's.

Add code to look up these names for parent images so they match
mapped images more closely.  Skip doing this for an image if it
already has its pool name defined (this will be the case for images
mapped by the user).

It is possible that an the name of a parent image can't be
determined, even if the image id is valid.  If this occurs it
does not preclude correct operation, so don't treat this as
an error.

On the other hand, defined pools will always have both an id and a
name.   And any snapshot of an image identified as a parent for a
clone image will exist, and will have a name (if not it indicates
some other internal error).  So treat failure to get these bits
of information as errors.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

9e15b77d

libceph: define ceph_pg_pool_name_by_id() · 72afc71f

Alex Elder authored Oct 30, 2012

Define and export function ceph_pg_pool_name_by_id() to supply
the name of a pg pool whose id is given.  This will be used by
the next patch.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

72afc71f

rbd: get parent spec for version 2 images · 86b00e0d

Alex Elder authored Oct 25, 2012

Add support for getting the the information identifying the parent
image for rbd images that have them.  The child image holds a
reference to its parent image specification structure.  Create a new
entry "parent" in /sys/bus/rbd/image/N/ to report the identifying
information for the parent image, if any.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

86b00e0d

rbd: allow null image name · a92ffdf8

Alex Elder authored Oct 30, 2012

Format 2 parent images are partially identified by their image id,
but it may not be possible to determine their image name.  The name
is not strictly needed for correct operation, so we won't be
treating it as an error if we don't know it.  Handle this case
gracefully in rbd_name_show().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

a92ffdf8

rbd: allow null image name · 2c0d0a10

Alex Elder authored Oct 30, 2012

We will know the image id for format 2 parent images, but won't
initially know its image name.  Avoid making the query for an image
id in rbd_dev_image_id() if it's already known.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

2c0d0a10

rbd: encapsulate last part of probe · 83a06263

Alex Elder authored Oct 30, 2012

Group the activities that now take place after an rbd_dev_probe()
call into a single function, and move the call to that function
into rbd_dev_probe() itself.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

83a06263

rbd: define rbd_dev_{create,destroy}() helpers · c53d5893

Alex Elder authored Oct 25, 2012

Encapsulate the creation/initialization and destruction of rbd
device structures.  The rbd_client and the rbd_spec structures
provided on creation hold references whose ownership is transferred
to the new rbd_device structure.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

c53d5893

rbd: consolidate rbd_dev init in rbd_add() · bd4ba655

Alex Elder authored Oct 25, 2012

Group the allocation and initialization of fields of the rbd device
structure created in rbd_add().  Move the grouped code down later in
the function, just prior to the call to rbd_dev_probe().  This is
for the most part simple code movement.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

bd4ba655

rbd: don't pass rbd_dev to rbd_get_client() · 9d3997fd

Alex Elder authored Oct 25, 2012

The only reason rbd_dev is passed to rbd_get_client() is so its
rbd_client field can get assigned.  Instead, just return the
rbd_client pointer as a result and have the caller do the
assignment.

Change rbd_put_client() so it takes an rbd_client structure,
so follows the more typical symmetry with rbd_get_client().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

9d3997fd

rbd: fill rbd_spec in rbd_add_parse_args() · 859c31df

Alex Elder authored Oct 25, 2012

Pass the address of an rbd_spec structure to rbd_add_parse_args().
Use it to hold the information defining the rbd image to be mapped
in an rbd_add() call.

Use the result in the caller to initialize the rbd_dev->id field.

This means rbd_dev is no longer needed in rbd_add_parse_args(),
so get rid of it.

Now that this transformation of rbd_add_parse_args() is complete,
correct and expand on the its header documentation to reflect the
new reality.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

859c31df

rbd: add reference counting to rbd_spec · 8b8fb99c

Alex Elder authored Oct 26, 2012

With layered images we'll share rbd_spec structures, so add a
reference count to it.  It neatens up some code also.

A silly get/put pair is added to the alloc routine just to avoid
"defined but not used" warnings.  It will go away soon.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

8b8fb99c

30 Oct, 2012 8 commits

rbd: define image specification structure · 0d7dbfce

Alex Elder authored Oct 25, 2012

Group the fields that uniquely specify an rbd image into a new
reference-counted rbd_spec structure.  This structure will be used
to describe the desired image when mapping an image, and when
probing parent images in layered rbd devices.  Replace the set of
fields in the rbd device structure with a pointer to a dynamically
allocated rbd_spec.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

0d7dbfce

rbd: have rbd_add_parse_args() return error · dc79b113

Alex Elder authored Oct 25, 2012

Change the interface to rbd_add_parse_args() so it returns an
error code rather than a pointer.  Return the ceph_options result
via a pointer whose address is passed as an argument.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

dc79b113

rbd: pass and populate rbd_options structure · 4e9afeba

Alex Elder authored Oct 25, 2012

Have the caller pass the address of an rbd_options structure to
rbd_add_parse_args(), to be initialized with the information
gleaned as a result of the parse.

I know, this is another near-reversal of a recent change...
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

4e9afeba

rbd: remove snap_name arg from rbd_add_parse_args() · 819d52bf

Alex Elder authored Oct 25, 2012

The snapshot name returned by rbd_add_parse_args() just gets saved
in the rbd_dev eventually.  So just do that inside that function and
do away with the snap_name argument, both in rbd_add_parse_args()
and rbd_dev_set_mapping().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

819d52bf

rbd: remove options args from rbd_add_parse_args() · f28e565a

Alex Elder authored Oct 25, 2012

They "options" argument to rbd_add_parse_args() (and it's partner
options_size) is now only needed within the function, so there's no
need to have the caller allocate and pass the options buffer.  Just
allocate the options buffer within the function using dup_token().

Also distinguish between failures due to failed memory allocation
and failing because a required argument was missing.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

f28e565a

rbd: get rid of snap_name_len · e5c35534

Alex Elder authored Oct 25, 2012

The value returned in the "snap_name_len" argument to
rbd_add_parse_args() is never actually used, so get rid of it.

The snap_name_len recorded in rbd_dev_v2_snap_name() is not
useful either, so get rid of that too.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

e5c35534

rbd: do all argument parsing in one place · 0ddebc0c

Alex Elder authored Oct 25, 2012

This patch makes rbd_add_parse_args() be the single place all
argument parsing occurs for an image map request:
    - Move the ceph_parse_options() call into that function
    - Use local variables rather than parameters to hold the list
      of monitor addresses supplied
    - Rather than returning it, pass the snapshot name (and its
      length) back via parameters
    - Have the function return a ceph_options structure pointer
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

0ddebc0c

rbd: move ceph_parse_options() call up · 78cea76e

Alex Elder authored Oct 25, 2012

Move option parsing out of rbd_get_client() and into its caller.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

78cea76e