- 26 Nov, 2012 40 commits
-
-
Yan, Zheng authored
(cherry picked from commit 3e8f43a0) When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return NULL. Passing NULL to memcmp() triggers oops. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 9bd95261) The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a message. If that races with a timeout that resends a zillion messages and resets the connection, and the ->alloc_msg() method returns a NULL message, it will call ceph_msg_put(NULL) and BUG. Fix by only calling put if msg is non-NULL. Fixes http://tracker.newdream.net/issues/3142Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 588377d6) If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 5ce765a5) In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by:
Bryan K. Wright <bryan@virginia.edu> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jim Schutt authored
(cherry picked from commit 6d4221b5) Because the Ceph client messenger uses a non-blocking connect, it is possible for the sending of the client banner to race with the arrival of the banner sent by the peer. When ceph_sock_state_change() notices the connect has completed, it schedules work to process the socket via con_work(). During this time the peer is writing its banner, and arrival of the peer banner races with con_work(). If con_work() calls try_read() before the peer banner arrives, there is nothing for it to do, after which con_work() calls try_write() to send the client's banner. In this case Ceph's protocol negotiation can complete succesfully. The server-side messenger immediately sends its banner and addresses after accepting a connect request, *before* actually attempting to read or verify the banner from the client. As a result, it is possible for the banner from the server to arrive before con_work() calls try_read(). If that happens, try_read() will read the banner and prepare protocol negotiation info via prepare_write_connect(). prepare_write_connect() calls con_out_kvec_reset(), which discards the as-yet-unsent client banner. Next, con_work() calls try_write(), which sends the protocol negotiation info rather than the banner that the peer is expecting. The result is that the peer sees an invalid banner, and the client reports "negotiation failed". Fix this by moving con_out_kvec_reset() out of prepare_write_connect() to its callers at all locations except the one where the banner might still need to be sent. [elder@inktak.com: added note about server-side behavior] Signed-off-by:
Jim Schutt <jaschut@sandia.gov> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit d1c338a5) The debugfs directory includes the cluster fsid and our unique global_id. We need to delay the initialization of the debug entry until we have learned both the fsid and our global_id from the monitor or else the second client can't create its debugfs entry and will fail (and multiple client instances aren't properly reflected in debugfs). Reported by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sylvain Munaut authored
(cherry picked from commit f0666b1a) Avoid crashing if the crypto key payload was NULL, as when it was not correctly allocated and initialized. Also, avoid leaking it. Signed-off-by:
Sylvain Munaut <tnt@246tNt.com> Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 61399191) We drop the lock when calling the ->alloc_msg() con op, which means we need to (a) not clobber con->in_msg without the mutex held, and (b) we need to verify that we are still in the OPEN state when we retake it to avoid causing any mayhem. If the state does change, -EAGAIN will get us back to con_work() and loop. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 4740a623) This function's calling convention is very limiting. In particular, we can't return any error other than ENOMEM (and only implicitly), which is a problem (see next patch). Instead, return an normal 0 or error code, and make the skip a pointer output parameter. Drop the useless in_hdr argument (we have the con pointer). Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 8636ea67) The ceph_fault() function takes the con mutex, so we should avoid dropping it before calling it. This fixes a potential race with another thread calling ceph_con_close(), or _open(), or similar (we don't reverify con->state after retaking the lock). Add annotation so that lockdep realizes we will drop the mutex before returning. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 7b862e07) We drop the con mutex when delivering a message. When we retake the lock, we need to verify we are still in the OPEN state before preparing to read the next tag, or else we risk stepping on a connection that has been closed. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 4f471e4a) Revoke all mon_client messages when we shut down the old connection. This is mostly moot since we are re-using the same ceph_connection, but it is cleaner. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 8007b8d6) If the connect() call immediately fails such that sock == NULL, we still need con_close_socket() to reset our socket state to CLOSED. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
Signed-off-by:
Sage Weil <sage@inktank.com> (cherry picked from commit 43c7427d)
-
Sage Weil authored
(cherry picked from commit 4a861692) Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 8dacc7da) Use a simple set of 6 enumerated values for the socket states (CON_STATE_*) and use those instead of the state bits. All of the con->state checks are now under the protection of the con mutex, so this is safe. It also simplifies many of the state checks because we can check for anything other than the expected state instead of various bits for races we can think of. This appears to hold up well to stress testing both with and without socket failure injection on the server side. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit d7353dd5) If we are CLOSED, the socket is closed and we won't get these. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit ee76e073) It is simpler to do this immediately, since we already hold the con mutex. It also avoids the need to deal with a not-quite-CLOSED socket in con_work. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 2e8cb100) If the state is CLOSED or OPENING, we shouldn't have a socket. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit a59b55a6) Take the con mutex before checking whether the connection is closed to avoid racing with someone else closing it. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 00650931) Avoid dropping and retaking con->mutex in the ceph_con_send() case by leaving locking up to the caller. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 3b5ede07) If we fault on a lossy connection, we should still close the socket immediately, and do so under the con mutex. We should also take the con mutex before printing out the state bits in the debug output. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 85effe18) We exponentially back off when we encounter connection errors. If several errors accumulate, we will eventually wait ages before even trying to reconnect. Fix this by resetting the backoff counter after a successful negotiation/ connection with the remote node. Fixes ceph issue #2802. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 5469155f) Take the con mutex while we are initiating a ceph open. This is necessary because the may have previously been in use and then closed, which could result in a racing workqueue running con_work(). Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit a4107026) Previously, we were opportunistically initializing the bio_iter if it appeared to be uninitialized in the middle of the read path. The problem is that a sequence like: - start reading message - initialize bio_iter - read half a message - messenger fault, reconnect - restart reading message - ** bio_iter now non-NULL, not reinitialized ** - read past end of bio, crash Instead, initialize the bio_iter unconditionally when we allocate/claim the message for read. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 6194ea89) The linger op registration (i.e., watch) modifies the object state. As such, the OSD will reply with success if it has already applied without doing the associated side-effects (setting up the watch session state). If we lose the ACK and resubmit, we will see success but the watch will not be correctly registered and we won't get notifies. To fix this, always resubmit the linger op with a new tid. We accomplish this by re-registering as a linger (i.e., 'registered') if we are not yet registered. Then the second loop will treat this just like a normal case of re-registering. This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and ceph bug #2796. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 8c50c817) Hold the mutex while twiddling all of the state bits to avoid possible races. While we're here, make not of why we cannot close the socket directly. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 3a140a0d) We need to set error_msg to something useful before calling ceph_fault(); do so here for try_{read,write}(). This is more informative than libceph: osd0 192.168.106.220:6801 (null) Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Yehuda Sadeh <yehuda@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guanjun He authored
(cherry picked from commit a2a32584) Add an atomic variable 'stopping' as flag in struct ceph_messenger, set this flag to 1 in function ceph_destroy_client(), and add the condition code in function ceph_data_ready() to test the flag value, if true(1), just return. Signed-off-by:
Guanjun He <gjhe@suse.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit d50b409f) Initialize the type field for messages in a msgpool. The caller was doing this for osd ops, but not for the reply messages. Reported-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit fbb85a47) It is possible to close a socket that is in the OPENING state. For example, it can happen if ceph_con_close() is called on the con before the TCP connection is established. con_work() will come around and shut down the socket. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 735a72ef) Do not re-initialize the con on every connection attempt. When we ceph_con_close, there may still be work queued on the socket (e.g., to close it), and re-initializing will clobber the work_struct state. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit b7a9e5dd) The peer name may change on each open attempt, even when the connection is reused. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit bc18f4b1) Sage liked the state diagram I put in my commit description so I'm putting it in with the code. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 5821bd8c) This patch gathers a few small changes in "net/ceph/messenger.c": out_msg_pos_next() - small logic change that mostly affects indentation write_partial_msg_pages(). - use a local variable trail_off to represent the offset into a message of the trail portion of the data (if present) - once we are in the trail portion we will always be there, so we don't always need to check against our data position - avoid computing len twice after we've reached the trail - get rid of the variable tmpcrc, which is not needed - trail_off and trail_len never change so mark them const - update some comments read_partial_message_bio() - bio_iovec_idx() will never return an error, so don't bother checking for it Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 7593af92) Currently a ceph connection enters a "CONNECTING" state when it begins the process of (re-)connecting with its peer. Once the two ends have successfully exchanged their banner and addresses, an additional NEGOTIATING bit is set in the ceph connection's state to indicate the connection information exhange has begun. The CONNECTING bit/state continues to be set during this phase. Rather than have the CONNECTING state continue while the NEGOTIATING bit is set, interpret these two phases as distinct states. In other words, when NEGOTIATING is set, clear CONNECTING. That way only one of them will be active at a time. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit ab166d5a) There are two phases in the process of linking together the two ends of a ceph connection. The first involves exchanging a banner and IP addresses, and if that is successful a second phase exchanges some detail about each side's connection capabilities. When initiating a connection, the client side now queues to send its information for both phases of this process at the same time. This is probably a bit more efficient, but it is slightly messier from a layering perspective in the code. So rearrange things so that the client doesn't send the connection information until it has received and processed the response in the initial banner phase (in process_banner()). Move the code (in the (con->sock == NULL) case in try_write()) that prepares for writing the connection information, delaying doing that until the banner exchange has completed. Move the code that begins the transition to this second "NEGOTIATING" phase out of process_banner() and into its caller, so preparing to write the connection information and preparing to read the response are adjacent to each other. Finally, preparing to write the connection information now requires the output kvec to be reset in all cases, so move that into the prepare_write_connect() and delete it from all callers. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit e27947c7) There is no state explicitly defined when a ceph connection is fully operational. So define one. It's set when the connection sequence completes successfully, and is cleared when the connection gets closed. Be a little more careful when examining the old state when a socket disconnect event is reported. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 3ec50d18) A connection state's NEGOTIATING bit gets set while in CONNECTING state after we have successfully exchanged a ceph banner and IP addresses with the connection's peer (the server). But that bit is not cleared again--at least not until another connection attempt is initiated. Instead, clear it as soon as the connection is fully established. Also, clear it when a socket connection gets prematurely closed in the midst of establishing a ceph connection (in case we had reached the point where it was set). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit bb9e6bba) A connection that is closed will no longer be connecting. So clear the CONNECTING state bit in ceph_con_close(). Similarly, if the socket has been closed we no longer are in connecting state (a new connect sequence will need to be initiated). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-