- 26 Nov, 2012 40 commits
-
-
Josh Durgin authored
(cherry picked from commit 403f24d3) This is updated whenever a snapshot is added or deleted, and the snapc pointer is changed with every refresh of the header. Signed-off-by:
Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by:
Alex Elder <elder@dreamhost.com> Reviewed-by:
Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit cd9d9f5d) A recent change made changes to the rbd_client_list be protected by a spinlock. Unfortunately in rbd_put_client(), the lock is taken before possibly dropping the last reference to an rbd_client, and on the last reference that eventually calls flush_workqueue() which can sleep. The problem was flagged by a debug spinlock warning: BUG: spinlock wrong CPU on CPU#3, rbd/27814 The solution is to move the spinlock acquisition and release inside rbd_client_release(), which is the spot where it's really needed for protecting the removal of the rbd_client from the client list. Signed-off-by:
Alex Elder <elder@dreamhost.com> Reviewed-by:
Sage Weil <sage@newdream.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 5bdca4e0) In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit f3dea7ed) (cherry picked from commit 642c0dbd) We need to flush the msgr workqueue during mon_client shutdown to ensure that any work affecting our embedded ceph_connection is finished so that we can be safely destroyed. Previously, we were flushing the work queue after osd_client shutdown and before mon_client shutdown to ensure that any osd connection refs to authorizers are flushed. Remove the redundant flush, and document in the comment that the mon_client flush is needed to cover that case as well. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yan, Zheng authored
(cherry picked from commit 43643528) (cherry picked from commit b132cf4c) The bug can cause NULL pointer dereference in write_partial_msg_pages Signed-off-by:
Zheng Yan <zheng.z.yan@intel.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 0d47766f) There were a few direct calls to ceph_con_{get,put}() instead of the con ops from osd_client.c. This is a bug since those ops aren't defined to be ceph_con_get/put. This breaks refcounting on the ceph_osd structs that contain the ceph_connections, and could lead to all manner of strangeness. The purpose of the ->get and ->put methods in a ceph connection are to allow the connection to indicate it has a reference to something external to the messaging system, *not* to indicate something external has a reference to the connection. [elder@inktank.com: added that last sentence] Signed-off-by:
Sage Weil <sage@newdream.net> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 88ed6ea0)
-
Alex Elder authored
(cherry picked from commit ab8cb34a) In ceph_osdc_release_request(), a reference to the r_reply message is dropped. But just after that, that same message is revoked if it was in use to receive an incoming reply. Reorder these so we are sure we hold a reference until we're actually done with the message. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 680584fa)
-
Sage Weil authored
(cherry picked from commit 6bd9adbd) Usually, we are adding pg_temp entries or removing them. Occasionally they update. In that case, osdmap_apply_incremental() was failing because the rbtree entry already exists. Fix by removing the existing entry before inserting a new one. Fixes http://tracker.newdream.net/issues/2446Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 35f9f8a0) There is a race between two __unregister_request() callers: the reply path and the ceph_osdc_wait_request(). If we get a reply *and* the timeout expires at roughly the same time, both callers will try to unregister the request, and the second one will do bad things. Simply check if the request is still already unregistered; if so, return immediately and do nothing. Fixes http://tracker.newdream.net/issues/2420Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 3da54776) Move the addition of the authorizer buffer to a connection's out_kvec out of get_connect_authorizer() and into its caller. This way, the caller--prepare_write_connect()--can avoid adding the connect header to out_kvec before it has been fully initialized. Prior to this patch, it was possible for a connect header to be sent over the wire before the authorizer protocol or buffer length fields were initialized. An authorizer buffer associated with that header could also be queued to send only after the connection header that describes it was on the wire. Fixes http://tracker.newdream.net/issues/2424Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit dac1e716) Change the name of prepare_connect_authorizer(). The next patch is going to make this function no longer add anything to the connection's out_kvec, so it will no longer fit the pattern of the rest of the prepare_connect_*() functions. In addition, pass the address of a variable that will hold the authorization protocol to use. Move the assignment of that to the connection's out_connect structure into prepare_write_connect(). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 729796be) Change prepare_connect_authorizer() so it returns a pointer (or pointer-coded error). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 8f43fb53) Rather than passing a bunch of arguments to be filled in with the content of the ceph_auth_handshake buffer now returned by the get_authorizer method, just use the returned information in the caller, and drop the unnecessary arguments. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit a3530df3) Have the get_authorizer auth_client method return a ceph_auth pointer rather than an integer, pointer-encoding any returned error value. This is to pave the way for making use of the returned value in an upcoming patch. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit a255651d) In the create_authorizer method for both the mds and osd clients, the auth_client->ops pointer is blindly dereferenced. There is no obvious guarantee that this pointer has been assigned. And furthermore, even if the ops pointer is non-null there is definitely no guarantee that the create_authorizer or destroy_authorizer methods are defined. Add checks in both routines to make sure they are defined (non-null) before use. Add similar checks in a few other spots in these files while we're at it. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 74f1869f) Make use of the new ceph_auth_handshake structure in order to reduce the number of arguments passed to the create_authorizor method in ceph_auth_client_ops. Use a local variable of that type as a shorthand in the get_authorizer method definitions. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 6c4a1915) The definitions for the ceph_mds_session and ceph_osd both contain five fields related only to "authorizers." Encapsulate those fields into their own struct type, allowing for better isolation in some upcoming patches. Fix the #includes in "linux/ceph/osd_client.h" to lay out their more complete canonical path. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit ed96af64) In prepare_connect_authorizer(), a connection's get_authorizer method is called but ignores its return value. This function can return an error, so check for it and return it if that ever occurs. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit b1c6b980) Change prepare_connect_authorizer() so it returns without dropping the connection mutex if the connection has no get_authorizer method. Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning authorization protocols. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 5a0f8fdd) prepare_write_connect() can return an error, but only one of its callers checks for it. All the rest are in functions that already return errors, so it should be fine to return the error if one gets returned. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit e10c758e) prepare_write_connect() prepares a connect message, then sets WRITE_PENDING on the connection. Then *after* this, it calls prepare_connect_authorizer(), which updates the content of the connection buffer already queued for sending. It's also possible it will result in prepare_write_connect() returning -EAGAIN despite the WRITE_PENDING big getting set. Fix this by preparing the connect authorizer first, setting the WRITE_PENDING bit only after that is done. Partially addresses http://tracker.newdream.net/issues/2424Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit e825a66d) In all cases, the value passed as the msgr argument to prepare_write_connect() is just con->msgr. Just get the msgr value from the ceph connection and drop the unneeded argument. The only msgr passed to prepare_write_banner() is also therefore just the one from con->msgr, so change that function to drop the msgr argument as well. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 41b90c00) prepare_write_connect() has an argument indicating whether a banner should be sent out before sending out a connection message. It's only ever set in one of its callers, so move the code that arranges to send the banner into that caller and drop the "include_banner" argument from prepare_write_connect(). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 84fb3adf) Reset a connection's kvec fields in the caller rather than in prepare_write_connect(). This ends up repeating a few lines of code but it's improving the separation between distinct operations on the connection, which we can take advantage of later. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit d329156f) Move the kvec reset for a connection out of prepare_write_banner and into its only caller. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit fd51653f) Make the second argument to read_partial() be the ending input byte position rather than the beginning offset it now represents. This amounts to moving the addition "to + size" into the caller. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit e6cee71f) read_partial() always increases whatever "to" value is supplied by adding the requested size to it, and that's the only thing it does with that pointed-to value. Do that pointer advance in the caller (and then only when the updated value will be subsequently used), and change the "to" parameter to be an in-only and non-pointer value. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 57dac9d1) There are two blocks of code in read_partial_message()--those that read the header and footer of the message--that can be replaced by a call to read_partial(). Do that. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Elder authored
(cherry picked from commit 065a68f9) From Al Viro <viro@zeniv.linux.org.uk> Al Viro noticed that we were using a non-cpu-encoded value in a switch statement in osd_req_encode_op(). The result would clearly not work correctly on a big-endian machine. Signed-off-by:
Alex Elder <elder@dreamhost.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 6eb43f4b) Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4. Reported-by:
Alexander Lyakas <alex.bolshoy@gmail.com> Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit f671d4cd) Fix the node weight lookup for tree buckets by using a correct accessor. Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit a1f4895b) If we get a map that doesn't make sense, error out or ignore the badness instead of BUGging out. This reflects the ceph.git commits 9895f0bff7dc68e9b49b572613d242315fb11b6c and 8ded26472058d5205803f244c2f33cb6cb10de79. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit c90f95ed) This small adjustment reflects a change that was made in ceph.git commit af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago. An N-1 search is not exhaustive. Fixed ceph.git bug #1594. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
(cherry picked from commit 8b12d47b) Move various types from int -> __u32 (or similar), and add const as appropriate. This reflects changes that have been present in the userland implementation for some time. Reviewed-by:
Alex Elder <elder@inktank.com> Signed-off-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Jones authored
commit 88a693b5 upstream. =============================== [ INFO: suspicious RCU usage. ] 3.5.0-rc1+ #63 Not tainted ------------------------------- security/selinux/netnode.c:178 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by trinity-child1/8750: #0: (sel_netnode_lock){+.....}, at: [<ffffffff812d8f8a>] sel_netnode_sid+0x16a/0x3e0 stack backtrace: Pid: 8750, comm: trinity-child1 Not tainted 3.5.0-rc1+ #63 Call Trace: [<ffffffff810cec2d>] lockdep_rcu_suspicious+0xfd/0x130 [<ffffffff812d91d1>] sel_netnode_sid+0x3b1/0x3e0 [<ffffffff812d8e20>] ? sel_netnode_find+0x1a0/0x1a0 [<ffffffff812d24a6>] selinux_socket_bind+0xf6/0x2c0 [<ffffffff810cd1dd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff810cdb55>] ? lock_release_holdtime.part.9+0x15/0x1a0 [<ffffffff81093841>] ? lock_hrtimer_base+0x31/0x60 [<ffffffff812c9536>] security_socket_bind+0x16/0x20 [<ffffffff815550ca>] sys_bind+0x7a/0x100 [<ffffffff816c03d5>] ? sysret_check+0x22/0x5d [<ffffffff810d392d>] ? trace_hardirqs_on_caller+0x10d/0x1a0 [<ffffffff8133b09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff816c03a9>] system_call_fastpath+0x16/0x1b This patch below does what Paul McKenney suggested in the previous thread. Signed-off-by:
Dave Jones <davej@redhat.com> Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by:
Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
James Morris <james.l.morris@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit 361d94a3 upstream. Calls into reiserfs journalling code and reiserfs_get_block() need to be protected with write lock. We remove write lock around calls to high level quota code in the next patch so these paths would suddently become unprotected. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit 7af11686 upstream. Calls into highlevel quota code cannot happen under the write lock. These calls take dqio_mutex which ranks above write lock. So drop write lock before calling back into quota code. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit b9e06ef2 upstream. In reiserfs_quota_on() we do quite some work - for example unpacking tail of a quota file. Thus we have to hold write lock until a moment we call back into the quota code. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit 3bb3e1fc upstream. When remounting reiserfs dquot_suspend() or dquot_resume() can be called. These functions take dqonoff_mutex which ranks above write lock so we have to drop it before calling into quota code. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bryan Schumaker authored
commit 399f11c3 upstream. Currently, we will schedule session recovery and then return to the caller of nfs4_handle_exception. This works for most cases, but causes a hang on the following test case: Client Server ------ ------ Open file over NFS v4.1 Write to file Expire client Try to lock file The server will return NFS4ERR_BADSESSION, prompting the client to schedule recovery. However, the client will continue placing lock attempts and the open recovery never seems to be scheduled. The simplest solution is to wait for session recovery to run before retrying the lock. Signed-off-by:
Bryan Schumaker <bjschuma@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-