- 11 Feb, 2010 12 commits
-
-
Sage Weil authored
Grab inode ref in helper. Make work functions static, with consistent naming. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
If a sync read gets a short result from the OSD, it may need to do a getattr to see if it is short due to reaching end-of-file. The getattr was being done while holding a reference to FILE_RD, which can lead to a deadlock if the MDS is revoking that capability bit and can't process the getattr until it does. We fix this by setting a flag if EOF size validation is needed, and doing the getattr in ceph_aio_read, after the RD cap ref is dropped. If the read needs to be continued, we loop and continue traversing the file. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Never retain caps in __send_cap() that are being revoked. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Try to invalidate pages in ceph_check_caps() if FILE_CACHE is being revoked. If we fail, queue an immediate async invalidate if FILE_CACHE is being revoked. (If it's not being revoked, we just queue the caps for later evaluation later, as per the old behavior.) Signed-off-by: Sage Weil <sage@newdream.net>
-
Yehuda Sadeh authored
In the cases where we either do a sync read or a write, we need to make sure that everything in the page cache is flushed. In the case of a sync write we invalidate the relevant pages, so that subsequent read/write reflects the new data written. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Yehuda Sadeh authored
A truncation should occur when either we have the specified caps for the file, or (in cases where we are not the only ones referencing the file) when it is mapped or when it is opened. The latter two cases were not handled. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Yehuda Sadeh authored
Originally ceph_page_mkwrite called ceph_write_begin, hoping that the returned locked page would be the page that it was requested to mkwrite. Factored out relevant part of ceph_page_mkwrite and we lock the right page anyway. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Yehuda Sadeh authored
Zeroing of holes was not done correctly: page_off was miscalculated and zeroing the tail didn't not adjust the 'read' value to include the zeroed portion. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Also verify encoding version as we go. Signed-off-by: Sage Weil <sage@newdream.net>
-
Yehuda Sadeh authored
Instead of removing osd connection immediately when the requests list is empty, put the osd connection on an lru. Only if that osd has not been used for more than a specified time, will it be removed. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Yehuda Sadeh authored
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
The auth_x protocol implements support for a kerberos-like mutual authentication infrastructure used by Ceph. We do not simply use vanilla kerberos because of scalability and performance issues when dealing with a large cluster of nodes providing a single logical service. Auth_x provides mutual authentication of client and server and protects against replay and man in the middle attacks. It does not encrypt the full session over the wire, however, so data payload may still be snooped. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
- 10 Feb, 2010 4 commits
-
-
Sage Weil authored
Inlucde struct version in encoding. This will streamline future protocol changes. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Add infrastructure to allow the mon_client to periodically renew its auth credentials. Also add a messenger callback that will force such a renewal if a peer rejects our authenticator. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Helpers to encrypt/decrypt AES and base64. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Helper for decoding into a ceph_buffer, and other misc decoding helpers we will need. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
- 03 Feb, 2010 2 commits
-
-
Sage Weil authored
We release all the pages, even if the osd response was different than the number of pages written. This could only happen due to truncation that arrives the osd in different order, for which we want the pages released anyway. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Yehuda Sadeh authored
This fixes a bug where the read/write ops arrive the osd after a following truncation request. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
- 29 Jan, 2010 2 commits
-
-
Yehuda Sadeh authored
We never truncate to a smaller size without contacting the MDS. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Include a type/version in ceph_entity_addr and filepath. Include extra byte in filepath encoding as necessary. Signed-off-by: Sage Weil <sage@newdream.net>
-
- 26 Jan, 2010 1 commit
-
-
Sage Weil authored
Signed-off-by: Sage Weil <sage@newdream.net>
-
- 25 Jan, 2010 7 commits
-
-
Yehuda Sadeh authored
This includes treating all the data preallocation and revokation at the same place, not having to have a special case for the reserved pages. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Yehuda Sadeh authored
Now doing it in the same callback that is also responsible for allocating the 'front' part of the message. If we get a message that we haven't got a corresponding tid for, mark it for skipping. Moving the mutex unlock/lock from the osd alloc_msg callback to the calling function in the messenger. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Yehuda Sadeh authored
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Yehuda Sadeh authored
Both front and middle parts of the message are now being allocated at the ceph_alloc_msg(). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Sage Weil authored
Previously, if the MDS request was interrupted, we would unregister the request and ignore any reply. This could cause the caps or other cache state to become out of sync. (For instance, aborting dbench and doing rm -r on clients would complain about a non-empty directory because the client didn't realize it's aborted file create request completed.) Even we don't unregister, we still can't process the reply normally because we are no longer holding the caller's locks (like the dir i_mutex). So, mark aborted operations with r_aborted, and in the reply handler, be sure to process all the caps. Do not process the namespace changes, though, since we no longer will hold the dir i_mutex. The dentry lease state can also be ignored as it's more forgiving. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
CEPH_MDS_OP_CREATE was not correctly marked as a write operation. Signed-off-by: Sage Weil <sage@newdream.net>
-
Julia Lawall authored
The variable client is initialized twice to the same (side effect-free) expression. Drop one initialization. A simplified version of the semantic match that finds this problem is: (http://coccinelle.lip6.fr/) // <smpl> @forall@ idexpression *x; identifier f!=ERR_PTR; @@ x = f(...) ... when != x ( x = f(...,<+...x...+>,...) | * x = f(...) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
-
- 14 Jan, 2010 3 commits
-
-
Sage Weil authored
Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
The ceph_entity_addr erank field is obsolete; remove it. Get rid of trivial addr comparison helpers while we're at it. Signed-off-by: Sage Weil <sage@newdream.net>
-
Yehuda Sadeh authored
This fixes a bug, where we had the parent list have dentries with offsets that are not monotonically increasing, which caused the ceph dcache_readdir to skip entries. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
- 07 Jan, 2010 1 commit
-
-
Yehuda Sadeh authored
The function was broken in the case where there was more than one page involved, broke the ceph sync_write case. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
-
- 23 Dec, 2009 8 commits
-
-
Sage Weil authored
Use the ceph_pagelist to encode the MDS reconnect message. We change the message encoding (protocol change!) at the same time to make our life easier (we don't know how many snaprealms we have when we start encoding). An empty message implies the session is closed/does not exist. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
The ceph_pagelist is a simple list of whole pages, strung together via their lru list_head. It facilitates encoding to a "buffer" of unknown size. Allow its use in place of the ceph_msg page vector. This will be used to fix the huge buffer preallocation woes of MDS reconnection. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Define supported and required feature set. Fail connection if the server requires features we do not support (TAG_FEATURES), or if the server does not support features we require. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Many (most?) message types include a transaction id. By including it in the fixed size header, we always have it available even when we are unable to allocate memory for the (larger, variable sized) message body. This will allow us to error out the appropriate request instead of (silently) dropping the reply. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
When we issue an OSD read, we specify a vector of pages that the data is to be read into. The request may be sent multiple times, to multiple OSDs, if the osdmap changes, which means we can get more than one reply. Only read data into the page vector if the reply is coming from the OSD we last sent the request to. Keep track of which connection is using the vector by taking a reference. If another connection was already using the vector before and a new reply comes in on the right connection, revoke the pages from the other connection. Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Use a single mutex (previously out_mutex) to protect both read and write activity from concurrent ceph_con_* calls. Drop the mutex when doing callbacks to avoid nested locking (the callback may need to call something like ceph_con_close). Signed-off-by: Sage Weil <sage@newdream.net>
-
Sage Weil authored
Canceled or timed out osd requests were getting left in the request list and never deallocated (until umount). Unregister if they are canceled (control-c) or time out. Signed-off-by: Sage Weil <sage@newdream.net>
-