Commits · a105f00cf17d711e876b3dc67e15f9a89b7de5a3 · Kirill Smelkov / linux · GitLab

17 Feb, 2010 4 commits

ceph: use rbtree for snap_realms · a105f00c

Sage Weil authored Feb 15, 2010

Switch from radix tree to rbtree for snap realms.  This is much more
appropriate given that realm keys are few and far between.
Signed-off-by: Sage Weil <sage@newdream.net>

a105f00c

ceph: use rbtree for mds requests · 44ca18f2

Sage Weil authored Feb 15, 2010

The rbtree is a more appropriate data structure than a radix_tree.  It
avoids extra memory usage and simplifies the code.

It also fixes a bug where the debugfs 'mdsc' file wasn't including the
most recent mds request.
Signed-off-by: Sage Weil <sage@newdream.net>

44ca18f2

ceph: cancel delayed work when closing connection · 91e45ce3

Sage Weil authored Feb 15, 2010

This ensures that if/when we reopen the connection, we can requeue work on
the connection immediately, without waiting for an old timer to expire.
Queue new delayed work inside con->mutex to avoid any race.

This fixes problems with clients failing to reconnect to the MDS due to
the client_reconnect message arriving too late (due to waiting for an old
delayed work timeout to expire).
Signed-off-by: Sage Weil <sage@newdream.net>

91e45ce3

ceph: allow connection to be reopened by fault callback · e2663ab6

Sage Weil authored Feb 16, 2010

Fix the messenger to allow a ceph_con_open() during the fault callback.
Previously the work wasn't getting queued on the connection because the
fault path avoids requeued work (normally spurious).  Loop on reopening by
checking for the OPENING state bit.

This fixes OSD reconnects when a TCP connection drops.
Signed-off-by: Sage Weil <sage@newdream.net>

e2663ab6

15 Feb, 2010 1 commit

ceph: reset osd connections after fault · 153a008b

Sage Weil authored Feb 15, 2010

A single osd connection fault (e.g. tcp disconnect) wasn't
reopening the connection, which causes all current and future
requests for that osd to hang.
Signed-off-by: Sage Weil <sage@newdream.net>

153a008b

14 Feb, 2010 1 commit

ceph: fix msgr to keep sent messages until acked · 6c5d1a49

Sage Weil authored Feb 13, 2010

The test was backwards from commit b3d1dbbd: keep the message if the
connection _isn't_ lossy.  This allows the client to continue when the
TCP connection drops for some reason (network glitch) but both ends
survive.
Signed-off-by: Sage Weil <sage@newdream.net>

6c5d1a49

11 Feb, 2010 14 commits

ceph: remove bogus invalidate_mapping_pages · 80310491

Sage Weil authored Feb 09, 2010

We were invalidating mapping pages when dropping FILE_CACHE in
__send_cap().  But ceph_check_caps attempts to invalidate already, and
also checks for success, so we should never get to this point.
Signed-off-by: Sage Weil <sage@newdream.net>

80310491

ceph: invalidate pages even if truncate is pending · 0840d8af

Sage Weil authored Feb 09, 2010

There is no reason not to invalidate pages when a truncate is pending.
Both throw out page cache pages.
Signed-off-by: Sage Weil <sage@newdream.net>

0840d8af

ceph: cleanup async writeback, truncation, invalidate helpers · 3c6f6b79

Sage Weil authored Feb 09, 2010

Grab inode ref in helper.  Make work functions static, with consistent
naming.
Signed-off-by: Sage Weil <sage@newdream.net>

3c6f6b79

ceph: fix sync read eof check deadlock · 6a026589

Sage Weil authored Feb 09, 2010

If a sync read gets a short result from the OSD, it may need to do a
getattr to see if it is short due to reaching end-of-file. The getattr
was being done while holding a reference to FILE_RD, which can lead to
a deadlock if the MDS is revoking that capability bit and can't process
the getattr until it does.

We fix this by setting a flag if EOF size validation is needed, and doing
the getattr in ceph_aio_read, after the RD cap ref is dropped. If the
read needs to be continued, we loop and continue traversing the file.
Signed-off-by: Sage Weil <sage@newdream.net>

6a026589

ceph: do not retain caps that are being revoked · 68c28323

Sage Weil authored Feb 09, 2010

Never retain caps in __send_cap() that are being revoked.
Signed-off-by: Sage Weil <sage@newdream.net>

68c28323

ceph: cap revocation fixes · cbd03635

Sage Weil authored Feb 09, 2010

Try to invalidate pages in ceph_check_caps() if FILE_CACHE is being
revoked.  If we fail, queue an immediate async invalidate if FILE_CACHE
is being revoked.  (If it's not being revoked, we just queue the caps
for later evaluation later, as per the old behavior.)
Signed-off-by: Sage Weil <sage@newdream.net>

cbd03635

ceph: sync read/write considers page cache · 29065a51

Yehuda Sadeh authored Feb 09, 2010

In the cases where we either do a sync read or a write, we
need to make sure that everything in the page cache is flushed.
In the case of a sync write we invalidate the relevant pages,
so that subsequent read/write reflects the new data written.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

29065a51

ceph: fix truncation when not holding caps · 3d497d85

Yehuda Sadeh authored Feb 09, 2010

A truncation should occur when either we have the
specified caps for the file, or (in cases where we are
not the only ones referencing the file) when it is mapped
or when it is opened. The latter two cases were not
handled.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

3d497d85

ceph: refactor ceph_write_begin, fix ceph_page_mkwrite · 4af6b225

Yehuda Sadeh authored Feb 09, 2010

Originally ceph_page_mkwrite called ceph_write_begin, hoping that
the returned locked page would be the page that it was requested
to mkwrite. Factored out relevant part of ceph_page_mkwrite and
we lock the right page anyway.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

4af6b225

ceph: fix short synchronous reads · 972f0d3a

Yehuda Sadeh authored Feb 04, 2010

Zeroing of holes was not done correctly: page_off was miscalculated and
zeroing the tail didn't not adjust the 'read' value to include the zeroed
portion.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

972f0d3a

ceph: add uid field to ceph_pg_pool · 02f90c61

Sage Weil authored Feb 04, 2010

Also verify encoding version as we go.
Signed-off-by: Sage Weil <sage@newdream.net>

02f90c61

ceph: put unused osd connections on lru · f5a2041b

Yehuda Sadeh authored Feb 03, 2010

Instead of removing osd connection immediately when the
requests list is empty, put the osd connection on an lru.
Only if that osd has not been used for more than a specified
time, will it be removed.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

f5a2041b

ceph: remove unused variable · b056c876

Yehuda Sadeh authored Feb 03, 2010

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

b056c876

ceph: add support for auth_x authentication protocol · ec0994e4

Sage Weil authored Feb 02, 2010

The auth_x protocol implements support for a kerberos-like mutual
authentication infrastructure used by Ceph.  We do not simply use vanilla
kerberos because of scalability and performance issues when dealing with
a large cluster of nodes providing a single logical service.

Auth_x provides mutual authentication of client and server and protects
against replay and man in the middle attacks.  It does not encrypt
the full session over the wire, however, so data payload may still be
snooped.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

ec0994e4

10 Feb, 2010 4 commits

ceph: add struct version to auth encoding · 07c8739c

Sage Weil authored Feb 04, 2010

Inlucde struct version in encoding. This will streamline future protocol
changes.
Signed-off-by: Sage Weil <sage@newdream.net>

07c8739c

ceph: allow renewal of auth credentials · 9bd2e6f8

Sage Weil authored Feb 02, 2010

Add infrastructure to allow the mon_client to periodically renew its auth
credentials.  Also add a messenger callback that will force such a renewal
if a peer rejects our authenticator.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

9bd2e6f8

ceph: aes crypto and base64 encode/decode helpers · 8b6e4f2d

Sage Weil authored Feb 02, 2010

Helpers to encrypt/decrypt AES and base64.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

8b6e4f2d

ceph: buffer decoding helpers · c7e337d6

Sage Weil authored Feb 02, 2010

Helper for decoding into a ceph_buffer, and other misc decoding helpers
we will need.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

c7e337d6

03 Feb, 2010 2 commits

ceph: release all pages after successful osd write response · 79788c69

Sage Weil authored Feb 02, 2010

We release all the pages, even if the osd response was
different than the number of pages written. This could only
happen due to truncation that arrives the osd in
different order, for which we want the pages released anyway.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

79788c69

ceph: always send truncation info with read and write osd ops · 0c948992

Yehuda Sadeh authored Feb 01, 2010

This fixes a bug where the read/write ops arrive the osd after
a following truncation request.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

0c948992

29 Jan, 2010 2 commits

ceph: remove unreachable code · 0f26c4b2

Yehuda Sadeh authored Jan 29, 2010

We never truncate to a smaller size without contacting the MDS.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

0f26c4b2

ceph: include type in ceph_entity_addr, filepath · ac8839d7

Sage Weil authored Jan 27, 2010

Include a type/version in ceph_entity_addr and filepath.  Include extra
byte in filepath encoding as necessary.
Signed-off-by: Sage Weil <sage@newdream.net>

ac8839d7

26 Jan, 2010 1 commit
- ceph: precede encoded ceph_pg_pool struct with version · 361be860
  Sage Weil authored Jan 25, 2010
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
  361be860
25 Jan, 2010 7 commits

ceph: keep reserved replies on the request structure · 0d59ab81

Yehuda Sadeh authored Jan 13, 2010

This includes treating all the data preallocation and revokation
at the same place, not having to have a special case for
the reserved pages.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>

0d59ab81

ceph: alloc message data pages and check if tid exists · 0547a9b3

Yehuda Sadeh authored Jan 11, 2010

Now doing it in the same callback that is also responsible for
allocating the 'front' part of the message. If we get a message
that we haven't got a corresponding tid for, mark it for skipping.

Moving the mutex unlock/lock from the osd alloc_msg callback
to the calling function in the messenger.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>

0547a9b3

ceph: refactor messages data section allocation · 9d7f0f13
Yehuda Sadeh authored Jan 11, 2010
```
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
```
9d7f0f13

ceph: allocate middle of message before stating to read · 2450418c

Yehuda Sadeh authored Jan 08, 2010

Both front and middle parts of the message are now being
allocated at the ceph_alloc_msg().
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>

2450418c

ceph: properly handle aborted mds requests · 5b1daecd

Sage Weil authored Jan 25, 2010

Previously, if the MDS request was interrupted, we would unregister the
request and ignore any reply. This could cause the caps or other cache
state to become out of sync. (For instance, aborting dbench and doing
rm -r on clients would complain about a non-empty directory because the
client didn't realize it's aborted file create request completed.)

Even we don't unregister, we still can't process the reply normally because
we are no longer holding the caller's locks (like the dir i_mutex).

So, mark aborted operations with r_aborted, and in the reply handler, be
sure to process all the caps. Do not process the namespace changes,
though, since we no longer will hold the dir i_mutex. The dentry lease
state can also be ignored as it's more forgiving.
Signed-off-by: Sage Weil <sage@newdream.net>

5b1daecd

ceph: mark MDS CREATE as a write op · 3ea25f94

Sage Weil authored Jan 25, 2010

CEPH_MDS_OP_CREATE was not correctly marked as a write operation.
Signed-off-by: Sage Weil <sage@newdream.net>

3ea25f94

ceph: remove duplicate variable initialization · ec7384ec

Julia Lawall authored Jan 20, 2010

The variable client is initialized twice to the same (side effect-free)
expression.  Drop one initialization.

A simplified version of the semantic match that finds this problem is:
(http://coccinelle.lip6.fr/)

// <smpl>
@forall@
idexpression *x;
identifier f!=ERR_PTR;
@@

x = f(...)
... when != x
(
x = f(...,<+...x...+>,...)
|
* x = f(...)
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Sage Weil <sage@newdream.net>

ec7384ec

14 Jan, 2010 3 commits

ceph: display pgid in debugfs osd request dump · 7740a42f
Sage Weil authored Jan 08, 2010
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
7740a42f

ceph: remove unused erank field · 103e2d3a

Sage Weil authored Jan 07, 2010

The ceph_entity_addr erank field is obsolete; remove it.  Get rid of
trivial addr comparison helpers while we're at it.
Signed-off-by: Sage Weil <sage@newdream.net>

103e2d3a

ceph: change dentry offset and position after splice_dentry · 4baa75ef

Yehuda Sadeh authored Jan 07, 2010

This fixes a bug, where we had the parent list have dentries with
offsets that are not monotonically increasing, which caused the ceph
dcache_readdir to skip entries.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

4baa75ef

07 Jan, 2010 1 commit

ceph: fix copy_user_to_page_vector() · 6a4ef481

Yehuda Sadeh authored Dec 31, 2009

The function was broken in the case where there was more than one page
involved, broke the ceph sync_write case.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

6a4ef481