- 14 Dec, 2020 40 commits
-
-
Ilya Dryomov authored
A couple whitespace fixups, no functional changes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
A pure move, no other changes. Note that ceph_tcp_recv{msg,page}() and ceph_tcp_send{msg,page}() helpers are also moved. msgr2 will bring its own, more efficient, variants based on iov_iter. Switching msgr1 to them was considered but decided against to avoid subtle regressions. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
In preparation for msgr2, define internal messenger <-> protocol interface (as opposed to external messenger <-> client interface, which is struct ceph_connection_operations) consisting of try_read(), try_write(), revoke(), revoke_incoming(), opened(), reset_session() and reset_protocol() ops. The semantics are exactly the same as they are now. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
In preparation for msgr2, make all protocol independent functions in messenger.c global. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
In preparation for msgr2, make zero_page global. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
In preparation for msgr2, move the defines to the header file. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
In preparation for msgr2, rename msgr1 specific states and move the defines to the header file. Also drop state transition comments. They don't cover all possible transitions (e.g. NEGOTIATING -> STANDBY, etc) and currently do more harm than good. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
unsigned long is a leftover from when con->state used to be a set of bits managed with set_bit(), clear_bit(), etc. Save a bit of memory. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Our messenger instance addr->port is normally zero -- anything else is nonsensical because as a client we connect to multiple servers and don't listen on any port. However, a user can supply an arbitrary addr:port via ip option and the port is currently preserved. Zero it. Conversely, make sure our addr->nonce is non-zero. A zero nonce is special: in combination with a zero port, it is used to blocklist the entire ip. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Move the logic of grabbing the next message from the queue into its own function. Like ceph_con_in_msg_alloc(), this is protocol independent. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
ceph_con_in_msg_alloc() is protocol independent, but con->in_hdr (and struct ceph_msg_header in general) is msgr1 specific. While the struct is deeply ingrained inside and outside the messenger, con->in_hdr field can be separated. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Make it possible to have local cursors and embed them outside struct ceph_msg. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Make it easier to follow and remove dependency on msgr1 specific CEPH_MSGR_TAG_SEQ. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
It is set in process_ack() but never used. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Stick with pr_info message because session reset isn't an error most of the time. When it is (i.e. if the server denies the reconnect attempt), we get a bunch of other pr_err messages. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
con->peer_global_seq is part of session state. Clear it when the server tells us to reset, not just in ceph_con_close(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
With just session reset bits left, rename appropriately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Move protocol reset bits into ceph_con_reset_protocol(), leaving just session reset bits. Note that con->out_skip is now reset on faults. This fixes a crash in the case of a stateful session getting a fault while in the middle of revoking a message. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
A fault due to a version mismatch or a feature set mismatch used to be treated differently from other faults: the connection would get closed without trying to reconnect and there was a ->bad_proto() connection op for notifying about that. This changed a long time ago, see commits 6384bb8b ("libceph: kill bad_proto ceph connection op") and 0fa6ebc6 ("libceph: fix protocol feature mismatch failure path"). Nowadays these aren't any different from other faults (i.e. we try to reconnect even though the mismatch won't resolve until the server is replaced). reset_connection() calls there are rather confusing because reset_connection() resets a session together an individual instance of the protocol. This is cleaned up in the next patch. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
The current setting allows the backoff to climb up to 5 minutes. This is too high -- it becomes hard to tell whether the client is stuck on something or just in backoff. In userspace, ms_max_backoff is defaulted to 15 seconds. Let's do the same. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
When we added the btime feature in mainline ceph, we had to extend struct ceph_mds_request_args so that it could be set. Implement the same in the kernel client. Rename ceph_mds_request_head with a _old extension, and a union ceph_mds_request_args_ext to allow for the extended size of the new header format. Add the appropriate code to handle both formats in struct create_request_message and key the behavior on whether the peer supports CEPH_FEATURE_FS_BTIME. The gid_list field in the payload is now populated from the saved credential. For now, we don't add any support for setting the btime via setattr, but this does enable us to add that in the future. [ idryomov: break unnecessarily long lines ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
We can always get the mdsc from the session, so there's no need to pass it in as a separate argument. Pass the session to __prepare_send_request as well, to prepare for later patches that will need to access it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Replace req->r_uid/r_gid with an r_cred pointer and take a reference to that at the point where we previously would sample the two. Use that to populate the uid and gid in the header and release the reference when the request is freed. This should enable us to later add support for sending supplementary group lists in MDS requests. [ idryomov: break unnecessarily long lines ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
We already have a pointer to the argument struct in req->r_args. Use that instead of groveling around in the ceph_mds_request_head. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
When setting the file/dir layout, it may need data pool info. So in mds server, it needs to check the osdmap. At present, if mds doesn't find the data pool specified, it will try to get the latest osdmap. Now if pass the osd epoch for setxattr, the mds server can only check this epoch of osdmap. URL: https://tracker.ceph.com/issues/48504Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Colin Ian King authored
The variable i is being initialized with a value that is never read and it is being updated later with a new value in a for-loop. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Luis Henriques authored
Add a new vxattr that allows userspace to list the caps for a specific directory or file. [ jlayton: change format delimiter to '/' ] Signed-off-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Geng Jichao reported a rather complex deadlock involving several moving parts: 1) readahead is issued against an inode and some of its pages are locked while the read is in flight 2) the same inode is evicted from the cache, and this task gets stuck waiting for the page lock because of the above readahead 3) another task is processing a reply trace, and looks up the inode being evicted while holding the s_mutex. That ends up waiting for the eviction to complete 4) a write reply for an unrelated inode is then processed in the ceph_con_workfn job. It calls ceph_check_caps after putting wrbuffer caps, and that gets stuck waiting on the s_mutex held by 3. The reply to "1" is stuck behind the write reply in "4", so we deadlock at that point. This patch changes the trace processing to call ceph_get_inode outside of the s_mutex and snap_rwsem, which should break the cycle above. [ idryomov: break unnecessarily long lines ] URL: https://tracker.ceph.com/issues/47998Reported-by: Geng Jichao <gengjichao@jd.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Luis Henriques authored
This reverts commit dffdcd71. When doing a rename across quota realms, there's a corner case that isn't handled correctly. Here's a testcase: mkdir files limit truncate files/file -s 10G setfattr limit -n ceph.quota.max_bytes -v 1000000 mv files limit/ The above will succeed because ftruncate(2) won't immediately notify the MDSs with the new file size, and thus the quota realms stats won't be updated. Since the possible fixes for this issue would have a huge performance impact, the solution for now is to simply revert to returning -EXDEV when doing a cross quota realms rename. URL: https://tracker.ceph.com/issues/48203Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Luis Henriques authored
While the MDS cluster is unstable and changing state the client may get mdsmap updates that will trigger warnings: [144692.478400] ceph: mdsmap_decode got incorrect state(up:standby-replay) [144697.489552] ceph: mdsmap_decode got incorrect state(up:standby-replay) [144697.489580] ceph: mdsmap_decode got incorrect state(up:standby-replay) This patch downgrades these warnings to debug, as they may flood the logs if the cluster is unstable for a while. Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Luis Henriques authored
A NULL pointer dereference may occur in __ceph_remove_cap with some of the callbacks used in ceph_iterate_session_caps, namely trim_caps_cb and remove_session_caps_cb. Those callers hold the session->s_mutex, so they are prevented from concurrent execution, but ceph_evict_inode does not. Since the callers of this function hold the i_ceph_lock, the fix is simply a matter of returning immediately if caps->ci is NULL. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/43272Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
write_begin operations are passed a flags parameter that we need to mirror here, so that we don't (e.g.) recurse back into filesystem code inappropriately. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
These two vxattrs will only exist in local client side, with which we can easily know which mountpoint the file belongs to and also they can help locate the debugfs path quickly. URL: https://tracker.ceph.com/issues/48057Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
This will help list some useful client side info, like the client entity address/name and blocklisted status, etc. URL: https://tracker.ceph.com/issues/48057Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Liu, Changcheng authored
1. monitor's default port is defined by CEPH_MON_PORT 2. CEPH_PORT_START and CEPH_PORT_LAST are not needed. Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
The link count for a directory is defined as inode->i_subdirs + 2, (for "." and ".."). i_subdirs is only populated when Fs caps are held. Ensure we grab Fs caps when fetching the link count for a directory. [ idryomov: break unnecessarily long line ] URL: https://tracker.ceph.com/issues/48125Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
For the old ceph version, if it received this one metric message containing the dentry lease metric info, it will just ignore it. URL: https://tracker.ceph.com/issues/43423Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-