- 01 Apr, 2019 1 commit
-
-
Ilya Dryomov authored
BugLink: https://bugs.launchpad.net/bugs/1822271 commit 0fd3fd0a upstream. The authorize reply can be empty, for example when the ticket used to build the authorizer is too old and TAG_BADAUTHORIZER is returned from the service. Calling ->verify_authorizer_reply() results in an attempt to decrypt and validate (somewhat) random data in au->buf (most likely the signature block from calc_signature()), which fails and ends up in con_fault_finish() with !con->auth_retry. The ticket isn't invalidated and the connection is retried again and again until a new ticket is obtained from the monitor: libceph: osd2 192.168.122.1:6809 bad authorize reply libceph: osd2 192.168.122.1:6809 bad authorize reply libceph: osd2 192.168.122.1:6809 bad authorize reply libceph: osd2 192.168.122.1:6809 bad authorize reply Let TAG_BADAUTHORIZER handler kick in and increment con->auth_retry. Cc: stable@vger.kernel.org Fixes: 5c056fdc ("libceph...
-
- 14 Mar, 2019 1 commit
-
-
Ilya Dryomov authored
BugLink: https://bugs.launchpad.net/bugs/1818813 commit 4aac9228 upstream. con_fault() can transition the connection into STANDBY right after ceph_con_keepalive() clears STANDBY in clear_standby(): libceph user thread ceph-msgr worker ceph_con_keepalive() mutex_lock(&con->mutex) clear_standby(con) mutex_unlock(&con->mutex) mutex_lock(&con->mutex) con_fault() ... if KEEPALIVE_PENDING isn't set set state to STANDBY ... mutex_unlock(&con->mutex) set KEEPALIVE_PENDING set WRITE_PENDING This triggers warnings in clear_standby() when either ceph_con_send() or ceph_con_keepalive() get to clearing STANDBY next time. I don't see a reason to condition queue_con() call on the previous value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING could have been a non-atomic flag. Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Tested-by:
Myungho Jung <mhjungk@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Juerg Haefliger <juergh@canonical.com> Signed-off-by:
Khalid Elmously <khalid.elmously@canonical.com>
-
- 23 May, 2018 1 commit
-
-
Ilya Dryomov authored
BugLink: http://bugs.launchpad.net/bugs/1768825 commit 9c55ad1c upstream. ceph_con_workfn() validates con->state before calling try_read() and then try_write(). However, try_read() temporarily releases con->mutex, notably in process_message() and ceph_con_in_msg_alloc(), opening the window for ceph_con_close() to sneak in, close the connection and release con->sock. When try_write() is called on the assumption that con->state is still valid (i.e. not STANDBY or CLOSED), a NULL sock gets passed to the networking stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: selinux_socket_sendmsg+0x5/0x20 Make sure con->state is valid at the top of try_write() and add an explicit BUG_ON for this, similar to try_read(). Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/23706 Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Jason Dillaman <dillaman@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Juerg Haefliger <juergh@canonical.com> Signed-off-by:
Kleber Sacilotto de Souza <kleber.souza@canonical.com>
-
- 27 Apr, 2017 1 commit
-
-
Ilya Dryomov authored
BugLink: http://bugs.launchpad.net/bugs/1681862 commit 633ee407 upstream. sock_alloc_inode() allocates socket+inode and socket_wq with GFP_KERNEL, which is not allowed on the writeback path: Workqueue: ceph-msgr con_work [libceph] ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000 0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00 ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148 Call Trace: [<ffffffff816dd629>] schedule+0x29/0x70 [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200 [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120 [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70 [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180 [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390 [<ffffffff81086335>] flush_work+0x165/0x250 [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0 [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs] [<ffffffff816d6b42>] ? __slab_free+0xee/0x234 [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs] [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30 [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs] [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs] [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40 [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs] [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs] [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs] [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs] [<ffffffff811c0c18>] super_cache_scan+0x178/0x180 [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340 [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450 [<ffffffff8115af70>] shrink_slab+0x100/0x140 [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490 [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0 [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40 [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110 [<ffffffff811a0ac5>] new_slab+0x2c5/0x390 [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459 [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0 [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0 [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0 [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0 [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0 [<ffffffff811d8566>] alloc_inode+0x26/0xa0 [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70 [<ffffffff815b933e>] sock_alloc+0x1e/0x80 [<ffffffff815ba855>] __sock_create+0x95/0x220 [<ffffffff815baa04>] sock_create_kern+0x24/0x30 [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph] [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd] [<ffffffff81084c19>] process_one_work+0x159/0x4f0 [<ffffffff8108561b>] worker_thread+0x11b/0x530 [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0 [<ffffffff8108b6f9>] kthread+0xc9/0xe0 [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90 [<ffffffff816e1b98>] ret_from_fork+0x58/0x90 [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90 Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here. Link: http://tracker.ceph.com/issues/19309 Reported-by:
Sergey Jerusalimov <wintchester@gmail.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Tim Gardner <tim.gardner@canonical.com> Signed-off-by:
Stefan Bader <stefan.bader@canonical.com>
-
- 20 Jan, 2017 1 commit
-
-
Ilya Dryomov authored
BugLink: http://bugs.launchpad.net/bugs/1655041 commit 5c056fdc upstream. After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b), the client gets back a ceph_x_authorize_reply, which it is supposed to verify to ensure the authenticity and protect against replay attacks. The code for doing this is there (ceph_x_verify_authorizer_reply(), ceph_auth_verify_authorizer_reply() + plumbing), but it is never invoked by the the messenger. AFAICT this goes back to 2009, when ceph authentication protocols support was added to the kernel client in 4e7a5dcd ("ceph: negotiate authentication protocol; implement AUTH_NONE protocol"). The second param of ceph_connection_operations::verify_authorizer_reply is unused all the way down. Pass 0 to facilitate backporting, and kill it in the next commit. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Sage Weil <sage@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Tim Gardner <tim.gardner@canonical.com> Signed-off-by:
Luis Henriques <luis.henriques@canonical.com>
-
- 06 Apr, 2016 3 commits
-
-
Ilya Dryomov authored
BugLink: http://bugs.launchpad.net/bugs/1553179 commit dbc0d3ca upstream. ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13. Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Alex Elder <elder@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Tim Gardner <tim.gardner@canonical.com>
-
Ilya Dryomov authored
BugLink: http://bugs.launchpad.net/bugs/1553179 commit e7a88e82 upstream. The contract between try_read() and try_write() is that when called each processes as much data as possible. When instructed by osd_client to skip a message, try_read() is violating this contract by returning after receiving and discarding a single message instead of checking for more. try_write() then gets a chance to write out more requests, generating more replies/skips for try_read() to handle, forcing the messenger into a starvation loop. Reported-by:
Varada Kari <Varada.Kari@sandisk.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Tested-by:
Varada Kari <Varada.Kari@sandisk.com> Reviewed-by:
Alex Elder <elder@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Tim Gardner <tim.gardner@canonical.com>
-
Ilya Dryomov authored
BugLink: http://bugs.launchpad.net/bugs/1553179 commit 67645d76 upstream. There are a number of problems with revoking a "was sending" message: (1) We never make any attempt to revoke data - only kvecs contibute to con->out_skip. However, once the header (envelope) is written to the socket, our peer learns data_len and sets itself to expect at least data_len bytes to follow front or front+middle. If ceph_msg_revoke() is called while the messenger is sending message's data portion, anything we send after that call is counted by the OSD towards the now revoked message's data portion. The effects vary, the most common one is the eventual hang - higher layers get stuck waiting for the reply to the message that was sent out after ceph_msg_revoke() returned and treated by the OSD as a bunch of data bytes. This is what Matt ran into. (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs is wrong. If ceph_msg_revoke() is called before the tag is sent out or while the messenger is sending the header, we will get a connection reset, either due to a bad tag (0 is not a valid tag) or a bad header CRC, which kind of defeats the purpose of revoke. Currently the kernel client refuses to work with header CRCs disabled, but that will likely change in the future, making this even worse. (3) con->out_skip is not reset on connection reset, leading to one or more spurious connection resets if we happen to get a real one between con->out_skip is set in ceph_msg_revoke() and before it's cleared in write_partial_skip(). Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never zero the tag or the header, i.e. send out tag+header regardless of when ceph_msg_revoke() is called. That way the header is always correct, no unnecessary resets are induced and revoke stands ready for disabled CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new "message out temp" and copy the header into it before sending. Reported-by:
Matt Conner <matt.conner@keepertech.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Tested-by:
Matt Conner <matt.conner@keepertech.com> Reviewed-by:
Sage Weil <sage@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Tim Gardner <tim.gardner@canonical.com>
-
- 02 Nov, 2015 5 commits
-
-
Ilya Dryomov authored
The following bit in ceph_msg_revoke_incoming() is unsafe: struct ceph_connection *con = msg->con; if (!con) return; mutex_lock(&con->mutex); <more msg->con use> There is nothing preventing con from getting destroyed right after msg->con test. One easy way to reproduce this is to disable message signing only on the server side and try to map an image. The system will go into a libceph: read_partial_message ffff880073f0ab68 signature check failed libceph: osd0 192.168.255.155:6801 bad crc/signature libceph: read_partial_message ffff880073f0ab68 signature check failed libceph: osd0 192.168.255.155:6801 bad crc/signature loop which has to be interrupted with Ctrl-C. Hit Ctrl-C and you are likely to end up with a random GP fault if the reset handler executes "within" ceph_msg_revoke_incoming(): <yet another reply w/o a signature> ... <Ctrl-C> rbd_obj_request_end ceph_osdc_cancel_request __unregister_request ceph_osdc_put_request ceph_msg_revoke_incoming ... osd_reset __kick_osd_requests __reset_osd remove_osd ceph_con_close reset_connection <clear con->in_msg->con> <put con ref> put_osd <free osd/con> <msg->con use> <-- !!! If ceph_msg_revoke_incoming() executes "before" the reset handler, osd/con will be leaked because ceph_msg_revoke_incoming() clears con->in_msg but doesn't put con ref, while reset_connection() only puts con ref if con->in_msg != NULL. The current msg->con scheme was introduced by commits 38941f80 ("libceph: have messages point to their connection") and 92ce034b ("libceph: have messages take a connection reference"), which defined when messages get associated with a connection and when that association goes away. Part of the problem is that this association is supposed to go away in much too many places; closing this race entirely requires either a rework of the existing or an addition of a new layer of synchronization. In lieu of that, we can make it *much* less likely to hit by disassociating messages only on their destruction and resend through a different connection. This makes the code simpler and is probably a good thing to do regardless - this patch adds a msg_con_set() helper which is is called from only three places: ceph_con_send() and ceph_con_in_msg_alloc() to set msg->con and ceph_msg_release() to clear it. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Support for message signing was merged into 3.19, along with nocephx_require_signatures option. But, all that option does is allow the kernel client to talk to clusters that don't support MSG_AUTH feature bit. That's pretty useless, given that it's been supported since bobtail. Meanwhile, if one disables message signing on the server side with "cephx sign messages = false", it becomes impossible to use the kernel client since it expects messages to be signed if MSG_AUTH was negotiated. Add nocephx_sign_messages option to support this use case. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
supported_features and required_features serve no purpose at all, while nocrc and tcp_nodelay belong to ceph_options::flags. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
We can use msg->con instead - at the point we sign an outgoing message or check the signature on the incoming one, msg->con is always set. We wouldn't know how to sign a message without an associated session (i.e. msg->con == NULL) and being able to sign a message using an explicitly provided authorizer is of no use. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Shraddha Barke authored
Use local variable cursor in place of &msg->cursor in read_partial_msg_data() and write_partial_msg_data(). Signed-off-by:
Shraddha Barke <shraddha.6596@gmail.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 17 Sep, 2015 1 commit
-
-
Ilya Dryomov authored
This struct ceph_timespec ceph_ts; ... con_out_kvec_add(con, sizeof(ceph_ts), &ceph_ts); wraps ceph_ts into a kvec and adds it to con->out_kvec array, yet ceph_ts becomes invalid on return from prepare_write_keepalive(). As a result, we send out bogus keepalive2 stamps. Fix this by encoding into a ceph_timespec member, similar to how acks are read and written. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Yan, Zheng <zyan@redhat.com>
-
- 09 Sep, 2015 1 commit
-
-
Ilya Dryomov authored
Only ->alloc_msg() should check data_len of the incoming message against the preallocated ceph_msg, doing it in the messenger is not right. The contract is that either ->alloc_msg() returns a ceph_msg which will fit all of the portions of the incoming message, or it returns NULL and possibly sets skip, signaling whether NULL is due to an -ENOMEM. ->alloc_msg() should be the only place where we make the skip/no-skip decision. I stumbled upon this while looking at con/osd ref counting. Right now, if we get a non-extent message with a larger data portion than we are prepared for, ->alloc_msg() returns a ceph_msg, and then, when we skip it in the messenger, we don't put the con/osd ref acquired in ceph_con_in_msg_alloc() (which is normally put in process_message()), so this also fixes a memory leak. An existing BUG_ON in ceph_msg_data_cursor_init() ensures we don't corrupt random memory should a buggy ->alloc_msg() return an unfit ceph_msg. While at it, I changed the "unknown tid" dout() to a pr_warn() to make sure all skips are seen and unified format strings. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Alex Elder <elder@linaro.org>
-
- 08 Sep, 2015 3 commits
-
-
Yan, Zheng authored
Signed-off-by:
Yan, Zheng <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Ilya Dryomov authored
Even though it's static, con_work(), being a work func, shows up in various stacktraces a lot. Prefix it with ceph_. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Benoît Canet authored
ceph_msgr_slab_init may fail due to a temporary ENOMEM. Delay a bit the initialization of zero_page in ceph_msgr_init and reorder its cleanup in _ceph_msgr_exit so it's done in reverse order of setup. BUG_ON() will not suffer to be postponed in case it is triggered. Signed-off-by:
Benoît Canet <benoit.canet@nodalink.com> Reviewed-by:
Alex Elder <elder@linaro.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 09 Jul, 2015 2 commits
-
-
Ilya Dryomov authored
addr_is_blank() should return true if family is neither AF_INET nor AF_INET6. This is what its counterpart entity_addr_t::is_blank_ip() is doing and it is the right thing to do: in process_banner() we check if our address is blank and if it is "learn" it from our peer. As it is, we never learn our address and always send out a blank one. This goes way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and some ipv6 support groundwork") from 2009. While at at, do not open-code ipv6_addr_any() and use INADDR_ANY constant instead of 0. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Sage Weil <sage@redhat.com>
-
Ilya Dryomov authored
Grab a reference on a network namespace of the 'rbd map' (in case of rbd) or 'mount' (in case of ceph) process and use that to open sockets instead of always using init_net and bailing if network namespace is anything but init_net. Be careful to not share struct ceph_client instances between different namespaces and don't add any code in the !CONFIG_NET_NS case. This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Sage Weil <sage@redhat.com>
-
- 29 Jun, 2015 1 commit
-
-
Benoît Canet authored
From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h: bool last_piece; /* current is last piece */ In ceph_msg_data_next(): *last_piece = cursor->last_piece; A call to ceph_msg_data_next() is followed by: ret = ceph_tcp_sendpage(con->sock, page, page_offset, length, last_piece); while ceph_tcp_sendpage() is: static int ceph_tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, bool more) The logic is inverted: correct it. Signed-off-by:
Benoît Canet <benoit.canet@nodalink.com> Reviewed-by:
Alex Elder <elder@linaro.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 25 Jun, 2015 1 commit
-
-
Benoît Canet authored
ceph_tcp_sendpage already does the work of mapping/unmapping the zero page if needed. Signed-off-by:
Benoît Canet <benoit.canet@nodalink.com> Reviewed-by:
Alex Elder <elder@linaro.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 11 May, 2015 1 commit
-
-
Eric W. Biederman authored
This is long overdue, and is part of cleaning up how we allocate kernel sockets that don't reference count struct net. Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 20 Apr, 2015 1 commit
-
-
Ilya Dryomov authored
- specific con->error_msg messages (e.g. "protocol version mismatch") end up getting overwritten by a catch-all "socket error on read / write", introduced in commit 3a140a0d ("libceph: report socket read/write error message") - "bad message sequence # for incoming message" loses to "bad crc" due to the fact that -EBADMSG is used for both Fix it, and tidy up con->error_msg assignments and pr_errs while at it. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 07 Apr, 2015 1 commit
-
-
Ilya Dryomov authored
This reverts commit 89baaa57. Dirty page throttling should be sufficient for us in the general case so there is no need to use __GFP_MEMALLOC - it would be needed only in the swap-over-rbd case, which we currently don't support. (It would probably take approximately the commit that is being reverted to add that support, but we would also need the "swap" option to distinguish from the general case and make sure swap ceph_client-s aren't shared with anything else.) See ceph-devel threads [1] and [2] for the details of why enabling pfmemalloc reserves for all cases is a bad thing. On top of potential system lockups related to drained emergency reserves, this turned out to cause ceph lockups in case peers are on the same host and communicating via loopback due to sk_filter() dropping pfmemalloc skbs on the receiving side because the receiving loopback socket is not tagged with SOCK_MEMALLOC. [1] "SOCK_MEMALLOC vs loopback" ...
-
- 19 Feb, 2015 1 commit
-
-
Chaitanya Huilgol authored
TCP_NODELAY socket option set on connection sockets, disables Nagle’s algorithm and improves latency characteristics. tcp_nodelay(default)/notcp_nodelay option flags provided to enable/disable setting the socket option. Signed-off-by:
Chaitanya Huilgol <chaitanya.huilgol@sandisk.com> [idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments] Signed-off-by:
Ilya Dryomov <idryomov@redhat.com>
-
- 17 Dec, 2014 2 commits
-
-
Yan, Zheng authored
Signed-off-by:
Yan, Zheng <zyan@redhat.com>
-
Ilya Dryomov authored
Use kvfree() from linux/mm.h instead, which is identical. Also fix the ceph_buffer comment: we will allocate with kmalloc() up to 32k - the value of PAGE_ALLOC_COSTLY_ORDER, but that really is just an implementation detail so don't mention it at all. Signed-off-by:
Ilya Dryomov <idryomov@redhat.com>
-
- 30 Oct, 2014 1 commit
-
-
Mike Christie authored
This patch has ceph's lib code use the memalloc flags. If the VM layer needs to write data out to free up memory to handle new allocation requests, the block layer must be able to make forward progress. To handle that requirement we use structs like mempools to reserve memory for objects like bios and requests. The problem is when we send/receive block layer requests over the network layer, net skb allocations can fail and the system can lock up. To solve this, the memalloc related flags were added. NBD, iSCSI and NFS uses these flags to tell the network/vm layer that it should use memory reserves to fullfill allcation requests for structs like skbs. I am running ceph in a bunch of VMs in my laptop, so this patch was not tested very harshly. Signed-off-by:
Mike Christie <michaelc@cs.wisc.edu> Reviewed-by:
Ilya Dryomov <idryomov@redhat.com>
-
- 14 Oct, 2014 3 commits
-
-
Ilya Dryomov authored
Commit f363e45f ("net/ceph: make ceph_msgr_wq non-reentrant") effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is wrong - libceph is very much a memory reclaim path, so restore it. Cc: stable@vger.kernel.org # needs backporting for < 3.12 Signed-off-by:
Ilya Dryomov <idryomov@redhat.com> Tested-by:
Micha Krause <micha@krausam.de> Reviewed-by:
Sage Weil <sage@redhat.com>
-
Yan, Zheng authored
this allow pagelist to present data that may be sent multiple times. Signed-off-by:
Yan, Zheng <zyan@redhat.com> Reviewed-by:
Sage Weil <sage@redhat.com>
-
Joe Perches authored
Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by:
Joe Perches <joe@perches.com> Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com>
-
- 09 Aug, 2014 1 commit
-
-
Ilya Dryomov authored
Determining ->last_piece based on the value of ->page_offset + length is incorrect because length here is the length of the entire message. ->last_piece set to false even if page array data item length is <= PAGE_SIZE, which results in invalid length passed to ceph_tcp_{send,recv}page() and causes various asserts to fire. # cat pages-cursor-init.sh #!/bin/bash rbd create --size 10 --image-format 2 foo FOO_DEV=$(rbd map foo) dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null rbd snap create foo@snap rbd snap protect foo@snap rbd clone foo@snap bar # rbd_resize calls librbd rbd_resize(), size is in bytes ./rbd_resize bar $(((4 << 20) + 512)) rbd resize --size 10 bar BAR_DEV=$(rbd map bar) # trigger a 512-byte copyup -- 512-byte page array data item dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5 The problem exists only in ceph_msg_data_pages_cursor_init(), ceph_msg_data_pages_advance() does the right thing. The size_t cast is unnecessary. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@redhat.com> Reviewed-by:
Alex Elder <elder@linaro.org>
-
- 08 Jul, 2014 2 commits
-
-
Ilya Dryomov authored
queue_con() bumps osd ref count. We should do the reverse when canceling con work. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
Add dout()s to ceph_msg_{get,put}(). Also move them to .c and turn kref release callback into a static function. Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Alex Elder <elder@linaro.org>
-
- 16 May, 2014 1 commit
-
-
Chunwei Chen authored
It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: https://github.com/zfsonlinux/spl/issues/241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <sage@inktank.com> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: stable@vger.kernel.org Signed-off-by:
Chunwei Chen <tuxoko@gmail.com> Reviewed-by:
Ilya Dryomov <ilya.dryomov@inktank.com>
-
- 11 Apr, 2014 1 commit
-
-
David S. Miller authored
Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 05 Apr, 2014 1 commit
-
-
Yan, Zheng authored
When there is no more data, ceph_msg_data_{pages,pagelist}_advance() should not move on to the next page. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com>
-
- 07 Feb, 2014 1 commit
-
-
Ilya Dryomov authored
Commit f38a5181 ("ceph: Convert to immutable biovecs") introduced a NULL pointer dereference, which broke rbd in -rc1. Fix it. Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- 26 Jan, 2014 1 commit
-
-
Ilya Dryomov authored
Encapsulate kmalloc vs vmalloc memory allocation and freeing logic into two helpers, ceph_kvmalloc() and ceph_kvfree(), and switch to them. ceph_kvmalloc() kmalloc()'s a maximum of 8 pages, anything bigger is vmalloc()'ed with __GFP_HIGHMEM set. This changes the existing behaviour: - for buffers (ceph_buffer_new()), from trying to kmalloc() everything and using vmalloc() just as a fallback - for messages (ceph_msg_new()), from going to vmalloc() for anything bigger than a page - for messages (ceph_msg_new()), from disallowing vmalloc() to use high memory Signed-off-by:
Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-