Commit 677eb17e authored by Chuck Lever's avatar Chuck Lever Committed by Anna Schumaker

xprtrdma: Fix XDR tail buffer marshalling

Currently xprtrdma appends an extra chunk element to the RPC/RDMA
read chunk list of each NFSv4 WRITE compound. The extra element
contains the final GETATTR operation in the compound.

The result is an extra RDMA READ operation to transfer a very short
piece of each NFS WRITE compound (typically 16 bytes). This is
inefficient.

It is also incorrect.

The client is sending the trailing GETATTR at the same Position as
the preceding WRITE data payload. Whether or not RFC 5667 allows
the GETATTR to appear in a read chunk, RFC 5666 requires that these
two separate RPC arguments appear at two distinct Positions.

It can also be argued that the GETATTR operation is not bulk data,
and therefore RFC 5667 forbids its appearance in a read chunk at
all.

Although RFC 5667 is not precise about when using a read list with
NFSv4 COMPOUND is allowed, the intent is that only data arguments
not touched by NFS (ie, read and write payloads) are to be sent
using RDMA READ or WRITE.

The NFS client constructs GETATTR arguments itself, and therefore is
required to send the trailing GETATTR operation as additional inline
content, not as a data payload.

NB: This change is not backwards compatible. Some older servers do
not accept inline content following the read list. The Linux NFS
server should handle this content correctly as of commit
a97c331f ("svcrdma: Handle additional inline content").
Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
Tested-by: default avatarDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
parent 33943b29
...@@ -96,6 +96,42 @@ static bool rpcrdma_results_inline(struct rpc_rqst *rqst) ...@@ -96,6 +96,42 @@ static bool rpcrdma_results_inline(struct rpc_rqst *rqst)
return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst); return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst);
} }
static int
rpcrdma_tail_pullup(struct xdr_buf *buf)
{
size_t tlen = buf->tail[0].iov_len;
size_t skip = tlen & 3;
/* Do not include the tail if it is only an XDR pad */
if (tlen < 4)
return 0;
/* xdr_write_pages() adds a pad at the beginning of the tail
* if the content in "buf->pages" is unaligned. Force the
* tail's actual content to land at the next XDR position
* after the head instead.
*/
if (skip) {
unsigned char *src, *dst;
unsigned int count;
src = buf->tail[0].iov_base;
dst = buf->head[0].iov_base;
dst += buf->head[0].iov_len;
src += skip;
tlen -= skip;
dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n",
__func__, skip, dst, src, tlen);
for (count = tlen; count; count--)
*dst++ = *src++;
}
return tlen;
}
/* /*
* Chunk assembly from upper layer xdr_buf. * Chunk assembly from upper layer xdr_buf.
* *
...@@ -147,6 +183,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos, ...@@ -147,6 +183,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
if (len && n == nsegs) if (len && n == nsegs)
return -EIO; return -EIO;
/* When encoding the read list, the tail is always sent inline */
if (type == rpcrdma_readch)
return n;
if (xdrbuf->tail[0].iov_len) { if (xdrbuf->tail[0].iov_len) {
/* the rpcrdma protocol allows us to omit any trailing /* the rpcrdma protocol allows us to omit any trailing
* xdr pad bytes, saving the server an RDMA operation. */ * xdr pad bytes, saving the server an RDMA operation. */
...@@ -476,8 +516,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst) ...@@ -476,8 +516,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
headerp->rm_body.rm_nochunks.rm_empty[2] = xdr_zero; headerp->rm_body.rm_nochunks.rm_empty[2] = xdr_zero;
/* new length after pullup */ /* new length after pullup */
rpclen = rqst->rq_svec[0].iov_len; rpclen = rqst->rq_svec[0].iov_len;
} } else if (rtype == rpcrdma_readch)
rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf);
if (rtype != rpcrdma_noch) { if (rtype != rpcrdma_noch) {
hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf, hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf,
headerp, rtype); headerp, rtype);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment