- 22 Mar, 2016 1 commit
-
-
J. Bruce Fields authored
You could add any multiple of 2^32/PNFS_SCSI_RANGE_SIZE to nr_iomaps and still pass this check. You'd probably still fail the following kcalloc, but best to be paranoid since this is from-the-wire data. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
- 18 Mar, 2016 5 commits
-
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Christoph Hellwig authored
This is a simple extension to the block layout driver to use SCSI persistent reservations for access control and fencing, as well as SCSI VPD pages for device identification. For this we need to pass the nfs4_client to the proc_getdeviceinfo method to generate the reservation key, and add a new fence_client method to allow for fence actions in the layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Christoph Hellwig authored
Trivial reorganization, no change in behavior. Move some code around, pull some code out of block layoutcommit that will be useful for the scsi layout. [bfields@redhat.com: split off from "nfsd: add SCSI layout support"] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Christoph Hellwig authored
Split the config symbols into a generic pNFS one, which is invisible and gets selected by the layout drivers, and one for the block layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Christoph Hellwig authored
This is a trivial extension to the block layout driver to support the new SCSI layouts draft. There are three changes: - device identifcation through the SCSI VPD page. This allows us to directly use the udev generated persistent device names instead of requiring an expensive lookup by crawling every block device node in /dev and reading a signature for it. - use of SCSI persistent reservations to protect device access and allow for robust fencing. On the client sides this just means registering and unregistering a server supplied key. - an optimized LAYOUTCOMMIT payload that doesn't send unessecary fields to the server. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
- 17 Mar, 2016 3 commits
-
-
Christoph Hellwig authored
Based on draft-ietf-nfsv4-scsi-layout-05 after the WG last call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
NeilBrown authored
sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer set. In this case it aborts the queuing of the upcall. However it has already taken a new counted reference on "h" and doesn't "put" it, even though it frees the data structure holding the reference. So let's delay the "cache_get" until we know we need it. Fixes: f9e1aedc ("sunrpc/cache: remove races with queuing an upcall.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Sudip Mukherjee authored
nfsd4_cltrack_grace_start() will allocate the memory for grace_start but when we returned due to error we missed freeing it. Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
- 16 Mar, 2016 1 commit
-
-
J. Bruce Fields authored
nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also unlocks if necessary (nfsd filehandle locking is probably too lenient), so it gets unlocked eventually, but if the following op in the compound needs to lock it again, we can deadlock. A fuzzer ran into this; normal clients don't send a secinfo followed by a readdir in the same compound. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
- 02 Mar, 2016 1 commit
-
-
J. Bruce Fields authored
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
- 01 Mar, 2016 16 commits
-
-
Chuck Lever authored
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
Clean up: close_out is reached only when ctxt == NULL and XPT_CLOSE is already set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
RFC 5666 Section 4.2 states: > When the peer detects an RPC-over-RDMA header version that it does > not support (currently this document defines only version 1), it > replies with an error code of ERR_VERS, and provides the low and > high inclusive version numbers it does, in fact, support. And: > When other decoding errors are detected in the header or chunks, > either an RPC decode error MAY be returned or the RPC/RDMA error > code ERR_CHUNK MUST be returned. The Linux NFS server does throw ERR_VERS when a client sends it a request whose rdma_version is not "one." But it does not return ERR_CHUNK when a header decoding error occurs. It just drops the request. To improve protocol extensibility, it should reject invalid values in the rdma_proc field instead of treating them all like RDMA_MSG. Otherwise clients can't detect when the server doesn't support new rdma_proc values. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
Error headers are shorter than either RDMA_MSG or RDMA_NOMSG. Since HDRLEN_MIN is already used in several other places that would be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two spots where the shorter length is needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
The maximum size of a backchannel message on RPC-over-RDMA depends on the connection's inline threshold. Today that threshold is typically 1024 bytes, making the maximum message size 996 bytes. The Linux server's CREATE_SESSION operation checks that the size of callback Calls can be as large as 1044 bytes, to accommodate RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the true message size maximum of 996 bytes. But the server's backchannel currently does not support RPCSEC_GSS. The actual maximum size it needs is much smaller. It is safe to reduce the limit to enable NFSv4.1 on RDMA backchannel operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
When the Linux NFS server writes an odd-length data item into a Write chunk, it finishes with XDR pad bytes. If the data item is smaller than the Write chunk, the pad bytes are written at the end of the data item, but still inside the chunk (ie, in the application's buffer). Since this is direct data placement, that exposes the pad bytes. XDR pad bytes are inserted in order to preserve the XDR alignment of the next XDR data item in an XDR stream. But Write chunks do not appear in the payload XDR stream, and only one data item is allowed in each chunk. Thus XDR padding is not needed in a Write chunk. With NFSv4, the Linux NFS server places the results of any operations that follow an NFSv4 READ or READLINK in the xdr_buf's tail. Those results also should never be sent as a part of a Write chunk. The current logic in send_write_chunks() appears to assume that the xdr_buf's tail contains only pad bytes (ie, NFSv3). The server should write only the contents of the xdr_buf's page list in a Write chunk. If there's more than an XDR pad in the tail, that needs to go inline or in the Reply chunk. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
The client provides the location of Write chunks into which the server writes bulk payload. The client provides these when the Upper Layer Protocol wants direct data placement and the Binding allows it. (For NFS, this is READ and READLINK operations). The client also provides the location of a Reply chunk into which the server writes the non-bulk part of an RPC reply. The client provides this chunk whenever it believes the reply can be larger than its receive buffers. The server then uses the presence of these chunks to determine how it will form its reply message. svc_rdma_sendto() was looking for Write and Reply chunks multiple times for every reply message. It would be more efficient to do it just once. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
The server does indeed now support NFSv4.1 on RDMA transports. It does not support shifting an RDMA-capable TCP transport (such as iWARP) to RDMA mode. Reported-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Kinglong Mee authored
Remember free allocated client when meeting unsupported state protect how. Fixes: 50c7b948 ("nfsd: minor consolidation of mach_cred handling code") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
J. Bruce Fields authored
A number of spots in the xdr decoding follow a pattern like n = be32_to_cpup(p++); READ_BUF(n + 4); where n is a u32. The only bounds checking is done in READ_BUF itself, but since it's checking (n + 4), it won't catch cases where n is very large, (u32)(-4) or higher. I'm not sure exactly what the consequences are, but we've seen crashes soon after. Instead, just break these up into two READ_BUF()s. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
- 23 Feb, 2016 1 commit
-
-
Stefan Hajnoczi authored
The qword_get() function NUL-terminates its output buffer. If the input string is in hex format \xXXXX... and the same length as the output buffer, there is an off-by-one: int qword_get(char **bpp, char *dest, int bufsize) { ... while (len < bufsize) { ... *dest++ = (h << 4) | l; len++; } ... *dest = '\0'; return len; } This patch ensures the NUL terminator doesn't fall outside the output buffer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
- 14 Feb, 2016 12 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc driver fixes from Greg KH: "Here are 3 fixes for some reported issues. Two nvmem driver fixes, and one mei fix. All have been in linux-next just fine" * tag 'char-misc-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: nvmem: qfprom: Specify LE device endianness nvmem: core: return error for non word aligned access mei: validate request value in client notify request ioctl
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds authored
Pull driver core fix from Greg KH: "Here is one driver core, well klist, fix for 4.5-rc4. It fixes a problem found in the scsi device list traversal that probably also could be triggered by other subsystems. The fix has been in linux-next for a while with no reported problems" * tag 'driver-core-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: klist: fix starting point removed bug in klist iterators
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty/serial fixes from Greg KH: "Here are a number of small tty and serial driver fixes for 4.5-rc4 that resolve some reported issues. One of them got reverted as it wasn't correct based on testing, and all have been in linux-next for a while" * tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "8250: uniphier: allow modular build with 8250 console" pty: make sure super_block is still valid in final /dev/tty close pty: fix possible use after free of tty->driver_data tty: Add support for PCIe WCH382 2S multi-IO card serial/omap: mark wait_for_xmitr as __maybe_unused serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485) 8250: uniphier: allow modular build with 8250 console tty: Drop krefs for interrupted tty lock
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull PHY fixes from Greg KH: "Here are a couple of PHY driver fixes for 4.5-rc4. A few small phy issues. All have been in linux-next with no reported issues" * tag 'usb-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload phy: twl4030-usb: Relase usb phy on unload phy: core: fix wrong err handle for phy_power_on phy: Restrict phy-hi6220-usb to HiSilicon arm64
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf tooling fixes from Thomas Gleixner: "Another round of fixes for the perf tooling side: - Prevent a NULL pointer dereference in tracepoint error handling - Fix a thread handling bug in the intel_pt error handling code - Search both .eh_frame and .debug_frame sections as toolchains seem to have random choices of storing the CFI information - Fix the perf state interval output values, which got broken when fixing the overall output" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf stat: Fix interval output values perf probe: Search both .eh_frame and .debug_frame sections for probe location perf tools: Fix thread lifetime related segfaut in intel_pt perf tools: tracepoint_error() can receive e=NULL, robustify it
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull lockdep fix from Thomas Gleixner: "A single fix for the stack trace caching logic in lockdep, where the duplicate avoidance managed to store no back trace at all" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix stack trace caching logic
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fix from Thomas Gleixner: "A single fix preventing a 32bit overflow in timespec/val to cputime conversions on 32bit machines" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irqchip fixes from Thomas Gleixner: "Another set of ARM SoC related irqchip fixes: - Plug a memory leak in gicv3-its - Limit features to the root gic interrupt controller - Add a missing barrier in the gic-v3 IAR access - Another compile test fix for sun4i" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor irqchip/gic: Only set the EOImodeNS bit for the root controller irqchip/gic: Only populate set_affinity for the root controller irqchip/gicv3-its: Fix memory leak in its_free_tables() irqchip/sun4i: Fix compilation outside of arch/arm
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Thomas Gleixner: "Two small fixlets for x86: - Prevent a KASAN false positive in thread_saved_pc() - Fix a 32-bit truncation problem in the x86 numa code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels x86: Fix KASAN false positives in thread_saved_pc()
-
git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds authored
Pull MIPS fixes from Ralf Baechle: "Here's the first round of MIPS fixes after the merge window: - Detect Octeon III's PCI correctly. - Fix return value of the MT7620 probing function. - Wire up the copy_file_range syscall. - Fix 64k page support on 32 bit kernels. - Fix the early Coherency Manager probe. - Allow only hardware-supported page sizes to be selected for R6000. - Fix corner cases for the RDHWR nstruction emulation on old hardware. - Fix FPU handling corner cases. - Remove stale entry for BCM33xx from the MAINTAINERS file. - 32 and 64 bit ELF headers are different, handle them correctly" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: mips: Differentiate between 32 and 64 bit ELF header MIPS: Octeon: Update OCTEON_FEATURE_PCIE for Octeon III MIPS: pci-mt7620: Fix return value check in mt7620_pci_probe() MIPS: Fix early CM probing MIPS: Wire up copy_file_range syscall. MIPS: Fix 64k page support for 32 bit kernels. MIPS: R6000: Don't allow 64k pages for R6000. MIPS: traps.c: Correct microMIPS RDHWR emulation MIPS: traps.c: Don't emulate RDHWR in the CpU #0 exception handler MAINTAINERS: Remove stale entry for BCM33xx chips MIPS: Fix FPU disable with preemption MIPS: Properly disable FPU in start_thread() MIPS: Fix buffer overflow in syscall_get_arguments()
-
git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds authored
Pull ARM fixes from Russell King: "A couple of ARM fixes from Linus for the ICST clock generator code" [ "Linus" here is Linus Walleij. Name-stealer. Linus "there can be only one" Torvalds ] * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8519/1: ICST: try other dividends than 1 ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()
-