• Jens Axboe's avatar
    Add io_uring IO interface · 2b188cc1
    Jens Axboe authored
    The submission queue (SQ) and completion queue (CQ) rings are shared
    between the application and the kernel. This eliminates the need to
    copy data back and forth to submit and complete IO.
    
    IO submissions use the io_uring_sqe data structure, and completions
    are generated in the form of io_uring_cqe data structures. The SQ
    ring is an index into the io_uring_sqe array, which makes it possible
    to submit a batch of IOs without them being contiguous in the ring.
    The CQ ring is always contiguous, as completion events are inherently
    unordered, and hence any io_uring_cqe entry can point back to an
    arbitrary submission.
    
    Two new system calls are added for this:
    
    io_uring_setup(entries, params)
    	Sets up an io_uring instance for doing async IO. On success,
    	returns a file descriptor that the application can mmap to
    	gain access to the SQ ring, CQ ring, and io_uring_sqes.
    
    io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
    	Initiates IO against the rings mapped to this fd, or waits for
    	them to complete, or both. The behavior is controlled by the
    	parameters passed in. If 'to_submit' is non-zero, then we'll
    	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
    	kernel will wait for 'min_complete' events, if they aren't
    	already available. It's valid to set IORING_ENTER_GETEVENTS
    	and 'min_complete' == 0 at the same time, this allows the
    	kernel to return already completed events without waiting
    	for them. This is useful only for polling, as for IRQ
    	driven IO, the application can just check the CQ ring
    	without entering the kernel.
    
    With this setup, it's possible to do async IO with a single system
    call. Future developments will enable polled IO with this interface,
    and polled submission as well. The latter will enable an application
    to do IO without doing ANY system calls at all.
    
    For IRQ driven IO, an application only needs to enter the kernel for
    completions if it wants to wait for them to occur.
    
    Each io_uring is backed by a workqueue, to support buffered async IO
    as well. We will only punt to an async context if the command would
    need to wait for IO on the device side. Any data that can be accessed
    directly in the page cache is done inline. This avoids the slowness
    issue of usual threadpools, since cached data is accessed as quickly
    as a sync interface.
    
    Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.cReviewed-by: default avatarHannes Reinecke <hare@suse.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    2b188cc1
syscall_64.tbl 15.4 KB