Commits · 3c96c7f4caeb044da53a85092903f9192f4e2342 · Kirill Smelkov / linux

30 May, 2018 1 commit

aio: take list removal to (some) callers of aio_complete() · 3c96c7f4

Al Viro authored May 28, 2018

We really want iocb out of io_cancel(2) reach before we start tearing
it down.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3c96c7f4

28 May, 2018 1 commit

aio: add missing break for the IOCB_CMD_FDSYNC case · ac060cba

Christoph Hellwig authored May 28, 2018

Looks like this got lost in a merge.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ac060cba

26 May, 2018 33 commits

random: convert to ->poll_mask · 89b310a2

Christoph Hellwig authored Apr 09, 2018

The big change is that random_read_wait and random_write_wait are merged
into a single waitqueue that uses keyed wakeups.  Because wait_event_*
doesn't know about that this will lead to occassional spurious wakeups
in _random_read and add_hwgenerator_randomness, but wait_event_* is
designed to handle these and were are not in a a hot path there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

89b310a2

timerfd: convert to ->poll_mask · 652fe8e8
Christoph Hellwig authored Mar 05, 2018
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
652fe8e8
eventfd: switch to ->poll_mask · 9e42f195
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
9e42f195
pipe: convert to ->poll_mask · dd67081b
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
dd67081b
crypto: af_alg: convert to ->poll_mask · b28fc822
Christoph Hellwig authored Jan 11, 2018
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
b28fc822
net/rxrpc: convert to ->poll_mask · 5001c2dc
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
5001c2dc
net/iucv: convert to ->poll_mask · f87be894
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
f87be894
net/phonet: convert to ->poll_mask · e7a98d47
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
e7a98d47
net/nfc: convert to ->poll_mask · 4bac2bcd
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
4bac2bcd
net/caif: convert to ->poll_mask · 9490e40a
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
9490e40a
net/bluetooth: convert to ->poll_mask · 17112d80
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
17112d80
net/sctp: convert to ->poll_mask · 568ea88e
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
568ea88e
net/tipc: convert to ->poll_mask · 4df7338f
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
4df7338f
net/vmw_vsock: convert to ->poll_mask · 31f50b55
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
31f50b55
net/atm: convert to ->poll_mask · 9f728af3
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
9f728af3
net/dccp: convert to ->poll_mask · f4335f52
Christoph Hellwig authored Dec 31, 2017
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
f4335f52

net: convert datagram_poll users tp ->poll_mask · db5051ea

Christoph Hellwig authored Apr 09, 2018

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

db5051ea

net/unix: convert to ->poll_mask · e76cd24d
Christoph Hellwig authored Apr 09, 2018
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
e76cd24d
net/tcp: convert to ->poll_mask · 2c7d3dac
Christoph Hellwig authored Apr 09, 2018
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
2c7d3dac

net: remove sock_no_poll · 984652dd

Christoph Hellwig authored Apr 09, 2018

Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need
for a stub.
Signed-off-by: Christoph Hellwig <hch@lst.de>

984652dd

net: add support for ->poll_mask in proto_ops · 15252423

Christoph Hellwig authored Apr 09, 2018

The socket file operations still implement ->poll until all protocols are
switched over.
Signed-off-by: Christoph Hellwig <hch@lst.de>

15252423

net: refactor socket_poll · 3cafb376

Christoph Hellwig authored Jan 09, 2018

Factor out two busy poll related helpers for late reuse, and remove
a command that isn't very helpful, especially with the __poll_t
annotations in place.
Signed-off-by: Christoph Hellwig <hch@lst.de>

3cafb376

aio: try to complete poll iocbs without context switch · 1962da0d

Christoph Hellwig authored May 20, 2018

If we can acquire ctx_lock without spinning we can just remove our
iocb from the active_reqs list, and thus complete the iocbs from the
wakeup context.
Signed-off-by: Christoph Hellwig <hch@lst.de>

1962da0d

aio: implement IOCB_CMD_POLL · 2c14fa83

Christoph Hellwig authored Mar 20, 2018

Simple one-shot poll through the io_submit() interface.  To poll for
a file descriptor the application should submit an iocb of type
IOCB_CMD_POLL.  It will poll the fd for the events specified in the
the first 32 bits of the aio_buf field of the iocb.

Unlike poll or epoll without EPOLLONESHOT this interface always works
in one shot mode, that is once the iocb is completed, it will have to be
resubmitted.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

2c14fa83

aio: simplify cancellation · 888933f8

Christoph Hellwig authored May 23, 2018

With the current aio code there is no need for the magic KIOCB_CANCELLED
value, as a cancelation just kicks the driver to queue the completion
ASAP, with all actual completion handling done in another thread. Given
that both the completion path and cancelation take the context lock there
is no need for magic cmpxchg loops either.  If we remove iocbs from the
active list after calling ->ki_cancel (but with ctx_lock still held), we
can also rely on the invariant thay anything found on the list has a
->ki_cancel callback and can be cancelled, further simplifing the code.
Signed-off-by: Christoph Hellwig <hch@lst.de>

888933f8

aio: simplify KIOCB_KEY handling · f3a2752a

Christoph Hellwig authored Mar 30, 2018

No need to pass the key field to lookup_iocb to compare it with KIOCB_KEY,
as we can do that right after retrieving it from userspace.  Also move the
KIOCB_KEY definition to aio.c as it is an internal value not used by any
other place in the kernel.
Signed-off-by: Christoph Hellwig <hch@lst.de>

f3a2752a

fs: introduce new ->get_poll_head and ->poll_mask methods · 3deb642f

Christoph Hellwig authored Jan 09, 2018

->get_poll_head returns the waitqueue that the poll operation is going
to sleep on.  Note that this means we can only use a single waitqueue
for the poll, unlike some current drivers that use two waitqueues for
different events.  But now that we have keyed wakeups and heavily use
those for poll there aren't that many good reason left to keep the
multiple waitqueues, and if there are any ->poll is still around, the
driver just won't support aio poll.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

3deb642f

fs: add new vfs_poll and file_can_poll helpers · 9965ed17

Christoph Hellwig authored Mar 05, 2018

These abstract out calls to the poll method in preparation for changes
in how we poll.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

9965ed17

fs: update documentation to mention __poll_t and match the code · 6e8b704d

Christoph Hellwig authored Jan 02, 2018

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6e8b704d

fs: cleanup do_pollfd · a0f8dcfc

Christoph Hellwig authored Mar 05, 2018

Use straightline code with failure handling gotos instead of a lot
of nested conditionals.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

a0f8dcfc

fs: unexport poll_schedule_timeout · 8f546ae1

Christoph Hellwig authored Jan 11, 2018

No users outside of select.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

8f546ae1

uapi: turn __poll_t sparse checks on by default · ee219b94
Christoph Hellwig authored May 23, 2018
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
```
ee219b94
Merge branch 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into aio-base · ed0d523a
Christoph Hellwig authored May 26, 2018

ed0d523a

24 May, 2018 1 commit

fix io_destroy()/aio_complete() race · 4faa9996

Al Viro authored May 23, 2018

If io_destroy() gets to cancelling everything that can be cancelled and
gets to kiocb_cancel() calling the function driver has left in ->ki_cancel,
it becomes vulnerable to a race with IO completion. At that point req
is already taken off the list and aio_complete() does *NOT* spin until
we (in free_ioctx_users()) releases ->ctx_lock. As the result, it proceeds
to kiocb_free(), freing req just it gets passed to ->ki_cancel().

Fix is simple - remove from the list after the call of kiocb_cancel(). All
instances of ->ki_cancel() already have to cope with the being called with
iocb still on list - that's what happens in io_cancel(2).

Cc: stable@kernel.org
Fixes: 0460fef2 "aio: use cancellation list lazily"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4faa9996

21 May, 2018 4 commits

aio: fix io_destroy(2) vs. lookup_ioctx() race · baf10564

Al Viro authored May 20, 2018

kill_ioctx() used to have an explicit RCU delay between removing the
reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
At some point that delay had been removed, on the theory that
percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was
the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
by lookup_ioctx(). As the result, we could get ctx freed right under
lookup_ioctx(). Tejun has fixed that in a6d7cff4 ("fs/aio: Add explicit
RCU grace period when freeing kioctx"); however, that fix is not enough.

Suppose io_destroy() from one thread races with e.g. io_setup() from another;
CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the
refcount, getting it to 0 and triggering a call of free_ioctx_users(),
which proceeds to drop the secondary refcount and once that reaches zero
calls free_ioctx_reqs(). That does
INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
queue_rcu_work(system_wq, &ctx->free_rwork);
and schedules freeing the whole thing after RCU delay.

In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
refcount from 0 to 1 and returned the reference to io_setup().

Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
freed until after percpu_ref_get(). Sure, we'd increment the counter before
ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to
stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it
has grabbed the reference, ctx is *NOT* going away until it gets around to
dropping that reference.

The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
It's not costlier than what we currently do in normal case, it's safe to
call since freeing *is* delayed and it closes the race window - either
lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
the object in question at all.

Cc: stable@kernel.org
Fixes: a6d7cff4 "fs/aio: Add explicit RCU grace period when freeing kioctx"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

baf10564

ext2: fix a block leak · 5aa1437d

Al Viro authored May 17, 2018

open file, unlink it, then use ioctl(2) to make it immutable or
append only.  Now close it and watch the blocks *not* freed...

Immutable/append-only checks belong in ->setattr().
Note: the bug is old and backport to anything prior to 737f2e93
("ext2: convert to use the new truncate convention") will need
these checks lifted into ext2_setattr().

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

5aa1437d

nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed · 3819bb0d

Al Viro authored May 11, 2018

That can (and does, on some filesystems) happen - ->mkdir() (and thus
vfs_mkdir()) can legitimately leave its argument negative and just
unhash it, counting upon the lookup to pick the object we'd created
next time we try to look at that name.

Some vfs_mkdir() callers forget about that possibility...
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3819bb0d

cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed · 9c3e9025

Al Viro authored May 10, 2018

That can (and does, on some filesystems) happen - ->mkdir() (and thus
vfs_mkdir()) can legitimately leave its argument negative and just
unhash it, counting upon the lookup to pick the object we'd created
next time we try to look at that name.

Some vfs_mkdir() callers forget about that possibility...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9c3e9025