Commits · df889b363198d946c0286b3fb2cfcca18d08a029 · Kirill Smelkov / linux

02 May, 2016 30 commits

Merge branch 'for-linus' into work.lookups · df889b36
Al Viro authored May 02, 2016

df889b36

lookup_open(): expand the call of vfs_create() · ce8644fc

Al Viro authored Apr 26, 2016

Lift IS_DEADDIR handling up into the part common with atomic_open(),
remove it from the latter.  Collapse permission checks into the
call of may_o_create(), getting it closer to atomic_open() case.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ce8644fc

path_openat(): take O_PATH handling out of do_last() · 6ac08709

Al Viro authored Apr 26, 2016

do_last() and lookup_open() simpler that way and so does O_PATH
itself.  As it bloody well should: we find what the pathname
resolves to, same way as in stat() et.al. and associate it with
FMODE_PATH struct file.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6ac08709

simple local filesystems: switch to ->iterate_shared() · 3b0a3c1a

Al Viro authored Apr 20, 2016

no changes needed (XFS isn't simple, but it has the same parallelism
in the interesting parts exercised from CXFS).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3b0a3c1a

dcache_{readdir,dir_lseek}() users: switch to ->iterate_shared · 4e82901c

Al Viro authored Apr 20, 2016

no need to lock directory in dcache_dir_lseek(), while we are
at it - per-struct file exclusion is enough.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4e82901c

cifs: switch to ->iterate_shared() · 3125d265
Al Viro authored Apr 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
3125d265

fuse: switch to ->iterate_shared() · d9b3dbdc

Al Viro authored Apr 20, 2016

Switch dcache pre-seeding on readdir to d_alloc_parallel();
nothing else is needed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d9b3dbdc

switch all procfs directories ->iterate_shared() · f50752ea
Al Viro authored Apr 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
f50752ea
proc_sys_fill_cache(): switch to d_alloc_parallel() · 76aab3ab
Al Viro authored Apr 20, 2016
```
make it usable with directory locked shared
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
76aab3ab

proc_fill_cache(): switch to d_alloc_parallel() · 3781764b

Al Viro authored Apr 20, 2016

... making it usable with directory locked shared
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3781764b

introduce a parallel variant of ->iterate() · 61922694

Al Viro authored Apr 20, 2016

New method: ->iterate_shared().  Same arguments as in ->iterate(),
called with the directory locked only shared.  Once all filesystems
switch, the old one will be gone.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

61922694

give readdir(2)/getdents(2)/etc. uniform exclusion with lseek() · 63b6df14

Al Viro authored Apr 20, 2016

same as read() on regular files has, and for the same reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

63b6df14

parallel lookups: actual switch to rwsem · 9902af79

Al Viro authored Apr 15, 2016

ta-da!

The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.

lockdep side also might need more work
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9902af79

parallel lookups machinery, part 4 (and last) · d9171b93

Al Viro authored Apr 15, 2016

If we *do* run into an in-lookup match, we need to wait for it to
cease being in-lookup.  Fortunately, we do have unused space in
in-lookup dentries - d_lru is never looked at until it stops being
in-lookup.

So we can stash a pointer to wait_queue_head from stack frame of
the caller of ->lookup().  Some precautions are needed while
waiting, but it's not that hard - we do hold a reference to dentry
we are waiting for, so it can't go away.  If it's found to be
in-lookup the wait_queue_head is still alive and will remain so
at least while ->d_lock is held.  Moreover, the condition we
are waiting for becomes true at the same point where everything
on that wq gets woken up, so we can just add ourselves to the
queue once.

d_alloc_parallel() gets a pointer to wait_queue_head_t from its
caller; lookup_slow() adjusted, d_add_ci() taught to use
d_alloc_parallel() if the dentry passed to it happens to be
in-lookup one (i.e. if it's been called from the parallel lookup).

That's pretty much it - all that remains is to switch ->i_mutex
to rwsem and have lookup_slow() take it shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d9171b93

parallel lookups machinery, part 3 · 94bdd655

Al Viro authored Apr 15, 2016

We will need to be able to check if there is an in-lookup
dentry with matching parent/name.  Right now it's impossible,
but as soon as start locking directories shared such beasts
will appear.

Add a secondary hash for locating those.  Hash chains go through
the same space where d_alias will be once it's not in-lookup anymore.
Search is done under the same bitlock we use for modifications -
with the primary hash we can rely on d_rehash() into the wrong
chain being the worst that could happen, but here the pointers are
buggered once it's removed from the chain.  On the other hand,
the chains are not going to be long and normally we'll end up
adding to the chain anyway.  That allows us to avoid bothering with
->d_lock when doing the comparisons - everything is stable until
removed from chain.

New helper: d_alloc_parallel().  Right now it allocates, verifies
that no hashed and in-lookup matches exist and adds to in-lookup
hash.

Returns ERR_PTR() for error, hashed match (in the unlikely case it's
been found) or new dentry.  In-lookup matches trigger BUG() for
now; that will change in the next commit when we introduce waiting
for ongoing lookup to finish.  Note that in-lookup matches won't be
possible until we actually go for shared locking.

lookup_slow() switched to use of d_alloc_parallel().

Again, these commits are separated only for making it easier to
review.  All this machinery will start doing something useful only
when we go for shared locking; it's just that the combination is
too large for my taste.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

94bdd655

parallel lookups machinery, part 2 · 84e710da

Al Viro authored Apr 15, 2016

We'll need to verify that there's neither a hashed nor in-lookup
dentry with desired parent/name before adding to in-lookup set.

One possible solution would be to hold the parent's ->d_lock through
both checks, but while the in-lookup set is relatively small at any
time, dcache is not.  And holding the parent's ->d_lock through
something like __d_lookup_rcu() would suck too badly.

So we leave the parent's ->d_lock alone, which means that we watch
out for the following scenario:
	* we verify that there's no hashed match
	* existing in-lookup match gets hashed by another process
	* we verify that there's no in-lookup matches and decide
that everything's fine.

Solution: per-directory kinda-sorta seqlock, bumped around the times
we hash something that used to be in-lookup or move (and hash)
something in place of in-lookup.  Then the above would turn into
	* read the counter
	* do dcache lookup
	* if no matches found, check for in-lookup matches
	* if there had been none of those either, check if the
counter has changed; repeat if it has.

The "kinda-sorta" part is due to the fact that we don't have much spare
space in inode.  There is a spare word (shared with i_bdev/i_cdev/i_pipe),
so the counter part is not a problem, but spinlock is a different story.

We could use the parent's ->d_lock, and it would be less painful in
terms of contention, for __d_add() it would be rather inconvenient to
grab; we could do that (using lock_parent()), but...

Fortunately, we can get serialization on the counter itself, and it
might be a good idea in general; we can use cmpxchg() in a loop to
get from even to odd and smp_store_release() from odd to even.

This commit adds the counter and updating logics; the readers will be
added in the next commit.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

84e710da

beginning of transition to parallel lookups - marking in-lookup dentries · 85c7f810

Al Viro authored Apr 14, 2016

marked as such when (would be) parallel lookup is about to pass them
to actual ->lookup(); unmarked when
	* __d_add() is about to make it hashed, positive or not.
	* __d_move() (from d_splice_alias(), directly or via
__d_unalias()) puts a preexisting dentry in its place
	* in caller of ->lookup() if it has escaped all of the
above.  Bug (WARN_ON, actually) if it reaches the final dput()
or d_instantiate() while still marked such.

As the result, we are guaranteed that for as long as the flag is
set, dentry will
	* remain negative unhashed with positive refcount
	* never have its ->d_alias looked at
	* never have its ->d_lru looked at
	* never have its ->d_parent and ->d_name changed

Right now we have at most one such for any given parent directory.
With parallel lookups that restriction will weaken to
	* only exist when parent is locked shared
	* at most one with given (parent,name) pair (comparison of
names is according to ->d_compare())
	* only exist when there's no hashed dentry with the same
(parent,name)

Transition will take the next several commits; unfortunately, we'll
only be able to switch to rwsem at the end of this series.  The
reason for not making it a single patch is to simplify review.

New primitives: d_in_lookup() (a predicate checking if dentry is in
the in-lookup state) and d_lookup_done() (tells the system that
we are done with lookup and if it's still marked as in-lookup, it
should cease to be such).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

85c7f810

__d_add(): don't drop/regain ->d_lock · 0568d705
Al Viro authored Apr 14, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
0568d705
lookup_slow(): bugger off on IS_DEADDIR() from the very beginning · 1936386e
Al Viro authored Apr 14, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
1936386e

nfs: missing wakeup in nfs_unblock_sillyrename() · d2caaa0a

Al Viro authored Apr 30, 2016

will be needed as soon as lookups are not serialized by ->i_mutex
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d2caaa0a

make ext2_get_page() and friends work without external serialization · be5b82db

Al Viro authored Apr 22, 2016

Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that.  Switch to a slightly different scheme,
_not_ setting Checked in case of failure.  That way the logics becomes
	if Checked => OK
	else if Error => fail
	else if !validate => fail
	else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened.  Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

be5b82db

ovl_lookup_real(): use lookup_one_len_unlocked() · b9e1d435
Al Viro authored Apr 14, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
b9e1d435

reconnect_one(): use lookup_one_len_unlocked() · 383d4e8a

Al Viro authored Apr 14, 2016

... and explain the non-obvious logics in case when lookup yields
a different dentry.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

383d4e8a

reiserfs: open-code reiserfs_mutex_lock_safe() in reiserfs_unpack() · 1ae1f3f6
Al Viro authored Apr 15, 2016
```
... and have it use inode_lock()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
1ae1f3f6
orangefs: don't open-code inode_lock/inode_unlock · 5ecfcb26
Al Viro authored Apr 12, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
5ecfcb26
ocfs2: don't open-code inode_lock/inode_unlock · 7b9743eb
Al Viro authored Apr 12, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
7b9743eb

configfs_detach_prep(): make sure that wait_mutex won't go away · 48f35b7b

Al Viro authored Apr 12, 2016

grab a reference to dentry we'd got the sucker from, and return
that dentry via *wait, rather than just returning the address of
->i_mutex.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

48f35b7b

kernfs: use lookup_one_len_unlocked() · 779b8391
Al Viro authored Apr 11, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
779b8391
security_d_instantiate(): move to the point prior to attaching dentry to inode · b9680917
Al Viro authored Apr 11, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
b9680917
Merge getxattr prototype change into work.lookups · 84695ffe
Al Viro authored May 02, 2016
```
The rest of work.xattr stuff isn't needed for this branch
```
84695ffe

30 Apr, 2016 1 commit

atomic_open(): fix the handling of create_error · 10c64cea

Al Viro authored Apr 27, 2016

* if we have a hashed negative dentry and either CREAT|EXCL on
r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
with failing may_o_create(), we should fail with EROFS or the
error may_o_create() has returned, but not ENOENT.  Which is what
the current code ends up returning.

* if we have CREAT|TRUNC hitting a regular file on a read-only
filesystem, we can't fail with EROFS here.  At the very least,
not until we'd done follow_managed() - we might have a writable
file (or a device, for that matter) bound on top of that one.
Moreover, the code downstream will see that O_TRUNC and attempt
to grab the write access (*after* following possible mount), so
if we really should fail with EROFS, it will happen.  No need
to do that inside atomic_open().

The real logics is much simpler than what the current code is
trying to do - if we decided to go for simple lookup, ended
up with a negative dentry *and* had create_error set, fail with
create_error.  No matter whether we'd got that negative dentry
from lookup_real() or had found it in dcache.

Cc: stable@vger.kernel.org # v3.6+
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

10c64cea

11 Apr, 2016 6 commits

->getxattr(): pass dentry and inode as separate arguments · ce23e640
Al Viro authored Apr 11, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
ce23e640
Linux 4.6-rc3 · bf162006
Linus Torvalds authored Apr 10, 2016

bf162006
xattr_handler: pass dentry and inode as separate arguments of ->get() · b296821a
Al Viro authored Apr 10, 2016
```
... and do not assume they are already attached to each other
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
b296821a

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · 08b15d13

Linus Torvalds authored Apr 10, 2016

Pull ARM fixes from Russell King:
 "A couple of small fixes, and wiring up the new syscalls which appeared
  during the merge window"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8550/1: protect idiv patching against undefined gcc behavior
  ARM: wire up preadv2 and pwritev2 syscalls
  ARM: SMP enable of cache maintanence broadcast

08b15d13

Merge tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc · 2f422f94

Linus Torvalds authored Apr 10, 2016

Pull MMC fixes from Ulf Hansson:
 "Here are a couple of mmc fixes intended for v4.6 rc3:

  MMC host:
   - sdhci: Fix regression setting power on Trats2 board
   - sdhci-pci: Add support and PCI IDs for more Broxton host controllers"

* tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: sdhci-pci: Add support and PCI IDs for more Broxton host controllers
  mmc: sdhci: Fix regression setting power on Trats2 board

2f422f94

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 6a7c9243

Linus Torvalds authored Apr 10, 2016

Pull i2c fixes from Wolfram Sang:
 "Some bugfixes from I2C:

   - fix a uevent triggered boot problem by removing a useless debug
     print

   - fix sysfs-attributes of the new i2c-demux-pinctrl driver to follow
     standard kernel behaviour

   - fix a potential division-by-zero error (needed two takes)"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: jz4780: really prevent potential division by zero
  Revert "i2c: jz4780: prevent potential division by zero"
  i2c: jz4780: prevent potential division by zero
  i2c: mux: demux-pinctrl: Update docs to new sysfs-attributes
  i2c: mux: demux-pinctrl: Clean up sysfs attributes
  i2c: prevent endless uevent loop with CONFIG_I2C_DEBUG_CORE

6a7c9243

10 Apr, 2016 3 commits

Revert "ext4: allow readdir()'s of large empty directories to be interrupted" · 9f2394c9

Linus Torvalds authored Apr 10, 2016

This reverts commit 1028b55b.

It's broken: it makes ext4 return an error at an invalid point, causing
the readdir wrappers to write the the position of the last successful
directory entry into the position field, which means that the next
readdir will now return that last successful entry _again_.

You can only return fatal errors (that terminate the readdir directory
walk) from within the filesystem readdir functions, the "normal" errors
(that happen when the readdir buffer fills up, for example) happen in
the iterorator where we know the position of the actual failing entry.

I do have a very different patch that does the "signal_pending()"
handling inside the iterator function where it is allowable, but while
that one passes all the sanity checks, I screwed up something like four
times while emailing it out, so I'm not going to commit it today.

So my track record is not good enough, and the stars will have to align
better before that one gets committed.  And it would be good to get some
review too, of course, since celestial alignments are always an iffy
debugging model.

IOW, let's just revert the commit that caused the problem for now.
Reported-by: Greg Thelen <gthelen@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9f2394c9

reiserfs: switch to generic_{get,set,remove}xattr() · 79a628d1

Al Viro authored Apr 10, 2016

reiserfs_xattr_[sg]et() will fail with -EOPNOTSUPP for V1 inodes anyway,
and all reiserfs instances of ->[sg]et() call it and so does ->set_acl().

Checks for name length in the instances had been bogus; they should've
been "bugger off if it's _exactly_ the prefix" (as generic would
do on its own) and not "bugger off if it's shorter than the prefix" -
that can't happen.

xattr_full_name() is needed to adjust for the fact that generic instances
will skip the prefix in the name passed to ->[gs]et(); reiserfs homegrown
analogues didn't.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

79a628d1

cifs: kill more bogus checks in ->...xattr() methods · 5fdccfef

Al Viro authored Apr 10, 2016

none of that stuff can ever be called for NULL or negative
dentry.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

5fdccfef