Commits · 5e53084d7734a1e1a64346f8480c0bb66c218764 · Kirill Smelkov / linux

11 Dec, 2014 6 commits

path_init(): store the "base" pointer to file in nameidata itself · 5e53084d
Al Viro authored Nov 20, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
5e53084d

make default ->i_fop have ->open() fail with ENXIO · bd9b51e7

Al Viro authored Nov 18, 2014

As it is, default ->i_fop has NULL ->open() (along with all other methods).
The only case where it matters is reopening (via procfs symlink) a file that
didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned
to something sane (default would fail on read/write/ioctl/etc.).

	Unfortunately, such case exists - alloc_file() users, especially
anon_get_file() ones.  There we have tons of opened files of very different
kinds sharing the same inode.  As the result, attempt to reopen those via
procfs succeeds and you get a descriptor you can't do anything with.

	Moreover, in case of sockets we set ->i_fop that will only be used
on such reopen attempts - and put a failing ->open() into it to make sure
those do not succeed.

	It would be simpler to put such ->open() into default ->i_fop and leave
it unchanged both for anon inode (as we do anyway) and for socket ones.  Result:
	* everything going through do_dentry_open() works as it used to
	* sock_no_open() kludge is gone
	* attempts to reopen anon-inode files fail as they really ought to
	* ditto for aio_private_file()
	* ditto for perfmon - this one actually tried to imitate sock_no_open()
trick, but failed to set ->i_fop, so in the current tree reopens succeed and
yield completely useless descriptor.  Intent clearly had been to fail with
-ENXIO on such reopens; now it actually does.
	* everything else that used alloc_file() keeps working - it has ->i_fop
set for its inodes anyway
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

bd9b51e7

make nameidata completely opaque outside of fs/namei.c · 1f55a6ec
Al Viro authored Nov 01, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
1f55a6ec
Merge branch 'nsfs' into for-next · 707c5960
Al Viro authored Dec 10, 2014

707c5960

kill proc_ns completely · 3d3d35b1

Al Viro authored Nov 01, 2014

procfs inodes need only the ns_ops part; nsfs inodes don't need it at all
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3d3d35b1

take the targets of /proc/*/ns/* symlinks to separate fs · e149ed2b

Al Viro authored Nov 01, 2014

New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot. The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.

As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e149ed2b

09 Dec, 2014 5 commits
- Merge branch 'iov_iter' into for-next · ba00410b
  Al Viro authored Dec 08, 2014
  
  ba00410b
- copy_from_iter_nocache() · aa583096
  Al Viro authored Nov 27, 2014
```
BTW, do we want memcpy_nocache()?
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  aa583096
- new helper: iov_iter_kvec() · abb78f87
  Al Viro authored Nov 24, 2014
```
initialization of kvec-backed iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  abb78f87
- csum_and_copy_..._iter() · a604ec7e
  Al Viro authored Nov 24, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  a604ec7e
- iov_iter.c: handle ITER_KVEC directly · a280455f
  Al Viro authored Nov 27, 2014
```
... without bothering with copy_..._user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  a280455f
04 Dec, 2014 8 commits

bury struct proc_ns in fs/proc · f77c8014

Al Viro authored Nov 01, 2014

a) make get_proc_ns() return a pointer to struct ns_common
b) mirror ns_ops in dentry->d_fsdata of ns dentries, so that
is_mnt_ns_file() could get away with fewer dereferences.

That way struct proc_ns becomes invisible outside of fs/proc/*.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

f77c8014

copy address of proc_ns_ops into ns_common · 33c42940
Al Viro authored Nov 01, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
33c42940

new helpers: ns_alloc_inum/ns_free_inum · 6344c433

Al Viro authored Nov 01, 2014

take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6344c433

make proc_ns_operations work with struct ns_common * instead of void * · 64964528

Al Viro authored Nov 01, 2014

We can do that now.  And kill ->inum(), while we are at it - all instances
are identical.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

64964528

switch the rest of proc_ns_operations to working with &...->ns · 3c041184
Al Viro authored Nov 01, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
3c041184
netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns · ff24870f
Al Viro authored Nov 01, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
ff24870f
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns · 58be2825
Al Viro authored Nov 01, 2014
```
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
58be2825

common object embedded into various struct ....ns · 435d5f4b

Al Viro authored Oct 31, 2014

for now - just move corresponding ->proc_inum instances over there
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

435d5f4b

27 Nov, 2014 9 commits

iov_iter.c: convert copy_to_iter() to iterate_and_advance · 3d4d3e48
Al Viro authored Nov 27, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
3d4d3e48
iov_iter.c: convert copy_from_iter() to iterate_and_advance · 0dbca9a4
Al Viro authored Nov 27, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
0dbca9a4

iov_iter.c: get rid of bvec_copy_page_{to,from}_iter() · d271524a

Al Viro authored Nov 27, 2014

Just have copy_page_{to,from}_iter() fall back to kmap_atomic +
copy_{to,from}_iter() + kunmap_atomic() in ITER_BVEC case.  As
the matter of fact, that's what we want to do for any iov_iter
kind that isn't blocking - e.g. ITER_KVEC will also go that way
once we recognize it on iov_iter.c primitives level
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d271524a

iov_iter.c: convert iov_iter_zero() to iterate_and_advance · 8442fa46
Al Viro authored Nov 27, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
8442fa46
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds · 1b17f1f2
Al Viro authored Nov 27, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
1b17f1f2
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds · e5393fae
Al Viro authored Nov 27, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
e5393fae
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds · e0f2dc40
Al Viro authored Nov 27, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
e0f2dc40

iov_iter.c: iterate_and_advance · 7ce2a91e

Al Viro authored Nov 27, 2014

same as iterate_all_kinds, but iterator is moved to the position past
the last byte we'd handled.

iov_iter_advance() converted to it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7ce2a91e

iov_iter.c: macros for iterating over iov_iter · 04a31165

Al Viro authored Nov 27, 2014

iterate_all_kinds(iter, size, ident, step_iovec, step_bvec)
iterates through the ranges covered by iter (up to size bytes total),
repeating step_iovec or step_bvec for each of those.  ident is
declared in expansion of that thing, either as struct iovec or
struct bvec, and it contains the range we are currently looking
at.  step_bvec should be a void expression, step_iovec - a size_t
one, with non-zero meaning "stop here, that many bytes from this
range left".  In the end, the amount actually handled is stored
in size.

iov_iter_copy_from_user_atomic() and iov_iter_alignment() converted
to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

04a31165

23 Nov, 2014 11 commits

Linux 3.18-rc6 · 5d01410f
Linus Torvalds authored Nov 23, 2014

5d01410f

uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME · 82975bc6

Andy Lutomirski authored Nov 21, 2014

x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns.  I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

82975bc6

sched: Provide update_curr callbacks for stop/idle scheduling classes · 90e362f4

Thomas Gleixner authored Nov 23, 2014

Chris bisected a NULL pointer deference in task_sched_runtime() to
commit 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency'.

Chris observed crashes in atop or other /proc walking programs when he
started fork bombs on his machine.  He assumed that this is a new exit
race, but that does not make any sense when looking at that commit.

What's interesting is that, the commit provides update_curr callbacks
for all scheduling classes except stop_task and idle_task.

While nothing can ever hit that via the clock_nanosleep() and
clock_gettime() interfaces, which have been the target of the commit in
question, the author obviously forgot that there are other code paths
which invoke task_sched_runtime()

do_task_stat(()
 thread_group_cputime_adjusted()
   thread_group_cputime()
     task_cputime()
       task_sched_runtime()
        if (task_current(rq, p) && task_on_rq_queued(p)) {
          update_rq_clock(rq);
          up->sched_class->update_curr(rq);
        }

If the stats are read for a stomp machine task, aka 'migration/N' and
that task is current on its cpu, this will happily call the NULL pointer
of stop_task->update_curr.  Ooops.

Chris observation that this happens faster when he runs the fork bomb
makes sense as the fork bomb will kick migration threads more often so
the probability to hit the issue will increase.

Add the missing update_curr callbacks to the scheduler classes stop_task
and idle_task.  While idle tasks cannot be monitored via /proc we have
other means to hit the idle case.

Fixes: 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
Reported-by: Chris Mason <clm@fb.com>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

90e362f4

Merge branch 'x86-traps' (trap handling from Andy Lutomirski) · 00c89b2f

Linus Torvalds authored Nov 23, 2014

Merge x86-64 iret fixes from Andy Lutomirski:
 "This addresses the following issues:

   - an unrecoverable double-fault triggerable with modify_ldt.
   - invalid stack usage in espfix64 failed IRET recovery from IST
     context.
   - invalid stack usage in non-espfix64 failed IRET recovery from IST
     context.

  It also makes a good but IMO scary change: non-espfix64 failed IRET
  will now report the correct error.  Hopefully nothing depended on the
  old incorrect behavior, but maybe Wine will get confused in some
  obscure corner case"

* emailed patches from Andy Lutomirski <luto@amacapital.net>:
  x86_64, traps: Rework bad_iret
  x86_64, traps: Stop using IST for #SS
  x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C

00c89b2f

x86_64, traps: Rework bad_iret · b645af2d

Andy Lutomirski authored Nov 22, 2014

It's possible for iretq to userspace to fail.  This can happen because
of a bad CS, SS, or RIP.

Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace.  To make this work, there's
an extra fixup to fudge the gs base into a usable state.

This is suboptimal because it loses the original exception.  It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with.  For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.

This patch throws out bad_iret entirely.  As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C.  It's should be clearer and more correct.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b645af2d

x86_64, traps: Stop using IST for #SS · 6f442be2

Andy Lutomirski authored Nov 22, 2014

On a 32-bit kernel, this has no effect, since there are no IST stacks.

On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code.  The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.

This fixes a bug in which the espfix64 code mishandles a stack segment
violation.

This saves 4k of memory per CPU and a tiny bit of code.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6f442be2

x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C · af726f21

Andy Lutomirski authored Nov 22, 2014

There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly.  Move it to C.

This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.

Fixes: 3891a04aSigned-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

af726f21

Merge tag 'armsoc-for-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 27946315

Linus Torvalds authored Nov 23, 2014

Pull ARM SoC fixes from Olof Johansson:
 "A collection of fixes this week:

   - A set of clock fixes for shmobile platforms
   - A fix for tegra that moves serial port labels to be per board.
     We're choosing to merge this for 3.18 because the labels will start
     being parsed in 3.19, and without this change serial port numbers
     that used to be stable since the dawn of time will change numbers.
   - A few other DT tweaks for Tegra.
   - A fix for multi_v7_defconfig that makes it stop spewing cpufreq
     errors on Arndale (Exynos)"

* tag 'armsoc-for-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: multi_v7_defconfig: fix failure setting CPU voltage by enabling dependent I2C controller
  ARM: tegra: roth: Fix SD card VDD_IO regulator
  ARM: tegra: Remove eMMC vmmc property for roth/tn7
  ARM: dts: tegra: move serial aliases to per-board
  ARM: tegra: Add serial port labels to Tegra124 DT
  ARM: shmobile: kzm9g legacy: Set i2c clks_per_count to 2
  ARM: shmobile: r8a7740 dtsi: Correct IIC0 parent clock
  ARM: shmobile: r8a7790: Fix SD3CKCR address to device tree
  ARM: shmobile: r8a7740 legacy: Correct IIC0 parent clock
  ARM: shmobile: r8a7740 legacy: Add missing INTCA clock for irqpin module
  ARM: shmobile: r8a7790: Fix SD3CKCR address
  ARM: dts: sun6i: Re-parent ahb1_mux to pll6 as required by dma controller

27946315

Merge branch 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu · 9f2e0f63

Linus Torvalds authored Nov 23, 2014

Pull percpu fix from Tejun Heo:
 "This contains one patch to fix a race condition which can lead to
  percpu_ref using a percpu pointer which is corrupted with a set DEAD
  bit.  The bug was introduced while separating out the ATOMIC mode flag
  from the DEAD flag.  The fix is pretty straight forward.

  I just committed the patch to the percpu tree but am sending out the
  pull request early as I'll be on vacation for a week.  The patch
  should be fairly safe and while the latency will be higher I'll be
  checking emails"

* 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu-ref: fix DEAD flag contamination of percpu pointer

9f2e0f63

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · d038a63a

Linus Torvalds authored Nov 23, 2014

Pull btrfs deadlock fix from Chris Mason:
 "This has a fix for a long standing deadlock that we've been trying to
  nail down for a while.  It ended up being a bad interaction with the
  fair reader/writer locks and the order btrfs reacquires locks in the
  btree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix lockups from btrfs_clear_path_blocking

d038a63a

percpu-ref: fix DEAD flag contamination of percpu pointer · 4aab3b5b

Tejun Heo authored Nov 22, 2014

While decoupling ATOMIC and DEAD flags, f47ad457 ("percpu_ref:
decouple switching to percpu mode and reinit") updated
__ref_is_percpu() so that it only tests ATOMIC flag to determine
whether the ref is in percpu mode or not; however, while DEAD implies
ATOMIC, the two flags are set separately during percpu_ref_kill() and
if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o
ATOMIC.  Because __ref_is_percpu() returns @ref->percpu_count_ptr
value verbatim as the percpu pointer after testing ATOMIC, the pointer
may now be contaminated with the DEAD flag.

This can be fixed by clearing the flag bits before returning the
pointer which was the fix proposed by Shaohua; however, as DEAD
implies ATOMIC, we can just test for both flags at once and avoid the
explicit masking.

Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD
are clear before returning @ref->percpu_count_ptr as the percpu
pointer.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-Reviewed-by: Shaohua Li <shli@kernel.org>
Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org
Fixes: f47ad457 ("percpu_ref: decouple switching to percpu mode and reinit")

4aab3b5b

22 Nov, 2014 1 commit

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cb954139

Linus Torvalds authored Nov 22, 2014

Pull timer fix from Thomas Gleixner:
 "A single bugfix for an init order problem in the sun4i subarch
  clockevents code"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevent: sun4i: Fix race condition in the probe code

cb954139