- 11 Dec, 2014 6 commits
-
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
As it is, default ->i_fop has NULL ->open() (along with all other methods). The only case where it matters is reopening (via procfs symlink) a file that didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned to something sane (default would fail on read/write/ioctl/etc.). Unfortunately, such case exists - alloc_file() users, especially anon_get_file() ones. There we have tons of opened files of very different kinds sharing the same inode. As the result, attempt to reopen those via procfs succeeds and you get a descriptor you can't do anything with. Moreover, in case of sockets we set ->i_fop that will only be used on such reopen attempts - and put a failing ->open() into it to make sure those do not succeed. It would be simpler to put such ->open() into default ->i_fop and leave it unchanged both for anon inode (as we do anyway) and for socket ones. Result: * everything going through do_dentry_open() works as it used to * sock_no_open() kludge is gone * attempts to reopen anon-inode files fail as they really ought to * ditto for aio_private_file() * ditto for perfmon - this one actually tried to imitate sock_no_open() trick, but failed to set ->i_fop, so in the current tree reopens succeed and yield completely useless descriptor. Intent clearly had been to fail with -ENXIO on such reopens; now it actually does. * everything else that used alloc_file() keeps working - it has ->i_fop set for its inodes anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
-
Al Viro authored
procfs inodes need only the ns_ops part; nsfs inodes don't need it at all Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now. It's not mountable (not even registered, so it's not in /proc/filesystems, etc.). Files on it *are* bindable - we explicitly permit that in do_loopback(). This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well. get_proc_ns() is a macro now (it's simply returning ->i_private; would have been an inline, if not for header ordering headache). proc_ns_inode() is an ex-parrot. The interface used in procfs is ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops). Dentries and inodes are never hashed; a non-counting reference to dentry is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path() if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details of that mechanism. As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt; it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets from ns_get_path(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 09 Dec, 2014 5 commits
-
-
Al Viro authored
-
Al Viro authored
BTW, do we want memcpy_nocache()? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
initialization of kvec-backed iov_iter Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... without bothering with copy_..._user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 04 Dec, 2014 8 commits
-
-
Al Viro authored
a) make get_proc_ns() return a pointer to struct ns_common b) mirror ns_ops in dentry->d_fsdata of ns dentries, so that is_mnt_ns_file() could get away with fewer dereferences. That way struct proc_ns becomes invisible outside of fs/proc/*.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
We can do that now. And kill ->inum(), while we are at it - all instances are identical. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
for now - just move corresponding ->proc_inum instances over there Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 27 Nov, 2014 9 commits
-
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Just have copy_page_{to,from}_iter() fall back to kmap_atomic + copy_{to,from}_iter() + kunmap_atomic() in ITER_BVEC case. As the matter of fact, that's what we want to do for any iov_iter kind that isn't blocking - e.g. ITER_KVEC will also go that way once we recognize it on iov_iter.c primitives level Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
same as iterate_all_kinds, but iterator is moved to the position past the last byte we'd handled. iov_iter_advance() converted to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
iterate_all_kinds(iter, size, ident, step_iovec, step_bvec) iterates through the ranges covered by iter (up to size bytes total), repeating step_iovec or step_bvec for each of those. ident is declared in expansion of that thing, either as struct iovec or struct bvec, and it contains the range we are currently looking at. step_bvec should be a void expression, step_iovec - a size_t one, with non-zero meaning "stop here, that many bytes from this range left". In the end, the amount actually handled is stored in size. iov_iter_copy_from_user_atomic() and iov_iter_alignment() converted to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 23 Nov, 2014 11 commits
-
-
Linus Torvalds authored
-
Andy Lutomirski authored
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Thomas Gleixner authored
Chris bisected a NULL pointer deference in task_sched_runtime() to commit 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'. Chris observed crashes in atop or other /proc walking programs when he started fork bombs on his machine. He assumed that this is a new exit race, but that does not make any sense when looking at that commit. What's interesting is that, the commit provides update_curr callbacks for all scheduling classes except stop_task and idle_task. While nothing can ever hit that via the clock_nanosleep() and clock_gettime() interfaces, which have been the target of the commit in question, the author obviously forgot that there are other code paths which invoke task_sched_runtime() do_task_stat(() thread_group_cputime_adjusted() thread_group_cputime() task_cputime() task_sched_runtime() if (task_current(rq, p) && task_on_rq_queued(p)) { update_rq_clock(rq); up->sched_class->update_curr(rq); } If the stats are read for a stomp machine task, aka 'migration/N' and that task is current on its cpu, this will happily call the NULL pointer of stop_task->update_curr. Ooops. Chris observation that this happens faster when he runs the fork bomb makes sense as the fork bomb will kick migration threads more often so the probability to hit the issue will increase. Add the missing update_curr callbacks to the scheduler classes stop_task and idle_task. While idle tasks cannot be monitored via /proc we have other means to hit the idle case. Fixes: 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency' Reported-by: Chris Mason <clm@fb.com> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Merge x86-64 iret fixes from Andy Lutomirski: "This addresses the following issues: - an unrecoverable double-fault triggerable with modify_ldt. - invalid stack usage in espfix64 failed IRET recovery from IST context. - invalid stack usage in non-espfix64 failed IRET recovery from IST context. It also makes a good but IMO scary change: non-espfix64 failed IRET will now report the correct error. Hopefully nothing depended on the old incorrect behavior, but maybe Wine will get confused in some obscure corner case" * emailed patches from Andy Lutomirski <luto@amacapital.net>: x86_64, traps: Rework bad_iret x86_64, traps: Stop using IST for #SS x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
-
Andy Lutomirski authored
It's possible for iretq to userspace to fail. This can happen because of a bad CS, SS, or RIP. Historically, we've handled it by fixing up an exception from iretq to land at bad_iret, which pretends that the failed iret frame was really the hardware part of #GP(0) from userspace. To make this work, there's an extra fixup to fudge the gs base into a usable state. This is suboptimal because it loses the original exception. It's also buggy because there's no guarantee that we were on the kernel stack to begin with. For example, if the failing iret happened on return from an NMI, then we'll end up executing general_protection on the NMI stack. This is bad for several reasons, the most immediate of which is that general_protection, as a non-paranoid idtentry, will try to deliver signals and/or schedule from the wrong stack. This patch throws out bad_iret entirely. As a replacement, it augments the existing swapgs fudge into a full-blown iret fixup, mostly written in C. It's should be clearer and more correct. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andy Lutomirski authored
On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andy Lutomirski authored
There's nothing special enough about the espfix64 double fault fixup to justify writing it in assembly. Move it to C. This also fixes a bug: if the double fault came from an IST stack, the old asm code would return to a partially uninitialized stack frame. Fixes: 3891a04aSigned-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds authored
Pull ARM SoC fixes from Olof Johansson: "A collection of fixes this week: - A set of clock fixes for shmobile platforms - A fix for tegra that moves serial port labels to be per board. We're choosing to merge this for 3.18 because the labels will start being parsed in 3.19, and without this change serial port numbers that used to be stable since the dawn of time will change numbers. - A few other DT tweaks for Tegra. - A fix for multi_v7_defconfig that makes it stop spewing cpufreq errors on Arndale (Exynos)" * tag 'armsoc-for-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: multi_v7_defconfig: fix failure setting CPU voltage by enabling dependent I2C controller ARM: tegra: roth: Fix SD card VDD_IO regulator ARM: tegra: Remove eMMC vmmc property for roth/tn7 ARM: dts: tegra: move serial aliases to per-board ARM: tegra: Add serial port labels to Tegra124 DT ARM: shmobile: kzm9g legacy: Set i2c clks_per_count to 2 ARM: shmobile: r8a7740 dtsi: Correct IIC0 parent clock ARM: shmobile: r8a7790: Fix SD3CKCR address to device tree ARM: shmobile: r8a7740 legacy: Correct IIC0 parent clock ARM: shmobile: r8a7740 legacy: Add missing INTCA clock for irqpin module ARM: shmobile: r8a7790: Fix SD3CKCR address ARM: dts: sun6i: Re-parent ahb1_mux to pll6 as required by dma controller
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpuLinus Torvalds authored
Pull percpu fix from Tejun Heo: "This contains one patch to fix a race condition which can lead to percpu_ref using a percpu pointer which is corrupted with a set DEAD bit. The bug was introduced while separating out the ATOMIC mode flag from the DEAD flag. The fix is pretty straight forward. I just committed the patch to the percpu tree but am sending out the pull request early as I'll be on vacation for a week. The patch should be fairly safe and while the latency will be higher I'll be checking emails" * 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu-ref: fix DEAD flag contamination of percpu pointer
-
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds authored
Pull btrfs deadlock fix from Chris Mason: "This has a fix for a long standing deadlock that we've been trying to nail down for a while. It ended up being a bad interaction with the fair reader/writer locks and the order btrfs reacquires locks in the btree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix lockups from btrfs_clear_path_blocking
-
Tejun Heo authored
While decoupling ATOMIC and DEAD flags, f47ad457 ("percpu_ref: decouple switching to percpu mode and reinit") updated __ref_is_percpu() so that it only tests ATOMIC flag to determine whether the ref is in percpu mode or not; however, while DEAD implies ATOMIC, the two flags are set separately during percpu_ref_kill() and if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o ATOMIC. Because __ref_is_percpu() returns @ref->percpu_count_ptr value verbatim as the percpu pointer after testing ATOMIC, the pointer may now be contaminated with the DEAD flag. This can be fixed by clearing the flag bits before returning the pointer which was the fix proposed by Shaohua; however, as DEAD implies ATOMIC, we can just test for both flags at once and avoid the explicit masking. Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD are clear before returning @ref->percpu_count_ptr as the percpu pointer. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-Reviewed-by: Shaohua Li <shli@kernel.org> Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org Fixes: f47ad457 ("percpu_ref: decouple switching to percpu mode and reinit")
-
- 22 Nov, 2014 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fix from Thomas Gleixner: "A single bugfix for an init order problem in the sun4i subarch clockevents code" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevent: sun4i: Fix race condition in the probe code
-