- 06 May, 2014 40 commits
-
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Beginning to introduce those. Just the callers for now, and it's clumsier than it'll eventually become; once we finish converting aio_read and aio_write instances, the things will get nicer. For now, these guys are in parallel to ->aio_read() and ->aio_write(); they take iocb and iov_iter, with everything in iov_iter already validated. File offset is passed in iocb->ki_pos, iov/nr_segs - in iov_iter. Main concerns in that series are stack footprint and ability to split the damn thing cleanly. [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Since we are about to introduce new methods (read_iter/write_iter), the tests in a bunch of places would have to grow inconveniently. Check once (at open() time) and store results in ->f_mode as FMODE_CAN_READ and FMODE_CAN_WRITE resp. It might end up being a temporary measure - once everything switches from ->aio_{read,write} to ->{read,write}_iter it might make sense to return to open-coded checks. We'll see... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
pos is redundant (it's iocb->ki_pos), and iov/nr_segs/count are taken care of by lifting iov_iter into the caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Now It Can Be Done(tm) - we don't need to do iov_shorten() in generic_file_direct_write() anymore, now that all ->direct_IO() instances are converted to proper iov_iter methods and honour iter->count and iter->iov_offset properly. Get rid of count/ocount arguments of generic_file_direct_write(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... and don't open-code iov_iter_alignment() there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
same as iov_iter_get_pages(), except that pages array is allocated (kmalloc if possible, vmalloc if that fails) and left for caller to free. Lustre and NFS ->direct_IO() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
counts the pages covered by iov_iter, up to given limit. do_block_direct_io() and fuse_iter_npages() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... to fuse_direct_{read,write}(). ->direct_IO() path uses the iov_iter passed by the caller instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
iov_iter_get_pages(iter, pages, maxsize, &start) grabs references pinning the pages of up to maxsize of (contiguous) data from iter. Returns the amount of memory grabbed or -error. In case of success, the requested area begins at offset start in pages[0] and runs through pages[1], etc. Less than requested amount might be returned - either because the contiguous area in the beginning of iterator is smaller than requested, or because the kernel failed to pin that many pages. direct-io.c switched to using iov_iter_get_pages() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
iov_iter-using variant of generic_file_aio_read(). Some callers converted. Note that it's still not quite there for use as ->read_iter() - we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately, that's true for all converted callers (and for generic_file_aio_read() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
returns the value aligned as badly as the worst remaining segment in iov_iter is. Use instead of open-coded equivalents. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
the thing is, we want to advance what's given to ->direct_IO() as we are forming the request; however, the callers care about the amount of data actually transferred, not the amount we tried to transfer. It's more convenient to allow ->direct_IO() instances do use iov_iter_advance() on the copy of iov_iter, leaving the actual advancing of the original to caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
all callers of ->aio_read() and ->aio_write() have iov/nr_segs already checked - generic_segment_checks() done after that is just an odd way to spell iov_length(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
all callers can use copy_page_from_iter() and it actually simplifies them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: agp: info leak in agpioc_info_wrap() fs/affs/super.c: bugfix / double free fanotify: fix -EOVERFLOW with large files on 64-bit slub: use sysfs'es release mechanism for kmem_cache revert "mm: vmscan: do not swap anon pages just because free+file is low" autofs: fix lockref lookup mm: filemap: update find_get_pages_tag() to deal with shadow entries mm/compaction: make isolate_freepages start at pageblock boundary MAINTAINERS: zswap/zbud: change maintainer email address mm/page-writeback.c: fix divide by zero in pos_ratio_polynom hugetlb: ensure hugepage access is denied if hugepages are not supported slub: fix memcg_propagate_slab_attrs drivers/rtc/rtc-pcf8523.c: fix month definition
-
Dan Carpenter authored
On 64 bit systems the agp_info struct has a 4 byte hole between ->agp_mode and ->aper_base. We need to clear it to avoid disclosing stack information to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Fabian Frederick authored
Commit 842a859d ("affs: use ->kill_sb() to simplify ->put_super() and failure exits of ->mount()") adds .kill_sb which frees sbi but doesn't remove sbi free in case of parse_options error causing double free+random crash. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [3.14.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Will Woods authored
On 64-bit systems, O_LARGEFILE is automatically added to flags inside the open() syscall (also openat(), blkdev_open(), etc). Userspace therefore defines O_LARGEFILE to be 0 - you can use it, but it's a no-op. Everything should be O_LARGEFILE by default. But: when fanotify does create_fd() it uses dentry_open(), which skips all that. And userspace can't set O_LARGEFILE in fanotify_init() because it's defined to 0. So if fanotify gets an event regarding a large file, the read() will just fail with -EOVERFLOW. This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit systems, using the same test as open()/openat()/etc. Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821Signed-off-by: Will Woods <wwoods@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
debugobjects warning during netfilter exit: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984 Workqueue: netns cleanup_net Call Trace: dump_stack+0x52/0x87 warn_slowpath_common+0x8c/0xc0 warn_slowpath_fmt+0x46/0x50 debug_print_object+0x8d/0xb0 __debug_check_no_obj_freed+0xa5/0x220 debug_check_no_obj_freed+0x15/0x20 kmem_cache_free+0x197/0x340 kmem_cache_destroy+0x86/0xe0 nf_conntrack_cleanup_net_list+0x131/0x170 nf_conntrack_pernet_exit+0x5d/0x70 ops_exit_list+0x5e/0x70 cleanup_net+0xfb/0x1c0 process_one_work+0x338/0x550 worker_thread+0x215/0x350 kthread+0xe7/0xf0 ret_from_fork+0x7c/0xb0 Also during dcookie cleanup: WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408 Call Trace: dump_stack (lib/dump_stack.c:52) warn_slowpath_common (kernel/panic.c:430) warn_slowpath_fmt (kernel/panic.c:445) debug_print_object (lib/debugobjects.c:262) __debug_check_no_obj_freed (lib/debugobjects.c:697) debug_check_no_obj_freed (lib/debugobjects.c:726) kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717) kmem_cache_destroy (mm/slab_common.c:363) dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343) event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153) __fput (fs/file_table.c:217) ____fput (fs/file_table.c:253) task_work_run (kernel/task_work.c:125 (discriminator 1)) do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751) int_signal (arch/x86/kernel/entry_64.S:807) Sysfs has a release mechanism. Use that to release the kmem_cache structure if CONFIG_SYSFS is enabled. Only slub is changed - slab currently only supports /proc/slabinfo and not /sys/kernel/slab/*. We talked about adding that and someone was working on it. [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build] [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more] Signed-off-by: Christoph Lameter <cl@linux.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Greg KH <greg@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
This reverts commit 0bf1457f ("mm: vmscan: do not swap anon pages just because free+file is low") because it introduced a regression in mostly-anonymous workloads, where reclaim would become ineffective and trap every allocating task in direct reclaim. The problem is that there is a runaway feedback loop in the scan balance between file and anon, where the balance tips heavily towards a tiny thrashing file LRU and anonymous pages are no longer being looked at. The commit in question removed the safe guard that would detect such situations and respond with forced anonymous reclaim. This commit was part of a series to fix premature swapping in loads with relatively little cache, and while it made a small difference, the cure is obviously worse than the disease. Revert it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ian Kent authored
autofs needs to be able to see private data dentry flags for its dentrys that are being created but not yet hashed and for its dentrys that have been rmdir()ed but not yet freed. It needs to do this so it can block processes in these states until a status has been returned to indicate the given operation is complete. It does this by keeping two lists, active and expring, of dentrys in this state and uses ->d_release() to keep them stable while it checks the reference count to determine if they should be used. But with the recent lockref changes dentrys being freed sometimes don't transition to a reference count of 0 before being freed so autofs can occassionally use a dentry that is invalid which can lead to a panic. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
Dave Jones reports the following crash when find_get_pages_tag() runs into an exceptional entry: kernel BUG at mm/filemap.c:1347! RIP: find_get_pages_tag+0x1cb/0x220 Call Trace: find_get_pages_tag+0x36/0x220 pagevec_lookup_tag+0x21/0x30 filemap_fdatawait_range+0xbe/0x1e0 filemap_fdatawait+0x27/0x30 sync_inodes_sb+0x204/0x2a0 sync_inodes_one_sb+0x19/0x20 iterate_supers+0xb2/0x110 sys_sync+0x44/0xb0 ia32_do_call+0x13/0x13 1343 /* 1344 * This function is never used on a shmem/tmpfs 1345 * mapping, so a swap entry won't be found here. 1346 */ 1347 BUG(); After commit 0cd6144a ("mm + fs: prepare for non-page entries in page cache radix trees") this comment and BUG() are out of date because exceptional entries can now appear in all mappings - as shadows of recently evicted pages. However, as Hugh Dickins notes, "it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably any other PAGECACHE_TAG_*) to appear on an exceptional entry. I expect it comes down to an occasional race in RCU lookup of the radix_tree: lacking absolute synchronization, we might sometimes catch an exceptional entry, with the tag which really belongs with the unexceptional entry which was there an instant before." And indeed, not only is the tree walk lockless, the tags are also read in chunks, one radix tree node at a time. There is plenty of time for page reclaim to swoop in and replace a page that was already looked up as tagged with a shadow entry. Remove the BUG() and update the comment. While reviewing all other lookup sites for whether they properly deal with shadow entries of evicted pages, update all the comments and fix memcg file charge moving to not miss shmem/tmpfs swapcache pages. Fixes: 0cd6144a ("mm + fs: prepare for non-page entries in page cache radix trees") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-