- 16 Jan, 2014 1 commit
-
-
Steven Whitehouse authored
Al Viro has tactfully pointed out that we are using the incorrect error code in some cases. This patch fixes that, and also removes the (unused) return value for glock dumping. > * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since > we don't support STREAMS it's probably fair game, but... what the hell? Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>
-
- 15 Jan, 2014 1 commit
-
-
Steven Whitehouse authored
Well I don't get the same warning locally as the kbuild robot, but I guess this should fix the problem, anyway. Here is the warning: head: 2d9e7230 commit: ee2411a8 [19/20] GFS2: Clean up quota slot allocation config: make ARCH=powerpc allmodconfig All error/warnings: fs/gfs2/quota.c: In function 'gfs2_quota_init': >> fs/gfs2/quota.c:1246:3: error: implicit declaration of function '__vmalloc' [-Werror=implicit-function-declaration] sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL); ^ >> fs/gfs2/quota.c:1246:24: warning: assignment makes pointer from integer without a cast [enabled by default] sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL); ^ fs/gfs2/quota.c: In function 'gfs2_quota_cleanup': >> fs/gfs2/quota.c:1361:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] vfree(sdp->sd_quota_bitmap); Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
- 14 Jan, 2014 5 commits
-
-
Steven Whitehouse authored
Gradually, the global qd_lock is being used for less and less. After this patch it will only be used for the per super block list whose purpose is to allow syncing of changes back to the master quota file from the local quota changes file. Fixing up that process to make it more efficient will be the subject of a later patch, however this patch removes another barrier to doing that. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
Steven Whitehouse authored
Quota slot allocation has historically used a vector of pages and a set of homegrown find/test/set/clear bit functions. Since the size of the bitmap is likely to be based on the default qc file size, thats a couple of pages at most. So we ought to be able to allocate that as a single chunk, with a vmalloc fallback, just in case of memory fragmentation. We are then able to use the kernel's own find/test/set/clear bit functions, rather than rolling our own. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
Steven Whitehouse authored
While investigating a rather strange bit of code in the quota clean up function, I spotted that the reason for its existence was that when remounting read only, we were not stopping the quotad thread, and thus it was possible for it to still have a reference to some of the quotas in that case. This patch moves the logd and quota thread start and stop into the make_fs_rw/ro functions, so that we now stop those threads when mounted read only. This means that quotad will always be stopped before we call the quota clean up function, and we can thus dispose of the (rather hackish) code that waits for it to give up its reference on the quotas. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
Steven Whitehouse authored
Prior to this patch, GFS2 kept all the quotas for each super block in a single linked list. This is rather slow when there are large numbers of quotas. This patch introduces a hlist_bl based hash table, similar to the one used for glocks. The initial look up of the quota is now lockless in the case where it is already cached, although we still have to take the per quota spinlock in order to bump the ref count. Either way though, this is a big improvement on what was there before. The qd_lock and the per super block list is preserved, for the time being. However it is intended that since this is no longer used for its original role, it should be possible to shrink the number of items on that list in due course and remove the requirement to take qd_lock in qd_get. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Steven Whitehouse authored
We recently fixed the writeback of pages prior to performing direct i/o, however the initial fix was perhaps a bit heavy handed. There is no need to invalidate pages if the direct i/o is only a read, since they will be identical to what has been flushed to disk anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
- 09 Jan, 2014 1 commit
-
-
Steven Whitehouse authored
Spotted by Andy Price. This should fix the odd messages from lockdep caused by 70d4ee94Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Price <anprice@redhat.com>
-
- 08 Jan, 2014 2 commits
-
-
Steven Whitehouse authored
This patch adds four new fields to directory leaf blocks. The intent is not to use them in the kernel itself, although perhaps we may be able to use them as hints at some later date, but instead to provide more information for debug/fsck use. One new field adds a pointer to the inode to which the leaf belongs. This can be useful if the pointer to the leaf block has become corrupt, as it will allow us to know which inode this block should be associated with. This field is set when the leaf is created and never changed over its lifetime. The second field is a "distance from the hash table" field. The meaning is as follows: 0 = An old leaf in which this value has not been set 1 = This leaf is pointed to directly from the hash table 2+ = This leaf is part of a chain, pointed to by another leaf block, the value gives the position in the chain. The third and fourth fields combine to give a time stamp of the most recent directory insertion or deletion from this leaf block. The time stamp is not updated when a new leaf block is chained from the current one. The code is currently written such that the timestamp on the dir inode will match that of the leaf block for the most recent insertion/deletion. For backwards compatibility, any of these new fields which is zero should be considered to be "unknown". Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
For most cases, only a single new block is needed when we reach the point of converting from stuffed to exhash directory. The exception being when the file name is so long that it will not fit within the new leaf block. So this patch adds a simple test for that situation so that we do not need to request the full reservation size in this case. Potentially we could calculate more accurately the value to use in other cases too, but that is much more complicated to do and it is doubtful that the benefit would outweigh the extra cost in code complexity. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
- 07 Jan, 2014 1 commit
-
-
Bob Peterson authored
This patch calls get_write_access in function gfs2_setattr_chown, which merely increases inode->i_writecount for the duration of the function. That will ensure that any file closes won't delete the inode's multi-block reservation while the function is running. It also ensures that a multi-block reservation exists when needed for quota change operations during the chown. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
- 06 Jan, 2014 3 commits
-
-
Steven Whitehouse authored
When we look to see if there is enough space to add a dir entry without allocation, we have then been repeating the same search later when we do the actual insertion. This patch caches the details of the location in the gfs2_diradd structure, so that we do not have to repeat the search. This will provide a performance improvement which will be greater as the size of the directory increases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
There are three cases where we need to calculate the number of blocks to reserve in a transaction involving linking an inode into a directory. The one in rename is a bit more complicated, but the basis of it is the same as for link and create. So it makes sense to move this calculation into a single function rather than repeating it three times. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
The intent is that this structure will hold the information required when adding entries to a directory (linking). To start with, it will contain only the number of blocks which are required to link the new entry into the directory. The current calculation returns either 0 or the maximim number of blocks that can ever be requested by such a transaction. The intent is that in a later patch, we can update the dir code to calculate this value more accurately. In addition further patches will also add further fields to the new structure to increase its utility. In addition this patch fixes a bug where the link used during inode creation was adding requesting too many blocks in some cases. This is harmless unless the fs is close to being full. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
- 03 Jan, 2014 8 commits
-
-
Steven Whitehouse authored
Prior to this patch, GFS2 had one address space for each rgrp, stored in the glock. This patch changes them to use a single address space in the super block. This therefore saves (sizeof(struct address_space) * nr_of_rgrps) bytes of memory and for large filesystems, that can be significant. It would be nice to be able to do something similar and merge the inode metadata address space into the same global address space. However, that is rather more complicated as the on-disk location doesn't have a 1:1 mapping with the inodes in general. So while it could be done, it will be a more complicated operation as it requires changing a lot more code paths. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
Each rgrp header is represented as a single extent on disk, so we can calculate the position within the address space, since we are using address spaces mapped 1:1 to the disk. This means that it is possible to use the range based versions of filemap_fdatawrite/wait and for invalidating the page cache. Our eventual intent is to then be able to merge the address spaces used for rgrps into a single address space, rather than to have one for each glock, saving memory and reducing complexity. Since during umount, the rgrp structures are disposed of before the glocks, we need to store the extent information in the glock so that is is available for a final invalidation. This patch uses a field which is otherwise unused in rgrp glocks to do that, so that we do not have to expand the size of a glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
Since gfs2_inplace_reserve() is always called with a valid alloc parms structure, there is no need to test for this within the function itself - and in any case, after we've all ready dereferenced it anyway. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
There is only one place this is used, when reading in the quota changes at mount time. It is not really required and much simpler to just convert the fields from the on-disk structure as required. There should be no functional change as a result of this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
For historical reasons, we drop and retake the log lock in ->releasepage() however, since there is no reason why we cannot hold the log lock over the whole function, this allows some simplification. In particular, pinning a buffer is only ever done under the log lock, so it is possible here to remove the test for pinned buffers in the second loop, since it is impossible for that to happen (it is also tested in the first loop). As a result, two tests made later in the second loop become constants and can also be reduced to the only possible branch. So the net result is to remove various bits of unreachable code and make this more readable. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Bob Peterson authored
With the preceding patch, we started accepting block reservations smaller than the ideal size, which requires a lot more parsing of the bitmaps. To reduce the amount of bitmap searching, this patch implements a scheme whereby each rgrp keeps track of the point at this multi-block reservations will fail. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Bob Peterson authored
This is just basically a resend of a patch I posted earlier. It didn't change from its original, except in diff offsets, etc: This patch fixes a bug in the GFS2 block allocation code. The problem starts if a process already has a multi-block reservation, but for some reason, another process disqualifies it from further allocations. For example, the other process might set on the GFS2_RDF_ERROR bit. The process holding the reservation jumps to label skip_rgrp, but that label comes after the code that removes the reservation from the tree. Therefore, the no longer usable reservation is not removed from the rgrp's reservations tree; it's lost. Eventually, the lost reservation causes the count of reserved blocks to get off, and eventually that causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger. This patch moves the call to after label skip_rgrp so that the disqualified reservation is properly removed from the tree, thus keeping the rgrp rd_reserved count sane. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Bob Peterson authored
Here is a second try at a patch I posted earlier, which also implements suggestions Steve made: Before this patch, GFS2 would keep searching through all the rgrps until it found one that had a chunk of free blocks big enough to satisfy the size hint, which is based on the file write size, regardless of whether the chunk was big enough to perform the write. However, when doing big writes there may not be a large enough chunk of free blocks in any rgrp, due to file system fragmentation. The largest chunk may be big enough to satisfy the write request, but it may not meet the ideal reservation size from the "size hint". The writes would slow to a crawl because every write would search every rgrp, then finally give up and default to a single-block write. In my case, performance would drop from 425MB/s to 18KB/s, or 24000 times slower. This patch basically makes it so that if we can't find a contiguous chunk of blocks big enough to satisfy the sizehint, we'll use the largest chunk of blocks we found that will still contain the write. It does so by keeping track of the largest run of blocks within the rgrp. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
- 02 Jan, 2014 16 commits
-
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm bugfixes from Marcelo Tosatti. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: Unconditionally uninit the MMU on nested vmexit KVM: x86: Fix APIC map calculation after re-enabling
-
Linus Torvalds authored
Merge patches from Andrew Morton: "Ten fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test() mm/memory-failure.c: transfer page count from head page to tail page after split thp MAINTAINERS: set up proper record for Xilinx Zynq mm: remove bogus warning in copy_huge_pmd() memcg: fix memcg_size() calculation mm: fix use-after-free in sys_remap_file_pages mm: munlock: fix deadlock in __munlock_pagevec() mm: munlock: fix a bug where THP tail page is encountered
-
Jason Baron authored
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock. That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with commmit 67347fe4 ("epoll: do not take global 'epmutex' for simple topologies"). The acquistion of the ep->mtx for the destination 'ep' was added such that a concurrent EPOLL_CTL_ADD operation would see the correct state of the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links') However, by simply not acquiring the lock, we do not serialize behind the ep->mtx from the add path, and thus may perform a full path check when if we had waited a little longer it may not have been necessary. However, this is a transient state, and performing the full loop checking in this case is not harmful. The important point is that we wouldn't miss doing the full loop checking when required, since EPOLL_CTL_ADD always locks any 'ep's that its operating upon. The reason we don't need to do lock ordering in the add path, is that we are already are holding the global 'epmutex' whenever we do the double lock. Further, the original posting of this patch, which was tested for the intended performance gains, did not perform this additional locking. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nobuhiro Iwamatsu authored
Min_low_pfn and max_low_pfn were used in pfn_valid macro if defined CONFIG_FLATMEM. When the functions that use the pfn_valid is used in driver module, max_low_pfn and min_low_pfn is to undefined, and fail to build. ERROR: "min_low_pfn" [drivers/block/aoe/aoe.ko] undefined! ERROR: "max_low_pfn" [drivers/block/aoe/aoe.ko] undefined! make[2]: *** [__modpost] Error 1 make[1]: *** [modules] Error 2 This patch fix this problem. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Cc: Kuninori Morimoto <kuninori.morimoto.gx@gmail.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jiang Liu authored
Check DMA mapping return values in function ioat_dma_self_test() to get rid of following warning message. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1203 at lib/dma-debug.c:937 check_unmap+0x4c0/0x9a0() ioatdma 0000:00:04.0: DMA-API: device driver failed to check map error[device address=0x000000085191b000] [size=2000 bytes] [mapped as single] Modules linked in: ioatdma(+) mac_hid wmi acpi_pad lp parport hidd_generic usbhid hid ixgbe isci dca libsas ahci ptp libahci scsi_transport_sas meegaraid_sas pps_core mdio CPU: 0 PID: 1203 Comm: systemd-udevd Not tainted 3.13.0-rc4+ #8 Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIIN1.86B.0044.L09.1311181644 11/18/2013 Call Trace: dump_stack+0x4d/0x66 warn_slowpath_common+0x7d/0xa0 warn_slowpath_fmt+0x4c/0x50 check_unmap+0x4c0/0x9a0 debug_dma_unmap_page+0x81/0x90 ioat_dma_self_test+0x3d2/0x680 [ioatdma] ioat3_dma_self_test+0x12/0x30 [ioatdma] ioat_probe+0xf4/0x110 [ioatdma] ioat3_dma_probe+0x268/0x410 [ioatdma] ioat_pci_probe+0x122/0x1b0 [ioatdma] local_pci_probe+0x45/0xa0 pci_device_probe+0xd9/0x130 driver_probe_device+0x171/0x490 __driver_attach+0x93/0xa0 bus_for_each_dev+0x6b/0xb0 driver_attach+0x1e/0x20 bus_add_driver+0x1f8/0x2b0 driver_register+0x81/0x110 __pci_register_driver+0x60/0x70 ioat_init_module+0x89/0x1000 [ioatdma] do_one_initcall+0xe2/0x250 load_module+0x2313/0x2a00 SyS_init_module+0xd9/0x130 system_call_fastpath+0x1a/0x1f ---[ end trace 990c591681d27c31 ]--- Mapped at: debug_dma_map_page+0xbe/0x180 ioat_dma_self_test+0x1ab/0x680 [ioatdma] ioat3_dma_self_test+0x12/0x30 [ioatdma] ioat_probe+0xf4/0x110 [ioatdma] ioat3_dma_probe+0x268/0x410 [ioatdma] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Naoya Horiguchi authored
Memory failures on thp tail pages cause kernel panic like below: mce: [Hardware Error]: Machine check events logged MCE exception done on CPU 7 BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff811b7cd1>] dequeue_hwpoisoned_huge_page+0x131/0x1e0 PGD bae42067 PUD ba47d067 PMD 0 Oops: 0000 [#1] SMP ... CPU: 7 PID: 128 Comm: kworker/7:2 Tainted: G M O 3.13.0-rc4-131217-1558-00003-g83b7df08e462 #25 ... Call Trace: me_huge_page+0x3e/0x50 memory_failure+0x4bb/0xc20 mce_process_work+0x3e/0x70 process_one_work+0x171/0x420 worker_thread+0x11b/0x3a0 ? manage_workers.isra.25+0x2b0/0x2b0 kthread+0xe4/0x100 ? kthread_create_on_node+0x190/0x190 ret_from_fork+0x7c/0xb0 ? kthread_create_on_node+0x190/0x190 ... RIP dequeue_hwpoisoned_huge_page+0x131/0x1e0 CR2: 0000000000000058 The reasoning of this problem is shown below: - when we have a memory error on a thp tail page, the memory error handler grabs a refcount of the head page to keep the thp under us. - Before unmapping the error page from processes, we split the thp, where page refcounts of both of head/tail pages don't change. - Then we call try_to_unmap() over the error page (which was a tail page before). We didn't pin the error page to handle the memory error, this error page is freed and removed from LRU list. - We never have the error page on LRU list, so the first page state check returns "unknown page," then we move to the second check with the saved page flag. - The saved page flag have PG_tail set, so the second page state check returns "hugepage." - We call me_huge_page() for freed error page, then we hit the above panic. The root cause is that we didn't move refcount from the head page to the tail page after split thp. So this patch suggests to do this. This panic was introduced by commit 524fca1e ("HWPOISON: fix misjudgement of page_action() for errors on mlocked pages"). Note that we did have the same refcount problem before this commit, but it was just ignored because we had only first page state check which returned "unknown page." The commit changed the refcount problem from "doesn't work" to "kernel panic." Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: <stable@vger.kernel.org> [3.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Simek authored
Setup correct zynq entry. - Add missing cadence_ttc_timer maintainership - Add zynq wildcard - Add xilinx wildcard Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
Sasha Levin reported the following warning being triggered WARNING: CPU: 28 PID: 35287 at mm/huge_memory.c:887 copy_huge_pmd+0x145/ 0x3a0() Call Trace: copy_huge_pmd+0x145/0x3a0 copy_page_range+0x3f2/0x560 dup_mmap+0x2c9/0x3d0 dup_mm+0xad/0x150 copy_process+0xa68/0x12e0 do_fork+0x96/0x270 SyS_clone+0x16/0x20 stub_clone+0x69/0x90 This warning was introduced by "mm: numa: Avoid unnecessary disruption of NUMA hinting during migration" for paranoia reasons but the warning is bogus. I was thinking of parallel races between NUMA hinting faults and forks but this warning would also be triggered by a parallel reclaim splitting a THP during a fork. Remote the bogus warning. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vladimir Davydov authored
The mem_cgroup structure contains nr_node_ids pointers to mem_cgroup_per_node objects, not the objects themselves. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@openvz.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rik van Riel authored
remap_file_pages calls mmap_region, which may merge the VMA with other existing VMAs, and free "vma". This can lead to a use-after-free bug. Avoid the bug by remembering vm_flags before calling mmap_region, and not trying to dereference vma later. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Kees Cook <keescook@chromium.org> Cc: Michel Lespinasse <walken@google.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
Commit 7225522b ("mm: munlock: batch non-THP page isolation and munlock+putback using pagevec" introduced __munlock_pagevec() to speed up munlock by holding lru_lock over multiple isolated pages. Pages that fail to be isolated are put_page()d immediately, also within the lock. This can lead to deadlock when __munlock_pagevec() becomes the holder of the last page pin and put_page() leads to __page_cache_release() which also locks lru_lock. The deadlock has been observed by Sasha Levin using trinity. This patch avoids the deadlock by deferring put_page() operations until lru_lock is released. Another pagevec (which is also used by later phases of the function is reused to gather the pages for put_page() operation. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
Since commit ff6a6da6 ("mm: accelerate munlock() treatment of THP pages") munlock skips tail pages of a munlocked THP page. However, when the head page already has PageMlocked unset, it will not skip the tail pages. Commit 7225522b ("mm: munlock: batch non-THP page isolation and munlock+putback using pagevec") has added a PageTransHuge() check which contains VM_BUG_ON(PageTail(page)). Sasha Levin found this triggered using trinity, on the first tail page of a THP page without PageMlocked flag. This patch fixes the issue by skipping tail pages also in the case when PageMlocked flag is unset. There is still a possibility of race with THP page split between clearing PageMlocked and determining how many pages to skip. The race might result in former tail pages not being skipped, which is however no longer a bug, as during the skip the PageTail flags are cleared. However this race also affects correctness of NR_MLOCK accounting, which is to be fixed in a separate patch. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixesLinus Torvalds authored
Pull GFS2 fixes from Steven Whitehouse: "Here is a set of small fixes for GFS2. There is a fix to drop s_umount which is copied in from the core vfs, two patches relate to a hard to hit "use after free" and memory leak. Two patches related to using DIO and buffered I/O on the same file to ensure correct operation in relation to glock state changes. The final patch adds an RCU read lock to ensure correct locking on an error path" * tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: Fix unsafe dereference in dump_holder() GFS2: Wait for async DIO in glock state changes GFS2: Fix incorrect invalidation for DIO/buffered I/O GFS2: Fix slab memory leak in gfs2_bufdata GFS2: Fix use-after-free race when calling gfs2_remove_from_ail GFS2: don't hold s_umount over blkdev_put
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull s390 fixes from Martin Schwidefsky: "Two small bug fixes and a follow-up to the CONFIG_NR_CPUS change. A kernel compiled with CONFIG_NR_CPUS=256 will waste quite a bit of memory for the per-cpu arrays. Under z/VM the maximum number of CPUs is 64, the code now limits the possible cpu mask to 64 if running under z/VM" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: obtain function handle in hotplug notifier s390/3270: fix allocation of tty3270_screen structure s390/smp: improve setup of possible cpu mask
-
Jan Kiszka authored
Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if one guest VCPU manipulates the page of another VCPU in L2, we may be fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in nested state. That can crash the host later on if nested_ept_get_cr3 is invoked while L1 already left vmxon and nested.current_vmcs12 became NULL therefore. Cc: stable@kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
-
Tetsuo Handa authored
GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that RCU read lock is held when using task_struct returned from pid_task(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
- 01 Jan, 2014 2 commits
-
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull radeon drm fixes from Dave Airlie: "Just piping a bunch of fixes from pre-xmas from Alex for radeon, all either fix bad hw setup issues or regressions" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/radeon: Bump version for CIK DCE tiling fix drm/radeon: set correct number of banks for CIK chips in DCE drm/radeon: set correct pipe config for Hawaii in DCE drm/radeon: expose render backend mask to the userspace drm/radeon: fix render backend setup for SI and CIK drm/radeon: 0x9649 is SUMO2 not SUMO drm/radeon: fix UVD 256MB check
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fix from Herbert Xu: "Fix a build error on ARM that was introduced in 3.13-rc1" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ixp4xx - Fix kernel compile error
-