- 25 May, 2010 40 commits
-
-
Samu Onkalo authored
For misc devices, inode->i_cdev doesn't point to the device drivers own data. Link between file operations and device driver internal data is lost. Pass pointer to misc device struct via file private data for driver open function use. Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Fritzsche authored
32-bit Sparc used to only allow usage of 24-bit of it's atomic_t type. This was corrected with linux 2.6.3 when Keith M Wesolowski changed the implementation to use the parisc approach of having an array of spinlocks to protect the atomic_t. These warnings were also removed from the sparc implementation when the new implementation was merged in BKrev:402e4949VThdc6D3iaosSFUgabMfvw, but the warning still remained in some other places without any 24-bit-only atomic_t implementation inside the kernel. We should remove these warnings to allow users to rely on the full 32-bit range of atomic_t. Signed-off-by: Peter Fritzsche <peter.fritzsche@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joe Perches authored
The current logging macros are pr_<level>, dev_<level>, netdev_<level>, and netif_<level>. pr_ uses warning, the other use warn. Standardize these logging macros a bit more by adding pr_warn and pr_warn_ratelimited. Right now, there are: $ for level in emerg alert crit err warn warning notice info ; do \ for prefix in pr dev netdev netif ; do \ echo -n "${prefix}_${level}: `git grep -w "${prefix}_${level}" | wc -l` " ; \ done ; \ echo ; \ done pr_emerg: 45 dev_emerg: 4 netdev_emerg: 1 netif_emerg: 4 pr_alert: 24 dev_alert: 36 netdev_alert: 1 netif_alert: 6 pr_crit: 24 dev_crit: 22 netdev_crit: 1 netif_crit: 4 pr_err: 2013 dev_err: 8467 netdev_err: 267 netif_err: 240 pr_warn: 0 dev_warn: 1818 netdev_warn: 126 netif_warn: 23 pr_warning: 773 dev_warning: 0 netdev_warning: 0 netif_warning: 0 pr_notice: 148 dev_notice: 111 netdev_notice: 9 netif_notice: 3 pr_info: 1717 dev_info: 3007 netdev_info: 101 netif_info: 85 Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
Quote from Nick piggin's about btrfs patch - http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04472.html. "add_to_page_cache_lru is exported, so it should be used. Benefits over using a private pagevec: neater code, 128 bytes fewer stack used, percpu lru ordering is preserved, and finally don't need to flush pagevec before returning so batching may be shared with other LRU insertions." Let's use it instead of private pagevec in ntfs, too. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
cached_page and lru_pvec have not been used. Let's remove the arguments. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alex Riesen authored
gcc-4.3.3 produces the warning: "format not a string literal and no format arguments" Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <cel@citi.umich.edu> Cc: David S. Miller <davem@davemloft.net> Acked-by: Tom Talpey <tmtalpey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Phil Carmody authored
Handle out-of-range indices before reading what they refer to. And don't access the one-past-the-end element of the array either. Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Roel Kluin <roel.kluin@gmail.com> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not USHORT_MAX/SHORT_MAX/SHORT_MIN. - Make SHRT_MIN of type s16, not int, for consistency. [akpm@linux-foundation.org: fix drivers/dma/timb_dma.c] [akpm@linux-foundation.org: fix security/keys/keyring.c] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yury Polyanskiy authored
drivers/char/hangcheck-timer.c is doubly broken. When the overflown value of TIMER_FREQ is abnormally low, it spams the syslog with KERN_CRIT messages "Hangcheck: hangcheck value past margin!" But whether it happens or not depends on HZ and lpj in a complex way. People have hit it occasionally as far as google search can tell. First, the following line overflows unsigned long: # define TIMER_FREQ (HZ*loops_per_jiffy) Second, and more importantly, loops_per_jiffy has little to do with the con= version from the the time scale of get_cycles() (aka rdtsc) to the time scale of jiffies. The attached patch resolves both of the problems. Acked-by: Joel Becker <joel.becker@oracle.com> Cc: john stultz <johnstul@us.ibm.com> Cc: Jan Glauber <jan.glauber@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joakim Tjernlund authored
Linux does not define __BYTE_ORDER in its endian header files which makes some header files bend backwards to get at the current endian. Lets #define __BYTE_ORDER in big_endian.h/litte_endian.h to make it easier for header files that are used in user space too. In userspace the convention is that 1. _both_ __LITTLE_ENDIAN and __BIG_ENDIAN are defined, 2. you have to test for e.g. __BYTE_ORDER == __BIG_ENDIAN. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jani Nikula authored
Add __must_check to error pointer handlers to have the compiler warn about mistakes like: if (err) ERR_PTR(err); It found two bugs: Mar 12 Nikula Jani [PATCH] enclosure: fix error path - actually return ERR_PTR() on error Mar 12 Nikula Jani [PATCH] sunrpc: fix error path - actually return ERR_PTR() on error Signed-off-by: Jani Nikula <ext-jani.1.nikula@nokia.com> Cc: Phil Carmody <ext-phil.2.carmody@nokia.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arjan van de Ven authored
Currently, the menu governor uses the (corrected) next timer as key item for predicting the idle duration. It turns out that there are specific cases where this breaks down: There are cases where we have a very repetitive pattern of idle durations, where the idle period is pretty much the same, for reasons completely unrelated to the next timer event. Examples of such repeating patterns are network loads with irq mitigation, the mouse moving but in theory also the wifi beacons. This patch adds a relatively simple detector for such repeating patterns, where the standard deviation of the last 8 idle periods is compared to a threshold. With this extra predictor in place, measurements show that the DECAY factor can now be increased (the decaying average will now decay slower) to get an even more stable result. [arjan@infradead.org: fix bug identified by Frank] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Corrado Zoccolo <czoccolo@gmail.com> Cc: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
FUJITA Tomonori authored
Architectures that handle DMA-non-coherent memory need to set ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the buffer doesn't share a cache with the others. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mathieu Desnoyers authored
asm-generic/atomic.h has been derived from the mn10300 implementation. Remove the now duplicated mn10300 implementation by including the generic version instead. This adds cmpxchg_local() and cmpxchg64_local() for free to the architecture, as they are implemented in asm-generic/atomic.h. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Peter Fritzsche <peter.fritzsche@gmx.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Jamie Lokier <jamie@shareable.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Keith M Wesolowski <wesolows@foobazco.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Greg Ungerer authored
Commit 8b505ca8 ("serial: 68328serial.c: remove BAUD_TABLE_SIZE macro") misses one use of BAUD_TABLE_SIZE. So the resulting 68328serial.c does not compile: drivers/serial/68328serial.c: In function `m68328_console_setup': drivers/serial/68328serial.c:1439: error: `BAUD_TABLE_SIZE' undeclared (first use in this function) drivers/serial/68328serial.c:1439: error: (Each undeclared identifier is reported only once drivers/serial/68328serial.c:1439: error: for each function it appears in.) Fix that last use of it. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Cc: Thiago Farina <tfransosi@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
FUJITA Tomonori authored
Architectures that handle DMA-non-coherent memory need to set ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the buffer doesn't share a cache with the others. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Howells authored
Extend gdbstub to support more features of gdb remote protocol to keep gdb-7 and emacs gud mode happy: (*) The D command. Detach debugger. (*) The H command. Handle setting the target thread by ignoring it. (*) The qAttached command. Indicate we 'attached' to an existing process. (*) The qC command. Indicate that the current thread ID is 0. (*) The qOffsets command. Indicate that no relocation has been done. (*) The qSymbol:: command. Indicate that we're not interested in looking up any symbol addresses. (*) The qSupported command. Indicate the maximum packet size and the fact that reverse step and continue aren't supported. (*) The vCont? command. Indicate that we don't support any of its variants. Also make it possible to trace the commands and replies without tracing the individual character I/O. [akpm@linux-foundation.org: make gdbstub_handle_query() static] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Chris Metcalf authored
This ensures that platforms with lowmem PAs above 32 bits work correctly by avoiding truncating the PA during a left shift. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Haicheng Li authored
Add global mutex zonelists_mutex to fix the possible race: CPU0 CPU1 CPU2 (1) zone->present_pages += online_pages; (2) build_all_zonelists(); (3) alloc_page(); (4) free_page(); (5) build_all_zonelists(); (6) __build_all_zonelists(); (7) zone->pageset = alloc_percpu(); In step (3,4), zone->pageset still points to boot_pageset, so bad things may happen if 2+ nodes are in this state. Even if only 1 node is accessing the boot_pageset, (3) may still consume too much memory to fail the memory allocations in step (7). Besides, atomic operation ensures alloc_percpu() in step (7) will never fail since there is a new fresh memory block added in step(6). [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists] Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <andi.kleen@intel.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Haicheng Li authored
For each new populated zone of hotadded node, need to update its pagesets with dynamically allocated per_cpu_pageset struct for all possible CPUs: 1) Detach zone->pageset from the shared boot_pageset at end of __build_all_zonelists(). 2) Use mutex to protect zone->pageset when it's still shared in onlined_pages() Otherwises, multiple zones of different nodes would share same boot strapping boot_pageset for same CPU, which will finally cause below kernel panic: ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:1239! invalid opcode: 0000 [#1] SMP ... Call Trace: [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0 [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0 [<ffffffff81128407>] __page_cache_alloc+0x67/0x70 [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260 [<ffffffff81132751>] ra_submit+0x21/0x30 [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0 [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0 [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670 [<ffffffff81266cfa>] nfs_file_read+0xca/0x130 [<ffffffff8117b20a>] do_sync_read+0xfa/0x140 [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0 [<ffffffff8117c151>] sys_read+0x51/0x80 [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b RIP [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900 RSP <ffff88000d1e78a8> ---[ end trace 4bda28328b9990db ] [akpm@linux-foundation.org: merge fix] Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <andi.kleen@intel.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wu Fengguang authored
No behavior change here. Move some of setup_per_cpu_pageset() code into a new function setup_zone_pageset() that will be useful for memory hotplug. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Reviewed-by: Andi Kleen <andi.kleen@intel.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Marcelo Roberto Jimenez authored
Got this while compiling for ARM/SA1100: mm/sparse.c: In function '__section_nr': mm/sparse.c:135: warning: 'root' is used uninitialized in this function This patch follows Russell King's suggestion for a new calculation for NR_SECTION_ROOTS. Thanks also to Sergei Shtylyov for pointing out the existence of the macro DIV_ROUND_UP. Atsushi Nemoto observed: : This fix doesn't just silence the warning - it fixes a real problem. : : Without this fix, mem_section[] might have 0 size so mem_section[0] : will share other variable area. For example, I got: : : c030c700 b __warned.16478 : c030c700 B mem_section : c030c701 b __warned.16483 : : This might cause very strange behavior. Your patch actually fixes it. Signed-off-by: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Akinobu Mita authored
In f4112de6 ("mm: introduce debug_kmap_atomic") I said that debug_kmap_atomic() needs CONFIG_TRACE_IRQFLAGS_SUPPORT. It was wrong. (I thought irqs_disabled() is only available when the architecture has CONFIG_TRACE_IRQFLAGS_SUPPORT) Remove the #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT check to enable kmap_atomic() debugging for the architectures which do not have CONFIG_TRACE_IRQFLAGS_SUPPORT. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
matt mooney authored
Add parenthesis in a define. This doesn't change functionality. checkpatch errors: 1) white space fixes 2) add spaces after comas Signed-off-by: matt mooney <mfm@muteddisk.com> Cc: Dan Carpenter <error27@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
matt mooney authored
Fix minor spelling errors in a few comments; no code changes. Signed-off-by: matt mooney <mfm@muteddisk.com> Cc: Dan Carpenter <error27@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
minskey guo authored
Enable users to online CPUs even if the CPUs belongs to a numa node which doesn't have onlined local memory. The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either in system boot/init period, or at the time of local memory online. For a numa node without onlined local memory, its zonelists are not initialized at present. As a result, any memory allocation operations executed by CPUs within this node will fail. In fact, an out-of-memory error is triggered when attempt to online CPUs before memory comes to online. This patch tries to create zonelists for such numa nodes, so that the memory allocation for this node can be fallback'ed to other nodes. [akpm@linux-foundation.org: remove unneeded export] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: minskey guo<chaohong.guo@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
For now, we have global isolation vs. memory control group isolation, do not allow the reclaim entry function to set an arbitrary page isolation callback, we do not need that flexibility. And since we already pass around the group descriptor for the memory control group isolation case, just use it to decide which one of the two isolator functions to use. The decisions can be merged into nearby branches, so no extra cost there. In fact, we save the indirect calls. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
This scan control is abused to communicate a return value from shrink_zones(). Write this idiomatically and remove the knob. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Carpenter <error27@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Richard Kennedy authored
When wb_writeback() hasn't written anything it will re-acquire the inode lock before calling inode_wait_for_writeback. This change tests the sync bit first so that is doesn't need to drop & re-acquire the lock if the inode became available while wb_writeback() was waiting to get the lock. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KOSAKI Motohiro authored
free_hot_cold_page() and __free_pages_ok() have very similar freeing preparation. Consolidate them. [akpm@linux-foundation.org: fix busted coding style] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KOSAKI Motohiro authored
If vmscan is under lumpy reclaim mode, it have to ignore referenced bit for making contenious free pages. but current page_check_references() doesn't. Fix it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Huang Shijie authored
Fix a wrong comment over page_cache_async_readahead(). Signed-off-by: Huang Shijie <shijie8@gmail.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shaohua Li authored
get_scan_ratio() calculates percentage and if the percentage is < 1%, it will round percentage down to 0% and cause we completely ignore scanning anon/file pages to reclaim memory even the total anon/file pages are very big. To avoid underflow, we don't use percentage, instead we directly calculate how many pages should be scaned. In this way, we should get several scanned pages for < 1% percent. This has some benefits: 1. increase our calculation precision 2. making our scan more smoothly. Without this, if percent[x] is underflow, shrink_zone() doesn't scan any pages and suddenly it scans all pages when priority is zero. With this, even priority isn't zero, shrink_zone() gets chance to scan some pages. Note, this patch doesn't really change logics, but just increase precision. For system with a lot of memory, this might slightly changes behavior. For example, in a sequential file read workload, without the patch, we don't swap any anon pages. With it, if anon memory size is bigger than 16G, we will see one anon page swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap is assumed to be 1024 which is common in this workload. So the impact sounds not a big deal. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Greg Thelen authored
Use mm->task_size instead of TASK_SIZE to ensure that the entire user address space is migrated. mm->task_size is independent of the calling task context. TASK SIZE may be dependant on the address space size of the calling process. Usage of TASK_SIZE can lead to partial address space migration if the calling process was 32 bit and the migrating process was 64 bit. Here is the test script used on 64 system with a 32 bit echo process: mount -t cgroup none /cgroup -o cpuset cd /cgroup mkdir 0 echo 1 > 0/cpuset.cpus echo 0 > 0/cpuset.mems echo 1 > 0/cpuset.memory_migrate mkdir 1 echo 1 > 1/cpuset.cpus echo 1 > 1/cpuset.mems echo 1 > 1/cpuset.memory_migrate echo $$ > 0/tasks 64_bit_process & pid=$! echo $pid > 1/tasks # This does not migrate all process pages without # this patch. If 64 bit echo is used or this patch is # applied, then the full address space of $pid is # migrated. To check memory migration, I watched: grep MemUsed /sys/devices/system/node/node*/meminfo Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
The fragmentation index may indicate that a failure is due to external fragmentation but after a compaction run completes, it is still possible for an allocation to fail. There are two obvious reasons as to why o Page migration cannot move all pages so fragmentation remains o A suitable page may exist but watermarks are not met In the event of compaction followed by an allocation failure, this patch defers further compaction in the zone (1 << compact_defer_shift) times. If the next compaction attempt also fails, compact_defer_shift is increased up to a maximum of 6. If compaction succeeds, the defer counters are reset again. The zone that is deferred is the first zone in the zonelist - i.e. the preferred zone. To defer compaction in the other zones, the information would need to be stored in the zonelist or implemented similar to the zonelist_cache. This would impact the fast-paths and is not justified at this time. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
mm: compaction: add a tunable that decides when memory should be compacted and when it should be reclaimed The kernel applies some heuristics when deciding if memory should be compacted or reclaimed to satisfy a high-order allocation. One of these is based on the fragmentation. If the index is below 500, memory will not be compacted. This choice is arbitrary and not based on data. To help optimise the system and set a sensible default for this value, this patch adds a sysctl extfrag_threshold. The kernel will only compact memory if the fragmentation index is above the extfrag_threshold. [randy.dunlap@oracle.com: Fix build errors when proc fs is not configured] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
Ordinarily when a high-order allocation fails, direct reclaim is entered to free pages to satisfy the allocation. With this patch, it is determined if an allocation failed due to external fragmentation instead of low memory and if so, the calling process will compact until a suitable page is freed. Compaction by moving pages in memory is considerably cheaper than paging out to disk and works where there are locked pages or no swap. If compaction fails to free a page of a suitable size, then reclaim will still occur. Direct compaction returns as soon as possible. As each block is compacted, it is checked if a suitable page has been freed and if so, it returns. [akpm@linux-foundation.org: Fix build errors] [aarcange@redhat.com: fix count_vm_event preempt in memory compaction direct reclaim] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
Add a per-node sysfs file called compact. When the file is written to, each zone in that node is compacted. The intention that this would be used by something like a job scheduler in a batch system before a job starts so that the job can allocate the maximum number of hugepages without significant start-up cost. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-