- 31 Jan, 2011 3 commits
-
-
Russell King authored
0ea12930 (arm: return both physical and virtual addresses from addruart) changed the way the 'addruart' worked, making it return both the virt and phys addresses. Unfortunately, for footbridge, these were reversed. Fix that. Tested on Netwinder. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
-
Russell King authored
We should not report incomplete blocks on error. Return the number of bytes successfully transferred, rounded down to the nearest block. Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
-
Russell King authored
When we encounter an error, make sure we complete the transaction otherwise we'll leave the request dangling. Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
-
- 27 Jan, 2011 8 commits
-
-
Linus Walleij authored
The MMCIDATACNT register contain the number of byte left at error not the number of words, so loose the << 2 thing. Further if CRC fails on the first block, we may end up with a negative number of transferred bytes which is not good, and the formula was in wrong order. Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
-
Linus Torvalds authored
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Use rq->clock_task instead of rq->clock for correctly maintaining load averages sched: Fix/remove redundant cfs_rq checks sched: Fix sign under-flows in wake_affine
-
Linus Torvalds authored
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: percpu, x86: Fix percpu_xchg_op() x86: Remove left over system_64.h x86-64: Don't use pointer to out-of-scope variable in dump_trace()
-
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmcLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: bfin_sdh: fix alloc size for private data mmc: sdhci-s3c: add platform_8bit_width() hook mmc: jz4740: don't treat NULL clk as an error mmc: mmci: don't read command response when invalid mmc: ushc: Remove duplicate include of usb.h
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (43 commits) bnx2: Eliminate AER error messages on systems not supporting it cnic: Fix big endian bug xfrm6: Don't forget to propagate peer into ipsec route. tg3: Use new VLAN code bonding: update documentation - alternate configuration. TCP: fix a bug that triggers large number of TCP RST by mistake MAINTAINERS: remove Reinette Chatre as iwlwifi maintainer rt2x00: add device id for windy31 usb device mac80211: fix a crash in ieee80211_beacon_get_tim on change_interface ipv6: Revert 'administrative down' address handling changes. textsearch: doc - fix spelling in lib/textsearch.c. USB NET KL5KUSB101: Fix mem leak in error path of kaweth_download_firmware() pch_gbe: don't use flush_scheduled_work() bnx2: Always set ETH_FLAG_TXVLAN net: clear heap allocation for ethtool_get_regs() ipv6: Always clone offlink routes. dcbnl: make get_app handling symmetric for IEEE and CEE DCBx tcp: fix bug in listening_get_next() inetpeer: Use correct AVL tree base pointer in inet_getpeer(). GRO: fix merging a paged skb after non-paged skbs ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/stagingLinus Torvalds authored
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging: hwmon: (lis3) turn down the no IRQ message hwmon: (asus_atk0110) Override interface detection on Sabertooth X58 hwmon: (applesmc) Properly initialize lockdep attributes
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6Linus Torvalds authored
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Runtime: Don't enable interrupts while running in_interrupt
-
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/avr32-2.6Linus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/avr32-2.6: avr32: add missing include causing undefined pgtable_page_* references
-
- 26 Jan, 2011 29 commits
-
-
Michael Chan authored
On PPC for example, AER is not supported and we see unnecessary AER error message without this patch: bnx2 0003:01:00.1: pci_cleanup_aer_uncorrect_error_status failed 0xfffffffb Reported-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
The chip's page tables did not set up properly on big endian machines, causing EEH errors on PPC machines. Reported-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Like ipv4, we have to propagate the ipv6 route peer into the ipsec top-level route during instantiation. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matt Carlson authored
This patch pivots the tg3 driver to the new VLAN infrastructure. All references to vlgrp have been removed. The driver still attempts to disable VLAN tag stripping if CONFIG_VLAN_8021Q or CONFIG_VLAN_8021Q_MODULE is not defined. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hans-Christian Egtvedt authored
This patch adds the linux/mm.h header file to the AVR32 arch pgalloc.c implementation to fix the undefined reference to pgtable_page_ctor() and pgtable_page_dtor(). Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
-
Paul Turner authored
The delta in clock_task is a more fair attribution of how much time a tg has been contributing load to the current cpu. While not really important it also means we're more in sync (by magnitude) with respect to periodic updates (since __update_curr deltas are clock_task based). Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110122044852.007092349@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Paul Turner authored
Since updates are against an entity's queuing cfs_rq it's not possible to enter update_cfs_{shares,load} with a NULL cfs_rq. (Indeed, update_cfs_load would crash prior to the check if we did anyway since we load is examined during the initializers). Also, in the update_cfs_load case there's no point in maintaining averages for rq->cfs_rq since we don't perform shares distribution at that level -- NULL check is replaced accordingly. Thanks to Dan Carpenter for pointing out the deference before NULL check. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110122044851.825284940@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Paul Turner authored
While care is taken around the zero-point in effective_load to not exceed the instantaneous rq->weight, it's still possible (e.g. using wake_idx != 0) for (load + effective_load) to underflow. In this case the comparing the unsigned values can result in incorrect balanced decisions. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110122044851.734245014@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Eric Dumazet authored
These recent percpu commits: 2485b646: x86,percpu: Move out of place 64 bit ops into X86_64 section 8270137a: cpuops: Use cmpxchg for xchg to avoid lock semantics Caused this 'perf top' crash: Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>] ? panic ? kmsg_dump ? kmsg_dump ? oops_end ? no_context ? __bad_area_nosemaphore ? perf_output_begin ? bad_area_nosemaphore ? do_page_fault ? __task_pid_nr_ns ? perf_event_tid ? __perf_event_header__init_id ? validate_chain ? perf_output_sample ? trace_hardirqs_off ? page_fault ? irq_work_run ? update_process_times ? tick_sched_timer ? tick_sched_timer ? __run_hrtimer ? hrtimer_interrupt ? account_system_vtime ? smp_apic_timer_interrupt ? apic_timer_interrupt ... Looking at assembly code, I found: list = this_cpu_xchg(irq_work_list, NULL); gives this wrong code : (gcc-4.1.2 cross compiler) ffffffff810bc45e: mov %gs:0xead0,%rax cmpxchg %rax,%gs:0xead0 jne ffffffff810bc45e <irq_work_run+0x3e> test %rax,%rax je ffffffff810bc4aa <irq_work_run+0x8a> Tell gcc we dirty eax/rax register in percpu_xchg_op() Compiler must use another register to store pxo_new__ We also dont need to reload percpu value after a jump, since a 'failed' cmpxchg already updated eax/rax Wrong generated code was : xor %rax,%rax /* load 0 into %rax */ 1: mov %gs:0xead0,%rax cmpxchg %rax,%gs:0xead0 jne 1b test %rax,%rax After patch : xor %rdx,%rdx /* load 0 into %rdx */ mov %gs:0xead0,%rax 1: cmpxchg %rdx,%gs:0xead0 jne 1b: test %rax,%rax Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Yinghai Lu authored
Left-over from the x86 merge ... Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D3E23D1.7010405@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wacom - pass touch resolution to clients through input_absinfo Input: wacom - add 2 Bamboo Pen and touch models Input: sysrq - ensure sysrq_enabled and __sysrq_enabled are consistent Input: sparse-keymap - fix KEY_VSW handling in sparse_keymap_setup Input: tegra-kbc - add tegra keyboard driver Input: gpio_keys - switch to using request_any_context_irq Input: serio - allow registered drivers to get status flag Input: ct82710c - return proper error code for ct82c710_open Input: bu21013_ts - added regulator support Input: bu21013_ts - remove duplicate resolution parameters Input: tnetv107x-ts - don't treat NULL clk as an error Input: tnetv107x-keypad - don't treat NULL clk as an error Fix up trivial conflicts in drivers/input/keyboard/Makefile due to additions of tc3589x/Tegra drivers
-
Sonic Zhang authored
The bfin_sdh driver allocates the wrong size for the private data in the mmc_host. The first parameter of mmc_alloc_host should be the size of the local driver struct rather than the common mmc_host. Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: <stable@kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>
-
Jaehoon Chung authored
We have 8-bit width support but is not a v3 controller. So we need platform_8bit_width() to support 8-bit buswidth. Also we need MMC_CAP_8_BIT_DATA, so we add it in platdata. This gets 8-bit support working again on s3c, after we previously disabled 8-bit by default on non-v3 controllers. Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Chris Ball <cjb@laptop.org>
-
Jamie Iles authored
clk_get() returns a struct clk cookie to the driver and some platforms may return NULL if they only support a single clock. clk_get() has only failed if it returns a ERR_PTR() encoded pointer. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Chris Ball <cjb@laptop.org>
-
Russell King - ARM Linux authored
Don't read the command response from the registers when either the command timed out (because there was no response from the card) or the checksum on the response was invalid. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Chris Ball <cjb@laptop.org>
-
Jesper Juhl authored
Including usb.h once is enough in drivers/mmc/host/ushc.c This removes the duplicate. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Chris Ball <cjb@laptop.org>
-
Ping Cheng authored
Also remove fake ABS_RX/ABS_RY "axes" that were used to report physical dimensions now that we have better way. Signed-off-by: Ping Cheng <pingc@wacom.com> Reviewed-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
-
Torben Hohn authored
The -rt patches change the console_semaphore to console_mutex. As a result, a quite large chunk of the patches changes all acquire/release_console_sem() to acquire/release_console_mutex() This commit makes things use more neutral function names which dont make implications about the underlying lock. The only real change is the return value of console_trylock which is inverted from try_acquire_console_sem() This patch also paves the way to switching console_sem from a semaphore to a mutex. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert] Signed-off-by: Torben Hohn <torbenh@gmx.de> Cc: Thomas Gleixner <tglx@tglx.de> Cc: Greg KH <gregkh@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Phillip Lougher authored
Fix potential use of uninitialised variable caused by recent decompressor code optimisations. In zlib_uncompress (zlib_wrapper.c) we have int zlib_err, zlib_init = 0; ... do { ... if (avail == 0) { offset = 0; put_bh(bh[k++]); continue; } ... zlib_err = zlib_inflate(stream, Z_SYNC_FLUSH); ... } while (zlib_err == Z_OK); If continue is executed (avail == 0) then the while condition will be evaluated testing zlib_err, which is uninitialised first time around the loop. Fix this by getting rid of the 'if (avail == 0)' condition test, this edge condition should not be being handled in the decompressor code, and instead handle it generically in the caller code. Similarly for xz_wrapper.c. Incidentally, on most architectures (bar Mips and Parisc), no uninitialised variable warning is generated by gcc, this is because the while condition test on continue is optimised out and not performed (when executing continue zlib_err has not been changed since entering the loop, and logically if the while condition was true previously, then it's still true). Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk> Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Toshiyuki Okajima authored
Executed command: fsstress -d /mnt -n 600 -p 850 crash> bt PID: 7947 TASK: ffff880160546a70 CPU: 0 COMMAND: "fsstress" #0 [ffff8800dfc07d00] machine_kexec at ffffffff81030db9 #1 [ffff8800dfc07d70] crash_kexec at ffffffff810a7952 #2 [ffff8800dfc07e40] oops_end at ffffffff814aa7c8 #3 [ffff8800dfc07e70] die_nmi at ffffffff814aa969 #4 [ffff8800dfc07ea0] do_nmi_callback at ffffffff8102b07b #5 [ffff8800dfc07f10] do_nmi at ffffffff814aa514 #6 [ffff8800dfc07f50] nmi at ffffffff814a9d60 [exception RIP: __lookup_tag+100] RIP: ffffffff812274b4 RSP: ffff88016056b998 RFLAGS: 00000287 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000006 RDX: 000000000000001d RSI: ffff88016056bb18 RDI: ffff8800c85366e0 RBP: ffff88016056b9c8 R8: ffff88016056b9e8 R9: 0000000000000000 R10: 000000000000000e R11: ffff8800c8536908 R12: 0000000000000010 R13: 0000000000000040 R14: ffffffffffffffc0 R15: ffff8800c85366e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 <NMI exception stack> #7 [ffff88016056b998] __lookup_tag at ffffffff812274b4 #8 [ffff88016056b9d0] radix_tree_gang_lookup_tag_slot at ffffffff81227605 #9 [ffff88016056ba20] find_get_pages_tag at ffffffff810fc110 #10 [ffff88016056ba80] pagevec_lookup_tag at ffffffff81105e85 #11 [ffff88016056baa0] write_cache_pages at ffffffff81104c47 #12 [ffff88016056bbd0] generic_writepages at ffffffff81105014 #13 [ffff88016056bbe0] do_writepages at ffffffff81105055 #14 [ffff88016056bbf0] __filemap_fdatawrite_range at ffffffff810fb2cb #15 [ffff88016056bc40] filemap_write_and_wait_range at ffffffff810fb32a #16 [ffff88016056bc70] generic_file_direct_write at ffffffff810fb3dc #17 [ffff88016056bce0] __generic_file_aio_write at ffffffff810fcee5 #18 [ffff88016056bda0] generic_file_aio_write at ffffffff810fd085 #19 [ffff88016056bdf0] do_sync_write at ffffffff8114f9ea #20 [ffff88016056bf00] vfs_write at ffffffff8114fcf8 #21 [ffff88016056bf30] sys_write at ffffffff81150691 #22 [ffff88016056bf80] system_call_fastpath at ffffffff8100c0b2 I think this root cause is the following: radix_tree_range_tag_if_tagged() always tags the root tag with settag if the root tag is set with iftag even if there are no iftag tags in the specified range (Of course, there are some iftag tags outside the specified range). =============================================================================== [[[Detailed description]]] (1) Why cannot radix_tree_gang_lookup_tag_slot() return forever? __lookup_tag(): - Return with 0. - Return with the index which is not bigger than the old one as the input parameter. Therefore the following "while" repeats forever because the above conditions cause "ret" not to be updated and the cur_index cannot be changed into the bigger one. (So, radix_tree_gang_lookup_tag_slot() cannot return forever.) radix_tree_gang_lookup_tag_slot(): 1178 while (ret < max_items) { 1179 unsigned int slots_found; 1180 unsigned long next_index; /* Index of next search */ 1181 1182 if (cur_index > max_index) 1183 break; 1184 slots_found = __lookup_tag(node, results + ret, 1185 cur_index, max_items - ret, &next_index, tag); 1186 ret += slots_found; // cannot update ret because slots_found == 0. // so, this while loops forever. 1187 if (next_index == 0) 1188 break; 1189 cur_index = next_index; 1190 } (2) Why does __lookup_tag() return with 0 and doesn't update the index? Assuming the following: - the one of the slot in radix_tree_node is NULL. - the one of the tag which corresponds to the slot sets with PAGECACHE_TAG_TOWRITE or other. - In a certain height(!=0), the corresponding index is 0. a) __lookup_tag() notices that the tag is set. 1005 static unsigned int 1006 __lookup_tag(struct radix_tree_node *slot, void ***results, unsigned long index, 1007 unsigned int max_items, unsigned long *next_index, unsigned int tag) 1008 { 1009 unsigned int nr_found = 0; 1010 unsigned int shift, height; 1011 1012 height = slot->height; 1013 if (height == 0) 1014 goto out; 1015 shift = (height-1) * RADIX_TREE_MAP_SHIFT; 1016 1017 while (height > 0) { 1018 unsigned long i = (index >> shift) & RADIX_TREE_MAP_MASK ; 1019 1020 for (;;) { 1021 if (tag_get(slot, tag, i)) 1022 break; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * the index is not updated yet. b) __lookup_tag() notices that the slot is NULL. 1023 index &= ~((1UL << shift) - 1); 1024 index += 1UL << shift; 1025 if (index == 0) 1026 goto out; /* 32-bit wraparound */ 1027 i++; 1028 if (i == RADIX_TREE_MAP_SIZE) 1029 goto out; 1030 } 1031 height--; 1032 if (height == 0) { /* Bottom level: grab some items */ ... 1055 } 1056 shift -= RADIX_TREE_MAP_SHIFT; 1057 slot = rcu_dereference_raw(slot->slots[i]); 1058 if (slot == NULL) 1059 break; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ c) __lookup_tag() doesn't update the index and return with 0. 1060 } 1061 out: 1062 *next_index = index; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1063 return nr_found; 1064 } (3) Why is the slot NULL even if the tag is set? Because radix_tree_range_tag_if_tagged() always sets the root tag with PAGECACHE_TAG_TOWRITE if the root tag is set with PAGECACHE_TAG_DIRTY, even if there is no tag which can be set with PAGECACHE_TAG_TOWRITE in the specified range (from *first_indexp to last_index). Of course, some PAGECACHE_TAG_DIRTY nodes must exist outside the specified range. (radix_tree_range_tag_if_tagged() is called only from tag_pages_for_writeback()) 640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root, 641 unsigned long *first_indexp, unsigned long last_index, 642 unsigned long nr_to_tag, 643 unsigned int iftag, unsigned int settag) 644 { 645 unsigned int height = root->height; 646 struct radix_tree_path path[height]; 647 struct radix_tree_path *pathp = path; 648 struct radix_tree_node *slot; 649 unsigned int shift; 650 unsigned long tagged = 0; 651 unsigned long index = *first_indexp; 652 653 last_index = min(last_index, radix_tree_maxindex(height)); 654 if (index > last_index) 655 return 0; 656 if (!nr_to_tag) 657 return 0; 658 if (!root_tag_get(root, iftag)) { 659 *first_indexp = last_index + 1; 660 return 0; 661 } 662 if (height == 0) { 663 *first_indexp = last_index + 1; 664 root_tag_set(root, settag); 665 return 1; 666 } ... 733 root_tag_set(root, settag); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 734 *first_indexp = index; 735 736 return tagged; 737 } As the result, there is no radix_tree_node which is set with PAGECACHE_TAG_TOWRITE but the root tag(radix_tree_root) is set with PAGECACHE_TAG_TOWRITE. [figure: inside radix_tree] (Please see the figure with typewriter font) =========================================== [roottag = DIRTY] | tag=0:NOTHING tag[0 0 0 1] 1:DIRTY [x x x +] 2:WRITEBACK | 3:DIRTY,WRITEBACK p 4:TOWRITE <---> 5:DIRTY,TOWRITE ... specified range (index: 0 to 2) * There is no DIRTY tag within the specified range. (But there is a DIRTY tag outside that range.) | | | | | | | | | after calling tag_pages_for_writeback() | | | | | | | | | v v v v v v v v v [roottag = DIRTY,TOWRITE] | p is "page". tag[0 0 0 1] x is NULL. [x x x +] +- is a pointer to "page". | p * But TOWRITE tag is set on the root tag. ============================================ After that, radix_tree_extend() via radix_tree_insert() is called when the page is added. This function sets the new radix_tree_node with PAGECACHE_TAG_TOWRITE to succeed the status of the root tag. 246 static int radix_tree_extend(struct radix_tree_root *root, unsigned long index) 247 { 248 struct radix_tree_node *node; 249 unsigned int height; 250 int tag; 251 252 /* Figure out what the height should be. */ 253 height = root->height + 1; 254 while (index > radix_tree_maxindex(height)) 255 height++; 256 257 if (root->rnode == NULL) { 258 root->height = height; 259 goto out; 260 } 261 262 do { 263 unsigned int newheight; 264 if (!(node = radix_tree_node_alloc(root))) 265 return -ENOMEM; 266 267 /* Increase the height. */ 268 node->slots[0] = radix_tree_indirect_to_ptr(root->rnode); 269 270 /* Propagate the aggregated tag info into the new root */ 271 for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) { 272 if (root_tag_get(root, tag)) 273 tag_set(node, tag, 0); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 274 } =========================================== [roottag = DIRTY,TOWRITE] | : tag[0 0 0 1] [0 0 0 0] [x x x +] [+ x x x] | | p p (new page) | | | | | | | | | after calling radix_tree_insert | | | | | | | | | v v v v v v v v v [roottag = DIRTY,TOWRITE] | tag [5 0 0 0] * DIRTY and TOWRITE tags are [+ + x x] succeeded to the new node. | | tag [0 0 0 1] [0 0 0 0] [x x x +] [+ x x x] | | p p ============================================ After that, the index 3 page is released by remove_from_page_cache(). Then we can make the situation that the tag is set with PAGECACHE_TAG_TOWRITE and that the slot which corresponds to the tag is NULL. =========================================== [roottag = DIRTY,TOWRITE] | tag [5 0 0 0] [+ + x x] | | tag [0 0 0 1] [0 0 0 0] [x x x +] [+ x x x] | | p p (remove) | | | | | | | | | after calling remove_page_cache | | | | | | | | | v v v v v v v v v [roottag = DIRTY,TOWRITE] | tag [4 0 0 0] * Only DIRTY tag is cleared [x + x x] because no TOWRITE tag is existed | in the bottom node. [0 0 0 0] [+ x x x] | p ============================================ To solve this problem Change to that radix_tree_tag_if_tagged() doesn't tag the root tag if it doesn't set any tags within the specified range. Like this. ============================================ 640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root, 641 unsigned long *first_indexp, unsigned long last_index, 642 unsigned long nr_to_tag, 643 unsigned int iftag, unsigned int settag) 644 { 650 unsigned long tagged = 0; ... 733 if (tagged) ^^^^^^^^^^^^^^^^^^^^^^^^ 734 root_tag_set(root, settag); 735 *first_indexp = index; 736 737 return tagged; 738 } ============================================ Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Voss, Nikolaus authored
setup_irq() was called before clockevents_register_device() which is needed by the irq handler. Bug was reproducible by restarting the kernel using kexec (reliable crash). Signed-off-by: Nikolaus Voss <n.voss@weinmann.de> Cc: David Brownell <dbrownell@users.sourceforge.net> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
A fix up mem_cgroup_move_parent() which use compound_order() in asynchronous manner. This compound_order() may return unknown value because we don't take lock. Use PageTransHuge() and HPAGE_SIZE instead of it. Also clean up for mem_cgroup_move_parent(). - remove unnecessary initialization of local variable. - rename charge_size -> page_size - remove unnecessary (wrong) comment. - added a comment about THP. Note: Current design take compound_page_lock() in caller of move_account(). This should be revisited when we implement direct move_task of hugepage without splitting. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
mem_cgroup_disabled() should be checked at splitting. If disabled, no heavy work is necesary. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
Commit 4b534334 ("memcg: clean up try_charge main loop") removes a cancel of charge at case: memory charge-> success. mem+swap charge-> failure. This leaks usage of memory. Fix it. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: <stable@kernel.org> [2.6.36+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
Callers of migrate_pages should putback_lru_pages to return pages isolated to LRU or free list. Now comment is rather confusing. It says caller always have to call it. It is more clear to point out that the caller has to call it if migrate_pages's return value isn't zero. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrea Arcangeli authored
Commit 5d689240 ("thp: select CONFIG_COMPACTION if TRANSPARENT_HUGEPAGE enabled") causes this warning during the configuration process: warning: (TRANSPARENT_HUGEPAGE) selects COMPACTION which has unmet direct dependencies (EXPERIMENTAL && HUGETLB_PAGE && MMU) COMPACTION doesn't depend on HUGETLB_PAGE, it doesn't depend on THP either, it is also useful for regular alloc_pages(order > 0) including the very kernel stack during fork (THREAD_ORDER = 1). It's always better to enable COMPACTION. The warning should be an error because we would end up with MIGRATION not selected, and COMPACTION wouldn't work without migration (despite it seems to build with an inline migrate_pages returning -ENOSYS). I'd also like to remove EXPERIMENTAL: compaction has been in the kernel for some releases (for full safety the default remains disabled which I think is enough). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Luca Tettamanti <kronos.it@gmail.com> Tested-by: Luca Tettamanti <kronos.it@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Juhl authored
In mm/memcontrol.c::mem_cgroup_move_parent() there's a path that jumps to the 'put_back' label ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, charge); if (ret || !parent) goto put_back; where we'll if (charge > PAGE_SIZE) compound_unlock_irqrestore(page, flags); but, we have not assigned anything to 'flags' at this point, nor have we called 'compound_lock_irqsave()' (which is what sets 'flags'). The 'put_back' label should be moved below the call to compound_unlock_irqrestore() as per this patch. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Rientjes authored
Commit 0e093d99 ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") uncovered a livelock in the page allocator that resulted in tasks infinitely looping trying to find memory and kswapd running at 100% cpu. The issue occurs because drain_all_pages() is called immediately following direct reclaim when no memory is freed and try_to_free_pages() returns non-zero because all zones in the zonelist do not have their all_unreclaimable flag set. When draining the per-cpu pagesets back to the buddy allocator for each zone, the zone->pages_scanned counter is cleared to avoid erroneously setting zone->all_unreclaimable later. The problem is that no pages may actually be drained and, thus, the unreclaimable logic never fails direct reclaim so the oom killer may be invoked. This apparently only manifested after wait_iff_congested() was introduced and the zone was full of anonymous memory that would not congest the backing store. The page allocator would infinitely loop if there were no other tasks waiting to be scheduled and clear zone->pages_scanned because of drain_all_pages() as the result of this change before kswapd could scan enough pages to trigger the reclaim logic. Additionally, with every loop of the page allocator and in the reclaim path, kswapd would be kicked and would end up running at 100% cpu. In this scenario, current and kswapd are all running continuously with kswapd incrementing zone->pages_scanned and current clearing it. The problem is even more pronounced when current swaps some of its memory to swap cache and the reclaimable logic then considers all active anonymous memory in the all_unreclaimable logic, which requires a much higher zone->pages_scanned value for try_to_free_pages() to return zero that is never attainable in this scenario. Before wait_iff_congested(), the page allocator would incur an unconditional timeout and allow kswapd to elevate zone->pages_scanned to a level that the oom killer would be called the next time it loops. The fix is to only attempt to drain pcp pages if there is actually a quantity to be drained. The unconditional clearing of zone->pages_scanned in free_pcppages_bulk() need not be changed since other callers already ensure that draining will occur. This patch ensures that free_pcppages_bulk() will actually free memory before calling into it from drain_all_pages() so zone->pages_scanned is only cleared if appropriate. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-