- 13 Oct, 2014 22 commits
-
-
Oleg Chernovskiy authored
commit 6bce8d97 upstream. Properly set the thermal min and max temp on CI. Otherwise, we end up setting the thermal ranges to 0 on resume and end up in the lowest power state. Signed-off-by: Oleg Chernovskiy <algonkvel@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Thomas Hellstrom authored
commit f01ea0c3 upstream. The code waiting for fifo idle was incorrect and could possibly spin forever under certain circumstances. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reported-by: Mark Sheldon <markshel@vmware.com> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> Reivewed-by: Mark Sheldon <markshel@vmware.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Y.C. Chen authored
commit 83502a5d upstream. Type error and cause AST2000 cannot be detected correctly Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Reviewed-by: Egbert Eich <eich@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Ville Syrjälä authored
commit 7a98948f upstream. The vblank waits in intel_tv_detect_type() are timing out for some reason. This is a regression caused removing seemingly useless vblank waits from the modeset seqeuence in: commit 56ef52ca Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Thu May 8 19:23:15 2014 +0300 drm/i915: Kill vblank waits after pipe enable on gmch platforms So it turns out they weren't all entirely useless. Apparently the pipe has to go through one full frame before we enable the TV port. Add a vblank wait to intel_enable_tv() to make sure that happens. Another approach was attempted by placing the vblank wait just after enabling the port. The theory behind that attempt was that we need to let the port stay enabled for one full frame before disabling it again during load detection. But that didn't work, and we definitely must have the vblank wait before enabling the port. Cc: Alan Bartlett <ajb@elrepo.org> Tested-by: Alan Bartlett <ajb@elrepo.org> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79311Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Daniel Vetter authored
commit 2232f031 upstream. In commit 1f83fee0 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Nov 15 17:17:22 2012 +0100 drm/i915: clear up wedged transitions I've accidentally inverted the EIO/wedged handling in the fault handler: We want to return the EIO as a SIGBUS only if it's not because of the gpu having died, to prevent userspace from unduly dying. In my defence the comment right above is completely misleading, so fix both. v2: Drop the WARN_ON, it's not actually a bug to e.g. receive an -EIO when swap-in fails. v3: Don't remove too much ... oops. Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mathias Krause authored
commit bbe1c274 upstream. The __init annotations for the DMI callback functions are wrong as this code can be called even after the module has been initialized, e.g. like this: # echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove # modprobe i915 # echo 1 > /sys/bus/pci/rescan The first command will remove the PCI device from the kernel's device list so the second command won't see it right away. But as it registers a PCI driver it'll see it on the third command. If the system happens to match one of the DMI table entries we'll try to call a function in long released memory and generate an Oops, at best. Fix this by removing the bogus annotation. Modpost should have caught that one but it ignores section reference mismatches from the .rodata section. :/ Fixes: 25e341cf ("drm/i915: quirk away broken OpRegion VBT") Fixes: 8ca4013d ("CHROMIUM: i915: Add DMI override to skip CRT...") Fixes: 425d244c ("drm/i915: ignore LVDS on intel graphics systems...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Duncan Laurie <dlaurie@chromium.org> Cc: Jarod Wilson <jarod@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> # Can modpost be fixed? Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Filipe Brandenburger authored
commit bfcfd44c upstream. The guard was introduced in commit ea1a8217 ("xattr: guard against simultaneous glibc header inclusion") but it is using #ifdef to check for a define that is either set to 1 or 0. Fix it to use #if instead. * Without this patch: $ { echo "#include <sys/xattr.h>"; echo "#include <linux/xattr.h>"; } | gcc -E -Iinclude/uapi - >/dev/null include/uapi/linux/xattr.h:19:0: warning: "XATTR_CREATE" redefined [enabled by default] #define XATTR_CREATE 0x1 /* set value, fail if attr already exists */ ^ /usr/include/x86_64-linux-gnu/sys/xattr.h:32:0: note: this is the location of the previous definition #define XATTR_CREATE XATTR_CREATE ^ * With this patch: $ { echo "#include <sys/xattr.h>"; echo "#include <linux/xattr.h>"; } | gcc -E -Iinclude/uapi - >/dev/null (no warnings) Signed-off-by: Filipe Brandenburger <filbranden@google.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Allan McRae <allan@archlinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Benjamin Tissoires authored
commit 5abfe85c upstream. Commit "HID: logitech: perform bounds checking on device_id early enough" unfortunately leaks some errors to dmesg which are not real ones: - if the report is not a DJ one, then there is not point in checking the device_id - the receiver (index 0) can also receive some notifications which can be safely ignored given the current implementation Move out the test regarding the report_id and also discards printing errors when the receiver got notified. Fixes: ad3e14d7Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jiri Kosina authored
commit c54def7b upstream. The report passed to us from transport driver could potentially be arbitrarily large, therefore we better sanity-check it so that magicmouse_emit_touch() gets only valid values of raw_id. Reported-by: Steven Vittitoe <scvitti@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jiri Kosina authored
commit 844817e4 upstream. The report passed to us from transport driver could potentially be arbitrarily large, therefore we better sanity-check it so that raw_data that we hold in picolcd_pending structure are always kept within proper bounds. Reported-by: Steven Vittitoe <scvitti@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Toshiaki Makita authored
commit e15693ef upstream. cfq_group_service_tree_add() is applying new_weight at the beginning of the function via cfq_update_group_weight(). This actually allows weight to change between adding it to and subtracting it from children_weight, and triggers WARN_ON_ONCE() in cfq_group_service_tree_del(), or even causes oops by divide error during vfr calculation in cfq_group_service_tree_add(). The detailed scenario is as follows: 1. Create blkio cgroups X and Y as a child of X. Set X's weight to 500 and perform some I/O to apply new_weight. This X's I/O completes before starting Y's I/O. 2. Y starts I/O and cfq_group_service_tree_add() is called with Y. 3. cfq_group_service_tree_add() walks up the tree during children_weight calculation and adds parent X's weight (500) to children_weight of root. children_weight becomes 500. 4. Set X's weight to 1000. 5. X starts I/O and cfq_group_service_tree_add() is called with X. 6. cfq_group_service_tree_add() applies its new_weight (1000). 7. I/O of Y completes and cfq_group_service_tree_del() is called with Y. 8. I/O of X completes and cfq_group_service_tree_del() is called with X. 9. cfq_group_service_tree_del() subtracts X's weight (1000) from children_weight of root. children_weight becomes -500. This triggers WARN_ON_ONCE(). 10. Set X's weight to 500. 11. X starts I/O and cfq_group_service_tree_add() is called with X. 12. cfq_group_service_tree_add() applies its new_weight (500) and adds it to children_weight of root. children_weight becomes 0. Calcularion of vfr triggers oops by divide error. weight should be updated right before adding it to children_weight. Reported-by: Ruki Sekiya <sekiya.ruki@lab.ntt.co.jp> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Clemens Ladisch authored
commit a9960e6a upstream. The calculated frame size was wrong because snd_pcm_format_physical_width() actually returns the number of bits, not bytes. Use snd_pcm_format_size() instead, which not only returns bytes, but also simplifies the calculation. Fixes: 8bea869c ("ALSA: PCM midlevel: improve fifo_size handling") Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Takashi Iwai authored
commit 7a9744cb upstream. When a driver is set up without the jack detection explicitly (either by passing a model option or via a specific fixup), the pin powermap of IDT/STAC codecs is set up wrongly, resulting in the silence output. It's because of a logic failure in stac_init_power_map(). It tries to avoid creating a callback for the pins that have other auto-hp and auto-mic callbacks, but the check is done in a wrong way at a wrong time. The stac_init_power_map() should be called after creating other jack detection ctls, and the jack callback should be created only for jack-detectable widgets. This patch fixes the check in stac_init_power_map() and its callee at the right place, after snd_hda_gen_build_controls(). Reported-by: Adam Richter <adam_richter2004@yahoo.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Takashi Iwai authored
commit acf08081 upstream. ALC1150 codec seems to need the COEF- and PLL-setups just like its compatible ALC882 codec. Some machines (e.g. SunMicro X10SAT) show the problem like too low output volumes unless the COEF setup is applied. Reported-and-tested-by: Dana Goyette <danagoyette@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Takashi Iwai authored
commit ff50479a upstream. Acer Aspire 3830TG with CX20588 codec has a digital built-in mic that has the same problem like many others, the inverted signal in stereo. Apply the same fixup to this machine, too. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Clemens Ladisch authored
commit ddc64b27 upstream. snd_info_get_line() documents that its last parameter must be one less than the buffer size, but this API design guarantees that (literally) every caller gets it wrong. Just change this parameter to have its obvious meaning. Reported-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Will Deacon authored
commit 27d7ff27 upstream. I'm not sure what I was on when I wrote this, but when iterating over the hardware watchpoint array (hbp_watch_array), our index is off by ARM_MAX_BRP, so we walk off the end of our thread_struct... ... except, a dodgy condition in the loop means that it never executes at all (bp cannot be NULL). This patch fixes the code so that we remove the bp check and use the correct index for accessing the watchpoint structures. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Josef Bacik authored
commit 4ce97dbf upstream. Epoll on trace_pipe can sometimes hang in a weird case. If the ring buffer is empty when we set waiters_pending but an event shows up exactly at that moment we can miss being woken up by the ring buffers irq work. Since ring_buffer_empty() is inherently racey we will sometimes think that the buffer is not empty. So we don't get woken up and we don't think there are any events even though there were some ready when we added the watch, which makes us hang. This patch fixes this by making sure that we are actually on the wait list before we set waiters_pending, and add a memory barrier to make sure ring_buffer_empty() is going to be correct. Link: http://lkml.kernel.org/p/1408989581-23727-1-git-send-email-jbacik@fb.com Cc: Martin Lau <kafai@fb.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Fan Du authored
commit 979bbf7b upstream. In block write mode, when encapsulating dma_buffer, first element is 'command', the rest is data buffer, so only copy actual data buffer starting from block[1] with the size indicating by block[0]. Signed-off-by: Fan Du <fan.du@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Simon Lindgren authored
commit 6721f28a upstream. There is a race condition in at91_do_twi_xfer when signals arrive. If a signal is recieved while waiting for a transfer to complete wait_for_completion_interruptible_timeout() will return -ERESTARTSYS. This is not handled correctly resulting in interrupts still being enabled and a transfer being in flight when we return. Symptoms include a range of oopses and bus lockups. Oopses can happen when the transfer completes because the interrupt handler will corrupt the stack. If a new transfer is started before the interrupt fires the controller will start a new transfer in the middle of the old one, resulting in confused slaves and a locked bus. To avoid this, use wait_for_completion_io_timeout instead so that we don't have to deal with gracefully shutting down the transfer and disabling the interrupts. Signed-off-by: Simon Lindgren <simon@aqwary.com> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Marek Roszko authored
commit 75b81f33 upstream. The driver was not bound checking the received length byte to ensure it was within the the buffer size that is allocated for SMBus blocks. This resulted in buffer overflows whenever an invalid length byte was received. It also failed to ensure the length byte was not zero. If it received zero, it would end up in an infinite loop as the at91_twi_read_next_byte function returned immediately without allowing RHR to be read to clear the RXRDY interrupt. Tested agaisnt a SMBus compliant battery. Signed-off-by: Marek Roszko <mark.roszko@gmail.com> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Chen-Yu Tsai authored
commit 0ce4bc1d upstream. The "clock-frequency" DT property is listed as optional, However, the current code stores the return value of of_property_read_u32 in the return code of mv64xxx_of_config, but then forgets to clear it after setting the default value of "clock-frequency". It is then passed out to the main probe function, resulting in a probe failure when "clock-frequency" is missing. This patch checks and then throws away the return value of of_property_read_u32, instead of storing it and having to clear it afterwards. This issue was discovered after the property was removed from all sunxi DTs. Fixes: 4c730a06 ("i2c: mv64xxx: Set bus frequency to 100kHz if clock-frequency is not provided") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 01 Oct, 2014 1 commit
-
-
Dave Martin authored
commit e2ccba49 upstream. Copying a function with memcpy() and then trying to execute the result isn't trivially portable to Thumb. This patch modifies the kexec soft restart code to copy its assembler trampoline relocate_new_kernel() using fncpy() instead, so that relocate_new_kernel can be in the same ISA as the rest of the kernel without problems. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Reported-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Tested-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 26 Sep, 2014 17 commits
-
-
Julian Anastasov authored
commit eb90b0c7 upstream. commit fc604767 ("ipvs: changes for local real server") from 2.6.37 introduced DNAT support to local real server but the IPv6 LOCAL_OUT handler ip_vs_local_reply6() is registered incorrectly as IPv4 hook causing any outgoing IPv4 traffic to be dropped depending on the IP header values. Chris tracked down the problem to CONFIG_IP_VS_IPV6=y Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768Reported-by: Chris J Arges <chris.j.arges@canonical.com> Tested-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Alex Gartrell authored
commit 76f084bc upstream. Previously, only the four high bits of the tclass were maintained in the ipv6 case. This matches the behavior of ipv4, though whether or not we should reflect ECN bits may be up for debate. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Julian Anastasov authored
commit 2627b7e1 upstream. commit 8f4e0a18 ("IPVS netns exit causes crash in conntrack") added second ip_vs_conn_drop_conntrack call instead of just adding the needed check. As result, the first call still can cause crash on netns exit. Remove it. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Michal Hocko authored
commit cc81717e upstream. Commit 11c731e8 ("mm/mempolicy: fix !vma in new_vma_page()") has removed BUG_ON(!vma) from new_vma_page which is partially correct because page_address_in_vma will return EFAULT for non-linear mappings and at least shared shmem might be mapped this way. The patch also tried to prevent NULL ptr for hugetlb pages which is not correct AFAICS because hugetlb pages cannot be mapped as VM_NONLINEAR and other conditions in page_address_in_vma seem to be legit and catch real bugs. This patch restores BUG_ON for PageHuge to catch potential issues when the to-be-migrated page is not setup properly. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Wanpeng Li authored
commit 11c731e8 upstream. BUG_ON(!vma) assumption is introduced by commit 0bf598d8 ("mbind: add BUG_ON(!vma) in new_vma_page()"), however, even if address = __vma_address(page, vma); and vma->start < address < vma->end page_address_in_vma() may still return -EFAULT because of many other conditions in it. As a result the while loop in new_vma_page() may end with vma=NULL. This patch revert the commit and also fix the potential dereference NULL pointer reported by Dan. http://marc.info/?l=linux-mm&m=137689530323257&w=2 kernel BUG at mm/mempolicy.c:1204! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 3 PID: 7056 Comm: trinity-child3 Not tainted 3.13.0-rc3+ #2 task: ffff8801ca5295d0 ti: ffff88005ab20000 task.ti: ffff88005ab20000 RIP: new_vma_page+0x70/0x90 RSP: 0000:ffff88005ab21db0 EFLAGS: 00010246 RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000008040075 RSI: ffff8801c3d74600 RDI: ffffea00079a8b80 RBP: ffff88005ab21dc8 R08: 0000000000000004 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff2 R13: ffffea00079a8b80 R14: 0000000000400000 R15: 0000000000400000 FS: 00007ff49c6f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff49c68f994 CR3: 000000005a205000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffffea00079a8b80 ffffea00079a8bc0 ffffea00079a8ba0 ffff88005ab21e50 ffffffff811adc7a 0000000000000000 ffff8801ca5295d0 0000000464e224f8 0000000000000000 0000000000000002 0000000000000000 ffff88020ce75c00 Call Trace: migrate_pages+0x12a/0x850 SYSC_mbind+0x513/0x6a0 SyS_mbind+0xe/0x10 ia32_do_call+0x13/0x13 Code: 85 c0 75 2f 4c 89 e1 48 89 da 31 f6 bf da 00 02 00 65 44 8b 04 25 08 f7 1c 00 e8 ec fd ff ff 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c 89 e6 48 89 df ba 01 00 00 00 e8 48 RIP [<ffffffff8119f200>] new_vma_page+0x70/0x90 RSP <ffff88005ab21db0> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jiri Slaby authored
-
Mel Gorman authored
commit 4ffeaf35 upstream. The fair zone allocation policy round-robins allocations between zones within a node to avoid age inversion problems during reclaim. If the first allocation fails, the batch counts are reset and a second attempt made before entering the slow path. One assumption made with this scheme is that batches expire at roughly the same time and the resets each time are justified. This assumption does not hold when zones reach their low watermark as the batches will be consumed at uneven rates. Allocation failure due to watermark depletion result in additional zonelist scans for the reset and another watermark check before hitting the slowpath. On UMA, the benefit is negligible -- around 0.25%. On 4-socket NUMA machine it's variable due to the variability of measuring overhead with the vmstat changes. The system CPU overhead comparison looks like 3.16.0-rc3 3.16.0-rc3 3.16.0-rc3 vanilla vmstat-v5 lowercost-v5 User 746.94 774.56 802.00 System 65336.22 32847.27 40852.33 Elapsed 27553.52 27415.04 27368.46 However it is worth noting that the overall benchmark still completed faster and intuitively it makes sense to take as few passes as possible through the zonelists. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mel Gorman authored
commit f7b5d647 upstream. The purpose of numa_zonelist_order=zone is to preserve lower zones for use with 32-bit devices. If locality is preferred then the numa_zonelist_order=node policy should be used. Unfortunately, the fair zone allocation policy overrides this by skipping zones on remote nodes until the lower one is found. While this makes sense from a page aging and performance perspective, it breaks the expected zonelist policy. This patch restores the expected behaviour for zone-list ordering. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mel Gorman authored
commit bb0b6dff upstream. When kswapd is awake reclaiming, the per-cpu stat thresholds are lowered to get more accurate counts to avoid breaching watermarks. This threshold update iterates over all possible CPUs which is unnecessary. Only online CPUs need to be updated. If a new CPU is onlined, refresh_zone_stat_thresholds() will set the thresholds correctly. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mel Gorman authored
commit 0d5d823a upstream. zone->pages_scanned is a write-intensive cache line during page reclaim and it's also updated during page free. Move the counter into vmstat to take advantage of the per-cpu updates and do not update it in the free paths unless necessary. On a small UMA machine running tiobench the difference is marginal. On a 4-node machine the overhead is more noticable. Note that automatic NUMA balancing was disabled for this test as otherwise the system CPU overhead is unpredictable. 3.16.0-rc3 3.16.0-rc3 3.16.0-rc3 vanillarearrange-v5 vmstat-v5 User 746.94 759.78 774.56 System 65336.22 58350.98 32847.27 Elapsed 27553.52 27282.02 27415.04 Note that the overhead reduction will vary depending on where exactly pages are allocated and freed. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mel Gorman authored
commit 3484b2de upstream. The arrangement of struct zone has changed over time and now it has reached the point where there is some inappropriate sharing going on. On x86-64 for example o The zone->node field is shared with the zone lock and zone->node is accessed frequently from the page allocator due to the fair zone allocation policy. o span_seqlock is almost never used by shares a line with free_area o Some zone statistics share a cache line with the LRU lock so reclaim-intensive and allocator-intensive workloads can bounce the cache line on a stat update This patch rearranges struct zone to put read-only and read-mostly fields together and then splits the page allocator intensive fields, the zone statistics and the page reclaim intensive fields into their own cache lines. Note that the type of lowmem_reserve changes due to the watermark calculations being signed and avoiding a signed/unsigned conversion there. On the test configuration I used the overall size of struct zone shrunk by one cache line. On smaller machines, this is not likely to be noticable. However, on a 4-node NUMA machine running tiobench the system CPU overhead is reduced by this patch. 3.16.0-rc3 3.16.0-rc3 vanillarearrange-v5r9 User 746.94 759.78 System 65336.22 58350.98 Elapsed 27553.52 27282.02 Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mel Gorman authored
commit 24b7e581 upstream. This was formerly the series "Improve sequential read throughput" which noted some major differences in performance of tiobench since 3.0. While there are a number of factors, two that dominated were the introduction of the fair zone allocation policy and changes to CFQ. The behaviour of fair zone allocation policy makes more sense than tiobench as a benchmark and CFQ defaults were not changed due to insufficient benchmarking. This series is what's left. It's one functional fix to the fair zone allocation policy when used on NUMA machines and a reduction of overhead in general. tiobench was used for the comparison despite its flaws as an IO benchmark as in this case we are primarily interested in the overhead of page allocator and page reclaim activity. On UMA, it makes little difference to overhead 3.16.0-rc3 3.16.0-rc3 vanilla lowercost-v5 User 383.61 386.77 System 403.83 401.74 Elapsed 5411.50 5413.11 On a 4-socket NUMA machine it's a bit more noticable 3.16.0-rc3 3.16.0-rc3 vanilla lowercost-v5 User 746.94 802.00 System 65336.22 40852.33 Elapsed 27553.52 27368.46 This patch (of 6): The LRU insertion and activate tracepoints take PFN as a parameter forcing the overhead to the caller. Move the overhead to the tracepoint fast-assign method to ensure the cost is only incurred when the tracepoint is active. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jerome Marchand authored
commit 2ab051e1 upstream. When memory cgoups are enabled, the code that decides to force to scan anonymous pages in get_scan_count() compares global values (free, high_watermark) to a value that is restricted to a memory cgroup (file). It make the code over-eager to force anon scan. For instance, it will force anon scan when scanning a memcg that is mainly populated by anonymous page, even when there is plenty of file pages to get rid of in others memcgs, even when swappiness == 0. It breaks user's expectation about swappiness and hurts performance. This patch makes sure that forced anon scan only happens when there not enough file pages for the all zone, not just in one random memcg. [hannes@cmpxchg.org: cleanups] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Joonsoo Kim authored
commit 474750ab upstream. Richard Yao reported a month ago that his system have a trouble with vmap_area_lock contention during performance analysis by /proc/meminfo. Andrew asked why his analysis checks /proc/meminfo stressfully, but he didn't answer it. https://lkml.org/lkml/2014/4/10/416 Although I'm not sure that this is right usage or not, there is a solution reducing vmap_area_lock contention with no side-effect. That is just to use rcu list iterator in get_vmalloc_info(). rcu can be used in this function because all RCU protocol is already respected by writers, since Nick Piggin commit db64fe02 ("mm: rewrite vmap layer") back in linux-2.6.28 Specifically : insertions use list_add_rcu(), deletions use list_del_rcu() and kfree_rcu(). Note the rb tree is not used from rcu reader (it would not be safe), only the vmap_area_list has full RCU protection. Note that __purge_vmap_area_lazy() already uses this rcu protection. rcu_read_lock(); list_for_each_entry_rcu(va, &vmap_area_list, list) { if (va->flags & VM_LAZY_FREE) { if (va->va_start < *start) *start = va->va_start; if (va->va_end > *end) *end = va->va_end; nr += (va->va_end - va->va_start) >> PAGE_SHIFT; list_add_tail(&va->purge_list, &valist); va->flags |= VM_LAZY_FREEING; va->flags &= ~VM_LAZY_FREE; } } rcu_read_unlock(); Peter: : While rcu list traversal over the vmap_area_list is safe, this may : arrive at different results than the spinlocked version. The rcu list : traversal version will not be a 'snapshot' of a single, valid instant : of the entire vmap_area_list, but rather a potential amalgam of : different list states. Joonsoo: : Yes, you are right, but I don't think that we should be strict here. : Meminfo is already not a 'snapshot' at specific time. While we try to get : certain stats, the other stats can change. And, although we may arrive at : different results than the spinlocked version, the difference would not be : large and would not make serious side-effect. [edumazet@google.com: add more commit description] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Richard Yao <ryao@gentoo.org> Acked-by: Eric Dumazet <edumazet@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Zhang Yanfei <zhangyanfei.yes@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jerome Marchand authored
commit 21bda264 upstream. Commit 71e3aac0 ("thp: transparent hugepage core") adds copy_pte_range prototype to huge_mm.h. I'm not sure why (or if) this function have been used outside of memory.c, but it currently isn't. This patch makes copy_pte_range() static again. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
David Rientjes authored
commit 14a4e214 upstream. Commit 9f1b868a ("mm: thp: khugepaged: add policy for finding target node") improved the previous khugepaged logic which allocated a transparent hugepages from the node of the first page being collapsed. However, it is still possible to collapse pages to remote memory which may suffer from additional access latency. With the current policy, it is possible that 255 pages (with PAGE_SHIFT == 12) will be collapsed remotely if the majority are allocated from that node. When zone_reclaim_mode is enabled, it means the VM should make every attempt to allocate locally to prevent NUMA performance degradation. In this case, we do not want to collapse hugepages to remote nodes that would suffer from increased access latency. Thus, when zone_reclaim_mode is enabled, only allow collapsing to nodes with RECLAIM_DISTANCE or less. There is no functional change for systems that disable zone_reclaim_mode. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Hugh Dickins authored
commit c0d73261 upstream. Use ACCESS_ONCE() in handle_pte_fault() when getting the entry or orig_pte upon which all subsequent decisions and pte_same() tests will be made. I have no evidence that its lack is responsible for the mm/filemap.c:202 BUG_ON(page_mapped(page)) in __delete_from_page_cache() found by trinity, and I am not optimistic that it will fix it. But I have found no other explanation, and ACCESS_ONCE() here will surely not hurt. If gcc does re-access the pte before passing it down, then that would be disastrous for correct page fault handling, and certainly could explain the page_mapped() BUGs seen (concurrent fault causing page to be mapped in a second time on top of itself: mapcount 2 for a single pte). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-