- 22 Feb, 2014 40 commits
-
-
Bjørn Mork authored
commit 67847bae upstream. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
K. Y. Srinivasan authored
commit 269f9794 upstream. When the guest attempts to connect with the host when there may already be a connection with the host (as would be the case during the kdump/kexec path), it is difficult to guarantee timely response from the host. Starting with WS2012 R2, the host supports this ability to re-connect with the host (explicitly to support kexec). Prior to responding to the guest, the host needs to ensure that device states based on the previous connection to the host have been properly torn down. This may introduce unbounded delays. To deal with this issue, don't do a timed wait during the initial connect with the host. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martyn Welch authored
commit f0342e66 upstream. In order to ensure the correct width cycles on the VME bus, the VME bridge drivers implement an algorithm to utilise the largest possible width reads and writes whilst maintaining natural alignment constraints. The algorithm currently looks at the start address rather than the current read/write address when determining whether a 16-bit width cycle is required to get to 32-bit alignment. This results in incorrect alignment, Reported-by: Jim Strouth <james.strouth@ge.com> Tested-by: Jim Strouth <james.strouth@ge.com> Signed-off-by: Martyn Welch <martyn.welch@ge.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Usyskin authored
commit 5cb906c7 upstream. Don't set read callback to NULL during reset as this leads to memory leak of both cb and its buffer. The memory is correctly freed during mei_release. The memory leak is detectable by kmemleak if application has open read call while system is going through suspend/resume. unreferenced object 0xecead780 (size 64): comm "AsyncTask #1", pid 1018, jiffies 4294949621 (age 152.440s) hex dump (first 32 bytes): 00 01 10 00 00 02 20 00 00 bf 30 f1 00 00 00 00 ...... ...0..... 00 00 00 00 00 00 00 00 36 01 00 00 00 70 da e2 ........6....p.. backtrace: [<c1a60aec>] kmemleak_alloc+0x3c/0xa0 [<c131ed56>] kmem_cache_alloc_trace+0xc6/0x190 [<c16243c9>] mei_io_cb_init+0x29/0x50 [<c1625722>] mei_cl_read_start+0x102/0x360 [<c16268f3>] mei_read+0x103/0x4e0 [<c1324b09>] vfs_read+0x89/0x160 [<c1324d5f>] SyS_read+0x4f/0x80 [<c1a7b318>] syscall_call+0x7/0xb [<ffffffff>] 0xffffffff unreferenced object 0xe2da7000 (size 512): comm "AsyncTask #1", pid 1018, jiffies 4294949621 (age 152.440s) hex dump (first 32 bytes): 00 6c da e2 7c 00 00 00 00 00 00 00 c0 eb 0c 59 .l..|..........Y 1b 00 00 00 01 00 00 00 02 10 00 00 01 00 00 00 ................ backtrace: [<c1a60aec>] kmemleak_alloc+0x3c/0xa0 [<c131f127>] __kmalloc+0xe7/0x1d0 [<c162447e>] mei_io_cb_alloc_resp_buf+0x2e/0x60 [<c162574c>] mei_cl_read_start+0x12c/0x360 [<c16268f3>] mei_read+0x103/0x4e0 [<c1324b09>] vfs_read+0x89/0x160 [<c1324d5f>] SyS_read+0x4f/0x80 [<c1a7b318>] syscall_call+0x7/0xb [<ffffffff>] 0xffffffff Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Usyskin authored
commit 30c54df7 upstream. Clear write callbacks sitting in write_waiting list on reset. Otherwise these callbacks are left dangling and cause memory leak. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Takashi Iwai authored
commit f88abaa0 upstream. The very same fixup is needed to make the mic on Sony VAIO Pro 11 working as well as VAIO Pro 13 model. Reported-and-tested-by: Hendrik-Jan Heins <hjheins@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt (Red Hat) authored
commit 87fbb2ac upstream. When the conversion was made to remove stop machine and use the breakpoint logic instead, the modification of the function graph caller is still done directly as though it was being done under stop machine. As it is not converted via stop machine anymore, there is a possibility that the code could be layed across cache lines and if another CPU is accessing that function graph call when it is being updated, it could cause a General Protection Fault. Convert the update of the function graph caller to use the breakpoint method as well. Cc: H. Peter Anvin <hpa@zytor.com> Fixes: 08d636b6 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
H. Peter Anvin authored
commit 4640c7ee upstream. If CONFIG_X86_SMAP is disabled, smap_violation() tests for conditions which are incorrect (as the AC flag doesn't matter), causing spurious faults. The dynamic disabling of SMAP (nosmap on the command line) is fine because it disables X86_FEATURE_SMAP, therefore causing the static_cpu_has() to return false. Found by Fengguang Wu's test system. [ v3: move all predicates into smap_violation() ] [ v2: use IS_ENABLED() instead of #ifdef ] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhostSigned-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
H. Peter Anvin authored
commit 03bbd596 upstream. If SMAP support is not compiled into the kernel, don't enable SMAP in CR4 -- in fact, we should clear it, because the kernel doesn't contain the proper STAC/CLAC instructions for SMAP support. Found by Fengguang Wu's test system. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhostSigned-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marcus Folkesson authored
commit c76782d1 upstream. This is necessary since timestamp is calculated as the last element in iio_compute_scan_bytes(). Without this fix any userspace code reading the layout of the buffer via sysfs will incorrectly interpret the data leading some nasty corruption. Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com> Acked-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hartmut Knaack authored
commit 38408d05 upstream. Only free an IRQ in error_free_irq, if it has been requested previously. Signed-off-by: Hartmut Knaack <knaack.h@gmx.de> Acked-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
H Hartley Sweeten authored
commit 1e85c1ea upstream. The last value written to a analog output channel is cached in the private data of this driver for readback. Currently, the wrong value is cached in the (*insn_write) functions. The current code stores the data[n] value for readback afer the loop has written all the values. At this time 'n' points past the end of the data array. Fix the functions by using a local variable to hold the data being written to the analog output channel. This variable is then used after the loop is complete to store the readback value. The current value is retrieved before the loop in case no values are actually written.. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bjørn Mork authored
commit f948dcf9 upstream. This device was mentioned in an OpenWRT forum. Seems to have a "standard" Sierra Wireless ifnumber to function layout: 0: qcdm 2: nmea 3: modem 8: qmi 9: storage Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Petr Písař authored
commit 0930b095 upstream. \E[3J console code (secure clear screen) needs to update_screen(vc) in order to write-through blanks into off-screen video memory. This has been removed accidentally in 3.6 by: commit 81732c3b Author: Jean-François Moine <moinejf@free.fr> Date: Thu Sep 6 19:24:13 2012 +0200 tty vt: Fix line garbage in virtual console on command line edition Signed-off-by: Petr Písař <petr.pisar@atlas.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christian König authored
commit b927e1c2 upstream. Otherwise decoding isn't really useable. bug: https://bugs.freedesktop.org/show_bug.cgi?id=71448Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Deucher authored
commit 858a41c8 upstream. Otherwise decoding isn't really useable. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lars Poeschel authored
commit 3ac06b90 upstream. 3GPP TS 07.10 states in section 5.4.6.3.7: "The length byte contains the value 2 or 3 ... depending on the break signal." The break byte is optional and if it is sent, the length is 3. In fact the driver was not able to work with modems that send this break byte in their modem status control message. If the modem just sends the break byte if it is really set, then weird things might happen. The code for deconding the modem status to the internal linux presentation in gsm_process_modem has already a big comment about this 2 or 3 byte length thing and it is already able to decode the brk, but the code calling the gsm_process_modem function in gsm_control_modem does not encode it and hand it over the right way. This patch fixes this. Without this fix if the modem sends the brk byte in it's modem status control message the driver will hang when opening a muxed channel. Signed-off-by: Lars Poeschel <poeschel@lemonage.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 2ec197db upstream. If an NFS client attempts to get a lock (using NLM) and the lock is not available, the server will remember the request and when the lock becomes available it will send a GRANT request to the client to provide the lock. If the client already held an adjacent lock, the GRANT callback will report the union of the existing and new locks, which can confuse the client. This happens because __posix_lock_file (called by vfs_lock_file) updates the passed-in file_lock structure when adjacent or over-lapping locks are found. To avoid this problem we take a copy of the two fields that can be changed (fl_start and fl_end) before the call and restore them afterwards. An alternate would be to allocate a 'struct file_lock', initialise it, use locks_copy_lock() to take a copy, then locks_release_private() after the vfs_lock_file() call. But that is a lot more work. Reported-by: Olaf Kirch <okir@suse.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -- v1 had a couple of issues (large on-stack struct and didn't really work properly). This version is much better tested. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-
Doug Anderson authored
commit d3d89c46 upstream. The ntc thermistor code was doing math whose temporary result might have overflowed 32-bits. We need some casts in there to make it safe. In one example I found: - pullup_uV: 1800000 - result of iio_read_channel_raw: 3226 - 1800000 * 3226 => 0x15a1cbc80 Signed-off-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul Bolle authored
commit 5bbb2ae3 upstream. bind_get() checks the device number it is called with. It uses MAX_RAW_MINORS for the upper bound. But MAX_RAW_MINORS is set at compile time while the actual number of raw devices can be set at runtime. This means the test can either be too strict or too lenient. And if the test ends up being too lenient bind_get() might try to access memory beyond what was allocated for "raw_devices". So check against the runtime value (max_raw_minors) in this function. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kleber Sacilotto de Souza authored
commit 14e2abb7 upstream. On IBM pseries systems the device_type device-tree property of a PCIe bridge contains the string "pciex". The of_bus_pci_match() function was looking only for "pci" on this property, so in such cases the bus matching code was falling back to the default bus, causing problems on functions that should be using "assigned-addresses" for region address translation. This patch fixes the problem by also looking for "pciex" on the PCI bus match function. v2: added comment Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Grant Likely <grant.likely@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Emmanuel Grumbach authored
commit 8e2a866e upstream. Not doing so will let BT kill our probe requests leading to failures in scan. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Emmanuel Grumbach authored
commit b900a87b upstream. This can be useful to be able to spot the firmware version from the error reports without needing to fetch it from another place. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Emmanuel Grumbach authored
commit c5128654 upstream. The driver wasn't reading the NVM properly. While this didn't lead to any issue until now, it seems that there is an old version of the NVM in the wild. In this version, the A band channels appear to be valid but the SKU capabilities (another field of the NVM) says that A band isn't supported at all. With this specific version of the NVM, the driver would think that A band is supported while the HW / firmware don't. This leads to asserts. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geert Uytterhoeven authored
commit 1f802f82 upstream. This reverts commit e120cc0d. It causes a NULL pointer dereference with drivers using the generic spi_transfer_one_message(), which always calls spi_finalize_current_message(), which zeroes master->cur_msg. Drivers implementing transfer_one_message() theirselves must always call spi_finalize_current_message(), even if the transfer failed: * @transfer_one_message: the subsystem calls the driver to transfer a single * message while queuing transfers that arrive in the meantime. When the * driver is finished with this message, it must call * spi_finalize_current_message() so the subsystem can issue the next * transfer Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Signed-off-by: Mark Brown <broonie@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Schwidefsky authored
commit 8d7f6690 upstream. The kernel currently crashes with a low-address-protection exception if a user space process executes an instruction that tries to use the linkage stack. Set the base-ASTE origin and the subspace-ASTE origin of the dispatchable-unit-control-table to point to a dummy ASTE. Set up control register 15 to point to an empty linkage stack with no room left. A user space process with a linkage stack instruction will still crash but with a different exception which is correctly translated to a segmentation fault instead of a kernel oops. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Holzheu authored
commit d7736ff5 upstream. Dumps created by kdump or zfcpdump can contain invalid memory holes when dumping z/VM systems that have memory pressure. For example: # zgetdump -i /proc/vmcore. Memory map: 0000000000000000 - 0000000000bfffff (12 MB) 0000000000e00000 - 00000000014fffff (7 MB) 000000000bd00000 - 00000000f3bfffff (3711 MB) The memory detection function find_memory_chunks() issues tprot to find valid memory chunks. In case of CMM it can happen that pages are marked as unstable via set_page_unstable() in arch_free_page(). If z/VM has released that pages, tprot returns -EFAULT and indicates a memory hole. So fix this and switch off CMM in case of kdump or zfcpdump. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Oleksij Rempel authored
commit 4fcfc744 upstream. Raw id and FW id should be switched. Tested-by: Oleksij Rempel <linux@rempel-privat.de> Signed-off-by: Oleksij Rempel <linux@rempel-privat.de> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stanislaw Gruszka authored
commit 2fa4cb90 upstream. sta_rc_update() callback must be atomic, hence we can not take mutexes or do other operations, which can sleep in ath9k_htc_sta_rc_update(). I think we can just return from ath9k_htc_sta_rc_update(), if it is called without IEEE80211_RC_SUPP_RATES_CHANGED bit. That will help with scheduling while atomic bug for most cases (except mesh and IBSS modes). For mesh and IBSS I do not see other solution like creating additional workqueue, because sending firmware command require us to sleep, but this can be done in additional patch. Patch partially fixes bug: https://bugzilla.redhat.com/show_bug.cgi?id=990955Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johannes Berg authored
commit 338f977f upstream. The "new" fragmentation code (since my rewrite almost 5 years ago) erroneously sets skb->len rather than using skb_trim() to adjust the length of the first fragment after copying out all the others. This leaves the skb tail pointer pointing to after where the data originally ended, and thus causes the encryption MIC to be written at that point, rather than where it belongs: immediately after the data. The impact of this is that if software encryption is done, then a) encryption doesn't work for the first fragment, the connection becomes unusable as the first fragment will never be properly verified at the receiver, the MIC is practically guaranteed to be wrong b) we leak up to 8 bytes of plaintext (!) of the packet out into the air This is only mitigated by the fact that many devices are capable of doing encryption in hardware, in which case this can't happen as the tail pointer is irrelevant in that case. Additionally, fragmentation is not used very frequently and would normally have to be configured manually. Fix this by using skb_trim() properly. Fixes: 2de8e0d9 ("mac80211: rewrite fragmentation") Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Emmanuel Grumbach authored
commit 0297ea17 upstream. When the driver cannot start the AP or when the assignement of the beacon goes wrong, we need to unassign the vif. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eliad Peller authored
commit 2f617435 upstream. ieee80211_start_roc_work() might add a new roc to existing roc, and tell cfg80211 it has already started. However, this might happen before the roc cookie was set, resulting in REMAIN_ON_CHANNEL (started) event with null cookie. Consequently, it can make wpa_supplicant go out of sync. Fix it by setting the roc cookie earlier. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steve French authored
commit 83e3bc23 upstream. The get/set ACL xattr support for CIFS ACLs attempts to send old cifs dialect protocol requests even when mounted with SMB2 or later dialects. Sending cifs requests on an smb2 session causes problems - the server drops the session due to the illegal request. This patch makes CIFS ACL operations protocol specific to fix that. Attempting to query/set CIFS ACLs for SMB2 will now return EOPNOTSUPP (until we add worker routines for sending query ACL requests via SMB2) instead of sending invalid (cifs) requests. A separate followon patch will be needed to fix cifs_acl_to_fattr (which takes a cifs specific u16 fid so can't be abstracted to work with SMB2 until that is changed) and will be needed to fix mount problems when "cifsacl" is specified on mount with e.g. vers=2.1 Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steve French authored
commit d979f3b0 upstream. Changeset 666753c3 added protocol operations for get/setxattr to avoid calling cifs operations on smb2/smb3 mounts for xattr operations and this changeset adds the calls to cifs specific protocol operations for xattrs (in order to reenable cifs support for xattrs which was temporarily disabled by the previous changeset. We do not have SMB2/SMB3 worker function for setting xattrs yet so this only enables it for cifs. CCing stable since without these two small changsets (its small coreq 666753c3 is also needed) calling getfattr/setfattr on smb2/smb3 mounts causes problems. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steve French authored
commit 666753c3 upstream. When mounting with smb2 (or smb2.1 or smb3) we need to check to make sure that attempts to query or set extended attributes do not attempt to send the request with the older cifs protocol instead (eventually we also need to add the support in SMB2 to query/set extended attributes but this patch prevents us from using the wrong protocol for extended attribute operations). Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naoya Horiguchi authored
commit 8d547ff4 upstream. mce-test detected a test failure when injecting error to a thp tail page. This is because we take page refcount of the tail page in madvise_hwpoison() while the fix in commit a3e0f9e4 ("mm/memory-failure.c: transfer page count from head page to tail page after split thp") assumes that we always take refcount on the head page. When a real memory error happens we take refcount on the head page where memory_failure() is called without MF_COUNT_INCREASED set, so it seems to me that testing memory error on thp tail page using madvise makes little sense. This patch cancels moving refcount in !MF_COUNT_INCREASED for valid testing. [akpm@linux-foundation.org: s/&&/&/] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit 96c7a2ff upstream. Recently due to a spike in connections per second memcached on 3 separate boxes triggered the OOM killer from accept. At the time the OOM killer was triggered there was 4GB out of 36GB free in zone 1. The problem was that alloc_fdtable was allocating an order 3 page (32KiB) to hold a bitmap, and there was sufficient fragmentation that the largest page available was 8KiB. I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious but I do agree that order 3 allocations are very likely to succeed. There are always pathologies where order > 0 allocations can fail when there are copious amounts of free memory available. Using the pigeon hole principle it is easy to show that it requires 1 page more than 50% of the pages being free to guarantee an order 1 (8KiB) allocation will succeed, 1 page more than 75% of the pages being free to guarantee an order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of the pages being free to guarantee an order 3 allocate will succeed. A server churning memory with a lot of small requests and replies like memcached is a common case that if anything can will skew the odds against large pages being available. Therefore let's not give external applications a practical way to kill linux server applications, and specify __GFP_NORETRY to the kmalloc in alloc_fdmem. Unless I am misreading the code and by the time the code reaches should_alloc_retry in __alloc_pages_slowpath (where __GFP_NORETRY becomes signification). We have already tried everything reasonable to allocate a page and the only thing left to do is wait. So not waiting and falling back to vmalloc immediately seems like the reasonable thing to do even if there wasn't a chance of triggering the OOM killer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Cong Wang <cwang@twopensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Frediano Ziglio authored
commit 7cde9b27 upstream. Due to the way kernel is initialized under Xen is possible that the ring1 selector used by the kernel for the boot cpu end up to be copied to userspace leading to segmentation fault in the userspace. Xen code in the kernel initialize no-boot cpus with correct selectors (ds and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen). On task context switch (switch_to) we assume that ds, es and cs already point to __USER_DS and __KERNEL_CSso these selector are not changed. If processor is an Intel that support sysenter instruction sysenter/sysexit is used so ds and es are not restored switching back from kernel to userspace. In the case the selectors point to a ring1 instead of __USER_DS the userspace code will crash on first memory access attempt (to be precise Xen on the emulated iret used to do sysexit will detect and set ds and es to zero which lead to GPF anyway). Now if an userspace process call kernel using sysenter and get rescheduled (for me it happen on a specific init calling wait4) could happen that the ring1 selector is set to ds and es. This is quite hard to detect cause after a while these selectors are fixed (__USER_DS seems sticky). Bisecting the code commit 7076aada appears to be the first one that have this issue. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Vrabel authored
commit 0160676b upstream. On hosts with more than 168 GB of memory, a 32-bit guest may attempt to grant map an MFN that is error cannot lookup in its mapping of the m2p table. There is an m2p lookup as part of m2p_add_override() and m2p_remove_override(). The lookup falls off the end of the mapped portion of the m2p and (because the mapping is at the highest virtual address) wraps around and the lookup causes a fault on what appears to be a user space address. do_page_fault() (thinking it's a fault to a userspace address), tries to lock mm->mmap_sem. If the gntdev device is used for the grant map, m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem already locked. do_page_fault() then deadlocks. The deadlock would most commonly occur when a 64-bit guest is started and xenconsoled attempts to grant map its console ring. Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the mapped portion of the m2p table before accessing the table and use this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn() (which already had the correct range check). All faults caused by accessing the non-existant parts of the m2p are thus within the kernel address space and exception_fixup() is called without trying to lock mm->mmap_sem. This means that for MFNs that are outside the mapped range of the m2p then mfn_to_pfn() will always look in the m2p overrides. This is correct because it must be a foreign MFN (and the PFN in the m2p in this case is only relevant for the other domain). v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides() v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of range as it's probably foreign. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@citrix.com> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Vrabel authored
commit 36613717 upstream. Backend drivers shouldn't transistion to CLOSED unless the frontend is CLOSED. If a backend does transition to CLOSED too soon then the frontend may not see the CLOSING state and will not properly shutdown. So, treat an unexpected backend CLOSED state the same as CLOSING. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-