- 06 Feb, 2014 10 commits
-
-
Marcelo Tosatti authored
commit 9ed96e87 upstream. Limit PIT timer frequency similarly to the limit applied by LAPIC timer. Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Young authored
commit a7f84f03 upstream. Current code check boot service region with kernel text region by: start+size >= __pa_symbol(_text) The end of the above region should be start + size - 1 instead. I see this problem in ovmf + Fedora 19 grub boot: text start: 1000000 md start: 800000 md size: 800000 Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Toshi Kani <toshi.kani@hp.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
PaX Team authored
commit 2def2ef2 upstream. The x32 case for the recvmsg() timout handling is broken: asmlinkage long compat_sys_recvmmsg(int fd, struct compat_mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct compat_timespec __user *timeout) { int datagrams; struct timespec ktspec; if (flags & MSG_CMSG_COMPAT) return -EINVAL; if (COMPAT_USE_64BIT_TIME) return __sys_recvmmsg(fd, (struct mmsghdr __user *)mmsg, vlen, flags | MSG_CMSG_COMPAT, (struct timespec *) timeout); ... The timeout pointer parameter is provided by userland (hence the __user annotation) but for x32 syscalls it's simply cast to a kernel pointer and is passed to __sys_recvmmsg which will eventually directly dereference it for both reading and writing. Other callers to __sys_recvmmsg properly copy from userland to the kernel first. The bug was introduced by commit ee4fa23c ("compat: Use COMPAT_USE_64BIT_TIME in net/compat.c") and should affect all kernels since 3.4 (and perhaps vendor kernels if they backported x32 support along with this code). Note that CONFIG_X86_X32_ABI gets enabled at build time and only if CONFIG_X86_X32 is enabled and ld can build x32 executables. Other uses of COMPAT_USE_64BIT_TIME seem fine. This addresses CVE-2014-0038. Signed-off-by: PaX Team <pageexec@freemail.hu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Rientjes authored
commit 8790c71a upstream. As a result of commit 5606e387 ("mm: numa: Migrate on reference policy"), /proc/<pid>/numa_maps prints the mempolicy for any <pid> as "prefer:N" for the local node, N, of the process reading the file. This should only be printed when the mempolicy of <pid> is MPOL_PREFERRED for node N. If the process is actually only using the default mempolicy for local node allocation, make sure "default" is printed as expected. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Robert Lippert <rlippert@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Ertman authored
commit 9e6c3b63 upstream. This patch is to fix a compiler warning of __bad_udelay due to a value of >999 being passed as a parameter to udelay() in the function e1000e_phy_has_link_generic(). This affects the gcc compiler when it is given a flag of -O3 and the icc compiler. This patch is also making the change from mdelay() to msleep() in the same function, since it was determined though code inspection that this function is never called in atomic context. Signed-off-by: David Ertman <davidx.m.ertman@intel.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aristeu Rozanski authored
commit 90ed4988 upstream. In case the device 0, function 1 is not found using pci_get_device(), pci_scan_single_device() will be used but, differently than pci_get_device(), it allocates a pci_dev but doesn't does bump the usage count on the pci_dev and after few module removals and loads the pci_dev will be freed. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Reviewed-by: mark gross <mark.gross@intel.com> Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.comSigned-off-by: Borislav Petkov <bp@suse.de> Cc: Jean Delvare <jdelvare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrea Arcangeli authored
commit 27c73ae7 upstream. Commit 7cb2ef56 ("mm: fix aio performance regression for database caused by THP") can cause dereference of a dangling pointer if split_huge_page runs during PageHuge() if there are updates to the tail_page->private field. Also it is repeating compound_head twice for hugetlbfs and it is running compound_head+compound_trans_head for THP when a single one is needed in both cases. The new code within the PageSlab() check doesn't need to verify that the THP page size is never bigger than the smallest hugetlbfs page size, to avoid memory corruption. A longstanding theoretical race condition was found while fixing the above (see the change right after the skip_unlock label, that is relevant for the compound_lock path too). By re-establishing the _mapcount tail refcounting for all compound pages, this also fixes the below problem: echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages BUG: Bad page state in process bash pfn:59a01 page:ffffea000139b038 count:0 mapcount:10 mapping: (null) index:0x0 page flags: 0x1c00000000008000(tail) Modules linked in: CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x55/0x76 bad_page+0xd5/0x130 free_pages_prepare+0x213/0x280 __free_pages+0x36/0x80 update_and_free_page+0xc1/0xd0 free_pool_huge_page+0xc2/0xe0 set_max_huge_pages.part.58+0x14c/0x220 nr_hugepages_store_common.isra.60+0xd0/0xf0 nr_hugepages_store+0x13/0x20 kobj_attr_store+0xf/0x20 sysfs_write_file+0x189/0x1e0 vfs_write+0xc5/0x1f0 SyS_write+0x55/0xb0 system_call_fastpath+0x16/0x1b Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guillaume Morin <guillaume@morinfr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ilia Mirkin authored
commit 5d2f4767 upstream. The only BIOS on record that needs the 14 offset has a bios major version 2 but BMP version 1.01. Another bunch of BIOSes that need the 18 offset have BMP version 2.01 or 5.01 or higher. So instead of looking at the bios major version, look at the BMP version. BIOSes with BMP version 0 do not contain a detectable script, so always return 0 for them. See https://bugs.freedesktop.org/show_bug.cgi?id=68835Reported-by: Mauro Molinari <mauromol@tiscali.it> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 9f97e4b1 upstream. Before a write starts we set a bit in the write-intent bitmap. When the write completes we clear that bit if the write was successful to all devices. However if the write wasn't fully successful we should not clear the bit. If the faulty drive is subsequently re-added, the fact that the bit is still set ensure that we will re-write the data that is missing. This logic is mediated by the STRIPE_DEGRADED flag - we only clear the bitmap bit when this flag is not set. Currently we correctly set the flag if a write starts when some devices are failed or missing. But we do *not* set the flag if some device failed during the write attempt. This is wrong and can result in clearing the bit inappropriately. So: set the flag when a write fails. This bug has been present since bitmaps were introduces, so the fix is suitable for any -stable kernel. Reported-by: Ethan Wilson <ethan.wilson@shiftmail.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrew Jones authored
commit 0dce7cd6 upstream. Commit e66d2ae7 moved the assignment vcpu->arch.apic_base = value above a condition with (vcpu->arch.apic_base ^ value), causing that check to always fail. Use old_value, vcpu->arch.apic_base's old value, in the condition instead. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 25 Jan, 2014 28 commits
-
-
Greg Kroah-Hartman authored
-
Taras Kondratiuk authored
commit b25f3e1c upstream. Kexec disables outer cache before jumping to reboot code, but it doesn't flush it explicitly. Flush is done implicitly inside of l2x0_disable(). But some SoC's override default .disable handler and don't flush cache. This may lead to a corrupted memory during Kexec reboot on these platforms. This patch adds cache flush inside of OMAP4 and Highbank outer_cache.disable() handlers to make it consistent with default l2x0_disable(). Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Wang Nan <wangnan0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ville Syrjälä authored
commit 7ad228b1 upstream. When the pipe A force quirk is applied the code will attempt to grab a crtc mutex during intel_modeset_setup_hw_state(). If we're already holding all crtc mutexes this will obviously deadlock every time. So instead of using drm_modeset_lock_all() just grab the mode_config.mutex. This is enough to avoid the unlocked mutex warnings from certain lower level functions. The regression was introduced in: commit 02747664 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Mon Dec 2 11:08:06 2013 +0200 drm/i915: Take modeset locks around intel_modeset_setup_hw_state() Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [danvet: Add cc: stable since the offending commit has that, too.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sudeep Holla authored
commit e44ef891 upstream. The MPIDR contains specific bitfields(MPIDR.Aff{2..0}) which uniquely identify a CPU, in addition to some non-identifying information and reserved bits. The ARM cpu binding defines the 'reg' property to only contain the affinity bits, and any cpu nodes with other bits set in their 'reg' entry are skipped. As such it is not necessary to mask the phys_id with MPIDR_HWID_BITMASK, and doing so could lead to matching erroneous CPU nodes in the device tree. This patch removes the masking of the physical identifier. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jon Medhurst authored
commit fe433907 upstream. When the pl011 is being used for a console, pl011_console_write forces the control register (CR) to enable the UART for transmission and then restores this to the original value afterwards. It does this while holding the port lock. Unfortunately, when the uart is started or shutdown - say in response to userland using the serial device for a terminal - then this updates the control register without any locking. This means we can have pl011_console_write Save CR pl011_startup Initialise CR, e.g. enable receive pl011_console_write Restore old CR with receive not enabled this result is a serial port which doesn't respond to any input. A similar race in reverse could happen when the device is shutdown. We can fix these problems by taking the port lock when updating CR. Signed-off-by: Jon Medhurst <tixy@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geert Uytterhoeven authored
commit f92f455f upstream. {,set}page_address() are macros if WANT_PAGE_VIRTUAL. If !WANT_PAGE_VIRTUAL, they're plain C functions. If someone calls them with a void *, this pointer is auto-converted to struct page * if !WANT_PAGE_VIRTUAL, but causes a build failure on architectures using WANT_PAGE_VIRTUAL (arc, m68k and sparc64): drivers/md/bcache/bset.c: In function `__btree_sort': drivers/md/bcache/bset.c:1190: warning: dereferencing `void *' pointer drivers/md/bcache/bset.c:1190: error: request for member `virtual' in something not a structure or union Convert them to static inline functions to fix this. There are already plenty of users of struct page members inside <linux/mm.h>, so there's no reason to keep them as macros. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 1cc03eb9 upstream. commit 5d8c71f9 md: raid5 crash during degradation Fixed a crash in an overly simplistic way which could leave R5_WriteError or R5_MadeGood set in the stripe cache for devices for which it is no longer relevant. When those devices are removed and spares added the flags are still set and can cause incorrect behaviour. commit 14a75d3e md/raid5: preferentially read from replacement device if possible. Fixed the same bug if a more effective way, so we can now revert the original commit. Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Fixes: 5d8c71f9Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit b50c259e upstream. If we discover a bad block when reading we split the request and potentially read some of it from a different device. The code path of this has two bugs in RAID10. 1/ we get a spin_lock with _irq, but unlock without _irq!! 2/ The calculation of 'sectors_handled' is wrong, as can be clearly seen by comparison with raid1.c This leads to at least 2 warnings and a probable crash is a RAID10 ever had known bad blocks. Fixes: 856e08e2Reported-by: Damian Nowak <spam@nowaker.net> URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit e8b84915 upstream. commit e875ecea md/raid10 record bad blocks as needed during recovery. added code to the "cannot recover this block" path to record a bad block rather than fail the whole recovery. Unfortunately this new case was placed *after* r10bio was freed rather than *before*, yet it still uses r10bio. This is will crash with a null dereference. So move the freeing of r10bio down where it is safe. Fixes: e875eceaReported-by: Damian Nowak <spam@nowaker.net> URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 8313b8e5 upstream. If an array is started degraded, and then the missing device is found it can be re-added and a minimal bitmap-based recovery will bring it fully up-to-date. If the array is read-only a recovery would not be allowed. But also if the array is read-only and the missing device was present very recently, then there could be no need for any recovery at all, so we simply include the device in the read-only array without any recovery. However... if the missing device was removed a little longer ago it could be missing some updates, but if a bitmap is present it will be conditionally accepted pending a bitmap-based update. We don't currently detect this case properly and will include that old device into the read-only array with no recovery even though it really needs a recovery. This patch keeps track of whether a bitmap-based-recovery is really needed or not in the new Bitmap_sync rdev flag. If that is set, then the device will not be added to a read-only array. Cc: Andrei Warkentin <andreiw@vmware.com> Fixes: d70ed2e4Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paulo Zanoni authored
commit 0882dae9 upstream. Properly zero the refcounts and crtc->ddi_pll_set so the previous HW state doesn't affect the result of reading the current HW state. This fixes WARNs about WRPLL refcount if we have an HDMI monitor on HSW and then suspend/resume. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64379Tested-by: Qingshuai Tian <qingshuai.tian@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andreas Rohner authored
commit 70f2fe3a upstream. There is a bug in the function nilfs_segctor_collect, which results in active data being written to a segment, that is marked as clean. It is possible, that this segment is selected for a later segment construction, whereby the old data is overwritten. The problem shows itself with the following kernel log message: nilfs_sufile_do_cancel_free: segment 6533 must be clean Usually a few hours later the file system gets corrupted: NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0 NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660) The issue can be reproduced with a file system that is nearly full and with the cleaner running, while some IO intensive task is running. Although it is quite hard to reproduce. This is what happens: 1. The cleaner starts the segment construction 2. nilfs_segctor_collect is called 3. sc_stage is on NILFS_ST_SUFILE and segments are freed 4. sc_stage is on NILFS_ST_DAT current segment is full 5. nilfs_segctor_extend_segments is called, which allocates a new segment 6. The new segment is one of the segments freed in step 3 7. nilfs_sufile_cancel_freev is called and produces an error message 8. Loop around and the collection starts again 9. sc_stage is on NILFS_ST_SUFILE and segments are freed including the newly allocated segment, which will contain active data and can be allocated at a later time 10. A few hours later another segment construction allocates the segment and causes file system corruption This can be prevented by simply reordering the statements. If nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments the freed segments are marked as dirty and cannot be allocated any more. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mikulas Patocka authored
commit 03e5ac2f upstream. Commit 8456a648 ("slab: use struct page for slab management") causes a crash in the LVM2 testsuite on PA-RISC (the crashing test is fsadm.sh). The testsuite doesn't crash on 3.12, crashes on 3.13-rc1 and later. Bad Address (null pointer deref?): Code=15 regs=000000413edd89a0 (Addr=000006202224647d) CPU: 3 PID: 24008 Comm: loop0 Not tainted 3.13.0-rc6 #5 task: 00000001bf3c0048 ti: 000000413edd8000 task.ti: 000000413edd8000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001101111100100001110 Not tainted r00-03 000000ff0806f90e 00000000405c8de0 000000004013e6c0 000000413edd83f0 r04-07 00000000405a95e0 0000000000000200 00000001414735f0 00000001bf349e40 r08-11 0000000010fe3d10 0000000000000001 00000040829c7778 000000413efd9000 r12-15 0000000000000000 000000004060d800 0000000010fe3000 0000000010fe3000 r16-19 000000413edd82a0 00000041078ddbc0 0000000000000010 0000000000000001 r20-23 0008f3d0d83a8000 0000000000000000 00000040829c7778 0000000000000080 r24-27 00000001bf349e40 00000001bf349e40 202d66202224640d 00000000405a95e0 r28-31 202d662022246465 000000413edd88f0 000000413edd89a0 0000000000000001 sr00-03 000000000532c000 0000000000000000 0000000000000000 000000000532c000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401fe42c 00000000401fe430 IIR: 539c0030 ISR: 00000000202d6000 IOR: 000006202224647d CPU: 3 CR30: 000000413edd8000 CR31: 0000000000000000 ORIG_R28: 00000000405a95e0 IAOQ[0]: vma_interval_tree_iter_first+0x14/0x48 IAOQ[1]: vma_interval_tree_iter_first+0x18/0x48 RP(r2): flush_dcache_page+0x128/0x388 Backtrace: flush_dcache_page+0x128/0x388 lo_splice_actor+0x90/0x148 [loop] splice_from_pipe_feed+0xc0/0x1d0 __splice_from_pipe+0xac/0xc0 lo_direct_splice_actor+0x1c/0x70 [loop] splice_direct_to_actor+0xec/0x228 lo_receive+0xe4/0x298 [loop] loop_thread+0x478/0x640 [loop] kthread+0x134/0x168 end_fault_vector+0x20/0x28 xfs_setsize_buftarg+0x0/0x90 [xfs] Kernel panic - not syncing: Bad Address (null pointer deref?) Commit 8456a648 changes the page structure so that the slab subsystem reuses the page->mapping field. The crash happens in the following way: * XFS allocates some memory from slab and issues a bio to read data into it. * the bio is sent to the loopback device. * lo_receive creates an actor and calls splice_direct_to_actor. * lo_splice_actor copies data to the target page. * lo_splice_actor calls flush_dcache_page because the page may be mapped by userspace. In that case we need to flush the kernel cache. * flush_dcache_page asks for the list of userspace mappings, however that page->mapping field is reused by the slab subsystem for a different purpose. This causes the crash. Note that other architectures without coherent caches (sparc, arm, mips) also call page_mapping from flush_dcache_page, so they may crash in the same way. This patch fixes this bug by testing if the page is a slab page in page_mapping and returning NULL if it is. The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in earlier kernels in the same scenario on architectures without cache coherence when CONFIG_DEBUG_VM is enabled - so it should be backported to stable kernels. In the old kernels, the function page_mapping is placed in include/linux/mm.h, so you should modify the patch accordingly when backporting it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: John David Anglin <dave.anglin@bell.net>] Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Qais Yousef authored
commit 5a610fcc upstream. In file included from kernel/crash_dump.c:2:0: include/linux/crash_dump.h:22:27: error: unknown type name `pgprot_t' when CONFIG_CRASH_DUMP=y The error was traced back to commit 9cb21813 ("vmcore: introduce remap_oldmem_pfn_range()") include <asm/pgtable.h> to get the missing definition Signed-off-by: Qais Yousef <qais.yousef@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt authored
commit 1739f09e upstream. Function tracing callbacks expect to have the ftrace_ops that registered it passed to them, not the address of the variable that holds the ftrace_ops that registered it. Use a mov instead of a lea to store the ftrace_ops into the parameter of the function tracing callback. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20131113152004.459787f9@gandalf.local.homeSigned-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hugh Dickins authored
commit eecc1e42 upstream. We see General Protection Fault on RSI in copy_page_rep: that RSI is what you get from a NULL struct page pointer. RIP: 0010:[<ffffffff81154955>] [<ffffffff81154955>] copy_page_rep+0x5/0x10 RSP: 0000:ffff880136e15c00 EFLAGS: 00010286 RAX: ffff880000000000 RBX: ffff880136e14000 RCX: 0000000000000200 RDX: 6db6db6db6db6db7 RSI: db73880000000000 RDI: ffff880dd0c00000 RBP: ffff880136e15c18 R08: 0000000000000200 R09: 000000000005987c R10: 000000000005987c R11: 0000000000000200 R12: 0000000000000001 R13: ffffea00305aa000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f195752f700(0000) GS:ffff880c7fc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000093010000 CR3: 00000001458e1000 CR4: 00000000000027e0 Call Trace: copy_user_huge_page+0x93/0xab do_huge_pmd_wp_page+0x710/0x815 handle_mm_fault+0x15d8/0x1d70 __do_page_fault+0x14d/0x840 do_page_fault+0x2f/0x90 page_fault+0x22/0x30 do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but since shrink_huge_zero_page() can free the huge_zero_page, and we have no hold of our own on it here (except where the fourth test holds page_table_lock and has checked pmd_same), it's possible for it to answer yes the first time, but no to the second or third test. Change all those last three to tests for NULL page. (Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin in https://lkml.org/lkml/2013/3/29/103. I believe that one is due to the source page being split, and a tail page freed, while copy is in progress; and not a problem without DEBUG_PAGEALLOC, since the pmd_same check will prevent a miscopy from being made visible.) Fixes: 97ae1749 ("thp: implement refcounting for huge zero page") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt authored
commit 3dc91d43 upstream. While running stress tests on adding and deleting ftrace instances I hit this bug: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: selinux_inode_permission+0x85/0x160 PGD 63681067 PUD 7ddbe067 PMD 0 Oops: 0000 [#1] PREEMPT CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000 RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160 RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840 RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000 RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54 R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000 R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000 FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0 Call Trace: security_inode_permission+0x1c/0x30 __inode_permission+0x41/0xa0 inode_permission+0x18/0x50 link_path_walk+0x66/0x920 path_openat+0xa6/0x6c0 do_filp_open+0x43/0xa0 do_sys_open+0x146/0x240 SyS_open+0x1e/0x20 system_call_fastpath+0x16/0x1b Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff RIP selinux_inode_permission+0x85/0x160 CR2: 0000000000000020 Investigating, I found that the inode->i_security was NULL, and the dereference of it caused the oops. in selinux_inode_permission(): isec = inode->i_security; rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd); Note, the crash came from stressing the deletion and reading of debugfs files. I was not able to recreate this via normal files. But I'm not sure they are safe. It may just be that the race window is much harder to hit. What seems to have happened (and what I have traced), is the file is being opened at the same time the file or directory is being deleted. As the dentry and inode locks are not held during the path walk, nor is the inodes ref counts being incremented, there is nothing saving these structures from being discarded except for an rcu_read_lock(). The rcu_read_lock() protects against freeing of the inode, but it does not protect freeing of the inode_security_struct. Now if the freeing of the i_security happens with a call_rcu(), and the i_security field of the inode is not changed (it gets freed as the inode gets freed) then there will be no issue here. (Linus Torvalds suggested not setting the field to NULL such that we do not need to check if it is NULL in the permission check). Note, this is a hack, but it fixes the problem at hand. A real fix is to restructure the destroy_inode() to call all the destructor handlers from the RCU callback. But that is a major job to do, and requires a lot of work. For now, we just band-aid this bug with this fix (it works), and work on a more maintainable solution in the future. Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.homeSigned-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit f9b0e058 upstream. Commit 4f8ad655 "writeback: Refactor writeback_single_inode()" added a condition to skip clean inode. However this is wrong in WB_SYNC_ALL mode because there we also want to wait for outstanding writeback on possibly clean inode. This was causing occasional data corruption issues on NFS because it uses sync_inode() to make sure all outstanding writes are flushed to the server before truncating the inode and with sync_inode() returning prematurely file was sometimes extended back by an outstanding write after it was truncated. So modify the test to also check for pages under writeback in WB_SYNC_ALL mode. Fixes: 4f8ad655Reported-and-tested-by: Dan Duval <dan.duval@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jean Delvare authored
commit 3f9aec76 upstream. When the core number exceeds 9, the size of the buffer storing the alarm attribute name is insufficient and the attribute name is truncated. This causes libsensors to skip these attributes as the truncated name is not recognized. Reported-by: Andreas Hollmann <hollmann@in.tum.de> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Warren authored
commit 2fac2b89 upstream. The body of i2c_parent_is_i2c_adapter() is currently guarded by I2C_MUX. It should be CONFIG_I2C_MUX instead. Among potentially other problems, this resulted in i2c_lock_adapter() only locking I2C mux child adapters, and not the parent adapter. In turn, this could allow inter-mingling of mux child selection and I2C transactions, which could result in I2C transactions being directed to the wrong I2C bus, and possibly even switching between busses in the middle of a transaction. One concrete issue caused by this bug was corrupted HDMI EDID reads during boot on the NVIDIA Tegra Seaboard system, although this only became apparent in recent linux-next, when the boot timing was changed just enough to trigger the race condition. Fixes: 3923172b ("i2c: reduce parent checking to a NOOP in non-I2C_MUX case") Cc: Phil Carmody <phil.carmody@partner.samsung.com> Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit 1f7f4dde upstream. Serge Hallyn <serge.hallyn@ubuntu.com> writes: > Hi Oleg, > > commit 40a0d32d : > "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" > breaks lxc-attach in 3.12. That code forks a child which does > setns() and then does a clone(CLONE_PARENT). That way the > grandchild can be in the right namespaces (which the child was > not) and be a child of the original task, which is the monitor. > > lxc-attach in 3.11 was working fine with no side effects that I > could see. Is there a real danger in allowing CLONE_PARENT > when current->nsproxy->pidns_for_children is not our pidns, > or was this done out of an "over-abundance of caution"? Can we > safely revert that new extra check? The two fundamental things I know we can not allow are: - A shared signal queue aka CLONE_THREAD. Because we compute the pid and uid of the signal when we place it in the queue. - Changing the pid and by extention pid_namespace of an existing process. From a parents perspective there is nothing special about the pid namespace, to deny CLONE_PARENT, because the parent simply won't know or care. From the childs perspective all that is special really are shared signal queues. User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks in different pid namespaces is almost certainly going to break because it is complicated. But shared signal handlers can look at per thread information to know which pid namespace a process is in, so I don't know of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads at the kernel level. It would be absolutely stupid to implement but that is a different thing. So hmm. Because it can do no harm, and because it is a regression let's remove the CLONE_PARENT check and send it stable. Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit 41301ae7 upstream. Gao feng <gaofeng@cn.fujitsu.com> reported that commit e51db735 userns: Better restrictions on when proc and sysfs can be mounted caused a regression on mounting a new instance of proc in a mount namespace created with user namespace privileges, when binfmt_misc is mounted on /proc/sys/fs/binfmt_misc. This is an unintended regression caused by the absolutely bogus empty directory check in fs_fully_visible. The check fs_fully_visible replaced didn't even bother to attempt to verify proc was fully visible and hiding proc files with any kind of mount is rare. So for now fix the userspace regression by allowing directory with nlink == 1 as /proc/sys/fs/binfmt_misc has. I will have a better patch but it is not stable material, or last minute kernel material. So it will have to wait. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Tested-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit f48cfddc upstream. Aditya Kali (adityakali@google.com) wrote: > Commit bf056bfa: > "proc: Fix the namespace inode permission checks." converted > the namespace files into symlinks. The same commit changed > the way namespace bind mounts appear in /proc/mounts: > $ mount --bind /proc/self/ns/ipc /mnt/ipc > Originally: > $ cat /proc/mounts | grep ipc > proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0 > > After commit bf056bfa: > $ cat /proc/mounts | grep ipc > proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0 > > This breaks userspace which expects the 2nd field in > /proc/mounts to be a valid path. The symlink /proc/<pid>/ns/{ipc,mnt,net,pid,user,uts} point to dentries allocated with d_alloc_pseudo that we can mount, and that have interesting names printed out with d_dname. When these files are bind mounted /proc/mounts is not currently displaying the mount point correctly because d_dname is called instead of just displaying the path where the file is mounted. Solve this by adding an explicit check to distinguish mounted pseudo inodes and unmounted pseudo inodes. Unmounted pseudo inodes always use mount of their filesstem as the mnt_root in their path making these two cases easy to distinguish. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
H Hartley Sweeten authored
commit 48108fe3 upstream. The dev->irq passed to request_irq() will always be 0 when the auto_attach function is called. The pcidev->irq should be used instead to get the correct irq number. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
H Hartley Sweeten authored
commit 90daf69a upstream. The SDF_CMD_READ should be one of the s->subdev_flags not part of the s->type. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bob Peterson authored
commit 62e96cf8 upstream. This patch calls get_write_access in function gfs2_setattr_chown, which merely increases inode->i_writecount for the duration of the function. That will ensure that any file closes won't delete the inode's multi-block reservation while the function is running. It also ensures that a multi-block reservation exists when needed for quota change operations during the chown. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Robert Richter authored
commit bee09ed9 upstream. On AMD family 10h we see following error messages while waking up from S3 for all non-boot CPUs leading to a failed IBS initialization: Enabling non-boot CPUs ... smpboot: Booting Node 0 Processor 1 APIC 0x1 [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu perf: IBS APIC setup failed on cpu #1 process: Switch to broadcast mode on CPU1 CPU1 is up ... ACPI: Waking up from system sleep state S3 Reason for this is that during suspend the LVT offset for the IBS vector gets lost and needs to be reinialized while resuming. The offset is read from the IBSCTL msr. On family 10h the offset needs to be 1 as offset 0 is used for the MCE threshold interrupt, but firmware assings it for IBS to 0 too. The kernel needs to reprogram the vector. The msr is a readonly node msr, but a new value can be written via pci config space access. The reinitialization is implemented for family 10h in setup_ibs_ctl() which is forced during IBS setup. This patch fixes IBS setup after waking up from S3 by adding resume/supend hooks for the boot cpu which does the offset reinitialization. Marking it as stable to let distros pick up this fix. Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rafael J. Wysocki authored
commit 2b844ba7 upstream. This reverts commit f6308b36 (ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs), because it causes the Alan Cox' ASUS T100TA to "crash and burn" during boot if the Baytrail pinctrl driver is compiled in. Fixes: f6308b36 (ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs) Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Requested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 Jan, 2014 2 commits
-
-
Greg Kroah-Hartman authored
-
Paul Turner authored
commit 0ac9b1c2 upstream. Currently, group entity load-weights are initialized to zero. This admits some races with respect to the first time they are re-weighted in earlty use. ( Let g[x] denote the se for "g" on cpu "x". ) Suppose that we have root->a and that a enters a throttled state, immediately followed by a[0]->t1 (the only task running on cpu[0]) blocking: put_prev_task(group_cfs_rq(a[0]), t1) put_prev_entity(..., t1) check_cfs_rq_runtime(group_cfs_rq(a[0])) throttle_cfs_rq(group_cfs_rq(a[0])) Then, before unthrottling occurs, let a[0]->b[0]->t2 wake for the first time: enqueue_task_fair(rq[0], t2) enqueue_entity(group_cfs_rq(b[0]), t2) enqueue_entity_load_avg(group_cfs_rq(b[0]), t2) account_entity_enqueue(group_cfs_ra(b[0]), t2) update_cfs_shares(group_cfs_rq(b[0])) < skipped because b is part of a throttled hierarchy > enqueue_entity(group_cfs_rq(a[0]), b[0]) ... We now have b[0] enqueued, yet group_cfs_rq(a[0])->load.weight == 0 which violates invariants in several code-paths. Eliminate the possibility of this by initializing group entity weight. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-