- 20 Oct, 2014 18 commits
-
-
Lai Jiangshan authored
commit 82cfb90b upstream. Commit 98683650 "Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6" switches to the new augment API, but the new API requires that the tree is augmented before rb_insert_augmented() is called, which is missing. So we add the augment-code to drbd_insert_interval() when it travels the tree up to down before rb_insert_augmented(). See the example in include/linux/interval_tree_generic.h or Documentation/rbtree.txt. drbd_insert_interval() may cancel the insertion when traveling, in this case, the just added augment-code does nothing before cancel since the @this node is already in the subtrees in this case. CC: Michel Lespinasse <walken@google.com> Signed-off-by:
Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by:
Andreas Gruenbacher <agruen@linbit.com> Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Douglas Lehr authored
commit 9fe373f9 upstream. The Crocodile chip occasionally comes up with 4k and 8k BAR sizes. Due to an erratum, setting the SR-IOV page size causes the physical function BARs to expand to the system page size. Since ppc64 uses 64k pages, when Linux tries to assign the smaller resource sizes to the now 64k BARs the address will be truncated and the BARs will overlap. Force Linux to allocate the resource as a full page, which avoids the overlap. [bhelgaas: print expanded resource, too] Signed-off-by:
Douglas Lehr <dllehr@us.ibm.com> Signed-off-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Acked-by:
Milton Miller <miltonm@us.ibm.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Richard Genoud authored
commit 1bf1890e upstream. I ran into this error after a ubiupdatevol, because I forgot to backport e9110361 UBI: fix the volumes tree sorting criteria. UBI error: process_pool_aeb: orphaned volume in fastmap pool UBI error: ubi_scan_fastmap: Attach by fastmap failed, doing a full scan! kmem_cache_destroy ubi_ainf_peb_slab: Slab cache still has objects CPU: 0 PID: 1 Comm: swapper Not tainted 3.14.18-00053-gf05cac8dbf85 #1 [<c000d298>] (unwind_backtrace) from [<c000baa8>] (show_stack+0x10/0x14) [<c000baa8>] (show_stack) from [<c01b7a68>] (destroy_ai+0x230/0x244) [<c01b7a68>] (destroy_ai) from [<c01b8fd4>] (ubi_attach+0x98/0x1ec) [<c01b8fd4>] (ubi_attach) from [<c01ade90>] (ubi_attach_mtd_dev+0x2b8/0x868) [<c01ade90>] (ubi_attach_mtd_dev) from [<c038b510>] (ubi_init+0x1dc/0x2ac) [<c038b510>] (ubi_init) from [<c0008860>] (do_one_initcall+0x94/0x140) [<c0008860>] (do_one_initcall) from [<c037aadc>] (kernel_init_freeable+0xe8/0x1b0) [<c037aadc>] (kernel_init_freeable) from [<c02730ac>] (kernel_init+0x8/0xe4) [<c02730ac>] (kernel_init) from [<c00093f0>] (ret_from_fork+0x14/0x24) UBI: scanning is finished Freeing the cache in the error path fixes the Slab error. Tested on at91sam9g35 (3.14.18+fastmap backports) Signed-off-by:
Richard Genoud <richard.genoud@gmail.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Christian Borntraeger authored
commit f346026e upstream. We must not fallthrough if the conditions for external call are not met. Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by:
Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Roger Tseng authored
commit d1419d50 upstream. Current code erroneously fill the last byte of R2 response with an undefined value. In addition, the controller actually 'offloads' the last byte (CRC7, end bit) while receiving R2 response and thus it's impossible to get the actual value. This could cause mmc stack to obtain inconsistent CID from the same card after resume and misidentify it as a different card. Fix by assigning dummy CRC and end bit: {7'b0, 1} = 0x1 to the last byte of R2. Fixes: ff984e57 ("mmc: Add realtek pcie sdmmc host driver") Signed-off-by:
Roger Tseng <rogerable@realtek.com> Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Champion Chen authored
commit 85560c4a upstream. Suspend could fail for some platforms because btusb_suspend==> btusb_stop_traffic ==> usb_kill_anchored_urbs. When btusb_bulk_complete returns before system suspend and resubmits an URB, the system cannot enter suspend state. Signed-off-by:
Champion Chen <champion_chen@realsil.com.cn> Signed-off-by:
Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by:
Marcel Holtmann <marcel@holtmann.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Artem Bityutskiy authored
commit ba29e721 upstream. Hu (hujianyang <hujianyang@huawei.com>) discovered an issue in the 'empty_log_bytes()' function, which calculates how many bytes are left in the log: " If 'c->lhead_lnum + 1 == c->ltail_lnum' and 'c->lhead_offs == c->leb_size', 'h' would equalent to 't' and 'empty_log_bytes()' would return 'c->log_bytes' instead of 0. " At this point it is not clear what would be the consequences of this, and whether this may lead to any problems, but this patch addresses the issue just in case. Tested-by:
hujianyang <hujianyang@huawei.com> Reported-by:
hujianyang <hujianyang@huawei.com> Signed-off-by:
Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
David Matlack authored
commit 56f17dd3 upstream. The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Signed-off-by:
David Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by:
David Matlack <dmatlack@google.com> Reviewed-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by:
David Matlack <dmatlack@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
David Matlack authored
commit ee3d1570 upstream. vcpu exits and memslot mutations can run concurrently as long as the vcpu does not aquire the slots mutex. Thus it is theoretically possible for memslots to change underneath a vcpu that is handling an exit. If we increment the memslot generation number again after synchronize_srcu_expedited(), vcpus can safely cache memslot generation without maintaining a single rcu_dereference through an entire vm exit. And much of the x86/kvm code does not maintain a single rcu_dereference of the current memslots during each exit. We can prevent the following case: vcpu (CPU 0) | thread (CPU 1) --------------------------------------------+-------------------------- 1 vm exit | 2 srcu_read_unlock(&kvm->srcu) | 3 decide to cache something based on | old memslots | 4 | change memslots | (increments generation) 5 | synchronize_srcu(&kvm->srcu); 6 retrieve generation # from new memslots | 7 tag cache with new memslot generation | 8 srcu_read_unlock(&kvm->srcu) | ... | <action based on cache occurs even | though the caching decision was based | on the old memslots> | ... | <action *continues* to occur until next | memslot generation change, which may | be never> | | By incrementing the generation after synchronizing with kvm->srcu readers, we ensure that the generation retrieved in (6) will become invalid soon after (8). Keeping the existing increment is not strictly necessary, but we do keep it and just move it for consistency from update_memslots to install_new_memslots. It invalidates old cached MMIOs immediately, instead of having to wait for the end of synchronize_srcu_expedited, which makes the code more clearly correct in case CPU 1 is preempted right after synchronize_srcu() returns. To avoid halving the generation space in SPTEs, always presume that the low bit of the generation is zero when reconstructing a generation number out of an SPTE. This effectively disables MMIO caching in SPTEs during the call to synchronize_srcu_expedited. Using the low bit this way is somewhat like a seqcount---where the protected thing is a cache, and instead of retrying we can simply punt if we observe the low bit to be 1. Signed-off-by:
David Matlack <dmatlack@google.com> Reviewed-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by:
David Matlack <dmatlack@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> [ kamal: backport to 3.13-stable: also made update_memslots() static, as per 7940876e "kvm: make local functions static" ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Paolo Bonzini authored
commit 00f034a1 upstream. The next patch will give a meaning (a la seqcount) to the low bit of the generation number. Ensure that it matches between kvm->memslots->generation and kvm_current_mmio_generation(). Reviewed-by:
David Matlack <dmatlack@google.com> Reviewed-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Oleg Nesterov authored
commit df24fb85 upstream. Add preempt_disable() + preempt_enable() around math_state_restore() in __restore_xstate_sig(). Otherwise __switch_to() after __thread_fpu_begin() can overwrite fpu->state we are going to restore. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20140902175717.GA21649@redhat.comReviewed-by:
Suresh Siddha <sbsiddha@gmail.com> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Oleg Nesterov authored
commit 66463db4 upstream. save_xstate_sig()->drop_init_fpu() doesn't look right. setup_rt_frame() can fail after that, in this case the next setup_rt_frame() triggered by SIGSEGV won't save fpu simply because the old state was lost. This obviously mean that fpu won't be restored after sys_rt_sigreturn() from SIGSEGV handler. Shift drop_init_fpu() into !failed branch in handle_signal(). Test-case (needs -O2): #include <stdio.h> #include <signal.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/mman.h> #include <pthread.h> #include <assert.h> volatile double D; void test(double d) { int pid = getpid(); for (D = d; D == d; ) { /* sys_tkill(pid, SIGHUP); asm to avoid save/reload * fp regs around "C" call */ asm ("" : : "a"(200), "D"(pid), "S"(1)); asm ("syscall" : : : "ax"); } printf("ERR!!\n"); } void sigh(int sig) { } char altstack[4096 * 10] __attribute__((aligned(4096))); void *tfunc(void *arg) { for (;;) { mprotect(altstack, sizeof(altstack), PROT_READ); mprotect(altstack, sizeof(altstack), PROT_READ|PROT_WRITE); } } int main(void) { stack_t st = { .ss_sp = altstack, .ss_size = sizeof(altstack), .ss_flags = SS_ONSTACK, }; struct sigaction sa = { .sa_handler = sigh, }; pthread_t pt; sigaction(SIGSEGV, &sa, NULL); sigaltstack(&st, NULL); sa.sa_flags = SA_ONSTACK; sigaction(SIGHUP, &sa, NULL); pthread_create(&pt, NULL, tfunc, NULL); test(123.456); return 0; } Reported-by:
Bean Anderson <bean@azulsystems.com> Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20140902175713.GA21646@redhat.comSigned-off-by:
H. Peter Anvin <hpa@linux.intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Cristian Stoica authored
commit 4451d494 upstream. buf_0 and buf_1 in caam_hash_state are not next to each other. Accessing buf_1 is incorrect from &buf_0 with an offset of only size_of(buf_0). The same issue is also with buflen_0 and buflen_1 Signed-off-by:
Cristian Stoica <cristian.stoica@freescale.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Frank Schaefer authored
commit 662c97cf upstream. The correct field order in progressive mode is V4L2_FIELD_NONE, not V4L2_FIELD_INTERLACED. Signed-off-by:
Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by:
Mauro Carvalho Chehab <m.chehab@samsung.com> [ kamal: backport to 3.13-stable: context ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Frank Schaefer authored
commit 627530c3 upstream. When a new video frame is started, the driver takes the next video buffer from the list of active buffers and moves it to dev->usb_ctl.vid_buf / dev->usb_ctl.vbi_buf for further processing. On streaming stop we currently only give back the pending buffers from the list but not the ones which are currently processed. This causes the following warning from the vb2 core since kernel 3.15: ... ------------[ cut here ]------------ WARNING: CPU: 1 PID: 2284 at drivers/media/v4l2-core/videobuf2-core.c:2115 __vb2_queue_cancel+0xed/0x150 [videobuf2_core]() [...] Call Trace: [<c0769c46>] dump_stack+0x48/0x69 [<c0245b69>] warn_slowpath_common+0x79/0x90 [<f925e4ad>] ? __vb2_queue_cancel+0xed/0x150 [videobuf2_core] [<f925e4ad>] ? __vb2_queue_cancel+0xed/0x150 [videobuf2_core] [<c0245bfd>] warn_slowpath_null+0x1d/0x20 [<f925e4ad>] __vb2_queue_cancel+0xed/0x150 [videobuf2_core] [<f925fa35>] vb2_internal_streamoff+0x35/0x90 [videobuf2_core] [<f925fac5>] vb2_streamoff+0x35/0x60 [videobuf2_core] [<f925fb27>] vb2_ioctl_streamoff+0x37/0x40 [videobuf2_core] [<f8e45895>] v4l_streamoff+0x15/0x20 [videodev] [<f8e4925d>] __video_do_ioctl+0x23d/0x2d0 [videodev] [<f8e49020>] ? video_ioctl2+0x20/0x20 [videodev] [<f8e48c63>] video_usercopy+0x203/0x5a0 [videodev] [<f8e49020>] ? video_ioctl2+0x20/0x20 [videodev] [<c039d0e7>] ? fsnotify+0x1e7/0x2b0 [<f8e49012>] video_ioctl2+0x12/0x20 [videodev] [<f8e49020>] ? video_ioctl2+0x20/0x20 [videodev] [<f8e4461e>] v4l2_ioctl+0xee/0x130 [videodev] [<f8e44530>] ? v4l2_open+0xf0/0xf0 [videodev] [<c0378de2>] do_vfs_ioctl+0x2e2/0x4d0 [<c0368eec>] ? vfs_write+0x13c/0x1c0 [<c0369a8f>] ? vfs_writev+0x2f/0x50 [<c0379028>] SyS_ioctl+0x58/0x80 [<c076fff3>] sysenter_do_call+0x12/0x12 ---[ end trace 5545f934409f13f4 ]--- ... Many thanks to Hans Verkuil, whose recently added check in the vb2 core unveiled this long standing issue and who has investigated it further. Signed-off-by:
Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by:
Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mauro Carvalho Chehab authored
commit 29bbb7bd upstream. Add support for PCTV microStick (77e) device that uses a sms1140 chipset. Signed-off-by:
Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Xuelin Shi authored
commit 87cea763 upstream. the partial xor result must be kept until the next tx is generated. Signed-off-by:
Xuelin Shi <xuelin.shi@freescale.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Loic Poulain authored
commit 4807b518 upstream. In this expression: seq = (seq - 1) % 8 seq (u8) is implicitly converted to an int in the arithmetic operation. So if seq value is 0, operation is ((0 - 1) % 8) => (-1 % 8) => -1. The new seq value is 0xff which is an invalid ACK value, we expect 0x07. It leads to frequent dropped ACK and retransmission. Fix this by using '&' binary operator instead of '%'. Signed-off-by:
Loic Poulain <loic.poulain@intel.com> Signed-off-by:
Marcel Holtmann <marcel@holtmann.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 14 Oct, 2014 1 commit
-
-
Kamal Mostafa authored
Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 13 Oct, 2014 1 commit
-
-
Jens Axboe authored
commit 46f341ff upstream. Commit 2da78092 changed the locking from a mutex to a spinlock, so we now longer sleep in this context. But there was a leftover might_sleep() in there, which now triggers since we do the final free from an RCU callback. Get rid of it. Reported-by:
Pontus Fuchs <pontus.fuchs@gmail.com> Signed-off-by:
Jens Axboe <axboe@fb.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 09 Oct, 2014 20 commits
-
-
Josh Triplett authored
commit 62b4d204 upstream. commit 03b8c7b6 ("futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test") added the HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in the middle of the options for the EXPERT menu. However, HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops placing items in the EXPERT menu, and displays the remaining several EXPERT items (starting with EPOLL) directly in the General Setup menu. Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the subsequent items display as part of the EXPERT menu again; the EMBEDDED menu now appears as the next top-level item in the General Setup menu, which makes General Setup much shorter and more usable. Signed-off-by:
Josh Triplett <josh@joshtriplett.org> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Josh Triplett authored
commit 361e9dfb upstream. The buffers sized by CONFIG_LOG_BUF_SHIFT and CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't ask about their size at all. Signed-off-by:
Josh Triplett <josh@joshtriplett.org> Acked-by:
Randy Dunlap <rdunlap@infradead.org> [ kamal: backport to 3.13-stable: only LOG_BUF_SHIFT ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Johannes Weiner authored
commit abe5f972 upstream. The zone allocation batches can easily underflow due to higher-order allocations or spills to remote nodes. On SMP that's fine, because underflows are expected from concurrency and dealt with by returning 0. But on UP, zone_page_state will just return a wrapped unsigned long, which will get past the <= 0 check and then consider the zone eligible until its watermarks are hit. 3a025760 ("mm: page_alloc: spill to remote nodes before waking kswapd") already made the counter-resetting use atomic_long_read() to accomodate underflows from remote spills, but it didn't go all the way with it. Make it clear that these batches are expected to go negative regardless of concurrency, and use atomic_long_read() everywhere. Fixes: 81c0a2bb ("mm: page_alloc: fair zone allocator policy") Reported-by:
Vlastimil Babka <vbabka@suse.cz> Reported-by:
Leon Romanovsky <leon@leon.nu> Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> [ kamal: Johannes' 3.12 backport applied to 3.13-stable ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Peter Zijlstra authored
commit 6c72e350 upstream. Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad.. Suggested-by:
Oleg Nesterov <oleg@redhat.com> Reported-by:
Oleg Nesterov <oleg@redhat.com> Reported-by:
Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Johannes Weiner authored
commit 2f7dd7a4 upstream. The cgroup iterators yield css objects that have not yet gone through css_online(), but they are not complete memcgs at this point and so the memcg iterators should not return them. Commit d8ad3055 ("mm/memcg: iteration skip memcgs not yet fully initialized") set out to implement exactly this, but it uses CSS_ONLINE, a cgroup-internal flag that does not meet the ordering requirements for memcg, and so the iterator may skip over initialized groups, or return partially initialized memcgs. The cgroup core can not reasonably provide a clear answer on whether the object around the css has been fully initialized, as that depends on controller-specific locking and lifetime rules. Thus, introduce a memcg-specific flag that is set after the memcg has been initialized in css_online(), and read before mem_cgroup_iter() callers access the memcg members. Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Acked-by:
Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [ kamal: Johannes' 3.12 backport applied to 3.13-stable ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Steve French authored
commit 19e81573 upstream. Changeset eb85d94b introduced a problem where if a cifs open fails during query info of a file we will still try to close the file (happens with certain types of reparse points) even though the file handle is not valid. In addition for SMB2/SMB3 we were not mapping the return code returned by Windows when trying to open a file (like a Windows NFS symlink) which is a reparse point. Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mel Gorman authored
commit abc40bd2 upstream. This patch reverts 1ba6e0b5 ("mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte"). If a huge page is being split due a protection change and the tail will be in a PROT_NONE vma then NUMA hinting PTEs are temporarily created in the protected VMA. VM_RW|VM_PROTNONE |-----------------| ^ split here In the specific case above, it should get fixed up by change_pte_range() but there is a window of opportunity for weirdness to happen. Similarly, if a huge page is shrunk and split during a protection update but before pmd_numa is cleared then a pte_numa can be left behind. Instead of adding complexity trying to deal with the case, this patch will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults will not be triggered which is marginal in comparison to the complexity in dealing with the corner cases during THP split. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Rik van Riel <riel@redhat.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mel Gorman authored
commit d3cb8bf6 upstream. A migration entry is marked as write if pte_write was true at the time the entry was created. The VMA protections are not double checked when migration entries are being removed as mprotect marks write-migration-entries as read. It means that potentially we take a spurious fault to mark PTEs write again but it's straight-forward. However, there is a race between write migrations being marked read and migrations finishing. This potentially allows a PTE to be write that should have been read. Close this race by double checking the VMA permissions using maybe_mkwrite when migration completes. [torvalds@linux-foundation.org: use maybe_mkwrite] Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
NeilBrown authored
commit 8e0e99ba upstream. It has come to my attention (thanks Martin) that 'discard_zeroes_data' is only a hint. Some devices in some cases don't do what it says on the label. The use of DISCARD in RAID5 depends on reads from discarded regions being predictably zero. If a write to a previously discarded region performs a read-modify-write cycle it assumes that the parity block was consistent with the data blocks. If all were zero, this would be the case. If some are and some aren't this would not be the case. This could lead to data corruption after a device failure when data needs to be reconstructed from the parity. As we cannot trust 'discard_zeroes_data', ignore it by default and so disallow DISCARD on all raid4/5/6 arrays. As many devices are trustworthy, and as there are benefits to using DISCARD, add a module parameter to over-ride this caution and cause DISCARD to work if discard_zeroes_data is set. If a site want to enable DISCARD on some arrays but not on others they should select DISCARD support at the filesystem level, and set the raid456 module parameter. raid456.devices_handle_discard_safely=Y As this is a data-safety issue, I believe this patch is suitable for -stable. DISCARD support for RAID456 was added in 3.7 Cc: Shaohua Li <shli@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Heinz Mauelshagen <heinzm@redhat.com> Acked-by:
Martin K. Petersen <martin.petersen@oracle.com> Acked-by:
Mike Snitzer <snitzer@redhat.com> Fixes: 620125f2Signed-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Nathan Lynch authored
commit 9cc6d9e5 upstream. Joachim Eastwood reports that commit fbfb872f "ARM: 8148/1: flush TLS and thumbee register state during exec" causes a boot-time crash on a Cortex-M4 nommu system: Freeing unused kernel memory: 68K (281e5000 - 281f6000) Unhandled exception: IPSR = 00000005 LR = fffffff1 CPU: 0 PID: 1 Comm: swapper Not tainted 3.17.0-rc6-00313-gd2205fa30aa7 #191 task: 29834000 ti: 29832000 task.ti: 29832000 PC is at flush_thread+0x2e/0x40 LR is at flush_thread+0x21/0x40 pc : [<2800954a>] lr : [<2800953d>] psr: 4100000b sp : 29833d60 ip : 00000000 fp : 00000001 r10: 00003cf8 r9 : 29b1f000 r8 : 00000000 r7 : 29b0bc00 r6 : 29834000 r5 : 29832000 r4 : 29832000 r3 : ffff0ff0 r2 : 29832000 r1 : 00000000 r0 : 282121f0 xPSR: 4100000b CPU: 0 PID: 1 Comm: swapper Not tainted 3.17.0-rc6-00313-gd2205fa30aa7 #191 [<2800afa5>] (unwind_backtrace) from [<2800a327>] (show_stack+0xb/0xc) [<2800a327>] (show_stack) from [<2800a963>] (__invalid_entry+0x4b/0x4c) The problem is that set_tls is attempting to clear the TLS location in the kernel-user helper page, which isn't set up on V7M. Fix this by guarding the write to the kuser helper page with a CONFIG_KUSER_HELPERS ifdef. Fixes: fbfb872f ARM: 8148/1: flush TLS and thumbee register state during exec Reported-by:
Joachim Eastwood <manabian@gmail.com> Tested-by:
Joachim Eastwood <manabian@gmail.com> Signed-off-by:
Nathan Lynch <nathan_lynch@mentor.com> Signed-off-by:
Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Chris Wilson authored
commit 91e56499 upstream. As we use WC updates of the PTE, we are responsible for notifying the hardware when to flush its TLBs. Do so after we zap all the PTEs before suspend (and the BIOS tries to read our GTT). Fixes a regression from commit 828c7908 Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Wed Oct 16 09:21:30 2013 -0700 drm/i915: Disable GGTT PTEs on GEN6+ suspend that survived and continue to cause harm even after commit e568af1c Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Mar 26 20:08:20 2014 +0100 drm/i915: Undo gtt scratch pte unmapping again v2: Trivial rebase. v3: Fixes requires pointer dances. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82340 Tested-by: ming.yao@intel.com Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Todd Previte <tprevite@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by:
Jani Nikula <jani.nikula@intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Arnd Bergmann authored
commit d62dbf77 upstream. When building this driver as a module, we get a helpful warning about the return type: drivers/cpufreq/integrator-cpufreq.c:232:2: warning: initialization from incompatible pointer type .remove = __exit_p(integrator_cpufreq_remove), If the remove callback returns void, the caller gets an undefined value as it expects an integer to be returned. This fixes the problem by passing down the value from cpufreq_unregister_driver. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Xiubo Li authored
commit 6596aa04 upstream. Since we cannot make sure the 'params->num_regs' will always be none zero here, and then if it equals to zero, the kmemdup() will return ZERO_SIZE_PTR, which equals to ((void *)16). So this patch fix this with just doing the zero check before calling kmemdup(). Signed-off-by:
Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by:
Mark Brown <broonie@kernel.org> [ kamal: backport to 3.13-stable: context ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Robin Murphy authored
commit 5ca918e5 upstream. The alignment fixup incorrectly decodes faulting ARM VLDn/VSTn instructions (where the optional alignment hint is given but incorrect) as LDR/STR, leading to register corruption. Detect these and correctly treat them as unhandled, so that userspace gets the fault it expects. Reported-by:
Simon Hosie <simon.hosie@arm.com> Signed-off-by:
Robin Murphy <robin.murphy@arm.com> Signed-off-by:
Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Miklos Szeredi authored
commit b928095b upstream. If overwriting an empty directory with rename, then need to drop the extra nlink. Test prog: #include <stdio.h> #include <fcntl.h> #include <err.h> #include <sys/stat.h> int main(void) { const char *test_dir1 = "test-dir1"; const char *test_dir2 = "test-dir2"; int res; int fd; struct stat statbuf; res = mkdir(test_dir1, 0777); if (res == -1) err(1, "mkdir(\"%s\")", test_dir1); res = mkdir(test_dir2, 0777); if (res == -1) err(1, "mkdir(\"%s\")", test_dir2); fd = open(test_dir2, O_RDONLY); if (fd == -1) err(1, "open(\"%s\")", test_dir2); res = rename(test_dir1, test_dir2); if (res == -1) err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2); res = fstat(fd, &statbuf); if (res == -1) err(1, "fstat(%i)", fd); if (statbuf.st_nlink != 0) { fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink); return 1; } return 0; } Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Peter Feiner authored
commit dbab31aa upstream. This fixes the same bug as b43790ee ("mm: softdirty: don't forget to save file map softdiry bit on unmap") and 9aed8614 ("mm/memory.c: don't forget to set softdirty on file mapped fault") where the return value of pte_*mksoft_dirty was being ignored. To be sure that no other pte/pmd "mk" function return values were being ignored, I annotated the functions in arch/x86/include/asm/pgtable.h with __must_check and rebuilt. The userspace effect of this bug is that the softdirty mark might be lost if a file mapped pte get zapped. Signed-off-by:
Peter Feiner <pfeiner@google.com> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
David Rientjes authored
commit d4a5fca5 upstream. Since commit 45906855 ("mm/sl[aou]b: Common alignment code"), the "ralign" automatic variable in __kmem_cache_create() may be used as uninitialized. The proper alignment defaults to BYTES_PER_WORD and can be overridden by SLAB_RED_ZONE or the alignment specified by the caller. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031Signed-off-by:
David Rientjes <rientjes@google.com> Reported-by:
Andrei Elovikov <a.elovikov@gmail.com> Acked-by:
Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Joseph Qi authored
commit 5760a97c upstream. There is a deadlock case which reported by Guozhonghua: https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html This case is caused by &res->spinlock and &dlm->master_lock misordering in different threads. It was introduced by commit 8d400b81 ("ocfs2/dlm: Clean up refmap helpers"). Since lockres is new, it doesn't not require the &res->spinlock. So remove it. Fixes: 8d400b81 ("ocfs2/dlm: Clean up refmap helpers") Signed-off-by:
Joseph Qi <joseph.qi@huawei.com> Reviewed-by:
joyce.xue <xuejiufei@huawei.com> Reported-by:
Guozhonghua <guozhonghua@h3c.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Andreas Rohner authored
commit 56d7acc7 upstream. This bug leads to reproducible silent data loss, despite the use of msync(), sync() and a clean unmount of the file system. It is easily reproducible with the following script: ----------------[BEGIN SCRIPT]-------------------- mkfs.nilfs2 -f /dev/sdb mount /dev/sdb /mnt dd if=/dev/zero bs=1M count=30 of=/mnt/testfile umount /mnt mount /dev/sdb /mnt CHECKSUM_BEFORE="$(md5sum /mnt/testfile)" /root/mmaptest/mmaptest /mnt/testfile 30 10 5 sync CHECKSUM_AFTER="$(md5sum /mnt/testfile)" umount /mnt mount /dev/sdb /mnt CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)" umount /mnt echo "BEFORE MMAP:\t$CHECKSUM_BEFORE" echo "AFTER MMAP:\t$CHECKSUM_AFTER" echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT" ----------------[END SCRIPT]-------------------- The mmaptest tool looks something like this (very simplified, with error checking removed): ----------------[BEGIN mmaptest]-------------------- data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE, MAP_SHARED, fd, file_offset); for (i = 0; i < write_count; ++i) { memcpy(data + i * 4096, buf, sizeof(buf)); msync(data, file_size - file_offset, MS_SYNC)) } ----------------[END mmaptest]-------------------- The output of the script looks something like this: BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile So it is clear, that the changes done using mmap() do not survive a remount. This can be reproduced a 100% of the time. The problem was introduced in commit 136e8770 ("nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary"). If the page was read with mpage_readpage() or mpage_readpages() for example, then it has no buffers attached to it. In that case page_has_buffers(page) in nilfs_set_page_dirty() will be false. Therefore nilfs_set_file_dirty() is never called and the pages are never collected and never written to disk. This patch fixes the problem by also calling nilfs_set_file_dirty() if the page has no buffers attached to it. [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/] Signed-off-by:
Andreas Rohner <andreas.rohner@gmx.net> Tested-by:
Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Markos Chandras authored
commit 8a574cfa upstream. Every mcount() call in the MIPS 32-bit kernel is done as follows: [...] move at, ra jal _mcount addiu sp, sp, -8 [...] but upon returning from the mcount() function, the stack pointer is not adjusted properly. This is explained in details in 58b69401 (MIPS: Function tracer: Fix broken function tracing). Commit ad8c3969 ("MIPS: Unbreak function tracer for 64-bit kernel.) fixed the stack manipulation for 64-bit but it didn't fix it completely for MIPS32. Signed-off-by:
Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7792/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-