1. 01 Feb, 2017 19 commits
    • Nikolay Borisov's avatar
      userns: Make ucounts lock irq-safe · c7556867
      Nikolay Borisov authored
      commit 880a3854 upstream.
      
      The ucounts_lock is being used to protect various ucounts lifecycle
      management functionalities. However, those services can also be invoked
      when a pidns is being freed in an RCU callback (e.g. softirq context).
      This can lead to deadlocks. There were already efforts trying to
      prevent similar deadlocks in add7c65c ("pid: fix lockdep deadlock
      warning due to ucount_lock"), however they just moved the context
      from hardirq to softrq. Fix this issue once and for all by explictly
      making the lock disable irqs altogether.
      
      Dmitry Vyukov <dvyukov@google.com> reported:
      
      > I've got the following deadlock report while running syzkaller fuzzer
      > on eec0d3d065bfcdf9cd5f56dd2a36b94d12d32297 of linux-next (on odroid
      > device if it matters):
      >
      > =================================
      > [ INFO: inconsistent lock state ]
      > 4.10.0-rc3-next-20170112-xc2-dirty #6 Not tainted
      > ---------------------------------
      > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      > swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      >  (ucounts_lock){+.?...}, at: [<     inline     >] spin_lock
      > ./include/linux/spinlock.h:302
      >  (ucounts_lock){+.?...}, at: [<ffff2000081678c8>]
      > put_ucounts+0x60/0x138 kernel/ucount.c:162
      > {SOFTIRQ-ON-W} state was registered at:
      > [<ffff2000081c82d8>] mark_lock+0x220/0xb60 kernel/locking/lockdep.c:3054
      > [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2941
      > [<ffff2000081c97a8>] __lock_acquire+0x388/0x3260 kernel/locking/lockdep.c:3295
      > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
      > [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
      > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
      > [<     inline     >] spin_lock ./include/linux/spinlock.h:302
      > [<     inline     >] get_ucounts kernel/ucount.c:131
      > [<ffff200008167c28>] inc_ucount+0x80/0x6c8 kernel/ucount.c:189
      > [<     inline     >] inc_mnt_namespaces fs/namespace.c:2818
      > [<ffff200008481850>] alloc_mnt_ns+0x78/0x3a8 fs/namespace.c:2849
      > [<ffff200008487298>] create_mnt_ns+0x28/0x200 fs/namespace.c:2959
      > [<     inline     >] init_mount_tree fs/namespace.c:3199
      > [<ffff200009bd6674>] mnt_init+0x258/0x384 fs/namespace.c:3251
      > [<ffff200009bd60bc>] vfs_caches_init+0x6c/0x80 fs/dcache.c:3626
      > [<ffff200009bb1114>] start_kernel+0x414/0x460 init/main.c:648
      > [<ffff200009bb01e8>] __primary_switched+0x6c/0x70 arch/arm64/kernel/head.S:456
      > irq event stamp: 2316924
      > hardirqs last  enabled at (2316924): [<     inline     >] rcu_do_batch
      > kernel/rcu/tree.c:2911
      > hardirqs last  enabled at (2316924): [<     inline     >]
      > invoke_rcu_callbacks kernel/rcu/tree.c:3182
      > hardirqs last  enabled at (2316924): [<     inline     >]
      > __rcu_process_callbacks kernel/rcu/tree.c:3149
      > hardirqs last  enabled at (2316924): [<ffff200008210414>]
      > rcu_process_callbacks+0x7a4/0xc28 kernel/rcu/tree.c:3166
      > hardirqs last disabled at (2316923): [<     inline     >] rcu_do_batch
      > kernel/rcu/tree.c:2900
      > hardirqs last disabled at (2316923): [<     inline     >]
      > invoke_rcu_callbacks kernel/rcu/tree.c:3182
      > hardirqs last disabled at (2316923): [<     inline     >]
      > __rcu_process_callbacks kernel/rcu/tree.c:3149
      > hardirqs last disabled at (2316923): [<ffff20000820fe80>]
      > rcu_process_callbacks+0x210/0xc28 kernel/rcu/tree.c:3166
      > softirqs last  enabled at (2316912): [<ffff20000811b4c4>]
      > _local_bh_enable+0x4c/0x80 kernel/softirq.c:155
      > softirqs last disabled at (2316913): [<     inline     >]
      > do_softirq_own_stack ./include/linux/interrupt.h:488
      > softirqs last disabled at (2316913): [<     inline     >]
      > invoke_softirq kernel/softirq.c:371
      > softirqs last disabled at (2316913): [<ffff20000811c994>]
      > irq_exit+0x264/0x308 kernel/softirq.c:405
      >
      > other info that might help us debug this:
      >  Possible unsafe locking scenario:
      >
      >        CPU0
      >        ----
      >   lock(ucounts_lock);
      >   <Interrupt>
      >     lock(ucounts_lock);
      >
      >  *** DEADLOCK ***
      >
      > 1 lock held by swapper/2/0:
      >  #0:  (rcu_callback){......}, at: [<     inline     >] __rcu_reclaim
      > kernel/rcu/rcu.h:108
      >  #0:  (rcu_callback){......}, at: [<     inline     >] rcu_do_batch
      > kernel/rcu/tree.c:2919
      >  #0:  (rcu_callback){......}, at: [<     inline     >]
      > invoke_rcu_callbacks kernel/rcu/tree.c:3182
      >  #0:  (rcu_callback){......}, at: [<     inline     >]
      > __rcu_process_callbacks kernel/rcu/tree.c:3149
      >  #0:  (rcu_callback){......}, at: [<ffff200008210390>]
      > rcu_process_callbacks+0x720/0xc28 kernel/rcu/tree.c:3166
      >
      > stack backtrace:
      > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-rc3-next-20170112-xc2-dirty #6
      > Hardware name: Hardkernel ODROID-C2 (DT)
      > Call trace:
      > [<ffff20000808fa60>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:500
      > [<ffff20000808fec0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:225
      > [<ffff2000088a99e0>] dump_stack+0x110/0x168
      > [<ffff2000082fa2b4>] print_usage_bug.part.27+0x49c/0x4bc
      > kernel/locking/lockdep.c:2387
      > [<     inline     >] print_usage_bug kernel/locking/lockdep.c:2357
      > [<     inline     >] valid_state kernel/locking/lockdep.c:2400
      > [<     inline     >] mark_lock_irq kernel/locking/lockdep.c:2617
      > [<ffff2000081c89ec>] mark_lock+0x934/0xb60 kernel/locking/lockdep.c:3065
      > [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2923
      > [<ffff2000081c9a60>] __lock_acquire+0x640/0x3260 kernel/locking/lockdep.c:3295
      > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
      > [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
      > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
      > [<     inline     >] spin_lock ./include/linux/spinlock.h:302
      > [<ffff2000081678c8>] put_ucounts+0x60/0x138 kernel/ucount.c:162
      > [<ffff200008168364>] dec_ucount+0xf4/0x158 kernel/ucount.c:214
      > [<     inline     >] dec_pid_namespaces kernel/pid_namespace.c:89
      > [<ffff200008293dc8>] delayed_free_pidns+0x40/0xe0 kernel/pid_namespace.c:156
      > [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
      > [<     inline     >] rcu_do_batch kernel/rcu/tree.c:2919
      > [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3182
      > [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3149
      > [<ffff2000082103d8>] rcu_process_callbacks+0x768/0xc28 kernel/rcu/tree.c:3166
      > [<ffff2000080821dc>] __do_softirq+0x324/0x6e0 kernel/softirq.c:284
      > [<     inline     >] do_softirq_own_stack ./include/linux/interrupt.h:488
      > [<     inline     >] invoke_softirq kernel/softirq.c:371
      > [<ffff20000811c994>] irq_exit+0x264/0x308 kernel/softirq.c:405
      > [<ffff2000081ecc28>] __handle_domain_irq+0xc0/0x150 kernel/irq/irqdesc.c:636
      > [<ffff200008081c80>] gic_handle_irq+0x68/0xd8
      > Exception stack(0xffff8000648e7dd0 to 0xffff8000648e7f00)
      > 7dc0:                                   ffff8000648d4b3c 0000000000000007
      > 7de0: 0000000000000000 1ffff0000c91a967 1ffff0000c91a967 1ffff0000c91a967
      > 7e00: ffff20000a4b6b68 0000000000000001 0000000000000007 0000000000000001
      > 7e20: 1fffe4000149ae90 ffff200009d35000 0000000000000000 0000000000000002
      > 7e40: 0000000000000000 0000000000000000 0000000002624a1a 0000000000000000
      > 7e60: 0000000000000000 ffff200009cbcd88 000060006d2ed000 0000000000000140
      > 7e80: ffff200009cff000 ffff200009cb6000 ffff200009cc2020 ffff200009d2159d
      > 7ea0: 0000000000000000 ffff8000648d4380 0000000000000000 ffff8000648e7f00
      > 7ec0: ffff20000820a478 ffff8000648e7f00 ffff20000820a47c 0000000010000145
      > 7ee0: 0000000000000140 dfff200000000000 ffffffffffffffff ffff20000820a478
      > [<ffff2000080837f8>] el1_irq+0xb8/0x130 arch/arm64/kernel/entry.S:486
      > [<     inline     >] arch_local_irq_restore
      > ./arch/arm64/include/asm/irqflags.h:81
      > [<ffff20000820a47c>] rcu_idle_exit+0x64/0xa8 kernel/rcu/tree.c:1030
      > [<     inline     >] cpuidle_idle_call kernel/sched/idle.c:200
      > [<ffff2000081bcbfc>] do_idle+0x1dc/0x2d0 kernel/sched/idle.c:243
      > [<ffff2000081bd1cc>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:345
      > [<ffff200008099f8c>] secondary_start_kernel+0x2cc/0x358
      > arch/arm64/kernel/smp.c:276
      > [<000000000279f1a4>] 0x279f1a4
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: add7c65c ("pid: fix lockdep deadlock warning due to ucount_lock")
      Fixes: f333c700 ("pidns: Add a limit on the number of pid namespaces")
      Link: https://www.spinics.net/lists/kernel/msg2426637.htmlSigned-off-by: default avatarNikolay Borisov <n.borisov.lkml@gmail.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7556867
    • Will Deacon's avatar
      vring: Force use of DMA API for ARM-based systems with legacy devices · 13e39d59
      Will Deacon authored
      commit c7070619 upstream.
      
      Booting Linux on an ARM fastmodel containing an SMMU emulation results
      in an unexpected I/O page fault from the legacy virtio-blk PCI device:
      
      [    1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received:
      [    1.211800] arm-smmu-v3 2b400000.smmu:	0x00000000fffff010
      [    1.211880] arm-smmu-v3 2b400000.smmu:	0x0000020800000000
      [    1.211959] arm-smmu-v3 2b400000.smmu:	0x00000008fa081002
      [    1.212075] arm-smmu-v3 2b400000.smmu:	0x0000000000000000
      [    1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received:
      [    1.212234] arm-smmu-v3 2b400000.smmu:	0x00000000fffff010
      [    1.212314] arm-smmu-v3 2b400000.smmu:	0x0000020800000000
      [    1.212394] arm-smmu-v3 2b400000.smmu:	0x00000008fa081000
      [    1.212471] arm-smmu-v3 2b400000.smmu:	0x0000000000000000
      
      <system hangs failing to read partition table>
      
      This is because the legacy virtio-blk device is behind an SMMU, so we
      have consequently swizzled its DMA ops and configured the SMMU to
      translate accesses. This then requires the vring code to use the DMA API
      to establish translations, otherwise all transactions will result in
      fatal faults and termination.
      
      Given that ARM-based systems only see an SMMU if one is really present
      (the topology is all described by firmware tables such as device-tree or
      IORT), then we can safely use the DMA API for all legacy virtio devices.
      Modern devices can advertise the prescense of an IOMMU using the
      VIRTIO_F_IOMMU_PLATFORM feature flag.
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Fixes: 876945db ("arm64: Hook up IOMMU dma_ops")
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13e39d59
    • Vlastimil Babka's avatar
      mm, page_alloc: fix premature OOM when racing with cpuset mems update · 96e5cec1
      Vlastimil Babka authored
      commit e47483bc upstream.
      
      Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
      triggers OOM killer in few seconds, despite lots of free memory.  The
      test attempts to repeatedly fault in memory in one process in a cpuset,
      while changing allowed nodes of the cpuset between 0 and 1 in another
      process.
      
      The problem comes from insufficient protection against cpuset changes,
      which can cause get_page_from_freelist() to consider all zones as
      non-eligible due to nodemask and/or current->mems_allowed.  This was
      masked in the past by sufficient retries, but since commit 682a3385
      ("mm, page_alloc: inline the fast path of the zonelist iterator") we fix
      the preferred_zoneref once, and don't iterate over the whole zonelist in
      further attempts, thus the only eligible zones might be placed in the
      zonelist before our starting point and we always miss them.
      
      A previous patch fixed this problem for current->mems_allowed.  However,
      cpuset changes also update the task's mempolicy nodemask.  The fix has
      two parts.  We have to repeat the preferred_zoneref search when we
      detect cpuset update by way of seqcount, and we have to check the
      seqcount before considering OOM.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Link: http://lkml.kernel.org/r/20170120103843.24587-5-vbabka@suse.cz
      Fixes: c33d6c06 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarGanapatrao Kulkarni <gpkulkarni@gmail.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96e5cec1
    • Vlastimil Babka's avatar
      mm, page_alloc: move cpuset seqcount checking to slowpath · b678e4ff
      Vlastimil Babka authored
      commit 5ce9bfef upstream.
      
      This is a preparation for the following patch to make review simpler.
      While the primary motivation is a bug fix, this also simplifies the fast
      path, although the moved code is only enabled when cpusets are in use.
      
      Link: http://lkml.kernel.org/r/20170120103843.24587-4-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b678e4ff
    • Vlastimil Babka's avatar
      mm, page_alloc: fix fast-path race with cpuset update or removal · d1656c5a
      Vlastimil Babka authored
      commit 16096c25 upstream.
      
      Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
      triggers OOM killer in few seconds, despite lots of free memory.  The
      test attempts to repeatedly fault in memory in one process in a cpuset,
      while changing allowed nodes of the cpuset between 0 and 1 in another
      process.
      
      One possible cause is that in the fast path we find the preferred
      zoneref according to current mems_allowed, so that it points to the
      middle of the zonelist, skipping e.g.  zones of node 1 completely.  If
      the mems_allowed is updated to contain only node 1, we never reach it in
      the zonelist, and trigger OOM before checking the cpuset_mems_cookie.
      
      This patch fixes the particular case by redoing the preferred zoneref
      search if we switch back to the original nodemask.  The condition is
      also slightly changed so that when the last non-root cpuset is removed,
      we don't miss it.
      
      Note that this is not a full fix, and more patches will follow.
      
      Link: http://lkml.kernel.org/r/20170120103843.24587-3-vbabka@suse.cz
      Fixes: 682a3385 ("mm, page_alloc: inline the fast path of the zonelist iterator")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarGanapatrao Kulkarni <gpkulkarni@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1656c5a
    • Vlastimil Babka's avatar
      mm, page_alloc: fix check for NULL preferred_zone · ade7afe9
      Vlastimil Babka authored
      commit ea57485a upstream.
      
      Patch series "fix premature OOM regression in 4.7+ due to cpuset races".
      
      This is v2 of my attempt to fix the recent report based on LTP cpuset
      stress test [1].  The intention is to go to stable 4.9 LTSS with this,
      as triggering repeated OOMs is not nice.  That's why the patches try to
      be not too intrusive.
      
      Unfortunately why investigating I found that modifying the testcase to
      use per-VMA policies instead of per-task policies will bring the OOM's
      back, but that seems to be much older and harder to fix problem.  I have
      posted a RFC [2] but I believe that fixing the recent regressions has a
      higher priority.
      
      Longer-term we might try to think how to fix the cpuset mess in a better
      and less error prone way.  I was for example very surprised to learn,
      that cpuset updates change not only task->mems_allowed, but also
      nodemask of mempolicies.  Until now I expected the parameter to
      alloc_pages_nodemask() to be stable.  I wonder why do we then treat
      cpusets specially in get_page_from_freelist() and distinguish HARDWALL
      etc, when there's unconditional intersection between mempolicy and
      cpuset.  I would expect the nodemask adjustment for saving overhead in
      g_p_f(), but that clearly doesn't happen in the current form.  So we
      have both crazy complexity and overhead, AFAICS.
      
      [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
      [2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz
      
      This patch (of 4):
      
      Since commit c33d6c06 ("mm, page_alloc: avoid looking up the first
      zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
      which can theoretically happen due to concurrent cpuset modification.  We
      check the zoneref pointer which is never NULL and we should check the zone
      pointer.  Also document this in first_zones_zonelist() comment per Michal
      Hocko.
      
      Fixes: c33d6c06 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
      Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ade7afe9
    • Vlastimil Babka's avatar
      mm/mempolicy.c: do not put mempolicy before using its nodemask · 9b1a1ae9
      Vlastimil Babka authored
      commit d51e9894 upstream.
      
      Since commit be97a41b ("mm/mempolicy.c: merge alloc_hugepage_vma to
      alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by
      mpol_cond_put() before accessing the embedded nodemask by
      __alloc_pages_nodemask().  The commit log says it's so "we can use a
      single exit path within the function" but that's clearly wrong.  We can
      still do that when doing mpol_cond_put() after the allocation attempt.
      
      Make sure the mempolicy is not freed prematurely, otherwise
      __alloc_pages_nodemask() can end up using a bogus nodemask, which could
      lead e.g.  to premature OOM.
      
      Fixes: be97a41b ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma")
      Link: http://lkml.kernel.org/r/20170118141124.8345-1-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b1a1ae9
    • Keno Fischer's avatar
      mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp · 6676aa65
      Keno Fischer authored
      commit 8310d48b upstream.
      
      In commit 19be0eaf ("mm: remove gup_flags FOLL_WRITE games from
      __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
      after a COW was resolved to setting the (newly introduced) FOLL_COW
      instead.  Simultaneously, the check in gup.c was updated to still allow
      writes with FOLL_FORCE set if FOLL_COW had also been set.
      
      However, a similar check in huge_memory.c was forgotten.  As a result,
      remote memory writes to ro regions of memory backed by transparent huge
      pages cause an infinite loop in the kernel (handle_mm_fault sets
      FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
      out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
      true.
      
      While in this state the process is stil SIGKILLable, but little else
      works (e.g.  no ptrace attach, no other signals).  This is easily
      reproduced with the following code (assuming thp are set to always):
      
          #include <assert.h>
          #include <fcntl.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <string.h>
          #include <sys/mman.h>
          #include <sys/stat.h>
          #include <sys/types.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          #define TEST_SIZE 5 * 1024 * 1024
      
          int main(void) {
            int status;
            pid_t child;
            int fd = open("/proc/self/mem", O_RDWR);
            void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                              MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
            assert(addr != MAP_FAILED);
            pid_t parent_pid = getpid();
            if ((child = fork()) == 0) {
              void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                                 MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
              assert(addr2 != MAP_FAILED);
              memset(addr2, 'a', TEST_SIZE);
              pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
              return 0;
            }
            assert(child == waitpid(child, &status, 0));
            assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
            return 0;
          }
      
      Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
      to the update in gup.c in the original commit.  The same pattern exists
      in follow_devmap_pmd.  However, we should not be able to reach that
      check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
      ever do.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.comSigned-off-by: default avatarKeno Fischer <keno@juliacomputing.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6676aa65
    • Lucas Stach's avatar
      drm/atomic: clear out fence when duplicating state · a2104c7c
      Lucas Stach authored
      [Fixed differently in 4.10]
      
      The fence needs to be cleared out, otherwise the following commit
      might wait on a stale fence from the previous commit. This was fixed
      as a side effect of 96260142 (drm/fence: add in-fences support)
      in kernel 4.10.
      
      As this commit introduces new functionality and as such can not be
      applied to stable, this patch is the minimal fix for the kernel 4.9
      stable series.
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Tested-by: default avatarFabio Estevam <fabio.estevam@nxp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2104c7c
    • Alex Deucher's avatar
      Revert "drm/radeon: always apply pci shutdown callbacks" · bbae3c45
      Alex Deucher authored
      commit b9b487e4 upstream.
      
      This seems to break reboot on some evergreen systems.
      
      bugs:
      https://bugs.freedesktop.org/show_bug.cgi?id=99524
      https://bugzilla.kernel.org/show_bug.cgi?id=192271
      
      This reverts commit a481daa8.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbae3c45
    • Dan Carpenter's avatar
      drm/vc4: fix a bounds check · 5270c017
      Dan Carpenter authored
      commit 21ccc324 upstream.
      
      We accidentally return success even if vc4_full_res_bounds_check() fails.
      
      Fixes: d5b1a78a ("drm/vc4: Add support for drawing 3D frames.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarEric Engestrom <eric@engestrom.ch>
      Reviewed-by: default avatarEric Anholt <eric@anholt.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5270c017
    • Eric Anholt's avatar
      drm/vc4: Return -EINVAL on the overflow checks failing. · cfba2a00
      Eric Anholt authored
      commit 6b8ac638 upstream.
      
      By failing to set the errno, we'd continue on to trying to set up the
      RCL, and then oops on trying to dereference the tile_bo that binning
      validation should have set up.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarEric Anholt <eric@anholt.net>
      Fixes: d5b1a78a ("drm/vc4: Add support for drawing 3D frames.")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfba2a00
    • Eric Anholt's avatar
      drm/vc4: Fix an integer overflow in temporary allocation layout. · b9edac54
      Eric Anholt authored
      commit 0f2ff82e upstream.
      
      We copy the unvalidated ioctl arguments from the user into kernel
      temporary memory to run the validation from, to avoid a race where the
      user updates the unvalidate contents in between validating them and
      copying them into the validated BO.
      
      However, in setting up the layout of the kernel side, we failed to
      check one of the additions (the roundup() for shader_rec_offset)
      against integer overflow, allowing a nearly MAX_UINT value of
      bin_cl_size to cause us to under-allocate the temporary space that we
      then copy_from_user into.
      Reported-by: default avatarMurray McAllister <murray.mcallister@insomniasec.com>
      Signed-off-by: default avatarEric Anholt <eric@anholt.net>
      Fixes: d5b1a78a ("drm/vc4: Add support for drawing 3D frames.")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9edac54
    • Eric Anholt's avatar
      drm/vc4: Fix memory leak of the CRTC state. · 32600835
      Eric Anholt authored
      commit 7622b255 upstream.
      
      The underscores variant frees the pointers inside, while the
      no-underscores variant calls underscores and then frees the struct.
      Signed-off-by: default avatarEric Anholt <eric@anholt.net>
      Fixes: d8dbf44f ("drm/vc4: Make the CRTCs cooperate on allocating display lists.")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32600835
    • Ville Syrjälä's avatar
      drm/i915: Ignore bogus plane coordinates on SKL when the plane is not visible · 4c741e2a
      Ville Syrjälä authored
      commit 3bfdfdcb upstream.
      
      When the plane is invisible we may have all sorts of bogus stuff
      in the coordinates, which we must ignore or else we might fail the
      plane update. This started to happen on SKL when I moved the plane
      offset computation to happen in the check phase. Previously we
      happily ignored it all since we never called the update_plane hook
      with an invisible plane.
      
      Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
      Cc: drm-intel-fixes@lists.freedesktop.org
      Fixes: b63a16f6 ("drm/i915: Compute display surface offset in the plane check hook for SKL+")
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98258
      Testcase: igt/pm_rpm/legacy-planes
      Testcase: igt/pm_rpm/universal-planes
      Reviewed-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1478550057-24864-3-git-send-email-ville.syrjala@linux.intel.com
      (cherry picked from commit a5e4c7d0)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c741e2a
    • Takashi Iwai's avatar
      drm: Fix broken VT switch with video=1366x768 option · f1dc9aae
      Takashi Iwai authored
      commit fdf35a6b upstream.
      
      I noticed that the VT switch doesn't work any longer with a Dell
      laptop with 1366x768 eDP when the machine is connected with a DP
      monitor.  It behaves as if VT were switched, but the graphics remain
      frozen.  Actually the keyboard works, so I could switch back to VT7
      again.
      
      I tried to track down the problem, and encountered a long story until
      we reach to this error:
      
      - The machine is booted with video=1366x768 option (the distro
        installer seems to add it as default).
      - Recently, drm_helper_probe_single_connector_modes() deals with
        cmdline modes, and it tries to create a new mode when no
        matching mode is found.
      - The drm_mode_create_from_cmdline_mode() creates a mode based on
        either CVT of GFT according to the given cmdline mode; in our case,
        it's 1366x768.
      - Since both CVT and GFT can't express the width 1366 due to
        alignment, the resultant mode becomes 1368x768, slightly larger than
        the given size.
      - Later on, the atomic commit is performed, and in
        drm_atomic_check_only(), the size of each plane is checked.
      - The size check of 1366x768 fails due to the above, and eventually
        the whole VT switch fails.
      
      Back in the history, we've had a manual fix-up of 1368x768 in various
      places via c09dedb7 ("drm/edid: Add a workaround for 1366x768 HD
      panel"), but they have been all in drm_edid.c at probing the modes
      from EDID.  For addressing the problem above, we need a similar hack
      to the mode newly created from cmdline, manually adjusting the width
      when the expected size is 1366 while we get 1368 instead.
      
      Fixes: eaf99c74 ("drm: Perform cmdline mode parsing during...")
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170109145614.29454-1-tiwai@suse.deReviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1dc9aae
    • Peter Ujfalusi's avatar
      drm: Schedule the output_poll_work with 1s delay if we have delayed event · 2abb7f40
      Peter Ujfalusi authored
      commit 68f458ee upstream.
      
      Instead of scheduling the work to handle the initial delayed event, use 1s
      delay.
      
      This delay should not be needed, but Optimus/nouveau will fail in a
      mysterious way if the delayed event is handled as soon as possible like it
      is done in drm_helper_probe_single_connector_modes() in case the poll
      was enabled before.
      
      Reverting 339fd362 would give back the 10 sec (!) delay to handle the
      delayed event. Adding 1sec delay to the poll_work is enough to work around
      the issue in Optimus setups and gives shorter response on handling the
      initial delayed event.
      
      Fixes: 339fd362 ("drm: drm_probe_helper: Fix output_poll_work scheduling")
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      [danvet: Add FIXME to the comment to make it stick out more.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170109143158.21917-1-peter.ujfalusi@ti.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2abb7f40
    • Dave Martin's avatar
      tile/ptrace: Preserve previous registers for short regset write · e4be4d49
      Dave Martin authored
      commit fd7c9914 upstream.
      
      Ensure that if userspace supplies insufficient data to
      PTRACE_SETREGSET to fill all the registers, the thread's old
      registers are preserved.
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4be4d49
    • Kees Cook's avatar
      fbdev: color map copying bounds checking · 544160b6
      Kees Cook authored
      commit 2dc705a9 upstream.
      
      Copying color maps to userspace doesn't check the value of to->start,
      which will cause kernel heap buffer OOB read due to signedness wraps.
      
      CVE-2016-8405
      
      Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: Peter Pi (@heisecode) of Trend Micro
      Cc: Min Chong <mchong@google.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      544160b6
  2. 26 Jan, 2017 21 commits