1. 20 Feb, 2014 3 commits
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq · d89985cb
      KOSAKI Motohiro authored
      commit 227d53b3 upstream.
      
      To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
      During aio buffer migration, we have a possibility to see the following
      call stack.
      
      aio_migratepage  [disable interrupt]
        migrate_page_copy
          clear_page_dirty_for_io
            set_page_dirty
              __set_page_dirty_buffers
                __set_page_dirty
                  spin_lock_irq
      
      This mean, current aio migration is a deadlockable.  spin_lock_irqsave
      is a safer alternative and we should use it.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: David Rientjes rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d89985cb
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() · 4d4bed81
      KOSAKI Motohiro authored
      commit a85d9df1 upstream.
      
      During aio stress test, we observed the following lockdep warning.  This
      mean AIO+numa_balancing is currently deadlockable.
      
      The problem is, aio_migratepage disable interrupt, but
      __set_page_dirty_nobuffers unintentionally enable it again.
      
      Generally, all helper function should use spin_lock_irqsave() instead of
      spin_lock_irq() because they don't know caller at all.
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&(&ctx->completion_lock)->rlock);
           <Interrupt>
             lock(&(&ctx->completion_lock)->rlock);
      
          *** DEADLOCK ***
      
            dump_stack+0x19/0x1b
            print_usage_bug+0x1f7/0x208
            mark_lock+0x21d/0x2a0
            mark_held_locks+0xb9/0x140
            trace_hardirqs_on_caller+0x105/0x1d0
            trace_hardirqs_on+0xd/0x10
            _raw_spin_unlock_irq+0x2c/0x50
            __set_page_dirty_nobuffers+0x8c/0xf0
            migrate_page_copy+0x434/0x540
            aio_migratepage+0xb1/0x140
            move_to_new_page+0x7d/0x230
            migrate_pages+0x5e5/0x700
            migrate_misplaced_page+0xbc/0xf0
            do_numa_page+0x102/0x190
            handle_pte_fault+0x241/0x970
            handle_mm_fault+0x265/0x370
            __do_page_fault+0x172/0x5a0
            do_page_fault+0x1a/0x70
            page_fault+0x28/0x30
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d4bed81
    • Stephen Smalley's avatar
      SELinux: Fix kernel BUG on empty security contexts. · a0f916d4
      Stephen Smalley authored
      commit 2172fa70 upstream.
      
      Setting an empty security context (length=0) on a file will
      lead to incorrectly dereferencing the type and other fields
      of the security context structure, yielding a kernel BUG.
      As a zero-length security context is never valid, just reject
      all such security contexts whether coming from userspace
      via setxattr or coming from the filesystem upon a getxattr
      request by SELinux.
      
      Setting a security context value (empty or otherwise) unknown to
      SELinux in the first place is only possible for a root process
      (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
      if the corresponding SELinux mac_admin permission is also granted
      to the domain by policy.  In Fedora policies, this is only allowed for
      specific domains such as livecd for setting down security contexts
      that are not defined in the build host policy.
      
      Reproducer:
      su
      setenforce 0
      touch foo
      setfattr -n security.selinux foo
      
      Caveat:
      Relabeling or removing foo after doing the above may not be possible
      without booting with SELinux disabled.  Any subsequent access to foo
      after doing the above will also trigger the BUG.
      
      BUG output from Matthew Thode:
      [  473.893141] ------------[ cut here ]------------
      [  473.962110] kernel BUG at security/selinux/ss/services.c:654!
      [  473.995314] invalid opcode: 0000 [#6] SMP
      [  474.027196] Modules linked in:
      [  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
      3.13.0-grsec #1
      [  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
      07/29/10
      [  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
      ffff8805f50cd488
      [  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
      [  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
      0000000000000100
      [  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
      ffff8805e8aaa000
      [  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
      0000000000000006
      [  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
      0000000000000006
      [  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
      0000000000000000
      [  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
      knlGS:0000000000000000
      [  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
      00000000000207f0
      [  474.556058] Stack:
      [  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
      ffff8805f1190a40
      [  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
      ffff8805e8aac860
      [  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
      ffff8805c0ac3d94
      [  474.690461] Call Trace:
      [  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
      [  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
      [  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
      [  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
      [  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
      [  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
      [  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
      [  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
      [  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
      [  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
      [  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
      [  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
      [  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
      [  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
      8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
      75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
      [  475.255884] RIP  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  475.296120]  RSP <ffff8805c0ac3c38>
      [  475.328734] ---[ end trace f076482e9d754adc ]---
      Reported-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0f916d4
  2. 13 Feb, 2014 31 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.4.80 · a6d2ebcd
      Greg Kroah-Hartman authored
      a6d2ebcd
    • Colin Cross's avatar
      3.4.y: timekeeping: fix 32-bit overflow in get_monotonic_boottime · cd34de10
      Colin Cross authored
      fixed upstream in v3.6 by ec145bab
      
      get_monotonic_boottime adds three nanonsecond values stored
      in longs, followed by an s64.  If the long values are all
      close to 1e9 the first three additions can overflow and
      become negative when added to the s64.  Cast the first
      value to s64 so that all additions are 64 bit.
      Signed-off-by: default avatarColin Cross <ccross@android.com>
      [jstultz: Fished this out of the AOSP commong.git tree. This was
      fixed upstream in v3.6 by ec145bab]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd34de10
    • John Stultz's avatar
      timekeeping: Avoid possible deadlock from clock_was_set_delayed · cf85cc93
      John Stultz authored
      commit 6fdda9a9 upstream.
      
      As part of normal operaions, the hrtimer subsystem frequently calls
      into the timekeeping code, creating a locking order of
        hrtimer locks -> timekeeping locks
      
      clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
      between the timekeeping the hrtimer subsystem, so that we could
      notify the hrtimer subsytem the time had changed while holding
      the timekeeping locks. This was done by scheduling delayed work
      that would run later once we were out of the timekeeing code.
      
      But unfortunately the lock chains are complex enoguh that in
      scheduling delayed work, we end up eventually trying to grab
      an hrtimer lock.
      
      Sasha Levin noticed this in testing when the new seqlock lockdep
      enablement triggered the following (somewhat abrieviated) message:
      
      [  251.100221] ======================================================
      [  251.100221] [ INFO: possible circular locking dependency detected ]
      [  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
      [  251.101967] -------------------------------------------------------
      [  251.101967] kworker/10:1/4506 is trying to acquire lock:
      [  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
      [  251.101967]
      [  251.101967] but task is already holding lock:
      [  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] which lock already depends on the new lock.
      [  251.101967]
      [  251.101967]
      [  251.101967] the existing dependency chain (in reverse order) is:
      [  251.101967]
      -> #5 (hrtimer_bases.lock#11){-.-...}:
      [snipped]
      -> #4 (&rt_b->rt_runtime_lock){-.-...}:
      [snipped]
      -> #3 (&rq->lock){-.-.-.}:
      [snipped]
      -> #2 (&p->pi_lock){-.-.-.}:
      [snipped]
      -> #1 (&(&pool->lock)->rlock){-.-...}:
      [  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
      [  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
      [  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
      [  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
      [  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
      [  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
      [  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
      [  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
      [  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
      [  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
      [  251.101967]
      -> #0 (timekeeper_seq){----..}:
      [snipped]
      [  251.101967] other info that might help us debug this:
      [  251.101967]
      [  251.101967] Chain exists of:
        timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
      
      [  251.101967]  Possible unsafe locking scenario:
      [  251.101967]
      [  251.101967]        CPU0                    CPU1
      [  251.101967]        ----                    ----
      [  251.101967]   lock(hrtimer_bases.lock#11);
      [  251.101967]                                lock(&rt_b->rt_runtime_lock);
      [  251.101967]                                lock(hrtimer_bases.lock#11);
      [  251.101967]   lock(timekeeper_seq);
      [  251.101967]
      [  251.101967]  *** DEADLOCK ***
      [  251.101967]
      [  251.101967] 3 locks held by kworker/10:1/4506:
      [  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] stack backtrace:
      [  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
      [  251.101967] Workqueue: events clock_was_set_work
      
      So the best solution is to avoid calling clock_was_set_delayed() while
      holding the timekeeping lock, and instead using a flag variable to
      decide if we should call clock_was_set() once we've released the locks.
      
      This works for the case here, where the do_adjtimex() was the deadlock
      trigger point. Unfortuantely, in update_wall_time() we still hold
      the jiffies lock, which would deadlock with the ipi triggered by
      clock_was_set(), preventing us from calling it even after we drop the
      timekeeping lock. So instead call clock_was_set_delayed() at that point.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      cf85cc93
    • Borislav Petkov's avatar
      rtc-cmos: Add an alarm disable quirk · ab99a94d
      Borislav Petkov authored
      commit d5a1c7e3 upstream.
      
      41c7f742 ("rtc: Disable the alarm in the hardware (v2)") added the
      functionality to disable the RTC wake alarm when shutting down the box.
      
      However, there are at least two b0rked BIOSes we know about:
      
      https://bugzilla.novell.com/show_bug.cgi?id=812592
      https://bugzilla.novell.com/show_bug.cgi?id=805740
      
      where, when wakeup alarm is enabled in the BIOS, the machine reboots
      automatically right after shutdown, regardless of what wakeup time is
      programmed.
      
      Bisecting the issue lead to this patch so disable its functionality with
      a DMI quirk only for those boxes.
      
      Cc: Brecht Machiels <brecht@mos6581.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Rabin Vincent <rabin.vincent@stericsson.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      [jstultz: Changed variable name for clarity, added extra dmi entry]
      Tested-by: default avatarBrecht Machiels <brecht@mos6581.org>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab99a94d
    • Ying Xue's avatar
      sched/rt: Avoid updating RT entry timeout twice within one tick period · dbf32394
      Ying Xue authored
      commit 57d2aa00 upstream.
      
      The issue below was found in 2.6.34-rt rather than mainline rt
      kernel, but the issue still exists upstream as well.
      
      So please let me describe how it was noticed on 2.6.34-rt:
      
      On this version, each softirq has its own thread, it means there
      is at least one RT FIFO task per cpu. The priority of these
      tasks is set to 49 by default. If user launches an RT FIFO task
      with priority lower than 49 of softirq RT tasks, it's possible
      there are two RT FIFO tasks enqueued one cpu runqueue at one
      moment. By current strategy of balancing RT tasks, when it comes
      to RT tasks, we really need to put them off to a CPU that they
      can run on as soon as possible. Even if it means a bit of cache
      line flushing, we want RT tasks to be run with the least latency.
      
      When the user RT FIFO task which just launched before is
      running, the sched timer tick of the current cpu happens. In this
      tick period, the timeout value of the user RT task will be
      updated once. Subsequently, we try to wake up one softirq RT
      task on its local cpu. As the priority of current user RT task
      is lower than the softirq RT task, the current task will be
      preempted by the higher priority softirq RT task. Before
      preemption, we check to see if current can readily move to a
      different cpu. If so, we will reschedule to allow the RT push logic
      to try to move current somewhere else. Whenever the woken
      softirq RT task runs, it first tries to migrate the user FIFO RT
      task over to a cpu that is running a task of lesser priority. If
      migration is done, it will send a reschedule request to the found
      cpu by IPI interrupt. Once the target cpu responds the IPI
      interrupt, it will pick the migrated user RT task to preempt its
      current task. When the user RT task is running on the new cpu,
      the sched timer tick of the cpu fires. So it will tick the user
      RT task again. This also means the RT task timeout value will be
      updated again. As the migration may be done in one tick period,
      it means the user RT task timeout value will be updated twice
      within one tick.
      
      If we set a limit on the amount of cpu time for the user RT task
      by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted
      upon reaching the soft limit.
      
      But exactly when the SIGXCPU signal should be sent depends on the
      RT task timeout value. In fact the timeout mechanism of sending
      the SIGXCPU signal assumes the RT task timeout is increased once
      every tick.
      
      However, currently the timeout value may be added twice per
      tick. So it results in the SIGXCPU signal being sent earlier
      than expected.
      
      To solve this issue, we prevent the timeout value from increasing
      twice within one tick time by remembering the jiffies value of
      last updating the timeout. As long as the RT task's jiffies is
      different with the global jiffies value, we allow its timeout to
      be updated.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Reviewed-by: default avatarYong Zhang <yong.zhang0@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ lizf: backported to 3.4: adjust context ]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbf32394
    • Peter Boonstoppel's avatar
      sched: Unthrottle rt runqueues in __disable_runtime() · f61eb9ce
      Peter Boonstoppel authored
      commit a4c96ae3 upstream.
      
      migrate_tasks() uses _pick_next_task_rt() to get tasks from the
      real-time runqueues to be migrated. When rt_rq is throttled
      _pick_next_task_rt() won't return anything, in which case
      migrate_tasks() can't move all threads over and gets stuck in an
      infinite loop.
      
      Instead unthrottle rt runqueues before migrating tasks.
      
      Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()
      Signed-off-by: default avatarPeter Boonstoppel <pboonstoppel@nvidia.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ lizf: backported to 3.4: adjust context ]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f61eb9ce
    • Mike Galbraith's avatar
      sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled · 1e5c13ec
      Mike Galbraith authored
      commit e221d028 upstream.
      
      Root task group bandwidth replenishment must service all CPUs, regardless of
      where the timer was last started, and regardless of the isolation mechanism,
      lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e5c13ec
    • Colin Cross's avatar
      sched/rt: Fix SCHED_RR across cgroups · 21b53baf
      Colin Cross authored
      commit 454c7999 upstream.
      
      task_tick_rt() has an optimization to only reschedule SCHED_RR tasks
      if they were the only element on their rq.  However, with cgroups
      a SCHED_RR task could be the only element on its per-cgroup rq but
      still be competing with other SCHED_RR tasks in its parent's
      cgroup.  In this case, the SCHED_RR task in the child cgroup would
      never yield at the end of its timeslice.  If the child cgroup
      rt_runtime_us was the same as the parent cgroup rt_runtime_us,
      the task in the parent cgroup would starve completely.
      
      Modify task_tick_rt() to check that the task is the only task on its
      rq, and that the each of the scheduling entities of its ancestors
      is also the only entity on its rq.
      Signed-off-by: default avatarColin Cross <ccross@android.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1337229266-15798-1-git-send-email-ccross@android.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21b53baf
    • Al Viro's avatar
      hpfs: deadlock and race in directory lseek() · d5c20298
      Al Viro authored
      commit 31abdab9 upstream.
      
      For one thing, there's an ABBA deadlock on hpfs fs-wide lock and i_mutex
      in hpfs_dir_lseek() - there's a lot of methods that grab the former with
      the caller already holding the latter, so it must take i_mutex first.
      
      For another, locking the damn thing, carefully validating the offset,
      then dropping locks and assigning the offset is obviously racy.
      
      Moreover, we _must_ do hpfs_add_pos(), or the machinery in dnode.c
      won't modify the sucker on B-tree surgeries.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5c20298
    • Yijing Wang's avatar
      PCI: Enable ARI if dev and upstream bridge support it; disable otherwise · d7c16d1e
      Yijing Wang authored
      commit b0cc6020 upstream.
      
      Currently, we enable ARI in a device's upstream bridge if the bridge and
      the device support it.  But we never disable ARI, even if the device is
      removed and replaced with a device that doesn't support ARI.
      
      This means that if we hot-remove an ARI device and replace it with a
      non-ARI multi-function device, we find only function 0 of the new device
      because the upstream bridge still has ARI enabled, and next_ari_fn()
      only returns function 0 for the new non-ARI device.
      
      This patch disables ARI in the upstream bridge if the device doesn't
      support ARI.  See the PCIe spec, r3.0, sec 6.13.
      
      [bhelgaas: changelog, function comment]
      [yijing: replace PCIe Cap accessor with legacy PCI accessor]
      Signed-off-by: default avatarYijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      d7c16d1e
    • Alex Deucher's avatar
      drm/radeon/DCE4+: clear bios scratch dpms bit (v2) · 27fb12b9
      Alex Deucher authored
      commit 6802d4ba upstream.
      
      The BlankCrtc table in some DCE8 boards has some
      logic shortcuts for the vbios when this bit is set.
      Clear it for driver use.
      
      v2: fix typo
      
      Bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=73420Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27fb12b9
    • Alex Deucher's avatar
      drm/radeon: set the full cache bit for fences on r7xx+ · 00a67d1c
      Alex Deucher authored
      commit d45b964a upstream.
      
      Needed to properly flush the read caches for fences.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00a67d1c
    • Marek Olšák's avatar
      drm/radeon: skip colorbuffer checking if COLOR_INFO.FORMAT is set to INVALID · 790d8449
      Marek Olšák authored
      commit 56492e0f upstream.
      
      This fixes a bug which was causing rejections of valid GPU commands
      from userspace.
      Signed-off-by: default avatarMarek Olšák <marek.olsak@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      790d8449
    • Michel Dänzer's avatar
      radeon/pm: Guard access to rdev->pm.power_state array · e4496194
      Michel Dänzer authored
      commit 37016951 upstream.
      
      It's never allocated on systems without an ATOMBIOS or COMBIOS ROM.
      
      Should fix an oops I encountered while resetting the GPU after a lockup
      on my PowerBook with an RV350.
      Signed-off-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4496194
    • Alex Deucher's avatar
      drm/radeon: warn users when hw_i2c is enabled (v2) · aea570ea
      Alex Deucher authored
      commit d1951782 upstream.
      
      The hw i2c engines are disabled by default as the
      current implementation is still experimental.  Print
      a warning when users enable it so that it's obvious
      when the option is enabled.
      
      v2: check for non-0 rather than 1
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aea570ea
    • Joe Thornber's avatar
      dm space map common: make sure new space is used during extend · 11690e14
      Joe Thornber authored
      commit 12c91a5c upstream.
      
      When extending a low level space map we should update nr_blocks at
      the start so the new space is used for the index entries.
      
      Otherwise extend can fail, e.g.: sm_metadata_extend call sequence
      that fails:
       -> sm_ll_extend
          -> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block
          => returns -ENOSPC because smm->begin == smm->ll.nr_blocks
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11690e14
    • Mikulas Patocka's avatar
      dm: wait until embedded kobject is released before destroying a device · 39bbeb69
      Mikulas Patocka authored
      commit be35f486 upstream.
      
      There may be other parts of the kernel holding a reference on the dm
      kobject.  We must wait until all references are dropped before
      deallocating the mapped_device structure.
      
      The dm_kobject_release method signals that all references are dropped
      via completion.  But dm_kobject_release doesn't free the kobject (which
      is embedded in the mapped_device structure).
      
      This is the sequence of operations:
      * when destroying a DM device, call kobject_put from dm_sysfs_exit
      * wait until all users stop using the kobject, when it happens the
        release method is called
      * the release method signals the completion and should return without
        delay
      * the dm device removal code that waits on the completion continues
      * the dm device removal code drops the dm_mod reference the device had
      * the dm device removal code frees the mapped_device structure that
        contains the kobject
      
      Using kobject this way should avoid the module unload race that was
      mentioned at the beginning of this thread:
      https://lkml.org/lkml/2014/1/4/83Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39bbeb69
    • Weston Andros Adamson's avatar
      sunrpc: Fix infinite loop in RPC state machine · b0c0d5a3
      Weston Andros Adamson authored
      commit 6ff33b7d upstream.
      
      When a task enters call_refreshresult with status 0 from call_refresh and
      !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting
      or max number of retries.
      
      Instead of trying forever, make use of the retry path that other errors use.
      
      This only seems to be possible when the crrefresh callback is gss_refresh_null,
      which only happens when destroying the context.
      
      To reproduce:
      
      1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific
         operations).
      
      2) reboot - the client will be stuck and will need to be hard rebooted
      
      BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46]
      Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy
      irq event stamp: 195724
      hardirqs last  enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30
      hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80
      softirqs last  enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276
      softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a
      CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4
      Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
      Workqueue: rpciod rpc_async_schedule [sunrpc]
      task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000
      RIP: 0010:[<ffffffffa0064fd4>]  [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc]
      RSP: 0018:ffff880079003d18  EFLAGS: 00000246
      RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007
      RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900
      RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7
      R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e
      R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900
      FS:  0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0
      Stack:
       ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830
       ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260
       ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000
      Call Trace:
       [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1
       [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc]
       [<ffffffff81052974>] process_one_work+0x211/0x3a5
       [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5
       [<ffffffff81052eeb>] worker_thread+0x134/0x202
       [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
       [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
       [<ffffffff810584a0>] kthread+0xc9/0xd1
       [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
       [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0
       [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
      Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00
      
      And the output of "rpcdebug -m rpc -s all":
      
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refresh (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      Signed-off-by: default avatarWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0c0d5a3
    • Weston Andros Adamson's avatar
      nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME · b5f0608e
      Weston Andros Adamson authored
      commit 78b19bae upstream.
      
      Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP
      by nfs4_stat_to_errno.
      
      This allows the client to mount v4.1 servers that don't support
      SECINFO_NO_NAME by falling back to the "guess and check" method of
      nfs4_find_root_sec.
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5f0608e
    • Trond Myklebust's avatar
      NFSv4: OPEN must handle the NFS4ERR_IO return code correctly · 3c16dfe2
      Trond Myklebust authored
      commit c7848f69 upstream.
      
      decode_op_hdr() cannot distinguish between an XDR decoding error and
      the perfectly valid errorcode NFS4ERR_IO. This is normally not a
      problem, but for the particular case of OPEN, we need to be able
      to increment the NFSv4 open sequence id when the server returns
      a valid response.
      Reported-by: default avatarJ Bruce Fields <bfields@fieldses.org>
      Link: http://lkml.kernel.org/r/20131204210356.GA19452@fieldses.orgSigned-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c16dfe2
    • Daniel Santos's avatar
      spidev: fix hang when transfer_one_message fails · 8b939c23
      Daniel Santos authored
      commit e120cc0d upstream.
      
      This corrects a problem in spi_pump_messages() that leads to an spi
      message hanging forever when a call to transfer_one_message() fails.
      This failure occurs in my MCP2210 driver when the cs_change bit is set
      on the last transfer in a message, an operation which the hardware does
      not support.
      
      Rationale
      Since the transfer_one_message() returns an int, we must presume that it
      may fail.  If transfer_one_message() should never fail, it should return
      void.  Thus, calls to transfer_one_message() should properly manage a
      failure.
      
      Fixes: ffbbdd21 (spi: create a message queueing infrastructure)
      Signed-off-by: default avatarDaniel Santos <daniel.santos@pobox.com>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b939c23
    • Ira Weiny's avatar
      IB/qib: Fix QP check when looping back to/from QP1 · 2332020b
      Ira Weiny authored
      commit 6e0ea9e6 upstream.
      
      The GSI QP type is compatible with and should be allowed to send data
      to/from any UD QP.  This was found when testing ibacm on the same node
      as an SA.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2332020b
    • Boaz Harrosh's avatar
      ore: Fix wrong math in allocation of per device BIO · 0cc70c5e
      Boaz Harrosh authored
      commit aad560b7 upstream.
      
      At IO preparation we calculate the max pages at each device and
      allocate a BIO per device of that size. The calculation was wrong
      on some unaligned corner cases offset/length combination and would
      make prepare return with -ENOMEM. This would be bad for pnfs-objects
      that would in that case IO through MDS. And fatal for exofs were it
      would fail writes with EIO.
      
      Fix it by doing the proper math, that will work in all cases. (I
      ran a test with all possible offset/length combinations this time
      round).
      
      Also when reading we do not need to allocate for the parity units
      since we jump over them.
      
      Also lower the max_io_length to take into account the parity pages
      so not to allocate BIOs bigger than PAGE_SIZE
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cc70c5e
    • Michael Grzeschik's avatar
      mtd: mxc_nand: remove duplicated ecc_stats counting · 0b909374
      Michael Grzeschik authored
      commit 05664777 upstream.
      
      The ecc_stats.corrected count variable will already be incremented in
      the above framework-layer just after this callback.
      Signed-off-by: default avatarMichael Grzeschik <m.grzeschik@pengutronix.de>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b909374
    • Mark Brown's avatar
      ACPI / init: Flag use of ACPI and ACPI idioms for power supplies to regulator API · 3111943e
      Mark Brown authored
      commit 49a12877 upstream.
      
      There is currently no facility in ACPI to express the hookup of voltage
      regulators, the expectation is that the regulators that exist in the
      system will be handled transparently by firmware if they need software
      control at all. This means that if for some reason the regulator API is
      enabled on such a system it should assume that any supplies that devices
      need are provided by the system at all relevant times without any software
      intervention.
      
      Tell the regulator core to make this assumption by calling
      regulator_has_full_constraints(). Do this as soon as we know we are using
      ACPI so that the information is available to the regulator core as early
      as possible. This will cause the regulator core to pretend that there is
      an always on regulator supplying any supply that is requested but that has
      not otherwise been mapped which is the behaviour expected on a system with
      ACPI.
      
      Should the ability to specify regulators be added in future revisions of
      ACPI then once we have support for ACPI mappings in the kernel the same
      assumptions will apply. It is also likely that systems will default to a
      mode of operation which does not require any interpretation of these
      mappings in order to be compatible with existing operating system releases
      so it should remain safe to make these assumptions even if the mappings
      exist but are not supported by the kernel.
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3111943e
    • Josh Triplett's avatar
      turbostat: Use GCC's CPUID functions to support PIC · 4c6544f6
      Josh Triplett authored
      commit 2b92865e upstream.
      
      turbostat uses inline assembly to call cpuid.  On 32-bit x86, on systems
      that have certain security features enabled by default that make -fPIC
      the default, this causes a build error:
      
      turbostat.c: In function ‘check_cpuid’:
      turbostat.c:1906:2: error: PIC register clobbered by ‘ebx’ in ‘asm’
        asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
        ^
      
      GCC provides a header cpuid.h, containing a __get_cpuid function that
      works with both PIC and non-PIC.  (On PIC, it saves and restores ebx
      around the cpuid instruction.)  Use that instead.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c6544f6
    • Li Zefan's avatar
      slub: Fix calculation of cpu slabs · b06c0a0c
      Li Zefan authored
      commit 8afb1474 upstream.
      
        /sys/kernel/slab/:t-0000048 # cat cpu_slabs
        231 N0=16 N1=215
        /sys/kernel/slab/:t-0000048 # cat slabs
        145 N0=36 N1=109
      
      See, the number of slabs is smaller than that of cpu slabs.
      
      The bug was introduced by commit 49e22585
      ("slub: per cpu cache for partial pages").
      
      We should use page->pages instead of page->pobjects when calculating
      the number of cpu partial slabs. This also fixes the mapping of slabs
      and nodes.
      
      As there's no variable storing the number of total/active objects in
      cpu partial slabs, and we don't have user interfaces requiring those
      statistics, I just add WARN_ON for those cases.
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b06c0a0c
    • Ludovic Desroches's avatar
      mmc: atmel-mci: fix timeout errors in SDIO mode when using DMA · 6fadde4e
      Ludovic Desroches authored
      commit 66b512ed upstream.
      
      With some SDIO devices, timeout errors can happen when reading data.
      To solve this issue, the DMA transfer has to be activated before sending
      the command to the device. This order is incorrect in PDC mode. So we
      have to take care if we are using DMA or PDC to know when to send the
      MMC command.
      Signed-off-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fadde4e
    • AKASHI Takahiro's avatar
      audit: correct a type mismatch in audit_syscall_exit() · cd1188e7
      AKASHI Takahiro authored
      commit 06bdadd7 upstream.
      
      audit_syscall_exit() saves a result of regs_return_value() in intermediate
      "int" variable and passes it to __audit_syscall_exit(), which expects its
      second argument as a "long" value.  This will result in truncating the
      value returned by a system call and making a wrong audit record.
      
      I don't know why gcc compiler doesn't complain about this, but anyway it
      causes a problem at runtime on arm64 (and probably most 64-bit archs).
      Signed-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd1188e7
    • Alex Williamson's avatar
      intel-iommu: fix off-by-one in pagetable freeing · 55931654
      Alex Williamson authored
      commit 08336fd2 upstream.
      
      dma_pte_free_level() has an off-by-one error when checking whether a pte
      is completely covered by a range.  Take for example the case of
      attempting to free pfn 0x0 - 0x1ff, ie.  512 entries covering the first
      2M superpage.
      
      The level_size() is 0x200 and we test:
      
        static void dma_pte_free_level(...
      	...
      
      	if (!(0 > 0 || 0x1ff < 0 + 0x200)) {
      		...
      	}
      
      Clearly the 2nd test is true, which means we fail to take the branch to
      clear and free the pagetable entry.  As a result, we're leaking
      pagetables and failing to install new pages over the range.
      
      This was found with a PCI device assigned to a QEMU guest using vfio-pci
      without a VGA device present.  The first 1M of guest address space is
      mapped with various combinations of 4K pages, but eventually the range
      is entirely freed and replaced with a 2M contiguous mapping.
      intel-iommu errors out with something like:
      
        ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083)
      
      In this case 5c2b8003 is the pointer to the previous leaf page that was
      neither freed nor cleared and 849c00083 is the superpage entry that
      we're trying to replace it with.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55931654
    • Tetsuo Handa's avatar
      SELinux: Fix memory leak upon loading policy · ef609edc
      Tetsuo Handa authored
      commit 8ed81460 upstream.
      
      Hello.
      
      I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .
      
      [  681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      Below is a patch, but I don't know whether we need special handing for undoing
      ebitmap_set_bit() call.
      ----------
      >>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
      From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Date: Mon, 6 Jan 2014 16:30:21 +0900
      Subject: SELinux: Fix memory leak upon loading policy
      
      Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not
      check return value from hashtab_insert() in filename_trans_read(). It leaks
      memory if hashtab_insert() returns error.
      
        unreferenced object 0xffff88005c9160d0 (size 8):
          comm "systemd", pid 1, jiffies 4294688674 (age 235.265s)
          hex dump (first 8 bytes):
            57 0b 00 00 6b 6b 6b a5                          W...kkk.
          backtrace:
            [<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0
            [<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360
            [<ffffffff812aec5d>] policydb_read+0xd1d/0xf70
            [<ffffffff812b345c>] security_load_policy+0x6c/0x500
            [<ffffffff812a623c>] sel_write_load+0xac/0x750
            [<ffffffff811eb680>] vfs_write+0xc0/0x1f0
            [<ffffffff811ec08c>] SyS_write+0x4c/0xa0
            [<ffffffff81690419>] system_call_fastpath+0x16/0x1b
            [<ffffffffffffffff>] 0xffffffffffffffff
      
      However, we should not return EEXIST error to the caller, or the systemd will
      show below message and the boot sequence freezes.
      
        systemd[1]: Failed to load SELinux policy. Freezing.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef609edc
  3. 06 Feb, 2014 6 commits