1. 17 Jul, 2014 33 commits
  2. 16 Jul, 2014 6 commits
    • Thomas Gleixner's avatar
      rtmutex: Plug slow unlock race · 0c57bb13
      Thomas Gleixner authored
      commit 27e35715 upstream.
      
      When the rtmutex fast path is enabled the slow unlock function can
      create the following situation:
      
      spin_lock(foo->m->wait_lock);
      foo->m->owner = NULL;
      	    			rt_mutex_lock(foo->m); <-- fast path
      				free = atomic_dec_and_test(foo->refcnt);
      				rt_mutex_unlock(foo->m); <-- fast path
      				if (free)
      				   kfree(foo);
      
      spin_unlock(foo->m->wait_lock); <--- Use after free.
      
      Plug the race by changing the slow unlock to the following scheme:
      
           while (!rt_mutex_has_waiters(m)) {
           	    /* Clear the waiters bit in m->owner */
      	    clear_rt_mutex_waiters(m);
            	    owner = rt_mutex_owner(m);
            	    spin_unlock(m->wait_lock);
            	    if (cmpxchg(m->owner, owner, 0) == owner)
            	       return;
            	    spin_lock(m->wait_lock);
           }
      
      So in case of a new waiter incoming while the owner tries the slow
      path unlock we have two situations:
      
       unlock(wait_lock);
      					lock(wait_lock);
       cmpxchg(p, owner, 0) == owner
       	    	   			mark_rt_mutex_waiters(lock);
      	 				acquire(lock);
      
      Or:
      
       unlock(wait_lock);
      					lock(wait_lock);
      	 				mark_rt_mutex_waiters(lock);
       cmpxchg(p, owner, 0) != owner
      					enqueue_waiter();
      					unlock(wait_lock);
       lock(wait_lock);
       wakeup_next waiter();
       unlock(wait_lock);
      					lock(wait_lock);
      					acquire(lock);
      
      If the fast path is disabled, then the simple
      
         m->owner = NULL;
         unlock(m->wait_lock);
      
      is sufficient as all access to m->owner is serialized via
      m->wait_lock;
      
      Also document and clarify the wakeup_next_waiter function as suggested
      by Oleg Nesterov.
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0c57bb13
    • Thomas Gleixner's avatar
      rtmutex: Handle deadlock detection smarter · a2e64fcd
      Thomas Gleixner authored
      commit 3d5c9340 upstream.
      
      Even in the case when deadlock detection is not requested by the
      caller, we can detect deadlocks. Right now the code stops the lock
      chain walk and keeps the waiter enqueued, even on itself. Silly not to
      yell when such a scenario is detected and to keep the waiter enqueued.
      
      Return -EDEADLK unconditionally and handle it at the call sites.
      
      The futex calls return -EDEADLK. The non futex ones dequeue the
      waiter, throw a warning and put the task into a schedule loop.
      
      Tagged for stable as it makes the code more robust.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brad Mouring <bmouring@ni.com>
      Link: http://lkml.kernel.org/r/20140605152801.836501969@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a2e64fcd
    • Thomas Gleixner's avatar
      rtmutex: Detect changes in the pi lock chain · 7ae4100d
      Thomas Gleixner authored
      commit 82084984 upstream.
      
      When we walk the lock chain, we drop all locks after each step. So the
      lock chain can change under us before we reacquire the locks. That's
      harmless in principle as we just follow the wrong lock path. But it
      can lead to a false positive in the dead lock detection logic:
      
      T0 holds L0
      T0 blocks on L1 held by T1
      T1 blocks on L2 held by T2
      T2 blocks on L3 held by T3
      T4 blocks on L4 held by T4
      
      Now we walk the chain
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 ->
           lock T2 ->  adjust T2 ->  drop locks
      
      T2 times out and blocks on L0
      
      Now we continue:
      
      lock T2 -> lock L0 -> deadlock detected, but it's not a deadlock at all.
      
      Brad tried to work around that in the deadlock detection logic itself,
      but the more I looked at it the less I liked it, because it's crystal
      ball magic after the fact.
      
      We actually can detect a chain change very simple:
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
      
           next_lock = T2->pi_blocked_on->lock;
      
      drop locks
      
      T2 times out and blocks on L0
      
      Now we continue:
      
      lock T2 ->
      
           if (next_lock != T2->pi_blocked_on->lock)
           	   return;
      
      So if we detect that T2 is now blocked on a different lock we stop the
      chain walk. That's also correct in the following scenario:
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
      
           next_lock = T2->pi_blocked_on->lock;
      
      drop locks
      
      T3 times out and drops L3
      T2 acquires L3 and blocks on L4 now
      
      Now we continue:
      
      lock T2 ->
      
           if (next_lock != T2->pi_blocked_on->lock)
           	   return;
      
      We don't have to follow up the chain at that point, because T2
      propagated our priority up to T4 already.
      
      [ Folded a cleanup patch from peterz ]
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarBrad Mouring <bmouring@ni.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140605152801.930031935@linutronix.deSigned-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7ae4100d
    • Thomas Gleixner's avatar
      rtmutex: Fix deadlock detector for real · 9e30d699
      Thomas Gleixner authored
      commit 397335f0 upstream.
      
      The current deadlock detection logic does not work reliably due to the
      following early exit path:
      
      	/*
      	 * Drop out, when the task has no waiters. Note,
      	 * top_waiter can be NULL, when we are in the deboosting
      	 * mode!
      	 */
      	if (top_waiter && (!task_has_pi_waiters(task) ||
      			   top_waiter != task_top_pi_waiter(task)))
      		goto out_unlock_pi;
      
      So this not only exits when the task has no waiters, it also exits
      unconditionally when the current waiter is not the top priority waiter
      of the task.
      
      So in a nested locking scenario, it might abort the lock chain walk
      and therefor miss a potential deadlock.
      
      Simple fix: Continue the chain walk, when deadlock detection is
      enabled.
      
      We also avoid the whole enqueue, if we detect the deadlock right away
      (A-A). It's an optimization, but also prevents that another waiter who
      comes in after the detection and before the task has undone the damage
      observes the situation and detects the deadlock and returns
      -EDEADLOCK, which is wrong as the other task is not in a deadlock
      situation.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9e30d699
    • Florian Westphal's avatar
      netfilter: nf_nat: fix oops on netns removal · 26e4e17c
      Florian Westphal authored
      [ Upstream commit 945b2b2d ]
      
      Quoting Samu Kallio:
      
       Basically what's happening is, during netns cleanup,
       nf_nat_net_exit gets called before ipv4_net_exit. As I understand
       it, nf_nat_net_exit is supposed to kill any conntrack entries which
       have NAT context (through nf_ct_iterate_cleanup), but for some
       reason this doesn't happen (perhaps something else is still holding
       refs to those entries?).
      
       When ipv4_net_exit is called, conntrack entries (including those
       with NAT context) are cleaned up, but the
       nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The
       bug happens when attempting to free a conntrack entry whose NAT hash
       'prev' field points to a slot in the freed hash table (head for that
       bin).
      
      We ignore conntracks with null nat bindings.  But this is wrong,
      as these are in bysource hash table as well.
      
      Restore nat-cleaning for the netns-is-being-removed case.
      
      bug:
      https://bugzilla.kernel.org/show_bug.cgi?id=65191
      
      Cc: <stable@vger.kernel.org> # 3.15.x
      Cc: <stable@vger.kernel.org> # 3.14.x
      Cc: <stable@vger.kernel.org> # 3.12.x
      Cc: <stable@vger.kernel.org> # 3.10.x
      Fixes: c2d421e1 ('netfilter: nf_nat: fix race when unloading protocol modules')
      Reported-by: default avatarSamu Kallio <samu.kallio@aberdeencloud.com>
      Debugged-by: default avatarSamu Kallio <samu.kallio@aberdeencloud.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarSamu Kallio <samu.kallio@aberdeencloud.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      26e4e17c
    • Julian Anastasov's avatar
      ipvs: stop tot_stats estimator only under CONFIG_SYSCTL · f730d2ab
      Julian Anastasov authored
      [ Upstream commit 9802d21e ]
      
      The tot_stats estimator is started only when CONFIG_SYSCTL
      is defined. But it is stopped without checking CONFIG_SYSCTL.
      Fix the crash by moving ip_vs_stop_estimator into
      ip_vs_control_net_cleanup_sysctl.
      
      The change is needed after commit 14e40546
      ("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39.
      
      Cc: <stable@vger.kernel.org> # 3.15.x
      Cc: <stable@vger.kernel.org> # 3.14.x
      Cc: <stable@vger.kernel.org> # 3.12.x
      Cc: <stable@vger.kernel.org> # 3.10.x
      Cc: <stable@vger.kernel.org> # 3.2.x
      Reported-by: default avatarJet Chen <jet.chen@intel.com>
      Tested-by: default avatarJet Chen <jet.chen@intel.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Sgned-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f730d2ab
  3. 14 Jul, 2014 1 commit