Commit d34883d4 authored by Xiao Guangrong's avatar Xiao Guangrong Committed by Linus Torvalds

mm: mmu_notifier: re-fix freed page still mapped in secondary MMU

Commit 751efd86 ("mmu_notifier_unregister NULL Pointer deref and
multiple ->release()") breaks the fix 3ad3d901 ("mm: mmu_notifier:
fix freed page still mapped in secondary MMU").

Since hlist_for_each_entry_rcu() is changed now, we can not revert that
patch directly, so this patch reverts the commit and simply fix the bug
spotted by that patch

This bug spotted by commit 751efd86 is:

    There is a race condition between mmu_notifier_unregister() and
    __mmu_notifier_release().

    Assume two tasks, one calling mmu_notifier_unregister() as a result
    of a filp_close() ->flush() callout (task A), and the other calling
    mmu_notifier_release() from an mmput() (task B).

                        A                               B
    t1                                            srcu_read_lock()
    t2            if (!hlist_unhashed())
    t3                                            srcu_read_unlock()
    t4            srcu_read_lock()
    t5                                            hlist_del_init_rcu()
    t6                                            synchronize_srcu()
    t7            srcu_read_unlock()
    t8            hlist_del_rcu()  <--- NULL pointer deref.

This can be fixed by using hlist_del_init_rcu instead of hlist_del_rcu.

The another issue spotted in the commit is "multiple ->release()
callouts", we needn't care it too much because it is really rare (e.g,
can not happen on kvm since mmu-notify is unregistered after
exit_mmap()) and the later call of multiple ->release should be fast
since all the pages have already been released by the first call.
Anyway, this issue should be fixed in a separate patch.

-stable suggestions: Any version that has commit 751efd86 need to be
backported.  I find the oldest version has this commit is 3.0-stable.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: default avatarRobin Holt <holt@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 4c663cfc
...@@ -40,48 +40,44 @@ void __mmu_notifier_release(struct mm_struct *mm) ...@@ -40,48 +40,44 @@ void __mmu_notifier_release(struct mm_struct *mm)
int id; int id;
/* /*
* srcu_read_lock() here will block synchronize_srcu() in * SRCU here will block mmu_notifier_unregister until
* mmu_notifier_unregister() until all registered * ->release returns.
* ->release() callouts this function makes have
* returned.
*/ */
id = srcu_read_lock(&srcu); id = srcu_read_lock(&srcu);
hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist)
/*
* If ->release runs before mmu_notifier_unregister it must be
* handled, as it's the only way for the driver to flush all
* existing sptes and stop the driver from establishing any more
* sptes before all the pages in the mm are freed.
*/
if (mn->ops->release)
mn->ops->release(mn, mm);
srcu_read_unlock(&srcu, id);
spin_lock(&mm->mmu_notifier_mm->lock); spin_lock(&mm->mmu_notifier_mm->lock);
while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) { while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
mn = hlist_entry(mm->mmu_notifier_mm->list.first, mn = hlist_entry(mm->mmu_notifier_mm->list.first,
struct mmu_notifier, struct mmu_notifier,
hlist); hlist);
/* /*
* Unlink. This will prevent mmu_notifier_unregister() * We arrived before mmu_notifier_unregister so
* from also making the ->release() callout. * mmu_notifier_unregister will do nothing other than to wait
* for ->release to finish and for mmu_notifier_unregister to
* return.
*/ */
hlist_del_init_rcu(&mn->hlist); hlist_del_init_rcu(&mn->hlist);
spin_unlock(&mm->mmu_notifier_mm->lock);
/*
* Clear sptes. (see 'release' description in mmu_notifier.h)
*/
if (mn->ops->release)
mn->ops->release(mn, mm);
spin_lock(&mm->mmu_notifier_mm->lock);
} }
spin_unlock(&mm->mmu_notifier_mm->lock); spin_unlock(&mm->mmu_notifier_mm->lock);
/* /*
* All callouts to ->release() which we have done are complete. * synchronize_srcu here prevents mmu_notifier_release from returning to
* Allow synchronize_srcu() in mmu_notifier_unregister() to complete * exit_mmap (which would proceed with freeing all pages in the mm)
*/ * until the ->release method returns, if it was invoked by
srcu_read_unlock(&srcu, id); * mmu_notifier_unregister.
*
/* * The mmu_notifier_mm can't go away from under us because one mm_count
* mmu_notifier_unregister() may have unlinked a notifier and may * is held by exit_mmap.
* still be calling out to it. Additionally, other notifiers
* may have been active via vmtruncate() et. al. Block here
* to ensure that all notifier callouts for this mm have been
* completed and the sptes are really cleaned up before returning
* to exit_mmap().
*/ */
synchronize_srcu(&srcu); synchronize_srcu(&srcu);
} }
...@@ -292,31 +288,34 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) ...@@ -292,31 +288,34 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
{ {
BUG_ON(atomic_read(&mm->mm_count) <= 0); BUG_ON(atomic_read(&mm->mm_count) <= 0);
spin_lock(&mm->mmu_notifier_mm->lock);
if (!hlist_unhashed(&mn->hlist)) { if (!hlist_unhashed(&mn->hlist)) {
/*
* SRCU here will force exit_mmap to wait for ->release to
* finish before freeing the pages.
*/
int id; int id;
id = srcu_read_lock(&srcu);
/* /*
* Ensure we synchronize up with __mmu_notifier_release(). * exit_mmap will block in mmu_notifier_release to guarantee
* that ->release is called before freeing the pages.
*/ */
id = srcu_read_lock(&srcu);
hlist_del_rcu(&mn->hlist);
spin_unlock(&mm->mmu_notifier_mm->lock);
if (mn->ops->release) if (mn->ops->release)
mn->ops->release(mn, mm); mn->ops->release(mn, mm);
srcu_read_unlock(&srcu, id);
spin_lock(&mm->mmu_notifier_mm->lock);
/* /*
* Allow __mmu_notifier_release() to complete. * Can not use list_del_rcu() since __mmu_notifier_release
* can delete it before we hold the lock.
*/ */
srcu_read_unlock(&srcu, id); hlist_del_init_rcu(&mn->hlist);
} else
spin_unlock(&mm->mmu_notifier_mm->lock); spin_unlock(&mm->mmu_notifier_mm->lock);
}
/* /*
* Wait for any running method to finish, including ->release() if it * Wait for any running method to finish, of course including
* was run by __mmu_notifier_release() instead of us. * ->release if it was run by mmu_notifier_relase instead of us.
*/ */
synchronize_srcu(&srcu); synchronize_srcu(&srcu);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment