Commit 7cf9c2c7 authored by Nick Piggin's avatar Nick Piggin Committed by Linus Torvalds

[PATCH] radix-tree: RCU lockless readside

Make radix tree lookups safe to be performed without locks.  Readers are
protected against nodes being deleted by using RCU based freeing.  Readers
are protected against new node insertion by using memory barriers to ensure
the node itself will be properly written before it is visible in the radix
tree.

Each radix tree node keeps a record of their height (above leaf nodes).
This height does not change after insertion -- when the radix tree is
extended, higher nodes are only inserted in the top.  So a lookup can take
the pointer to what is *now* the root node, and traverse down it even if
the tree is concurrently extended and this node becomes a subtree of a new
root.

"Direct" pointers (tree height of 0, where root->rnode points directly to
the data item) are handled by using the low bit of the pointer to signal
whether rnode is a direct pointer or a pointer to a radix tree node.

When a reader wants to traverse the next branch, they will take a copy of
the pointer.  This pointer will be either NULL (and the branch is empty) or
non-NULL (and will point to a valid node).

[akpm@osdl.org: cleanups]
[Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
[clameter@sgi.com: build fix]
Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent 36de6437
/* /*
* Copyright (C) 2001 Momchil Velikov * Copyright (C) 2001 Momchil Velikov
* Portions Copyright (C) 2001 Christoph Hellwig * Portions Copyright (C) 2001 Christoph Hellwig
* Copyright (C) 2006 Nick Piggin
* *
* This program is free software; you can redistribute it and/or * This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as * modify it under the terms of the GNU General Public License as
...@@ -21,6 +22,35 @@ ...@@ -21,6 +22,35 @@
#include <linux/preempt.h> #include <linux/preempt.h>
#include <linux/types.h> #include <linux/types.h>
#include <linux/kernel.h>
#include <linux/rcupdate.h>
/*
* A direct pointer (root->rnode pointing directly to a data item,
* rather than another radix_tree_node) is signalled by the low bit
* set in the root->rnode pointer.
*
* In this case root->height is also NULL, but the direct pointer tests are
* needed for RCU lookups when root->height is unreliable.
*/
#define RADIX_TREE_DIRECT_PTR 1
static inline void *radix_tree_ptr_to_direct(void *ptr)
{
return (void *)((unsigned long)ptr | RADIX_TREE_DIRECT_PTR);
}
static inline void *radix_tree_direct_to_ptr(void *ptr)
{
return (void *)((unsigned long)ptr & ~RADIX_TREE_DIRECT_PTR);
}
static inline int radix_tree_is_direct_ptr(void *ptr)
{
return (int)((unsigned long)ptr & RADIX_TREE_DIRECT_PTR);
}
/*** radix-tree API starts here ***/
#define RADIX_TREE_MAX_TAGS 2 #define RADIX_TREE_MAX_TAGS 2
...@@ -47,6 +77,77 @@ do { \ ...@@ -47,6 +77,77 @@ do { \
(root)->rnode = NULL; \ (root)->rnode = NULL; \
} while (0) } while (0)
/**
* Radix-tree synchronization
*
* The radix-tree API requires that users provide all synchronisation (with
* specific exceptions, noted below).
*
* Synchronization of access to the data items being stored in the tree, and
* management of their lifetimes must be completely managed by API users.
*
* For API usage, in general,
* - any function _modifying_ the the tree or tags (inserting or deleting
* items, setting or clearing tags must exclude other modifications, and
* exclude any functions reading the tree.
* - any function _reading_ the the tree or tags (looking up items or tags,
* gang lookups) must exclude modifications to the tree, but may occur
* concurrently with other readers.
*
* The notable exceptions to this rule are the following functions:
* radix_tree_lookup
* radix_tree_tag_get
* radix_tree_gang_lookup
* radix_tree_gang_lookup_tag
* radix_tree_tagged
*
* The first 4 functions are able to be called locklessly, using RCU. The
* caller must ensure calls to these functions are made within rcu_read_lock()
* regions. Other readers (lock-free or otherwise) and modifications may be
* running concurrently.
*
* It is still required that the caller manage the synchronization and lifetimes
* of the items. So if RCU lock-free lookups are used, typically this would mean
* that the items have their own locks, or are amenable to lock-free access; and
* that the items are freed by RCU (or only freed after having been deleted from
* the radix tree *and* a synchronize_rcu() grace period).
*
* (Note, rcu_assign_pointer and rcu_dereference are not needed to control
* access to data items when inserting into or looking up from the radix tree)
*
* radix_tree_tagged is able to be called without locking or RCU.
*/
/**
* radix_tree_deref_slot - dereference a slot
* @pslot: pointer to slot, returned by radix_tree_lookup_slot
* Returns: item that was stored in that slot with any direct pointer flag
* removed.
*
* For use with radix_tree_lookup_slot(). Caller must hold tree at least read
* locked across slot lookup and dereference. More likely, will be used with
* radix_tree_replace_slot(), as well, so caller will hold tree write locked.
*/
static inline void *radix_tree_deref_slot(void **pslot)
{
return radix_tree_direct_to_ptr(*pslot);
}
/**
* radix_tree_replace_slot - replace item in a slot
* @pslot: pointer to slot, returned by radix_tree_lookup_slot
* @item: new item to store in the slot.
*
* For use with radix_tree_lookup_slot(). Caller must hold tree write locked
* across slot lookup and replacement.
*/
static inline void radix_tree_replace_slot(void **pslot, void *item)
{
BUG_ON(radix_tree_is_direct_ptr(item));
rcu_assign_pointer(*pslot,
(void *)((unsigned long)item |
((unsigned long)*pslot & RADIX_TREE_DIRECT_PTR)));
}
int radix_tree_insert(struct radix_tree_root *, unsigned long, void *); int radix_tree_insert(struct radix_tree_root *, unsigned long, void *);
void *radix_tree_lookup(struct radix_tree_root *, unsigned long); void *radix_tree_lookup(struct radix_tree_root *, unsigned long);
void **radix_tree_lookup_slot(struct radix_tree_root *, unsigned long); void **radix_tree_lookup_slot(struct radix_tree_root *, unsigned long);
......
This diff is collapsed.
...@@ -294,7 +294,7 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, ...@@ -294,7 +294,7 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
static int migrate_page_move_mapping(struct address_space *mapping, static int migrate_page_move_mapping(struct address_space *mapping,
struct page *newpage, struct page *page) struct page *newpage, struct page *page)
{ {
struct page **radix_pointer; void **pslot;
if (!mapping) { if (!mapping) {
/* Anonymous page */ /* Anonymous page */
...@@ -305,12 +305,11 @@ static int migrate_page_move_mapping(struct address_space *mapping, ...@@ -305,12 +305,11 @@ static int migrate_page_move_mapping(struct address_space *mapping,
write_lock_irq(&mapping->tree_lock); write_lock_irq(&mapping->tree_lock);
radix_pointer = (struct page **)radix_tree_lookup_slot( pslot = radix_tree_lookup_slot(&mapping->page_tree,
&mapping->page_tree, page_index(page));
page_index(page));
if (page_count(page) != 2 + !!PagePrivate(page) || if (page_count(page) != 2 + !!PagePrivate(page) ||
*radix_pointer != page) { (struct page *)radix_tree_deref_slot(pslot) != page) {
write_unlock_irq(&mapping->tree_lock); write_unlock_irq(&mapping->tree_lock);
return -EAGAIN; return -EAGAIN;
} }
...@@ -318,7 +317,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, ...@@ -318,7 +317,7 @@ static int migrate_page_move_mapping(struct address_space *mapping,
/* /*
* Now we know that no one else is looking at the page. * Now we know that no one else is looking at the page.
*/ */
get_page(newpage); get_page(newpage); /* add cache reference */
#ifdef CONFIG_SWAP #ifdef CONFIG_SWAP
if (PageSwapCache(page)) { if (PageSwapCache(page)) {
SetPageSwapCache(newpage); SetPageSwapCache(newpage);
...@@ -326,8 +325,14 @@ static int migrate_page_move_mapping(struct address_space *mapping, ...@@ -326,8 +325,14 @@ static int migrate_page_move_mapping(struct address_space *mapping,
} }
#endif #endif
*radix_pointer = newpage; radix_tree_replace_slot(pslot, newpage);
/*
* Drop cache reference from old page.
* We know this isn't the last reference.
*/
__put_page(page); __put_page(page);
write_unlock_irq(&mapping->tree_lock); write_unlock_irq(&mapping->tree_lock);
return 0; return 0;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment