Commit 3b4c7bc0 authored by Christian Brauner's avatar Christian Brauner Committed by Christian Brauner (Microsoft)

xattr: use rbtree for simple_xattrs

A while ago Vasily reported that it is possible to set a large number of
xattrs on inodes of filesystems that make use of the simple xattr
infrastructure. This includes all kernfs-based filesystems that support
xattrs (e.g., cgroupfs and tmpfs). Both cgroupfs and tmpfs can be
mounted by unprivileged users in unprivileged containers and root in an
unprivileged container can set an unrestricted number of security.*
xattrs and privileged users can also set unlimited trusted.* xattrs. As
there are apparently users that have a fairly large number of xattrs we
should scale a bit better. Other xattrs such as user.* are restricted
for kernfs-based instances to a fairly limited number.

Using a simple linked list protected by a spinlock used for set, get,
and list operations doesn't scale well if users use a lot of xattrs even
if it's not a crazy number. There's no need to bring in the big guns
like rhashtables or rw semaphores for this. An rbtree with a rwlock, or
limited rcu semanics and seqlock is enough.

It scales within the constraints we are working in. By far the most
common operation is getting an xattr. Setting xattrs should be a
moderately rare operation. And listxattr() often only happens when
copying xattrs between files or together with the contents to a new
file. Holding a lock across listxattr() is unproblematic because it
doesn't list the values of xattrs. It can only be used to list the names
of all xattrs set on a file. And the number of xattr names that can be
listed with listxattr() is limited to XATTR_LIST_MAX aka 65536 bytes. If
a larger buffer is passed then vfs_listxattr() caps it to XATTR_LIST_MAX
and if more xattr names are found it will return -E2BIG. In short, the
maximum amount of memory that can be retrieved via listxattr() is
limited.

Of course, the API is broken as documented on xattr(7) already. In the
future we might want to address this but for now this is the world we
live in and have lived for a long time. But it does indeed mean that
once an application goes over XATTR_LIST_MAX limit of xattrs set on an
inode it isn't possible to copy the file and include its xattrs in the
copy unless the caller knows all xattrs or limits the copy of the xattrs
to important ones it knows by name (At least for tmpfs, and kernfs-based
filesystems. Other filesystems might provide ways of achieving this.).

Bonus of this port to rbtree+rwlock is that we shrink the memory
consumption for users of the simple xattr infrastructure.

Also add proper kernel documentation to all the functions.
A big thanks to Paul for his comments.

Cc: Vasily Averin <vvs@openvz.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
Acked-by: default avatarPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
parent 9abf2313
...@@ -992,8 +992,29 @@ const char *xattr_full_name(const struct xattr_handler *handler, ...@@ -992,8 +992,29 @@ const char *xattr_full_name(const struct xattr_handler *handler,
} }
EXPORT_SYMBOL(xattr_full_name); EXPORT_SYMBOL(xattr_full_name);
/* /**
* Allocate new xattr and copy in the value; but leave the name to callers. * free_simple_xattr - free an xattr object
* @xattr: the xattr object
*
* Free the xattr object. Can handle @xattr being NULL.
*/
static inline void free_simple_xattr(struct simple_xattr *xattr)
{
if (xattr)
kfree(xattr->name);
kvfree(xattr);
}
/**
* simple_xattr_alloc - allocate new xattr object
* @value: value of the xattr object
* @size: size of @value
*
* Allocate a new xattr object and initialize respective members. The caller is
* responsible for handling the name of the xattr.
*
* Return: On success a new xattr object is returned. On failure NULL is
* returned.
*/ */
struct simple_xattr *simple_xattr_alloc(const void *value, size_t size) struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
{ {
...@@ -1014,20 +1035,69 @@ struct simple_xattr *simple_xattr_alloc(const void *value, size_t size) ...@@ -1014,20 +1035,69 @@ struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
return new_xattr; return new_xattr;
} }
/* /**
* xattr GET operation for in-memory/pseudo filesystems * rbtree_simple_xattr_cmp - compare xattr name with current rbtree xattr entry
* @key: xattr name
* @node: current node
*
* Compare the xattr name with the xattr name attached to @node in the rbtree.
*
* Return: Negative value if continuing left, positive if continuing right, 0
* if the xattr attached to @node matches @key.
*/
static int rbtree_simple_xattr_cmp(const void *key, const struct rb_node *node)
{
const char *xattr_name = key;
const struct simple_xattr *xattr;
xattr = rb_entry(node, struct simple_xattr, rb_node);
return strcmp(xattr->name, xattr_name);
}
/**
* rbtree_simple_xattr_node_cmp - compare two xattr rbtree nodes
* @new_node: new node
* @node: current node
*
* Compare the xattr attached to @new_node with the xattr attached to @node.
*
* Return: Negative value if continuing left, positive if continuing right, 0
* if the xattr attached to @new_node matches the xattr attached to @node.
*/
static int rbtree_simple_xattr_node_cmp(struct rb_node *new_node,
const struct rb_node *node)
{
struct simple_xattr *xattr;
xattr = rb_entry(new_node, struct simple_xattr, rb_node);
return rbtree_simple_xattr_cmp(xattr->name, node);
}
/**
* simple_xattr_get - get an xattr object
* @xattrs: the header of the xattr object
* @name: the name of the xattr to retrieve
* @buffer: the buffer to store the value into
* @size: the size of @buffer
*
* Try to find and retrieve the xattr object associated with @name.
* If @buffer is provided store the value of @xattr in @buffer
* otherwise just return the length. The size of @buffer is limited
* to XATTR_SIZE_MAX which currently is 65536.
*
* Return: On success the length of the xattr value is returned. On error a
* negative error code is returned.
*/ */
int simple_xattr_get(struct simple_xattrs *xattrs, const char *name, int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
void *buffer, size_t size) void *buffer, size_t size)
{ {
struct simple_xattr *xattr; struct simple_xattr *xattr = NULL;
struct rb_node *rbp;
int ret = -ENODATA; int ret = -ENODATA;
spin_lock(&xattrs->lock); read_lock(&xattrs->lock);
list_for_each_entry(xattr, &xattrs->head, list) { rbp = rb_find(name, &xattrs->rb_root, rbtree_simple_xattr_cmp);
if (strcmp(name, xattr->name)) if (rbp) {
continue; xattr = rb_entry(rbp, struct simple_xattr, rb_node);
ret = xattr->size; ret = xattr->size;
if (buffer) { if (buffer) {
if (size < xattr->size) if (size < xattr->size)
...@@ -1035,34 +1105,44 @@ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name, ...@@ -1035,34 +1105,44 @@ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
else else
memcpy(buffer, xattr->value, xattr->size); memcpy(buffer, xattr->value, xattr->size);
} }
break;
} }
spin_unlock(&xattrs->lock); read_unlock(&xattrs->lock);
return ret; return ret;
} }
/** /**
* simple_xattr_set - xattr SET operation for in-memory/pseudo filesystems * simple_xattr_set - set an xattr object
* @xattrs: target simple_xattr list * @xattrs: the header of the xattr object
* @name: name of the extended attribute * @name: the name of the xattr to retrieve
* @value: value of the xattr. If %NULL, will remove the attribute. * @value: the value to store along the xattr
* @size: size of the new xattr * @size: the size of @value
* @flags: %XATTR_{CREATE|REPLACE} * @flags: the flags determining how to set the xattr
* @removed_size: returns size of the removed xattr, -1 if none removed * @removed_size: the size of the removed xattr
*
* Set a new xattr object.
* If @value is passed a new xattr object will be allocated. If XATTR_REPLACE
* is specified in @flags a matching xattr object for @name must already exist.
* If it does it will be replaced with the new xattr object. If it doesn't we
* fail. If XATTR_CREATE is specified and a matching xattr does already exist
* we fail. If it doesn't we create a new xattr. If @flags is zero we simply
* insert the new xattr replacing any existing one.
*
* If @value is empty and a matching xattr object is found we delete it if
* XATTR_REPLACE is specified in @flags or @flags is zero.
* *
* %XATTR_CREATE is set, the xattr shouldn't exist already; otherwise fails * If @value is empty and no matching xattr object for @name is found we do
* with -EEXIST. If %XATTR_REPLACE is set, the xattr should exist; * nothing if XATTR_CREATE is specified in @flags or @flags is zero. For
* otherwise, fails with -ENODATA. * XATTR_REPLACE we fail as mentioned above.
* *
* Returns 0 on success, -errno on failure. * Return: On success zero and on error a negative error code is returned.
*/ */
int simple_xattr_set(struct simple_xattrs *xattrs, const char *name, int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
const void *value, size_t size, int flags, const void *value, size_t size, int flags,
ssize_t *removed_size) ssize_t *removed_size)
{ {
struct simple_xattr *xattr; struct simple_xattr *xattr = NULL, *new_xattr = NULL;
struct simple_xattr *new_xattr = NULL; struct rb_node *parent = NULL, **rbp;
int err = 0; int err = 0, ret;
if (removed_size) if (removed_size)
*removed_size = -1; *removed_size = -1;
...@@ -1075,42 +1155,68 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name, ...@@ -1075,42 +1155,68 @@ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
new_xattr->name = kstrdup(name, GFP_KERNEL); new_xattr->name = kstrdup(name, GFP_KERNEL);
if (!new_xattr->name) { if (!new_xattr->name) {
kvfree(new_xattr); free_simple_xattr(new_xattr);
return -ENOMEM; return -ENOMEM;
} }
} }
spin_lock(&xattrs->lock); write_lock(&xattrs->lock);
list_for_each_entry(xattr, &xattrs->head, list) { rbp = &xattrs->rb_root.rb_node;
if (!strcmp(name, xattr->name)) { while (*rbp) {
if (flags & XATTR_CREATE) { parent = *rbp;
xattr = new_xattr; ret = rbtree_simple_xattr_cmp(name, *rbp);
err = -EEXIST; if (ret < 0)
} else if (new_xattr) { rbp = &(*rbp)->rb_left;
list_replace(&xattr->list, &new_xattr->list); else if (ret > 0)
if (removed_size) rbp = &(*rbp)->rb_right;
*removed_size = xattr->size; else
} else { xattr = rb_entry(*rbp, struct simple_xattr, rb_node);
list_del(&xattr->list); if (xattr)
if (removed_size) break;
*removed_size = xattr->size;
}
goto out;
}
}
if (flags & XATTR_REPLACE) {
xattr = new_xattr;
err = -ENODATA;
} else {
list_add(&new_xattr->list, &xattrs->head);
xattr = NULL;
} }
out:
spin_unlock(&xattrs->lock);
if (xattr) { if (xattr) {
kfree(xattr->name); /* Fail if XATTR_CREATE is requested and the xattr exists. */
kvfree(xattr); if (flags & XATTR_CREATE) {
err = -EEXIST;
goto out_unlock;
}
if (new_xattr)
rb_replace_node(&xattr->rb_node, &new_xattr->rb_node,
&xattrs->rb_root);
else
rb_erase(&xattr->rb_node, &xattrs->rb_root);
if (!err && removed_size)
*removed_size = xattr->size;
} else {
/* Fail if XATTR_REPLACE is requested but no xattr is found. */
if (flags & XATTR_REPLACE) {
err = -ENODATA;
goto out_unlock;
}
/*
* If XATTR_CREATE or no flags are specified together with a
* new value simply insert it.
*/
if (new_xattr) {
rb_link_node(&new_xattr->rb_node, parent, rbp);
rb_insert_color(&new_xattr->rb_node, &xattrs->rb_root);
}
/*
* If XATTR_CREATE or no flags are specified and neither an
* old or new xattr exist then we don't need to do anything.
*/
} }
out_unlock:
write_unlock(&xattrs->lock);
if (err)
free_simple_xattr(new_xattr);
else
free_simple_xattr(xattr);
return err; return err;
} }
...@@ -1134,14 +1240,31 @@ static int xattr_list_one(char **buffer, ssize_t *remaining_size, ...@@ -1134,14 +1240,31 @@ static int xattr_list_one(char **buffer, ssize_t *remaining_size,
return 0; return 0;
} }
/* /**
* xattr LIST operation for in-memory/pseudo filesystems * simple_xattr_list - list all xattr objects
* @inode: inode from which to get the xattrs
* @xattrs: the header of the xattr object
* @buffer: the buffer to store all xattrs into
* @size: the size of @buffer
*
* List all xattrs associated with @inode. If @buffer is NULL we returned
* the required size of the buffer. If @buffer is provided we store the
* xattrs value into it provided it is big enough.
*
* Note, the number of xattr names that can be listed with listxattr(2) is
* limited to XATTR_LIST_MAX aka 65536 bytes. If a larger buffer is passed
* then vfs_listxattr() caps it to XATTR_LIST_MAX and if more xattr names
* are found it will return -E2BIG.
*
* Return: On success the required size or the size of the copied xattrs is
* returned. On error a negative error code is returned.
*/ */
ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs, ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs,
char *buffer, size_t size) char *buffer, size_t size)
{ {
bool trusted = capable(CAP_SYS_ADMIN); bool trusted = capable(CAP_SYS_ADMIN);
struct simple_xattr *xattr; struct simple_xattr *xattr;
struct rb_node *rbp;
ssize_t remaining_size = size; ssize_t remaining_size = size;
int err = 0; int err = 0;
...@@ -1162,8 +1285,10 @@ ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs, ...@@ -1162,8 +1285,10 @@ ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs,
} }
#endif #endif
spin_lock(&xattrs->lock); read_lock(&xattrs->lock);
list_for_each_entry(xattr, &xattrs->head, list) { for (rbp = rb_first(&xattrs->rb_root); rbp; rbp = rb_next(rbp)) {
xattr = rb_entry(rbp, struct simple_xattr, rb_node);
/* skip "trusted." attributes for unprivileged callers */ /* skip "trusted." attributes for unprivileged callers */
if (!trusted && xattr_is_trusted(xattr->name)) if (!trusted && xattr_is_trusted(xattr->name))
continue; continue;
...@@ -1172,18 +1297,76 @@ ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs, ...@@ -1172,18 +1297,76 @@ ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs,
if (err) if (err)
break; break;
} }
spin_unlock(&xattrs->lock); read_unlock(&xattrs->lock);
return err ? err : size - remaining_size; return err ? err : size - remaining_size;
} }
/* /**
* Adds an extended attribute to the list * rbtree_simple_xattr_less - compare two xattr rbtree nodes
* @new_node: new node
* @node: current node
*
* Compare the xattr attached to @new_node with the xattr attached to @node.
* Note that this function technically tolerates duplicate entries.
*
* Return: True if insertion point in the rbtree is found.
*/ */
void simple_xattr_list_add(struct simple_xattrs *xattrs, static bool rbtree_simple_xattr_less(struct rb_node *new_node,
struct simple_xattr *new_xattr) const struct rb_node *node)
{ {
spin_lock(&xattrs->lock); return rbtree_simple_xattr_node_cmp(new_node, node) < 0;
list_add(&new_xattr->list, &xattrs->head); }
spin_unlock(&xattrs->lock);
/**
* simple_xattr_add - add xattr objects
* @xattrs: the header of the xattr object
* @new_xattr: the xattr object to add
*
* Add an xattr object to @xattrs. This assumes no replacement or removal
* of matching xattrs is wanted. Should only be called during inode
* initialization when a few distinct initial xattrs are supposed to be set.
*/
void simple_xattr_add(struct simple_xattrs *xattrs,
struct simple_xattr *new_xattr)
{
write_lock(&xattrs->lock);
rb_add(&new_xattr->rb_node, &xattrs->rb_root, rbtree_simple_xattr_less);
write_unlock(&xattrs->lock);
}
/**
* simple_xattrs_init - initialize new xattr header
* @xattrs: header to initialize
*
* Initialize relevant fields of a an xattr header.
*/
void simple_xattrs_init(struct simple_xattrs *xattrs)
{
xattrs->rb_root = RB_ROOT;
rwlock_init(&xattrs->lock);
}
/**
* simple_xattrs_free - free xattrs
* @xattrs: xattr header whose xattrs to destroy
*
* Destroy all xattrs in @xattr. When this is called no one can hold a
* reference to any of the xattrs anymore.
*/
void simple_xattrs_free(struct simple_xattrs *xattrs)
{
struct rb_node *rbp;
rbp = rb_first(&xattrs->rb_root);
while (rbp) {
struct simple_xattr *xattr;
struct rb_node *rbp_next;
rbp_next = rb_next(rbp);
xattr = rb_entry(rbp, struct simple_xattr, rb_node);
rb_erase(&xattr->rb_node, &xattrs->rb_root);
free_simple_xattr(xattr);
rbp = rbp_next;
}
} }
...@@ -80,48 +80,28 @@ static inline const char *xattr_prefix(const struct xattr_handler *handler) ...@@ -80,48 +80,28 @@ static inline const char *xattr_prefix(const struct xattr_handler *handler)
} }
struct simple_xattrs { struct simple_xattrs {
struct list_head head; struct rb_root rb_root;
spinlock_t lock; rwlock_t lock;
}; };
struct simple_xattr { struct simple_xattr {
struct list_head list; struct rb_node rb_node;
char *name; char *name;
size_t size; size_t size;
char value[]; char value[];
}; };
/* void simple_xattrs_init(struct simple_xattrs *xattrs);
* initialize the simple_xattrs structure void simple_xattrs_free(struct simple_xattrs *xattrs);
*/
static inline void simple_xattrs_init(struct simple_xattrs *xattrs)
{
INIT_LIST_HEAD(&xattrs->head);
spin_lock_init(&xattrs->lock);
}
/*
* free all the xattrs
*/
static inline void simple_xattrs_free(struct simple_xattrs *xattrs)
{
struct simple_xattr *xattr, *node;
list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
kfree(xattr->name);
kvfree(xattr);
}
}
struct simple_xattr *simple_xattr_alloc(const void *value, size_t size); struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
int simple_xattr_get(struct simple_xattrs *xattrs, const char *name, int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
void *buffer, size_t size); void *buffer, size_t size);
int simple_xattr_set(struct simple_xattrs *xattrs, const char *name, int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
const void *value, size_t size, int flags, const void *value, size_t size, int flags,
ssize_t *removed_size); ssize_t *removed_size);
ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs, char *buffer, ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs,
size_t size); char *buffer, size_t size);
void simple_xattr_list_add(struct simple_xattrs *xattrs, void simple_xattr_add(struct simple_xattrs *xattrs,
struct simple_xattr *new_xattr); struct simple_xattr *new_xattr);
#endif /* _LINUX_XATTR_H */ #endif /* _LINUX_XATTR_H */
...@@ -3255,7 +3255,7 @@ static int shmem_initxattrs(struct inode *inode, ...@@ -3255,7 +3255,7 @@ static int shmem_initxattrs(struct inode *inode,
memcpy(new_xattr->name + XATTR_SECURITY_PREFIX_LEN, memcpy(new_xattr->name + XATTR_SECURITY_PREFIX_LEN,
xattr->name, len); xattr->name, len);
simple_xattr_list_add(&info->xattrs, new_xattr); simple_xattr_add(&info->xattrs, new_xattr);
} }
return 0; return 0;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment