Commit fa7773de authored by Jens Axboe's avatar Jens Axboe

Merge branch 'work.openat2' of...

Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into for-5.6/io_uring-vfs

Pull in Al's openat2 branch, since we'll need that for the openat2
support.

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
parents def9d278 b55eef87
......@@ -3302,7 +3302,9 @@ S: France
N: Aleksa Sarai
E: cyphar@cyphar.com
W: https://www.cyphar.com/
D: `pids` cgroup subsystem
D: /sys/fs/cgroup/pids
D: openat2(2)
S: Sydney, Australia
N: Dipankar Sarma
E: dipankar@in.ibm.com
......
......@@ -13,6 +13,7 @@ It has subsequently been updated to reflect changes in the kernel
including:
- per-directory parallel name lookup.
- ``openat2()`` resolution restriction flags.
Introduction to pathname lookup
===============================
......@@ -235,6 +236,13 @@ renamed. If ``d_lookup`` finds that a rename happened while it
unsuccessfully scanned a chain in the hash table, it simply tries
again.
``rename_lock`` is also used to detect and defend against potential attacks
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
the parent directory is moved outside the root, bypassing the ``path_equal()``
check). If ``rename_lock`` is updated during the lookup and the path encounters
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
``-EAGAIN``.
inode->i_rwsem
~~~~~~~~~~~~~~
......@@ -348,6 +356,13 @@ any changes to any mount points while stepping up. This locking is
needed to stabilize the link to the mounted-on dentry, which the
refcount on the mount itself doesn't ensure.
``mount_lock`` is also used to detect and defend against potential attacks
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
the parent directory is moved outside the root, bypassing the ``path_equal()``
check). If ``mount_lock`` is updated during the lookup and the path encounters
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
``-EAGAIN``.
RCU
~~~
......@@ -405,6 +420,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that
only one root is in effect for the entire path walk, even if it races
with a ``chroot()`` system call.
It should be noted that in the case of ``LOOKUP_IN_ROOT`` or
``LOOKUP_BENEATH``, the effective root becomes the directory file descriptor
passed to ``openat2()`` (which exposes these ``LOOKUP_`` flags).
The root is needed when either of two conditions holds: (1) either the
pathname or a symbolic link starts with a "'/'", or (2) a "``..``"
component is being handled, since "``..``" from the root must always stay
......@@ -1149,7 +1168,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and
the stack frame discarded.
The other case involves things in ``/proc`` that look like symlinks but
aren't really::
aren't really (and are therefore commonly referred to as "magic-links")::
$ ls -l /proc/self/fd/1
lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4
......@@ -1286,7 +1305,9 @@ A few flags
A suitable way to wrap up this tour of pathname walking is to list
the various flags that can be stored in the ``nameidata`` to guide the
lookup process. Many of these are only meaningful on the final
component, others reflect the current state of the pathname lookup.
component, others reflect the current state of the pathname lookup, and some
apply restrictions to all path components encountered in the path lookup.
And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with
the others. If this is not set, an empty pathname causes an error
very early on. If it is set, empty pathnames are not considered to be
......@@ -1310,13 +1331,48 @@ longer needed.
``LOOKUP_JUMPED`` means that the current dentry was chosen not because
it had the right name but for some other reason. This happens when
following "``..``", following a symlink to ``/``, crossing a mount point
or accessing a "``/proc/$PID/fd/$FD``" symlink. In this case the
filesystem has not been asked to revalidate the name (with
``d_revalidate()``). In such cases the inode may still need to be
revalidated, so ``d_op->d_weak_revalidate()`` is called if
or accessing a "``/proc/$PID/fd/$FD``" symlink (also known as a "magic
link"). In this case the filesystem has not been asked to revalidate the
name (with ``d_revalidate()``). In such cases the inode may still need
to be revalidated, so ``d_op->d_weak_revalidate()`` is called if
``LOOKUP_JUMPED`` is set when the look completes - which may be at the
final component or, when creating, unlinking, or renaming, at the penultimate component.
Resolution-restriction flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to allow userspace to protect itself against certain race conditions
and attack scenarios involving changing path components, a series of flags are
available which apply restrictions to all path components encountered during
path lookup. These flags are exposed through ``openat2()``'s ``resolve`` field.
``LOOKUP_NO_SYMLINKS`` blocks all symlink traversals (including magic-links).
This is distinctly different from ``LOOKUP_FOLLOW``, because the latter only
relates to restricting the following of trailing symlinks.
``LOOKUP_NO_MAGICLINKS`` blocks all magic-link traversals. Filesystems must
ensure that they return errors from ``nd_jump_link()``, because that is how
``LOOKUP_NO_MAGICLINKS`` and other magic-link restrictions are implemented.
``LOOKUP_NO_XDEV`` blocks all ``vfsmount`` traversals (this includes both
bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the
lookup is determined by the first mountpoint the path lookup reaches --
absolute paths start with the ``vfsmount`` of ``/``, and relative paths start
with the ``dfd``'s ``vfsmount``. Magic-links are only permitted if the
``vfsmount`` of the path is unchanged.
``LOOKUP_BENEATH`` blocks any path components which resolve outside the
starting point of the resolution. This is done by blocking ``nd_jump_root()``
as well as blocking ".." if it would jump outside the starting point.
``rename_lock`` and ``mount_lock`` are used to detect attacks against the
resolution of "..". Magic-links are also blocked.
``LOOKUP_IN_ROOT`` resolves all path components as though the starting point
were the filesystem root. ``nd_jump_root()`` brings the resolution back to to
the starting point, and ".." at the starting point will act as a no-op. As with
``LOOKUP_BENEATH``, ``rename_lock`` and ``mount_lock`` are used to detect
attacks against ".." resolution. Magic-links are also blocked.
Final-component flags
~~~~~~~~~~~~~~~~~~~~~
......
......@@ -6402,6 +6402,7 @@ F: fs/*
F: include/linux/fs.h
F: include/linux/fs_types.h
F: include/uapi/linux/fs.h
F: include/uapi/linux/openat2.h
FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER
M: Riku Voipio <riku.voipio@iki.fi>
......
......@@ -475,3 +475,4 @@
543 common fspick sys_fspick
544 common pidfd_open sys_pidfd_open
# 545 reserved for clone3
547 common openat2 sys_openat2
......@@ -449,3 +449,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3
437 common openat2 sys_openat2
......@@ -38,7 +38,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
#define __NR_compat_syscalls 436
#define __NR_compat_syscalls 438
#endif
#define __ARCH_WANT_SYS_CLONE
......
......@@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick)
__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
#define __NR_clone3 435
__SYSCALL(__NR_clone3, sys_clone3)
#define __NR_openat2 437
__SYSCALL(__NR_openat2, sys_openat2)
/*
* Please add new compat syscalls above this comment and update
......
......@@ -356,3 +356,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3
437 common openat2 sys_openat2
......@@ -435,3 +435,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3
437 common openat2 sys_openat2
......@@ -441,3 +441,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3
437 common openat2 sys_openat2
......@@ -374,3 +374,4 @@
433 n32 fspick sys_fspick
434 n32 pidfd_open sys_pidfd_open
435 n32 clone3 __sys_clone3
437 n32 openat2 sys_openat2
......@@ -350,3 +350,4 @@
433 n64 fspick sys_fspick
434 n64 pidfd_open sys_pidfd_open
435 n64 clone3 __sys_clone3
437 n64 openat2 sys_openat2
......@@ -423,3 +423,4 @@
433 o32 fspick sys_fspick
434 o32 pidfd_open sys_pidfd_open
435 o32 clone3 __sys_clone3
437 o32 openat2 sys_openat2
......@@ -433,3 +433,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3_wrapper
437 common openat2 sys_openat2
......@@ -517,3 +517,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 nospu clone3 ppc_clone3
437 common openat2 sys_openat2
......@@ -438,3 +438,4 @@
433 common fspick sys_fspick sys_fspick
434 common pidfd_open sys_pidfd_open sys_pidfd_open
435 common clone3 sys_clone3 sys_clone3
437 common openat2 sys_openat2 sys_openat2
......@@ -438,3 +438,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3
437 common openat2 sys_openat2
......@@ -481,3 +481,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3
437 common openat2 sys_openat2
......@@ -440,3 +440,4 @@
433 i386 fspick sys_fspick __ia32_sys_fspick
434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
435 i386 clone3 sys_clone3 __ia32_sys_clone3
437 i386 openat2 sys_openat2 __ia32_sys_openat2
......@@ -357,6 +357,7 @@
433 common fspick __x64_sys_fspick
434 common pidfd_open __x64_sys_pidfd_open
435 common clone3 __x64_sys_clone3/ptregs
437 common openat2 __x64_sys_openat2
#
# x32-specific system call numbers start at 512 to avoid cache impact
......
......@@ -406,3 +406,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3
437 common openat2 sys_openat2
......@@ -491,7 +491,7 @@ struct nameidata {
struct path root;
struct inode *inode; /* path.dentry.d_inode */
unsigned int flags;
unsigned seq, m_seq;
unsigned seq, m_seq, r_seq;
int last_type;
unsigned depth;
int total_link_count;
......@@ -641,6 +641,14 @@ static bool legitimize_links(struct nameidata *nd)
static bool legitimize_root(struct nameidata *nd)
{
/*
* For scoped-lookups (where nd->root has been zeroed), we need to
* restart the whole lookup from scratch -- because set_root() is wrong
* for these lookups (nd->dfd is the root, not the filesystem root).
*/
if (!nd->root.mnt && (nd->flags & LOOKUP_IS_SCOPED))
return false;
/* Nothing to do if nd->root is zero or is managed by the VFS user. */
if (!nd->root.mnt || (nd->flags & LOOKUP_ROOT))
return true;
nd->flags |= LOOKUP_ROOT_GRABBED;
......@@ -776,12 +784,37 @@ static int complete_walk(struct nameidata *nd)
int status;
if (nd->flags & LOOKUP_RCU) {
if (!(nd->flags & LOOKUP_ROOT))
/*
* We don't want to zero nd->root for scoped-lookups or
* externally-managed nd->root.
*/
if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED)))
nd->root.mnt = NULL;
if (unlikely(unlazy_walk(nd)))
return -ECHILD;
}
if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
/*
* While the guarantee of LOOKUP_IS_SCOPED is (roughly) "don't
* ever step outside the root during lookup" and should already
* be guaranteed by the rest of namei, we want to avoid a namei
* BUG resulting in userspace being given a path that was not
* scoped within the root at some point during the lookup.
*
* So, do a final sanity-check to make sure that in the
* worst-case scenario (a complete bypass of LOOKUP_IS_SCOPED)
* we won't silently return an fd completely outside of the
* requested root to userspace.
*
* Userspace could move the path outside the root after this
* check, but as discussed elsewhere this is not a concern (the
* resolved file was inside the root at some point).
*/
if (!path_is_under(&nd->path, &nd->root))
return -EXDEV;
}
if (likely(!(nd->flags & LOOKUP_JUMPED)))
return 0;
......@@ -798,10 +831,18 @@ static int complete_walk(struct nameidata *nd)
return status;
}
static void set_root(struct nameidata *nd)
static int set_root(struct nameidata *nd)
{
struct fs_struct *fs = current->fs;
/*
* Jumping to the real root in a scoped-lookup is a BUG in namei, but we
* still have to ensure it doesn't happen because it will cause a breakout
* from the dirfd.
*/
if (WARN_ON(nd->flags & LOOKUP_IS_SCOPED))
return -ENOTRECOVERABLE;
if (nd->flags & LOOKUP_RCU) {
unsigned seq;
......@@ -814,6 +855,7 @@ static void set_root(struct nameidata *nd)
get_fs_root(fs, &nd->root);
nd->flags |= LOOKUP_ROOT_GRABBED;
}
return 0;
}
static void path_put_conditional(struct path *path, struct nameidata *nd)
......@@ -837,6 +879,18 @@ static inline void path_to_nameidata(const struct path *path,
static int nd_jump_root(struct nameidata *nd)
{
if (unlikely(nd->flags & LOOKUP_BENEATH))
return -EXDEV;
if (unlikely(nd->flags & LOOKUP_NO_XDEV)) {
/* Absolute path arguments to path_init() are allowed. */
if (nd->path.mnt != NULL && nd->path.mnt != nd->root.mnt)
return -EXDEV;
}
if (!nd->root.mnt) {
int error = set_root(nd);
if (error)
return error;
}
if (nd->flags & LOOKUP_RCU) {
struct dentry *d;
nd->path = nd->root;
......@@ -859,14 +913,32 @@ static int nd_jump_root(struct nameidata *nd)
* Helper to directly jump to a known parsed path from ->get_link,
* caller must have taken a reference to path beforehand.
*/
void nd_jump_link(struct path *path)
int nd_jump_link(struct path *path)
{
int error = -ELOOP;
struct nameidata *nd = current->nameidata;
path_put(&nd->path);
if (unlikely(nd->flags & LOOKUP_NO_MAGICLINKS))
goto err;
error = -EXDEV;
if (unlikely(nd->flags & LOOKUP_NO_XDEV)) {
if (nd->path.mnt != path->mnt)
goto err;
}
/* Not currently safe for scoped-lookups. */
if (unlikely(nd->flags & LOOKUP_IS_SCOPED))
goto err;
path_put(&nd->path);
nd->path = *path;
nd->inode = nd->path.dentry->d_inode;
nd->flags |= LOOKUP_JUMPED;
return 0;
err:
path_put(path);
return error;
}
static inline void put_link(struct nameidata *nd)
......@@ -1049,6 +1121,9 @@ const char *get_link(struct nameidata *nd)
int error;
const char *res;
if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
return ERR_PTR(-ELOOP);
if (!(nd->flags & LOOKUP_RCU)) {
touch_atime(&last->link);
cond_resched();
......@@ -1083,10 +1158,9 @@ const char *get_link(struct nameidata *nd)
return res;
}
if (*res == '/') {
if (!nd->root.mnt)
set_root(nd);
if (unlikely(nd_jump_root(nd)))
return ERR_PTR(-ECHILD);
error = nd_jump_root(nd);
if (unlikely(error))
return ERR_PTR(error);
while (unlikely(*++res == '/'))
;
}
......@@ -1268,10 +1342,14 @@ static int follow_managed(struct path *path, struct nameidata *nd)
break;
}
if (need_mntput && path->mnt == mnt)
if (need_mntput) {
if (path->mnt == mnt)
mntput(path->mnt);
if (need_mntput)
if (unlikely(nd->flags & LOOKUP_NO_XDEV))
ret = -EXDEV;
else
nd->flags |= LOOKUP_JUMPED;
}
if (ret == -EISDIR || !ret)
ret = 1;
if (ret > 0 && unlikely(d_flags_negative(flags)))
......@@ -1332,6 +1410,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
mounted = __lookup_mnt(path->mnt, path->dentry);
if (!mounted)
break;
if (unlikely(nd->flags & LOOKUP_NO_XDEV))
return false;
path->mnt = &mounted->mnt;
path->dentry = mounted->mnt.mnt_root;
nd->flags |= LOOKUP_JUMPED;
......@@ -1352,8 +1432,11 @@ static int follow_dotdot_rcu(struct nameidata *nd)
struct inode *inode = nd->inode;
while (1) {
if (path_equal(&nd->path, &nd->root))
if (path_equal(&nd->path, &nd->root)) {
if (unlikely(nd->flags & LOOKUP_BENEATH))
return -ECHILD;
break;
}
if (nd->path.dentry != nd->path.mnt->mnt_root) {
struct dentry *old = nd->path.dentry;
struct dentry *parent = old->d_parent;
......@@ -1366,7 +1449,7 @@ static int follow_dotdot_rcu(struct nameidata *nd)
nd->path.dentry = parent;
nd->seq = seq;
if (unlikely(!path_connected(&nd->path)))
return -ENOENT;
return -ECHILD;
break;
} else {
struct mount *mnt = real_mount(nd->path.mnt);
......@@ -1378,6 +1461,8 @@ static int follow_dotdot_rcu(struct nameidata *nd)
return -ECHILD;
if (&mparent->mnt == nd->path.mnt)
break;
if (unlikely(nd->flags & LOOKUP_NO_XDEV))
return -ECHILD;
/* we know that mountpoint was pinned */
nd->path.dentry = mountpoint;
nd->path.mnt = &mparent->mnt;
......@@ -1392,6 +1477,8 @@ static int follow_dotdot_rcu(struct nameidata *nd)
return -ECHILD;
if (!mounted)
break;
if (unlikely(nd->flags & LOOKUP_NO_XDEV))
return -ECHILD;
nd->path.mnt = &mounted->mnt;
nd->path.dentry = mounted->mnt.mnt_root;
inode = nd->path.dentry->d_inode;
......@@ -1479,9 +1566,12 @@ static int path_parent_directory(struct path *path)
static int follow_dotdot(struct nameidata *nd)
{
while(1) {
if (path_equal(&nd->path, &nd->root))
while (1) {
if (path_equal(&nd->path, &nd->root)) {
if (unlikely(nd->flags & LOOKUP_BENEATH))
return -EXDEV;
break;
}
if (nd->path.dentry != nd->path.mnt->mnt_root) {
int ret = path_parent_directory(&nd->path);
if (ret)
......@@ -1490,6 +1580,8 @@ static int follow_dotdot(struct nameidata *nd)
}
if (!follow_up(&nd->path))
break;
if (unlikely(nd->flags & LOOKUP_NO_XDEV))
return -EXDEV;
}
follow_mount(&nd->path);
nd->inode = nd->path.dentry->d_inode;
......@@ -1698,12 +1790,33 @@ static inline int may_lookup(struct nameidata *nd)
static inline int handle_dots(struct nameidata *nd, int type)
{
if (type == LAST_DOTDOT) {
if (!nd->root.mnt)
set_root(nd);
if (nd->flags & LOOKUP_RCU) {
return follow_dotdot_rcu(nd);
} else
return follow_dotdot(nd);
int error = 0;
if (!nd->root.mnt) {
error = set_root(nd);
if (error)
return error;
}
if (nd->flags & LOOKUP_RCU)
error = follow_dotdot_rcu(nd);
else
error = follow_dotdot(nd);
if (error)
return error;
if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
/*
* If there was a racing rename or mount along our
* path, then we can't be sure that ".." hasn't jumped
* above nd->root (and so userspace should retry or use
* some fallback).
*/
smp_rmb();
if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
return -EAGAIN;
if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
return -EAGAIN;
}
}
return 0;
}
......@@ -2157,6 +2270,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
/* must be paired with terminate_walk() */
static const char *path_init(struct nameidata *nd, unsigned flags)
{
int error;
const char *s = nd->name->name;
if (!*s)
......@@ -2167,6 +2281,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
nd->last_type = LAST_ROOT; /* if there are only slashes... */
nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
nd->depth = 0;
nd->m_seq = __read_seqcount_begin(&mount_lock.seqcount);
nd->r_seq = __read_seqcount_begin(&rename_lock.seqcount);
smp_rmb();
if (flags & LOOKUP_ROOT) {
struct dentry *root = nd->root.dentry;
struct inode *inode = root->d_inode;
......@@ -2175,9 +2294,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
nd->path = nd->root;
nd->inode = inode;
if (flags & LOOKUP_RCU) {
nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq);
nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
nd->root_seq = nd->seq;
nd->m_seq = read_seqbegin(&mount_lock);
} else {
path_get(&nd->path);
}
......@@ -2188,13 +2306,16 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
nd->path.mnt = NULL;
nd->path.dentry = NULL;
nd->m_seq = read_seqbegin(&mount_lock);
if (*s == '/') {
set_root(nd);
if (likely(!nd_jump_root(nd)))
/* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */
if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) {
error = nd_jump_root(nd);
if (unlikely(error))
return ERR_PTR(error);
return s;
return ERR_PTR(-ECHILD);
} else if (nd->dfd == AT_FDCWD) {
}
/* Relative pathname -- get the starting-point it is relative to. */
if (nd->dfd == AT_FDCWD) {
if (flags & LOOKUP_RCU) {
struct fs_struct *fs = current->fs;
unsigned seq;
......@@ -2209,7 +2330,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
get_fs_pwd(current->fs, &nd->path);
nd->inode = nd->path.dentry->d_inode;
}
return s;
} else {
/* Caller must check execute permissions on the starting path component */
struct fd f = fdget_raw(nd->dfd);
......@@ -2234,8 +2354,19 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
nd->inode = nd->path.dentry->d_inode;
}
fdput(f);
return s;
}
/* For scoped-lookups we need to set the root to the dirfd as well. */
if (flags & LOOKUP_IS_SCOPED) {
nd->root = nd->path;
if (flags & LOOKUP_RCU) {
nd->root_seq = nd->seq;
} else {
path_get(&nd->root);
nd->flags |= LOOKUP_ROOT_GRABBED;
}
}
return s;
}
static const char *trailing_symlink(struct nameidata *nd)
......
......@@ -55,7 +55,7 @@ static void nsfs_evict(struct inode *inode)
ns->ops->put(ns);
}
static void *__ns_get_path(struct path *path, struct ns_common *ns)
static int __ns_get_path(struct path *path, struct ns_common *ns)
{
struct vfsmount *mnt = nsfs_mnt;
struct dentry *dentry;
......@@ -74,13 +74,13 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
got_it:
path->mnt = mntget(mnt);
path->dentry = dentry;
return NULL;
return 0;
slow:
rcu_read_unlock();
inode = new_inode_pseudo(mnt->mnt_sb);
if (!inode) {
ns->ops->put(ns);
return ERR_PTR(-ENOMEM);
return -ENOMEM;
}
inode->i_ino = ns->inum;
inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
......@@ -92,7 +92,7 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
dentry = d_alloc_anon(mnt->mnt_sb);
if (!dentry) {
iput(inode);
return ERR_PTR(-ENOMEM);
return -ENOMEM;
}
d_instantiate(dentry, inode);
dentry->d_fsdata = (void *)ns->ops;
......@@ -101,23 +101,22 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
d_delete(dentry); /* make sure ->d_prune() does nothing */
dput(dentry);
cpu_relax();
return ERR_PTR(-EAGAIN);
return -EAGAIN;
}
goto got_it;
}
void *ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
int ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
void *private_data)
{
void *ret;
int ret;
do {
struct ns_common *ns = ns_get_cb(private_data);
if (!ns)
return ERR_PTR(-ENOENT);
return -ENOENT;
ret = __ns_get_path(path, ns);
} while (ret == ERR_PTR(-EAGAIN));
} while (ret == -EAGAIN);
return ret;
}
......@@ -134,7 +133,7 @@ static struct ns_common *ns_get_path_task(void *private_data)
return args->ns_ops->get(args->task);
}
void *ns_get_path(struct path *path, struct task_struct *task,
int ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops)
{
struct ns_get_path_task_args args = {
......@@ -150,7 +149,7 @@ int open_related_ns(struct ns_common *ns,
{
struct path path = {};
struct file *f;
void *err;
int err;
int fd;
fd = get_unused_fd_flags(O_CLOEXEC);
......@@ -167,11 +166,11 @@ int open_related_ns(struct ns_common *ns,
}
err = __ns_get_path(&path, relative);
} while (err == ERR_PTR(-EAGAIN));
} while (err == -EAGAIN);
if (IS_ERR(err)) {
if (err) {
put_unused_fd(fd);
return PTR_ERR(err);
return err;
}
f = dentry_open(&path, O_RDONLY, current_cred());
......
......@@ -955,48 +955,84 @@ struct file *open_with_fake_path(const struct path *path, int flags,
}
EXPORT_SYMBOL(open_with_fake_path);
static inline int build_open_flags(int flags, umode_t mode, struct open_flags *op)
#define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE))
#define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)
static inline struct open_how build_open_how(int flags, umode_t mode)
{
struct open_how how = {
.flags = flags & VALID_OPEN_FLAGS,
.mode = mode & S_IALLUGO,
};
/* O_PATH beats everything else. */
if (how.flags & O_PATH)
how.flags &= O_PATH_FLAGS;
/* Modes should only be set for create-like flags. */
if (!WILL_CREATE(how.flags))
how.mode = 0;
return how;
}
static inline int build_open_flags(const struct open_how *how,
struct open_flags *op)
{
int flags = how->flags;
int lookup_flags = 0;
int acc_mode = ACC_MODE(flags);
/* Must never be set by userspace */
flags &= ~(FMODE_NONOTIFY | O_CLOEXEC);
/*
* Clear out all open flags we don't know about so that we don't report
* them in fcntl(F_GETFD) or similar interfaces.
* Older syscalls implicitly clear all of the invalid flags or argument
* values before calling build_open_flags(), but openat2(2) checks all
* of its arguments.
*/
flags &= VALID_OPEN_FLAGS;
if (flags & ~VALID_OPEN_FLAGS)
return -EINVAL;
if (how->resolve & ~VALID_RESOLVE_FLAGS)
return -EINVAL;
if (flags & (O_CREAT | __O_TMPFILE))
op->mode = (mode & S_IALLUGO) | S_IFREG;
else
/* Deal with the mode. */
if (WILL_CREATE(flags)) {
if (how->mode & ~S_IALLUGO)
return -EINVAL;
op->mode = how->mode | S_IFREG;
} else {
if (how->mode != 0)
return -EINVAL;
op->mode = 0;
/* Must never be set by userspace */
flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
}
/*
* O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only
* check for O_DSYNC if the need any syncing at all we enforce it's
* always set instead of having to deal with possibly weird behaviour
* for malicious applications setting only __O_SYNC.
* In order to ensure programs get explicit errors when trying to use
* O_TMPFILE on old kernels, O_TMPFILE is implemented such that it
* looks like (O_DIRECTORY|O_RDWR & ~O_CREAT) to old kernels. But we
* have to require userspace to explicitly set it.
*/
if (flags & __O_SYNC)
flags |= O_DSYNC;
if (flags & __O_TMPFILE) {
if ((flags & O_TMPFILE_MASK) != O_TMPFILE)
return -EINVAL;
if (!(acc_mode & MAY_WRITE))
return -EINVAL;
} else if (flags & O_PATH) {
/*
* If we have O_PATH in the open flag. Then we
* cannot have anything other than the below set of flags
*/
flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH;
}
if (flags & O_PATH) {
/* O_PATH only permits certain other flags to be set. */
if (flags & ~O_PATH_FLAGS)
return -EINVAL;
acc_mode = 0;
}
/*
* O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only
* check for O_DSYNC if the need any syncing at all we enforce it's
* always set instead of having to deal with possibly weird behaviour
* for malicious applications setting only __O_SYNC.
*/
if (flags & __O_SYNC)
flags |= O_DSYNC;
op->open_flag = flags;
/* O_TRUNC implies we need access checks for write permissions */
......@@ -1022,6 +1058,18 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
lookup_flags |= LOOKUP_DIRECTORY;
if (!(flags & O_NOFOLLOW))
lookup_flags |= LOOKUP_FOLLOW;
if (how->resolve & RESOLVE_NO_XDEV)
lookup_flags |= LOOKUP_NO_XDEV;
if (how->resolve & RESOLVE_NO_MAGICLINKS)
lookup_flags |= LOOKUP_NO_MAGICLINKS;
if (how->resolve & RESOLVE_NO_SYMLINKS)
lookup_flags |= LOOKUP_NO_SYMLINKS;
if (how->resolve & RESOLVE_BENEATH)
lookup_flags |= LOOKUP_BENEATH;
if (how->resolve & RESOLVE_IN_ROOT)
lookup_flags |= LOOKUP_IN_ROOT;
op->lookup_flags = lookup_flags;
return 0;
}
......@@ -1040,8 +1088,11 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
struct file *file_open_name(struct filename *name, int flags, umode_t mode)
{
struct open_flags op;
int err = build_open_flags(flags, mode, &op);
return err ? ERR_PTR(err) : do_filp_open(AT_FDCWD, name, &op);
struct open_how how = build_open_how(flags, mode);
int err = build_open_flags(&how, &op);
if (err)
return ERR_PTR(err);
return do_filp_open(AT_FDCWD, name, &op);
}
/**
......@@ -1072,17 +1123,19 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt,
const char *filename, int flags, umode_t mode)
{
struct open_flags op;
int err = build_open_flags(flags, mode, &op);
struct open_how how = build_open_how(flags, mode);
int err = build_open_flags(&how, &op);
if (err)
return ERR_PTR(err);
return do_file_open_root(dentry, mnt, filename, &op);
}
EXPORT_SYMBOL(file_open_root);
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
static long do_sys_openat2(int dfd, const char __user *filename,
struct open_how *how)
{
struct open_flags op;
int fd = build_open_flags(flags, mode, &op);
int fd = build_open_flags(how, &op);
struct filename *tmp;
if (fd)
......@@ -1092,7 +1145,7 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
if (IS_ERR(tmp))
return PTR_ERR(tmp);
fd = get_unused_fd_flags(flags);
fd = get_unused_fd_flags(how->flags);
if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, &op);
if (IS_ERR(f)) {
......@@ -1107,12 +1160,16 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
return fd;
}
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
struct open_how how = build_open_how(flags, mode);
return do_sys_openat2(dfd, filename, &how);
}
return do_sys_open(AT_FDCWD, filename, flags, mode);
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
return ksys_open(filename, flags, mode);
}
SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
......@@ -1120,10 +1177,32 @@ SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(dfd, filename, flags, mode);
}
SYSCALL_DEFINE4(openat2, int, dfd, const char __user *, filename,
struct open_how __user *, how, size_t, usize)
{
int err;
struct open_how tmp;
BUILD_BUG_ON(sizeof(struct open_how) < OPEN_HOW_SIZE_VER0);
BUILD_BUG_ON(sizeof(struct open_how) != OPEN_HOW_SIZE_LATEST);
if (unlikely(usize < OPEN_HOW_SIZE_VER0))
return -EINVAL;
err = copy_struct_from_user(&tmp, sizeof(tmp), how, usize);
if (err)
return err;
/* O_LARGEFILE is only allowed for non-O_PATH. */
if (!(tmp.flags & O_PATH) && force_o_largefile())
tmp.flags |= O_LARGEFILE;
return do_sys_openat2(dfd, filename, &tmp);
}
#ifdef CONFIG_COMPAT
/*
* Exactly like sys_open(), except that it doesn't set the
......
......@@ -1626,8 +1626,7 @@ static const char *proc_pid_get_link(struct dentry *dentry,
if (error)
goto out;
nd_jump_link(&path);
return NULL;
error = nd_jump_link(&path);
out:
return ERR_PTR(error);
}
......
......@@ -42,22 +42,26 @@ static const char *proc_ns_get_link(struct dentry *dentry,
const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
struct task_struct *task;
struct path ns_path;
void *error = ERR_PTR(-EACCES);
int error = -EACCES;
if (!dentry)
return ERR_PTR(-ECHILD);
task = get_proc_task(inode);
if (!task)
return error;
return ERR_PTR(-EACCES);
if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
goto out;
if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
error = ns_get_path(&ns_path, task, ns_ops);
if (!error)
nd_jump_link(&ns_path);
}
if (error)
goto out;
error = nd_jump_link(&ns_path);
out:
put_task_struct(task);
return error;
return ERR_PTR(error);
}
static int proc_ns_readlink(struct dentry *dentry, char __user *buffer, int buflen)
......
......@@ -2,15 +2,29 @@
#ifndef _LINUX_FCNTL_H
#define _LINUX_FCNTL_H
#include <linux/stat.h>
#include <uapi/linux/fcntl.h>
/* list of all valid flags for the open/openat flags argument: */
/* List of all valid flags for the open/openat flags argument: */
#define VALID_OPEN_FLAGS \
(O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \
O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \
FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
/* List of all valid flags for the how->upgrade_mask argument: */
#define VALID_UPGRADE_FLAGS \
(UPGRADE_NOWRITE | UPGRADE_NOREAD)
/* List of all valid flags for the how->resolve argument: */
#define VALID_RESOLVE_FLAGS \
(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
RESOLVE_BENEATH | RESOLVE_IN_ROOT)
/* List of all open_how "versions". */
#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
#ifndef force_o_largefile
#define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T))
#endif
......
......@@ -2,6 +2,7 @@
#ifndef _LINUX_NAMEI_H
#define _LINUX_NAMEI_H
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/path.h>
#include <linux/fcntl.h>
......@@ -38,6 +39,15 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
#define LOOKUP_ROOT 0x2000
#define LOOKUP_ROOT_GRABBED 0x0008
/* Scoping flags for lookup. */
#define LOOKUP_NO_SYMLINKS 0x010000 /* No symlink crossing. */
#define LOOKUP_NO_MAGICLINKS 0x020000 /* No nd_jump_link() crossing. */
#define LOOKUP_NO_XDEV 0x040000 /* No mountpoint crossing. */
#define LOOKUP_BENEATH 0x080000 /* No escaping from starting point. */
#define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */
/* LOOKUP_* flags which do scope-related checks based on the dirfd. */
#define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
extern int path_pts(struct path *path);
extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
......@@ -68,7 +78,7 @@ extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
extern void nd_jump_link(struct path *path);
extern int __must_check nd_jump_link(struct path *path);
static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
......
......@@ -76,10 +76,10 @@ static inline int ns_alloc_inum(struct ns_common *ns)
extern struct file *proc_ns_fget(int fd);
#define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
extern void *ns_get_path(struct path *path, struct task_struct *task,
extern int ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
typedef struct ns_common *ns_get_path_helper_t(void *);
extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb,
extern int ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb,
void *private_data);
extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
......
......@@ -69,6 +69,7 @@ struct rseq;
union bpf_attr;
struct io_uring_params;
struct clone_args;
struct open_how;
#include <linux/types.h>
#include <linux/aio_abi.h>
......@@ -439,6 +440,8 @@ asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group);
asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
umode_t mode);
asmlinkage long sys_openat2(int dfd, const char __user *filename,
struct open_how *how, size_t size);
asmlinkage long sys_close(unsigned int fd);
asmlinkage long sys_vhangup(void);
......
......@@ -851,8 +851,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
__SYSCALL(__NR_clone3, sys_clone3)
#endif
#define __NR_openat2 437
__SYSCALL(__NR_openat2, sys_openat2)
#undef __NR_syscalls
#define __NR_syscalls 436
#define __NR_syscalls 438
/*
* 32 bit systems traditionally used different
......
......@@ -3,6 +3,7 @@
#define _UAPI_LINUX_FCNTL_H
#include <asm/fcntl.h>
#include <linux/openat2.h>
#define F_SETLEASE (F_LINUX_SPECIFIC_BASE + 0)
#define F_GETLEASE (F_LINUX_SPECIFIC_BASE + 1)
......@@ -100,5 +101,4 @@
#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
#endif /* _UAPI_LINUX_FCNTL_H */
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI_LINUX_OPENAT2_H
#define _UAPI_LINUX_OPENAT2_H
#include <linux/types.h>
/*
* Arguments for how openat2(2) should open the target path. If only @flags and
* @mode are non-zero, then openat2(2) operates very similarly to openat(2).
*
* However, unlike openat(2), unknown or invalid bits in @flags result in
* -EINVAL rather than being silently ignored. @mode must be zero unless one of
* {O_CREAT, O_TMPFILE} are set.
*
* @flags: O_* flags.
* @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
*/
struct open_how {
__u64 flags;
__u64 mode;
__u64 resolve;
};
/* how->resolve flags for openat2(2). */
#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
(includes bind-mounts). */
#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
"magic-links". */
#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
(implies OEXT_NO_MAGICLINKS) */
#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
"..", symlinks, and absolute
paths which escape the dirfd. */
#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
be scoped inside the dirfd
(similar to chroot(2)). */
#endif /* _UAPI_LINUX_OPENAT2_H */
......@@ -302,14 +302,14 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
struct inode *ns_inode;
struct path ns_path;
char __user *uinsns;
void *res;
int res;
u32 ulen;
res = ns_get_path_cb(&ns_path, bpf_prog_offload_info_fill_ns, &args);
if (IS_ERR(res)) {
if (res) {
if (!info->ifindex)
return -ENODEV;
return PTR_ERR(res);
return res;
}
down_read(&bpf_devs_lock);
......@@ -526,13 +526,13 @@ int bpf_map_offload_info_fill(struct bpf_map_info *info, struct bpf_map *map)
};
struct inode *ns_inode;
struct path ns_path;
void *res;
int res;
res = ns_get_path_cb(&ns_path, bpf_map_offload_info_fill_ns, &args);
if (IS_ERR(res)) {
if (res) {
if (!info->ifindex)
return -ENODEV;
return PTR_ERR(res);
return res;
}
ns_inode = ns_path.dentry->d_inode;
......
......@@ -7495,7 +7495,7 @@ static void perf_fill_ns_link_info(struct perf_ns_link_info *ns_link_info,
{
struct path ns_path;
struct inode *ns_inode;
void *error;
int error;
error = ns_get_path(&ns_path, task, ns_ops);
if (!error) {
......
......@@ -2573,16 +2573,18 @@ static const char *policy_get_link(struct dentry *dentry,
{
struct aa_ns *ns;
struct path path;
int error;
if (!dentry)
return ERR_PTR(-ECHILD);
ns = aa_get_current_ns();
path.mnt = mntget(aafs_mnt);
path.dentry = dget(ns_dir(ns));
nd_jump_link(&path);
error = nd_jump_link(&path);
aa_put_ns(ns);
return NULL;
return ERR_PTR(error);
}
static int policy_readlink(struct dentry *dentry, char __user *buffer,
......
......@@ -40,6 +40,7 @@ TARGETS += powerpc
TARGETS += proc
TARGETS += pstore
TARGETS += ptrace
TARGETS += openat2
TARGETS += rseq
TARGETS += rtc
TARGETS += seccomp
......
# SPDX-License-Identifier: GPL-2.0-or-later
CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test
include ../lib.mk
$(TEST_GEN_PROGS): helpers.c
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdbool.h>
#include <string.h>
#include <syscall.h>
#include <limits.h>
#include "helpers.h"
bool needs_openat2(const struct open_how *how)
{
return how->resolve != 0;
}
int raw_openat2(int dfd, const char *path, void *how, size_t size)
{
int ret = syscall(__NR_openat2, dfd, path, how, size);
return ret >= 0 ? ret : -errno;
}
int sys_openat2(int dfd, const char *path, struct open_how *how)
{
return raw_openat2(dfd, path, how, sizeof(*how));
}
int sys_openat(int dfd, const char *path, struct open_how *how)
{
int ret = openat(dfd, path, how->flags, how->mode);
return ret >= 0 ? ret : -errno;
}
int sys_renameat2(int olddirfd, const char *oldpath,
int newdirfd, const char *newpath, unsigned int flags)
{
int ret = syscall(__NR_renameat2, olddirfd, oldpath,
newdirfd, newpath, flags);
return ret >= 0 ? ret : -errno;
}
int touchat(int dfd, const char *path)
{
int fd = openat(dfd, path, O_CREAT);
if (fd >= 0)
close(fd);
return fd;
}
char *fdreadlink(int fd)
{
char *target, *tmp;
E_asprintf(&tmp, "/proc/self/fd/%d", fd);
target = malloc(PATH_MAX);
if (!target)
ksft_exit_fail_msg("fdreadlink: malloc failed\n");
memset(target, 0, PATH_MAX);
E_readlink(tmp, target, PATH_MAX);
free(tmp);
return target;
}
bool fdequal(int fd, int dfd, const char *path)
{
char *fdpath, *dfdpath, *other;
bool cmp;
fdpath = fdreadlink(fd);
dfdpath = fdreadlink(dfd);
if (!path)
E_asprintf(&other, "%s", dfdpath);
else if (*path == '/')
E_asprintf(&other, "%s", path);
else
E_asprintf(&other, "%s/%s", dfdpath, path);
cmp = !strcmp(fdpath, other);
free(fdpath);
free(dfdpath);
free(other);
return cmp;
}
bool openat2_supported = false;
void __attribute__((constructor)) init(void)
{
struct open_how how = {};
int fd;
BUILD_BUG_ON(sizeof(struct open_how) != OPEN_HOW_SIZE_VER0);
/* Check openat2(2) support. */
fd = sys_openat2(AT_FDCWD, ".", &how);
openat2_supported = (fd >= 0);
if (fd >= 0)
close(fd);
}
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#ifndef __RESOLVEAT_H__
#define __RESOLVEAT_H__
#define _GNU_SOURCE
#include <stdint.h>
#include <errno.h>
#include <linux/types.h>
#include "../kselftest.h"
#define ARRAY_LEN(X) (sizeof (X) / sizeof (*(X)))
#define BUILD_BUG_ON(e) ((void)(sizeof(struct { int:(-!!(e)); })))
#ifndef SYS_openat2
#ifndef __NR_openat2
#define __NR_openat2 437
#endif /* __NR_openat2 */
#define SYS_openat2 __NR_openat2
#endif /* SYS_openat2 */
/*
* Arguments for how openat2(2) should open the target path. If @resolve is
* zero, then openat2(2) operates very similarly to openat(2).
*
* However, unlike openat(2), unknown bits in @flags result in -EINVAL rather
* than being silently ignored. @mode must be zero unless one of {O_CREAT,
* O_TMPFILE} are set.
*
* @flags: O_* flags.
* @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
*/
struct open_how {
__u64 flags;
__u64 mode;
__u64 resolve;
};
#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
bool needs_openat2(const struct open_how *how);
#ifndef RESOLVE_IN_ROOT
/* how->resolve flags for openat2(2). */
#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
(includes bind-mounts). */
#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
"magic-links". */
#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
(implies OEXT_NO_MAGICLINKS) */
#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
"..", symlinks, and absolute
paths which escape the dirfd. */
#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
be scoped inside the dirfd
(similar to chroot(2)). */
#endif /* RESOLVE_IN_ROOT */
#define E_func(func, ...) \
do { \
if (func(__VA_ARGS__) < 0) \
ksft_exit_fail_msg("%s:%d %s failed\n", \
__FILE__, __LINE__, #func);\
} while (0)
#define E_asprintf(...) E_func(asprintf, __VA_ARGS__)
#define E_chmod(...) E_func(chmod, __VA_ARGS__)
#define E_dup2(...) E_func(dup2, __VA_ARGS__)
#define E_fchdir(...) E_func(fchdir, __VA_ARGS__)
#define E_fstatat(...) E_func(fstatat, __VA_ARGS__)
#define E_kill(...) E_func(kill, __VA_ARGS__)
#define E_mkdirat(...) E_func(mkdirat, __VA_ARGS__)
#define E_mount(...) E_func(mount, __VA_ARGS__)
#define E_prctl(...) E_func(prctl, __VA_ARGS__)
#define E_readlink(...) E_func(readlink, __VA_ARGS__)
#define E_setresuid(...) E_func(setresuid, __VA_ARGS__)
#define E_symlinkat(...) E_func(symlinkat, __VA_ARGS__)
#define E_touchat(...) E_func(touchat, __VA_ARGS__)
#define E_unshare(...) E_func(unshare, __VA_ARGS__)
#define E_assert(expr, msg, ...) \
do { \
if (!(expr)) \
ksft_exit_fail_msg("ASSERT(%s:%d) failed (%s): " msg "\n", \
__FILE__, __LINE__, #expr, ##__VA_ARGS__); \
} while (0)
int raw_openat2(int dfd, const char *path, void *how, size_t size);
int sys_openat2(int dfd, const char *path, struct open_how *how);
int sys_openat(int dfd, const char *path, struct open_how *how);
int sys_renameat2(int olddirfd, const char *oldpath,
int newdirfd, const char *newpath, unsigned int flags);
int touchat(int dfd, const char *path);
char *fdreadlink(int fd);
bool fdequal(int fd, int dfd, const char *path);
extern bool openat2_supported;
#endif /* __RESOLVEAT_H__ */
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include "../kselftest.h"
#include "helpers.h"
/*
* O_LARGEFILE is set to 0 by glibc.
* XXX: This is wrong on {mips, parisc, powerpc, sparc}.
*/
#undef O_LARGEFILE
#define O_LARGEFILE 0x8000
struct open_how_ext {
struct open_how inner;
uint32_t extra1;
char pad1[128];
uint32_t extra2;
char pad2[128];
uint32_t extra3;
};
struct struct_test {
const char *name;
struct open_how_ext arg;
size_t size;
int err;
};
#define NUM_OPENAT2_STRUCT_TESTS 7
#define NUM_OPENAT2_STRUCT_VARIATIONS 13
void test_openat2_struct(void)
{
int misalignments[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 17, 87 };
struct struct_test tests[] = {
/* Normal struct. */
{ .name = "normal struct",
.arg.inner.flags = O_RDONLY,
.size = sizeof(struct open_how) },
/* Bigger struct, with zeroed out end. */
{ .name = "bigger struct (zeroed out)",
.arg.inner.flags = O_RDONLY,
.size = sizeof(struct open_how_ext) },
/* TODO: Once expanded, check zero-padding. */
/* Smaller than version-0 struct. */
{ .name = "zero-sized 'struct'",
.arg.inner.flags = O_RDONLY, .size = 0, .err = -EINVAL },
{ .name = "smaller-than-v0 struct",
.arg.inner.flags = O_RDONLY,
.size = OPEN_HOW_SIZE_VER0 - 1, .err = -EINVAL },
/* Bigger struct, with non-zero trailing bytes. */
{ .name = "bigger struct (non-zero data in first 'future field')",
.arg.inner.flags = O_RDONLY, .arg.extra1 = 0xdeadbeef,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
{ .name = "bigger struct (non-zero data in middle of 'future fields')",
.arg.inner.flags = O_RDONLY, .arg.extra2 = 0xfeedcafe,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
{ .name = "bigger struct (non-zero data at end of 'future fields')",
.arg.inner.flags = O_RDONLY, .arg.extra3 = 0xabad1dea,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
};
BUILD_BUG_ON(ARRAY_LEN(misalignments) != NUM_OPENAT2_STRUCT_VARIATIONS);
BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_STRUCT_TESTS);
for (int i = 0; i < ARRAY_LEN(tests); i++) {
struct struct_test *test = &tests[i];
struct open_how_ext how_ext = test->arg;
for (int j = 0; j < ARRAY_LEN(misalignments); j++) {
int fd, misalign = misalignments[j];
char *fdpath = NULL;
bool failed;
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
void *copy = NULL, *how_copy = &how_ext;
if (!openat2_supported) {
ksft_print_msg("openat2(2) unsupported\n");
resultfn = ksft_test_result_skip;
goto skip;
}
if (misalign) {
/*
* Explicitly misalign the structure copying it with the given
* (mis)alignment offset. The other data is set to be non-zero to
* make sure that non-zero bytes outside the struct aren't checked
*
* This is effectively to check that is_zeroed_user() works.
*/
copy = malloc(misalign + sizeof(how_ext));
how_copy = copy + misalign;
memset(copy, 0xff, misalign);
memcpy(how_copy, &how_ext, sizeof(how_ext));
}
fd = raw_openat2(AT_FDCWD, ".", how_copy, test->size);
if (test->err >= 0)
failed = (fd < 0);
else
failed = (fd != test->err);
if (fd >= 0) {
fdpath = fdreadlink(fd);
close(fd);
}
if (failed) {
resultfn = ksft_test_result_fail;
ksft_print_msg("openat2 unexpectedly returned ");
if (fdpath)
ksft_print_msg("%d['%s']\n", fd, fdpath);
else
ksft_print_msg("%d (%s)\n", fd, strerror(-fd));
}
skip:
if (test->err >= 0)
resultfn("openat2 with %s argument [misalign=%d] succeeds\n",
test->name, misalign);
else
resultfn("openat2 with %s argument [misalign=%d] fails with %d (%s)\n",
test->name, misalign, test->err,
strerror(-test->err));
free(copy);
free(fdpath);
fflush(stdout);
}
}
}
struct flag_test {
const char *name;
struct open_how how;
int err;
};
#define NUM_OPENAT2_FLAG_TESTS 23
void test_openat2_flags(void)
{
struct flag_test tests[] = {
/* O_TMPFILE is incompatible with O_PATH and O_CREAT. */
{ .name = "incompatible flags (O_TMPFILE | O_PATH)",
.how.flags = O_TMPFILE | O_PATH | O_RDWR, .err = -EINVAL },
{ .name = "incompatible flags (O_TMPFILE | O_CREAT)",
.how.flags = O_TMPFILE | O_CREAT | O_RDWR, .err = -EINVAL },
/* O_PATH only permits certain other flags to be set ... */
{ .name = "compatible flags (O_PATH | O_CLOEXEC)",
.how.flags = O_PATH | O_CLOEXEC },
{ .name = "compatible flags (O_PATH | O_DIRECTORY)",
.how.flags = O_PATH | O_DIRECTORY },
{ .name = "compatible flags (O_PATH | O_NOFOLLOW)",
.how.flags = O_PATH | O_NOFOLLOW },
/* ... and others are absolutely not permitted. */
{ .name = "incompatible flags (O_PATH | O_RDWR)",
.how.flags = O_PATH | O_RDWR, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_CREAT)",
.how.flags = O_PATH | O_CREAT, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_EXCL)",
.how.flags = O_PATH | O_EXCL, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_NOCTTY)",
.how.flags = O_PATH | O_NOCTTY, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_DIRECT)",
.how.flags = O_PATH | O_DIRECT, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_LARGEFILE)",
.how.flags = O_PATH | O_LARGEFILE, .err = -EINVAL },
/* ->mode must only be set with O_{CREAT,TMPFILE}. */
{ .name = "non-zero how.mode and O_RDONLY",
.how.flags = O_RDONLY, .how.mode = 0600, .err = -EINVAL },
{ .name = "non-zero how.mode and O_PATH",
.how.flags = O_PATH, .how.mode = 0600, .err = -EINVAL },
{ .name = "valid how.mode and O_CREAT",
.how.flags = O_CREAT, .how.mode = 0600 },
{ .name = "valid how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR, .how.mode = 0600 },
/* ->mode must only contain 0777 bits. */
{ .name = "invalid how.mode and O_CREAT",
.how.flags = O_CREAT,
.how.mode = 0xFFFF, .err = -EINVAL },
{ .name = "invalid (very large) how.mode and O_CREAT",
.how.flags = O_CREAT,
.how.mode = 0xC000000000000000ULL, .err = -EINVAL },
{ .name = "invalid how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.mode = 0x1337, .err = -EINVAL },
{ .name = "invalid (very large) how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.mode = 0x0000A00000000000ULL, .err = -EINVAL },
/* ->resolve must only contain RESOLVE_* flags. */
{ .name = "invalid how.resolve and O_RDONLY",
.how.flags = O_RDONLY,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_CREAT",
.how.flags = O_CREAT,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_PATH",
.how.flags = O_PATH,
.how.resolve = 0x1337, .err = -EINVAL },
};
BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_FLAG_TESTS);
for (int i = 0; i < ARRAY_LEN(tests); i++) {
int fd, fdflags = -1;
char *path, *fdpath = NULL;
bool failed = false;
struct flag_test *test = &tests[i];
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
if (!openat2_supported) {
ksft_print_msg("openat2(2) unsupported\n");
resultfn = ksft_test_result_skip;
goto skip;
}
path = (test->how.flags & O_CREAT) ? "/tmp/ksft.openat2_tmpfile" : ".";
unlink(path);
fd = sys_openat2(AT_FDCWD, path, &test->how);
if (test->err >= 0)
failed = (fd < 0);
else
failed = (fd != test->err);
if (fd >= 0) {
int otherflags;
fdpath = fdreadlink(fd);
fdflags = fcntl(fd, F_GETFL);
otherflags = fcntl(fd, F_GETFD);
close(fd);
E_assert(fdflags >= 0, "fcntl F_GETFL of new fd");
E_assert(otherflags >= 0, "fcntl F_GETFD of new fd");
/* O_CLOEXEC isn't shown in F_GETFL. */
if (otherflags & FD_CLOEXEC)
fdflags |= O_CLOEXEC;
/* O_CREAT is hidden from F_GETFL. */
if (test->how.flags & O_CREAT)
fdflags |= O_CREAT;
if (!(test->how.flags & O_LARGEFILE))
fdflags &= ~O_LARGEFILE;
failed |= (fdflags != test->how.flags);
}
if (failed) {
resultfn = ksft_test_result_fail;
ksft_print_msg("openat2 unexpectedly returned ");
if (fdpath)
ksft_print_msg("%d['%s'] with %X (!= %X)\n",
fd, fdpath, fdflags,
test->how.flags);
else
ksft_print_msg("%d (%s)\n", fd, strerror(-fd));
}
skip:
if (test->err >= 0)
resultfn("openat2 with %s succeeds\n", test->name);
else
resultfn("openat2 with %s fails with %d (%s)\n",
test->name, test->err, strerror(-test->err));
free(fdpath);
fflush(stdout);
}
}
#define NUM_TESTS (NUM_OPENAT2_STRUCT_VARIATIONS * NUM_OPENAT2_STRUCT_TESTS + \
NUM_OPENAT2_FLAG_TESTS)
int main(int argc, char **argv)
{
ksft_print_header();
ksft_set_plan(NUM_TESTS);
test_openat2_struct();
test_openat2_flags();
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
ksft_exit_fail();
else
ksft_exit_pass();
}
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <syscall.h>
#include <limits.h>
#include <unistd.h>
#include "../kselftest.h"
#include "helpers.h"
/* Construct a test directory with the following structure:
*
* root/
* |-- a/
* | `-- c/
* `-- b/
*/
int setup_testdir(void)
{
int dfd;
char dirname[] = "/tmp/ksft-openat2-rename-attack.XXXXXX";
/* Make the top-level directory. */
if (!mkdtemp(dirname))
ksft_exit_fail_msg("setup_testdir: failed to create tmpdir\n");
dfd = open(dirname, O_PATH | O_DIRECTORY);
if (dfd < 0)
ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n");
E_mkdirat(dfd, "a", 0755);
E_mkdirat(dfd, "b", 0755);
E_mkdirat(dfd, "a/c", 0755);
return dfd;
}
/* Swap @dirfd/@a and @dirfd/@b constantly. Parent must kill this process. */
pid_t spawn_attack(int dirfd, char *a, char *b)
{
pid_t child = fork();
if (child != 0)
return child;
/* If the parent (the test process) dies, kill ourselves too. */
E_prctl(PR_SET_PDEATHSIG, SIGKILL);
/* Swap @a and @b. */
for (;;)
renameat2(dirfd, a, dirfd, b, RENAME_EXCHANGE);
exit(1);
}
#define NUM_RENAME_TESTS 2
#define ROUNDS 400000
const char *flagname(int resolve)
{
switch (resolve) {
case RESOLVE_IN_ROOT:
return "RESOLVE_IN_ROOT";
case RESOLVE_BENEATH:
return "RESOLVE_BENEATH";
}
return "(unknown)";
}
void test_rename_attack(int resolve)
{
int dfd, afd;
pid_t child;
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
int escapes = 0, other_errs = 0, exdevs = 0, eagains = 0, successes = 0;
struct open_how how = {
.flags = O_PATH,
.resolve = resolve,
};
if (!openat2_supported) {
how.resolve = 0;
ksft_print_msg("openat2(2) unsupported -- using openat(2) instead\n");
}
dfd = setup_testdir();
afd = openat(dfd, "a", O_PATH);
if (afd < 0)
ksft_exit_fail_msg("test_rename_attack: failed to open 'a'\n");
child = spawn_attack(dfd, "a/c", "b");
for (int i = 0; i < ROUNDS; i++) {
int fd;
char *victim_path = "c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../..";
if (openat2_supported)
fd = sys_openat2(afd, victim_path, &how);
else
fd = sys_openat(afd, victim_path, &how);
if (fd < 0) {
if (fd == -EAGAIN)
eagains++;
else if (fd == -EXDEV)
exdevs++;
else if (fd == -ENOENT)
escapes++; /* escaped outside and got ENOENT... */
else
other_errs++; /* unexpected error */
} else {
if (fdequal(fd, afd, NULL))
successes++;
else
escapes++; /* we got an unexpected fd */
}
close(fd);
}
if (escapes > 0)
resultfn = ksft_test_result_fail;
ksft_print_msg("non-escapes: EAGAIN=%d EXDEV=%d E<other>=%d success=%d\n",
eagains, exdevs, other_errs, successes);
resultfn("rename attack with %s (%d runs, got %d escapes)\n",
flagname(resolve), ROUNDS, escapes);
/* Should be killed anyway, but might as well make sure. */
E_kill(child, SIGKILL);
}
#define NUM_TESTS NUM_RENAME_TESTS
int main(int argc, char **argv)
{
ksft_print_header();
ksft_set_plan(NUM_TESTS);
test_rename_attack(RESOLVE_BENEATH);
test_rename_attack(RESOLVE_IN_ROOT);
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
ksft_exit_fail();
else
ksft_exit_pass();
}
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include "../kselftest.h"
#include "helpers.h"
/*
* Construct a test directory with the following structure:
*
* root/
* |-- procexe -> /proc/self/exe
* |-- procroot -> /proc/self/root
* |-- root/
* |-- mnt/ [mountpoint]
* | |-- self -> ../mnt/
* | `-- absself -> /mnt/
* |-- etc/
* | `-- passwd
* |-- creatlink -> /newfile3
* |-- reletc -> etc/
* |-- relsym -> etc/passwd
* |-- absetc -> /etc/
* |-- abssym -> /etc/passwd
* |-- abscheeky -> /cheeky
* `-- cheeky/
* |-- absself -> /
* |-- self -> ../../root/
* |-- garbageself -> /../../root/
* |-- passwd -> ../cheeky/../cheeky/../etc/../etc/passwd
* |-- abspasswd -> /../cheeky/../cheeky/../etc/../etc/passwd
* |-- dotdotlink -> ../../../../../../../../../../../../../../etc/passwd
* `-- garbagelink -> /../../../../../../../../../../../../../../etc/passwd
*/
int setup_testdir(void)
{
int dfd, tmpfd;
char dirname[] = "/tmp/ksft-openat2-testdir.XXXXXX";
/* Unshare and make /tmp a new directory. */
E_unshare(CLONE_NEWNS);
E_mount("", "/tmp", "", MS_PRIVATE, "");
/* Make the top-level directory. */
if (!mkdtemp(dirname))
ksft_exit_fail_msg("setup_testdir: failed to create tmpdir\n");
dfd = open(dirname, O_PATH | O_DIRECTORY);
if (dfd < 0)
ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n");
/* A sub-directory which is actually used for tests. */
E_mkdirat(dfd, "root", 0755);
tmpfd = openat(dfd, "root", O_PATH | O_DIRECTORY);
if (tmpfd < 0)
ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n");
close(dfd);
dfd = tmpfd;
E_symlinkat("/proc/self/exe", dfd, "procexe");
E_symlinkat("/proc/self/root", dfd, "procroot");
E_mkdirat(dfd, "root", 0755);
/* There is no mountat(2), so use chdir. */
E_mkdirat(dfd, "mnt", 0755);
E_fchdir(dfd);
E_mount("tmpfs", "./mnt", "tmpfs", MS_NOSUID | MS_NODEV, "");
E_symlinkat("../mnt/", dfd, "mnt/self");
E_symlinkat("/mnt/", dfd, "mnt/absself");
E_mkdirat(dfd, "etc", 0755);
E_touchat(dfd, "etc/passwd");
E_symlinkat("/newfile3", dfd, "creatlink");
E_symlinkat("etc/", dfd, "reletc");
E_symlinkat("etc/passwd", dfd, "relsym");
E_symlinkat("/etc/", dfd, "absetc");
E_symlinkat("/etc/passwd", dfd, "abssym");
E_symlinkat("/cheeky", dfd, "abscheeky");
E_mkdirat(dfd, "cheeky", 0755);
E_symlinkat("/", dfd, "cheeky/absself");
E_symlinkat("../../root/", dfd, "cheeky/self");
E_symlinkat("/../../root/", dfd, "cheeky/garbageself");
E_symlinkat("../cheeky/../etc/../etc/passwd", dfd, "cheeky/passwd");
E_symlinkat("/../cheeky/../etc/../etc/passwd", dfd, "cheeky/abspasswd");
E_symlinkat("../../../../../../../../../../../../../../etc/passwd",
dfd, "cheeky/dotdotlink");
E_symlinkat("/../../../../../../../../../../../../../../etc/passwd",
dfd, "cheeky/garbagelink");
return dfd;
}
struct basic_test {
const char *name;
const char *dir;
const char *path;
struct open_how how;
bool pass;
union {
int err;
const char *path;
} out;
};
#define NUM_OPENAT2_OPATH_TESTS 88
void test_openat2_opath_tests(void)
{
int rootfd, hardcoded_fd;
char *procselfexe, *hardcoded_fdpath;
E_asprintf(&procselfexe, "/proc/%d/exe", getpid());
rootfd = setup_testdir();
hardcoded_fd = open("/dev/null", O_RDONLY);
E_assert(hardcoded_fd >= 0, "open fd to hardcode");
E_asprintf(&hardcoded_fdpath, "self/fd/%d", hardcoded_fd);
struct basic_test tests[] = {
/** RESOLVE_BENEATH **/
/* Attempts to cross dirfd should be blocked. */
{ .name = "[beneath] jump to /",
.path = "/", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] absolute link to $root",
.path = "cheeky/absself", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] chained absolute links to $root",
.path = "abscheeky/absself", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] jump outside $root",
.path = "..", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] temporary jump outside $root",
.path = "../root/", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] symlink temporary jump outside $root",
.path = "cheeky/self", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] chained symlink temporary jump outside $root",
.path = "abscheeky/self", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] garbage links to $root",
.path = "cheeky/garbageself", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] chained garbage links to $root",
.path = "abscheeky/garbageself", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
/* Only relative paths that stay inside dirfd should work. */
{ .name = "[beneath] ordinary path to 'root'",
.path = "root", .how.resolve = RESOLVE_BENEATH,
.out.path = "root", .pass = true },
{ .name = "[beneath] ordinary path to 'etc'",
.path = "etc", .how.resolve = RESOLVE_BENEATH,
.out.path = "etc", .pass = true },
{ .name = "[beneath] ordinary path to 'etc/passwd'",
.path = "etc/passwd", .how.resolve = RESOLVE_BENEATH,
.out.path = "etc/passwd", .pass = true },
{ .name = "[beneath] relative symlink inside $root",
.path = "relsym", .how.resolve = RESOLVE_BENEATH,
.out.path = "etc/passwd", .pass = true },
{ .name = "[beneath] chained-'..' relative symlink inside $root",
.path = "cheeky/passwd", .how.resolve = RESOLVE_BENEATH,
.out.path = "etc/passwd", .pass = true },
{ .name = "[beneath] absolute symlink component outside $root",
.path = "abscheeky/passwd", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] absolute symlink target outside $root",
.path = "abssym", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] absolute path outside $root",
.path = "/etc/passwd", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] cheeky absolute path outside $root",
.path = "cheeky/abspasswd", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] chained cheeky absolute path outside $root",
.path = "abscheeky/abspasswd", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
/* Tricky paths should fail. */
{ .name = "[beneath] tricky '..'-chained symlink outside $root",
.path = "cheeky/dotdotlink", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] tricky absolute + '..'-chained symlink outside $root",
.path = "abscheeky/dotdotlink", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] tricky garbage link outside $root",
.path = "cheeky/garbagelink", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
{ .name = "[beneath] tricky absolute + garbage link outside $root",
.path = "abscheeky/garbagelink", .how.resolve = RESOLVE_BENEATH,
.out.err = -EXDEV, .pass = false },
/** RESOLVE_IN_ROOT **/
/* All attempts to cross the dirfd will be scoped-to-root. */
{ .name = "[in_root] jump to /",
.path = "/", .how.resolve = RESOLVE_IN_ROOT,
.out.path = NULL, .pass = true },
{ .name = "[in_root] absolute symlink to /root",
.path = "cheeky/absself", .how.resolve = RESOLVE_IN_ROOT,
.out.path = NULL, .pass = true },
{ .name = "[in_root] chained absolute symlinks to /root",
.path = "abscheeky/absself", .how.resolve = RESOLVE_IN_ROOT,
.out.path = NULL, .pass = true },
{ .name = "[in_root] '..' at root",
.path = "..", .how.resolve = RESOLVE_IN_ROOT,
.out.path = NULL, .pass = true },
{ .name = "[in_root] '../root' at root",
.path = "../root/", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "root", .pass = true },
{ .name = "[in_root] relative symlink containing '..' above root",
.path = "cheeky/self", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "root", .pass = true },
{ .name = "[in_root] garbage link to /root",
.path = "cheeky/garbageself", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "root", .pass = true },
{ .name = "[in_root] chainged garbage links to /root",
.path = "abscheeky/garbageself", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "root", .pass = true },
{ .name = "[in_root] relative path to 'root'",
.path = "root", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "root", .pass = true },
{ .name = "[in_root] relative path to 'etc'",
.path = "etc", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc", .pass = true },
{ .name = "[in_root] relative path to 'etc/passwd'",
.path = "etc/passwd", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] relative symlink to 'etc/passwd'",
.path = "relsym", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] chained-'..' relative symlink to 'etc/passwd'",
.path = "cheeky/passwd", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] chained-'..' absolute + relative symlink to 'etc/passwd'",
.path = "abscheeky/passwd", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] absolute symlink to 'etc/passwd'",
.path = "abssym", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] absolute path 'etc/passwd'",
.path = "/etc/passwd", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] cheeky absolute path 'etc/passwd'",
.path = "cheeky/abspasswd", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] chained cheeky absolute path 'etc/passwd'",
.path = "abscheeky/abspasswd", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] tricky '..'-chained symlink outside $root",
.path = "cheeky/dotdotlink", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] tricky absolute + '..'-chained symlink outside $root",
.path = "abscheeky/dotdotlink", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] tricky absolute path + absolute + '..'-chained symlink outside $root",
.path = "/../../../../abscheeky/dotdotlink", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] tricky garbage link outside $root",
.path = "cheeky/garbagelink", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] tricky absolute + garbage link outside $root",
.path = "abscheeky/garbagelink", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
{ .name = "[in_root] tricky absolute path + absolute + garbage link outside $root",
.path = "/../../../../abscheeky/garbagelink", .how.resolve = RESOLVE_IN_ROOT,
.out.path = "etc/passwd", .pass = true },
/* O_CREAT should handle trailing symlinks correctly. */
{ .name = "[in_root] O_CREAT of relative path inside $root",
.path = "newfile1", .how.flags = O_CREAT,
.how.mode = 0700,
.how.resolve = RESOLVE_IN_ROOT,
.out.path = "newfile1", .pass = true },
{ .name = "[in_root] O_CREAT of absolute path",
.path = "/newfile2", .how.flags = O_CREAT,
.how.mode = 0700,
.how.resolve = RESOLVE_IN_ROOT,
.out.path = "newfile2", .pass = true },
{ .name = "[in_root] O_CREAT of tricky symlink outside root",
.path = "/creatlink", .how.flags = O_CREAT,
.how.mode = 0700,
.how.resolve = RESOLVE_IN_ROOT,
.out.path = "newfile3", .pass = true },
/** RESOLVE_NO_XDEV **/
/* Crossing *down* into a mountpoint is disallowed. */
{ .name = "[no_xdev] cross into $mnt",
.path = "mnt", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
{ .name = "[no_xdev] cross into $mnt/",
.path = "mnt/", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
{ .name = "[no_xdev] cross into $mnt/.",
.path = "mnt/.", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
/* Crossing *up* out of a mountpoint is disallowed. */
{ .name = "[no_xdev] goto mountpoint root",
.dir = "mnt", .path = ".", .how.resolve = RESOLVE_NO_XDEV,
.out.path = "mnt", .pass = true },
{ .name = "[no_xdev] cross up through '..'",
.dir = "mnt", .path = "..", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
{ .name = "[no_xdev] temporary cross up through '..'",
.dir = "mnt", .path = "../mnt", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
{ .name = "[no_xdev] temporary relative symlink cross up",
.dir = "mnt", .path = "self", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
{ .name = "[no_xdev] temporary absolute symlink cross up",
.dir = "mnt", .path = "absself", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
/* Jumping to "/" is ok, but later components cannot cross. */
{ .name = "[no_xdev] jump to / directly",
.dir = "mnt", .path = "/", .how.resolve = RESOLVE_NO_XDEV,
.out.path = "/", .pass = true },
{ .name = "[no_xdev] jump to / (from /) directly",
.dir = "/", .path = "/", .how.resolve = RESOLVE_NO_XDEV,
.out.path = "/", .pass = true },
{ .name = "[no_xdev] jump to / then proc",
.path = "/proc/1", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
{ .name = "[no_xdev] jump to / then tmp",
.path = "/tmp", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
/* Magic-links are blocked since they can switch vfsmounts. */
{ .name = "[no_xdev] cross through magic-link to self/root",
.dir = "/proc", .path = "self/root", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
{ .name = "[no_xdev] cross through magic-link to self/cwd",
.dir = "/proc", .path = "self/cwd", .how.resolve = RESOLVE_NO_XDEV,
.out.err = -EXDEV, .pass = false },
/* Except magic-link jumps inside the same vfsmount. */
{ .name = "[no_xdev] jump through magic-link to same procfs",
.dir = "/proc", .path = hardcoded_fdpath, .how.resolve = RESOLVE_NO_XDEV,
.out.path = "/proc", .pass = true, },
/** RESOLVE_NO_MAGICLINKS **/
/* Regular symlinks should work. */
{ .name = "[no_magiclinks] ordinary relative symlink",
.path = "relsym", .how.resolve = RESOLVE_NO_MAGICLINKS,
.out.path = "etc/passwd", .pass = true },
/* Magic-links should not work. */
{ .name = "[no_magiclinks] symlink to magic-link",
.path = "procexe", .how.resolve = RESOLVE_NO_MAGICLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_magiclinks] normal path to magic-link",
.path = "/proc/self/exe", .how.resolve = RESOLVE_NO_MAGICLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_magiclinks] normal path to magic-link with O_NOFOLLOW",
.path = "/proc/self/exe", .how.flags = O_NOFOLLOW,
.how.resolve = RESOLVE_NO_MAGICLINKS,
.out.path = procselfexe, .pass = true },
{ .name = "[no_magiclinks] symlink to magic-link path component",
.path = "procroot/etc", .how.resolve = RESOLVE_NO_MAGICLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_magiclinks] magic-link path component",
.path = "/proc/self/root/etc", .how.resolve = RESOLVE_NO_MAGICLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_magiclinks] magic-link path component with O_NOFOLLOW",
.path = "/proc/self/root/etc", .how.flags = O_NOFOLLOW,
.how.resolve = RESOLVE_NO_MAGICLINKS,
.out.err = -ELOOP, .pass = false },
/** RESOLVE_NO_SYMLINKS **/
/* Normal paths should work. */
{ .name = "[no_symlinks] ordinary path to '.'",
.path = ".", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.path = NULL, .pass = true },
{ .name = "[no_symlinks] ordinary path to 'root'",
.path = "root", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.path = "root", .pass = true },
{ .name = "[no_symlinks] ordinary path to 'etc'",
.path = "etc", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.path = "etc", .pass = true },
{ .name = "[no_symlinks] ordinary path to 'etc/passwd'",
.path = "etc/passwd", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.path = "etc/passwd", .pass = true },
/* Regular symlinks are blocked. */
{ .name = "[no_symlinks] relative symlink target",
.path = "relsym", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_symlinks] relative symlink component",
.path = "reletc/passwd", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_symlinks] absolute symlink target",
.path = "abssym", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_symlinks] absolute symlink component",
.path = "absetc/passwd", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_symlinks] cheeky garbage link",
.path = "cheeky/garbagelink", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_symlinks] cheeky absolute + garbage link",
.path = "abscheeky/garbagelink", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_symlinks] cheeky absolute + absolute symlink",
.path = "abscheeky/absself", .how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
/* Trailing symlinks with NO_FOLLOW. */
{ .name = "[no_symlinks] relative symlink with O_NOFOLLOW",
.path = "relsym", .how.flags = O_NOFOLLOW,
.how.resolve = RESOLVE_NO_SYMLINKS,
.out.path = "relsym", .pass = true },
{ .name = "[no_symlinks] absolute symlink with O_NOFOLLOW",
.path = "abssym", .how.flags = O_NOFOLLOW,
.how.resolve = RESOLVE_NO_SYMLINKS,
.out.path = "abssym", .pass = true },
{ .name = "[no_symlinks] trailing symlink with O_NOFOLLOW",
.path = "cheeky/garbagelink", .how.flags = O_NOFOLLOW,
.how.resolve = RESOLVE_NO_SYMLINKS,
.out.path = "cheeky/garbagelink", .pass = true },
{ .name = "[no_symlinks] multiple symlink components with O_NOFOLLOW",
.path = "abscheeky/absself", .how.flags = O_NOFOLLOW,
.how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
{ .name = "[no_symlinks] multiple symlink (and garbage link) components with O_NOFOLLOW",
.path = "abscheeky/garbagelink", .how.flags = O_NOFOLLOW,
.how.resolve = RESOLVE_NO_SYMLINKS,
.out.err = -ELOOP, .pass = false },
};
BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_OPATH_TESTS);
for (int i = 0; i < ARRAY_LEN(tests); i++) {
int dfd, fd;
char *fdpath = NULL;
bool failed;
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
struct basic_test *test = &tests[i];
if (!openat2_supported) {
ksft_print_msg("openat2(2) unsupported\n");
resultfn = ksft_test_result_skip;
goto skip;
}
/* Auto-set O_PATH. */
if (!(test->how.flags & O_CREAT))
test->how.flags |= O_PATH;
if (test->dir)
dfd = openat(rootfd, test->dir, O_PATH | O_DIRECTORY);
else
dfd = dup(rootfd);
E_assert(dfd, "failed to openat root '%s': %m", test->dir);
E_dup2(dfd, hardcoded_fd);
fd = sys_openat2(dfd, test->path, &test->how);
if (test->pass)
failed = (fd < 0 || !fdequal(fd, rootfd, test->out.path));
else
failed = (fd != test->out.err);
if (fd >= 0) {
fdpath = fdreadlink(fd);
close(fd);
}
close(dfd);
if (failed) {
resultfn = ksft_test_result_fail;
ksft_print_msg("openat2 unexpectedly returned ");
if (fdpath)
ksft_print_msg("%d['%s']\n", fd, fdpath);
else
ksft_print_msg("%d (%s)\n", fd, strerror(-fd));
}
skip:
if (test->pass)
resultfn("%s gives path '%s'\n", test->name,
test->out.path ?: ".");
else
resultfn("%s fails with %d (%s)\n", test->name,
test->out.err, strerror(-test->out.err));
fflush(stdout);
free(fdpath);
}
free(procselfexe);
close(rootfd);
free(hardcoded_fdpath);
close(hardcoded_fd);
}
#define NUM_TESTS NUM_OPENAT2_OPATH_TESTS
int main(int argc, char **argv)
{
ksft_print_header();
ksft_set_plan(NUM_TESTS);
/* NOTE: We should be checking for CAP_SYS_ADMIN here... */
if (geteuid() != 0)
ksft_exit_skip("all tests require euid == 0\n");
test_openat2_opath_tests();
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
ksft_exit_fail();
else
ksft_exit_pass();
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment