Commit d0d4730a authored by Lokesh Gidra's avatar Lokesh Gidra Committed by Linus Torvalds

userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob

With this change, when the knob is set to 0, it allows unprivileged users
to call userfaultfd, like when it is set to 1, but with the restriction
that page faults from only user-mode can be handled.  In this mode, an
unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

This enables administrators to reduce the likelihood that an attacker with
access to userfaultfd can delay faulting kernel code to widen timing
windows for other exploits.

The default value of this knob is changed to 0.  This is required for
correct functioning of pipe mutex.  However, this will fail postcopy live
migration, which will be unnoticeable to the VM guests.  To avoid this,
set 'vm.userfault = 1' in /sys/sysctl.conf.

The main reason this change is desirable as in the short term is that the
Android userland will behave as with the sysctl set to zero.  So without
this commit, any Linux binary using userfaultfd to manage its memory would
behave differently if run within the Android userland.  For more details,
refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/

Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.comSigned-off-by: default avatarLokesh Gidra <lokeshgidra@google.com>
Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: <calin@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 37cd0575
...@@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone. ...@@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone.
unprivileged_userfaultfd unprivileged_userfaultfd
======================== ========================
This flag controls whether unprivileged users can use the userfaultfd This flag controls the mode in which unprivileged users can use the
system calls. Set this to 1 to allow unprivileged users to use the userfaultfd system calls. Set this to 0 to restrict unprivileged users
userfaultfd system calls, or set this to 0 to restrict userfaultfd to only to handle page faults in user mode only. In this case, users without
privileged users (with SYS_CAP_PTRACE capability). SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
succeed. Prohibiting use of userfaultfd for handling faults from kernel
mode may make certain vulnerabilities more difficult to exploit.
The default value is 1. Set this to 1 to allow unprivileged users to use the userfaultfd system
calls without any restrictions.
The default value is 0.
user_reserve_kbytes user_reserve_kbytes
......
...@@ -28,7 +28,7 @@ ...@@ -28,7 +28,7 @@
#include <linux/security.h> #include <linux/security.h>
#include <linux/hugetlb.h> #include <linux/hugetlb.h>
int sysctl_unprivileged_userfaultfd __read_mostly = 1; int sysctl_unprivileged_userfaultfd __read_mostly;
static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
...@@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) ...@@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx; struct userfaultfd_ctx *ctx;
int fd; int fd;
if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) if (!sysctl_unprivileged_userfaultfd &&
(flags & UFFD_USER_MODE_ONLY) == 0 &&
!capable(CAP_SYS_PTRACE)) {
printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
"sysctl knob to 1 if kernel faults must be handled "
"without obtaining CAP_SYS_PTRACE capability\n");
return -EPERM; return -EPERM;
}
BUG_ON(!current->mm); BUG_ON(!current->mm);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment