Commit cefdca0a authored by Peter Xu's avatar Peter Xu Committed by Linus Torvalds

userfaultfd/sysctl: add vm.unprivileged_userfaultfd

Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available.  By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit.  While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.

We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.

Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users.  When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.

Andrea said:

: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd.  In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.

[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
Suggested-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
Suggested-by: default avatarMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent f0fd5050
...@@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm: ...@@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm:
- stat_refresh - stat_refresh
- numa_stat - numa_stat
- swappiness - swappiness
- unprivileged_userfaultfd
- user_reserve_kbytes - user_reserve_kbytes
- vfs_cache_pressure - vfs_cache_pressure
- watermark_boost_factor - watermark_boost_factor
...@@ -818,6 +819,17 @@ The default value is 60. ...@@ -818,6 +819,17 @@ The default value is 60.
============================================================== ==============================================================
unprivileged_userfaultfd
This flag controls whether unprivileged users can use the userfaultfd
system calls. Set this to 1 to allow unprivileged users to use the
userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
privileged users (with SYS_CAP_PTRACE capability).
The default value is 1.
==============================================================
- user_reserve_kbytes - user_reserve_kbytes
When overcommit_memory is set to 2, "never overcommit" mode, reserve When overcommit_memory is set to 2, "never overcommit" mode, reserve
......
...@@ -30,6 +30,8 @@ ...@@ -30,6 +30,8 @@
#include <linux/security.h> #include <linux/security.h>
#include <linux/hugetlb.h> #include <linux/hugetlb.h>
int sysctl_unprivileged_userfaultfd __read_mostly = 1;
static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
enum userfaultfd_state { enum userfaultfd_state {
...@@ -1930,6 +1932,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) ...@@ -1930,6 +1932,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx; struct userfaultfd_ctx *ctx;
int fd; int fd;
if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
return -EPERM;
BUG_ON(!current->mm); BUG_ON(!current->mm);
/* Check the UFFD_* constants for consistency. */ /* Check the UFFD_* constants for consistency. */
......
...@@ -28,6 +28,8 @@ ...@@ -28,6 +28,8 @@
#define UFFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) #define UFFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
#define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS) #define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS)
extern int sysctl_unprivileged_userfaultfd;
extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason); extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason);
extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
......
...@@ -66,6 +66,7 @@ ...@@ -66,6 +66,7 @@
#include <linux/kexec.h> #include <linux/kexec.h>
#include <linux/bpf.h> #include <linux/bpf.h>
#include <linux/mount.h> #include <linux/mount.h>
#include <linux/userfaultfd_k.h>
#include "../lib/kstrtox.h" #include "../lib/kstrtox.h"
...@@ -1719,6 +1720,17 @@ static struct ctl_table vm_table[] = { ...@@ -1719,6 +1720,17 @@ static struct ctl_table vm_table[] = {
.extra1 = (void *)&mmap_rnd_compat_bits_min, .extra1 = (void *)&mmap_rnd_compat_bits_min,
.extra2 = (void *)&mmap_rnd_compat_bits_max, .extra2 = (void *)&mmap_rnd_compat_bits_max,
}, },
#endif
#ifdef CONFIG_USERFAULTFD
{
.procname = "unprivileged_userfaultfd",
.data = &sysctl_unprivileged_userfaultfd,
.maxlen = sizeof(sysctl_unprivileged_userfaultfd),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &zero,
.extra2 = &one,
},
#endif #endif
{ } { }
}; };
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment