• Dave Hansen's avatar
    x86/pkeys: Allocation/free syscalls · e8c24d3a
    Dave Hansen authored
    This patch adds two new system calls:
    
    	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
    	int pkey_free(int pkey);
    
    These implement an "allocator" for the protection keys
    themselves, which can be thought of as analogous to the allocator
    that the kernel has for file descriptors.  The kernel tracks
    which numbers are in use, and only allows operations on keys that
    are valid.  A key which was not obtained by pkey_alloc() may not,
    for instance, be passed to pkey_mprotect().
    
    These system calls are also very important given the kernel's use
    of pkeys to implement execute-only support.  These help ensure
    that userspace can never assume that it has control of a key
    unless it first asks the kernel.  The kernel does not promise to
    preserve PKRU (right register) contents except for allocated
    pkeys.
    
    The 'init_access_rights' argument to pkey_alloc() specifies the
    rights that will be established for the returned pkey.  For
    instance:
    
    	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);
    
    will allocate 'pkey', but also sets the bits in PKRU[1] such that
    writing to 'pkey' is already denied.
    
    The kernel does not prevent pkey_free() from successfully freeing
    in-use pkeys (those still assigned to a memory range by
    pkey_mprotect()).  It would be expensive to implement the checks
    for this, so we instead say, "Just don't do it" since sane
    software will never do it anyway.
    
    Any piece of userspace calling pkey_alloc() needs to be prepared
    for it to fail.  Why?  pkey_alloc() returns the same error code
    (ENOSPC) when there are no pkeys and when pkeys are unsupported.
    They can be unsupported for a whole host of reasons, so apps must
    be prepared for this.  Also, libraries or LD_PRELOADs might steal
    keys before an application gets access to them.
    
    This allocation mechanism could be implemented in userspace.
    Even if we did it in userspace, we would still need additional
    user/kernel interfaces to tell userspace which keys are being
    used by the kernel internally (such as for execute-only
    mappings).  Having the kernel provide this facility completely
    removes the need for these additional interfaces, or having an
    implementation of this in userspace at all.
    
    Note that we have to make changes to all of the architectures
    that do not use mman-common.h because we use the new
    PKEY_DENY_ACCESS/WRITE macros in arch-independent code.
    
    1. PKRU is the Protection Key Rights User register.  It is a
       usermode-accessible register that controls whether writes
       and/or access to each individual pkey is allowed or denied.
    Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: linux-arch@vger.kernel.org
    Cc: Dave Hansen <dave@sr71.net>
    Cc: arnd@arndb.de
    Cc: linux-api@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: luto@kernel.org
    Cc: akpm@linux-foundation.org
    Cc: torvalds@linux-foundation.org
    Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    e8c24d3a
mman.h 3.5 KB