• Andrea Arcangeli's avatar
    userfaultfd: linux/Documentation/vm/userfaultfd.txt · 25edd8bf
    Andrea Arcangeli authored
    This is the latest userfaultfd patchset.  The postcopy live migration
    feature on the qemu side is mostly ready to be merged and it entirely
    depends on the userfaultfd syscall to be merged as well.  So it'd be great
    if this patchset could be reviewed for merging in -mm.
    
    Userfaults allow to implement on demand paging from userland and more
    generally they allow userland to more efficiently take control of the
    behavior of page faults than what was available before (PROT_NONE +
    SIGSEGV trap).
    
    The use cases are:
    
    1) KVM postcopy live migration (one form of cloud memory
       externalization).
    
       KVM postcopy live migration is the primary driver of this work:
    
        http://blog.zhaw.ch/icclab/setting-up-post-copy-live-migration-in-openstack/
        http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04873.html
    
    2) postcopy live migration of binaries inside linux containers:
    
        http://thread.gmane.org/gmane.linux.kernel.mm/132662
    
    3) KVM postcopy live snapshotting (allowing to limit/throttle the
       memory usage, unlike fork would, plus the avoidance of fork
       overhead in the first place).
    
       While the wrprotect tracking is not implemented yet, the syscall API is
       already contemplating the wrprotect fault tracking and it's generic enough
       to allow its later implementation in a backwards compatible fashion.
    
    4) KVM userfaults on shared memory. The UFFDIO_COPY lowlevel method
       should be extended to work also on tmpfs and then the
       uffdio_register.ioctls will notify userland that UFFDIO_COPY is
       available even when the registered virtual memory range is tmpfs
       backed.
    
    5) alternate mechanism to notify web browsers or apps on embedded
       devices that volatile pages have been reclaimed. This basically
       avoids the need to run a syscall before the app can access with the
       CPU the virtual regions marked volatile. This depends on point 4)
       to be fulfilled first, as volatile pages happily apply to tmpfs.
    
    Even though there wasn't a real use case requesting it yet, it also
    allows to implement distributed shared memory in a way that readonly
    shared mappings can exist simultaneously in different hosts and they
    can be become exclusive at the first wrprotect fault.
    
    This patch (of 22):
    
    Add documentation.
    Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
    Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
    Cc: zhang.zhanghailiang@huawei.com
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Andres Lagar-Cavilla <andreslc@google.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Peter Feiner <pfeiner@google.com>
    Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    25edd8bf
userfaultfd.txt 6.85 KB