Commit b8da5cd4 authored by Axel Rasmussen's avatar Axel Rasmussen Committed by Linus Torvalds

userfaultfd: update documentation to describe minor fault handling

Reword / reorganize things a little bit into "lists", so new features /
modes / ioctls can sort of just be appended.

Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to
intercept and resolve minor faults.  Make it clear that COPY and ZEROPAGE
are used for MISSING faults, whereas CONTINUE is used for MINOR faults.

Link: https://lkml.kernel.org/r/20210301222728.176417-6-axelrasmussen@google.comSigned-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent f6191471
...@@ -63,36 +63,36 @@ the generic ioctl available. ...@@ -63,36 +63,36 @@ the generic ioctl available.
The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
defines what memory types are supported by the ``userfaultfd`` and what defines what memory types are supported by the ``userfaultfd`` and what
events, except page fault notifications, may be generated. events, except page fault notifications, may be generated:
If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs - The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in other than page faults are supported. These events are described in more
``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be detail below in the `Non-cooperative userfaultfd`_ section.
set if the kernel supports registering ``userfaultfd`` ranges on shared
memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, - ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
``MAP_SHARED``, ``memfd_create``, etc). indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
registrations for hugetlbfs and shared memory (covering all shmem APIs,
The userland application that wants to use ``userfaultfd`` with hugetlbfs i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
or shared memory need to set the corresponding flag in etc) virtual memory areas, respectively.
``uffdio_api.features`` to enable those features.
- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
If the userland desires to receive notifications for events other than ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
page faults, it has to verify that ``uffdio_api.features`` has appropriate areas.
``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
detail below in `Non-cooperative userfaultfd`_ section. The userland application should set the feature flags it intends to use
when invoking the ``UFFDIO_API`` ioctl, to request that those features be
Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should enabled if supported.
be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
register a memory range in the ``userfaultfd`` by setting the Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
bitmask) to register a memory range in the ``userfaultfd`` by setting the
uffdio_register structure accordingly. The ``uffdio_register.mode`` uffdio_register structure accordingly. The ``uffdio_register.mode``
bitmask will specify to the kernel which kind of faults to track for bitmask will specify to the kernel which kind of faults to track for
the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing the range. The ``UFFDIO_REGISTER`` ioctl will return the
pages). The ``UFFDIO_REGISTER`` ioctl will return the
``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
userfaults on the range registered. Not all ioctls will necessarily be userfaults on the range registered. Not all ioctls will necessarily be
supported for all memory types depending on the underlying virtual supported for all memory types (e.g. anonymous memory vs. shmem vs.
memory backend (anonymous memory vs tmpfs vs real filebacked hugetlbfs), or all types of intercepted faults.
mappings).
Userland can use the ``uffdio_register.ioctls`` to manage the virtual Userland can use the ``uffdio_register.ioctls`` to manage the virtual
address space in the background (to add or potentially also remove address space in the background (to add or potentially also remove
...@@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This means a userfault ...@@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This means a userfault
could be triggering just before userland maps in the background the could be triggering just before userland maps in the background the
user-faulted page. user-faulted page.
The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That Resolving Userfaults
atomically copies a page into the userfault registered range and wakes --------------------
up the blocked userfaults
(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set). There are three basic ways to resolve userfaults:
Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
guaranteeing that nothing can see an half copied page since it'll - ``UFFDIO_COPY`` atomically copies some existing page contents from
keep userfaulting until the copy has finished. userspace.
- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.
- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.
These operations are atomic in the sense that they guarantee nothing can
see a half-populated page, since readers will keep userfaulting until the
operation has finished.
By default, these wake up userfaults blocked on the range in question.
They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
that waking will be done separately at some later time.
Which ioctl to choose depends on the kind of page fault, and what we'd
like to do to resolve it:
- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be
resolved by either providing a new page (``UFFDIO_COPY``), or mapping
the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map
the zero page for a missing fault. With userfaultfd, userspace can
decide what content to provide before the faulting thread continues.
- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in
the page cache). Userspace has the option of modifying the page's
contents before resolving the fault. Once the contents are correct
(modified or not), userspace asks the kernel to map the page and let the
faulting thread continue with ``UFFDIO_CONTINUE``.
Notes: Notes:
- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then - You can tell which kind of fault occurred by examining
you must provide some kind of page in your thread after reading from ``pagefault.flags`` within the ``uffd_msg``, checking for the
the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``. ``UFFD_PAGEFAULT_FLAG_*`` flags.
The normal behavior of the OS automatically providing a zero page on
an anonymous mmaping is not in place.
- None of the page-delivering ioctls default to the range that you - None of the page-delivering ioctls default to the range that you
registered with. You must fill in all fields for the appropriate registered with. You must fill in all fields for the appropriate
...@@ -122,9 +147,9 @@ Notes: ...@@ -122,9 +147,9 @@ Notes:
- You get the address of the access that triggered the missing page - You get the address of the access that triggered the missing page
event out of a struct uffd_msg that you read in the thread from the event out of a struct uffd_msg that you read in the thread from the
uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or uffd. You can supply as many pages as you want with these IOCTLs.
``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then Keep in mind that unless you used DONTWAKE then the first of any of
the first of any of those IOCTLs wakes up the faulting thread. those IOCTLs wakes up the faulting thread.
- Be sure to test for all errors including - Be sure to test for all errors including
(``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment