• Axel Rasmussen's avatar
    mm: userfaultfd: add new UFFDIO_POISON ioctl · fc71884a
    Axel Rasmussen authored
    The basic idea here is to "simulate" memory poisoning for VMs.  A VM
    running on some host might encounter a memory error, after which some
    page(s) are poisoned (i.e., future accesses SIGBUS).  They expect that
    once poisoned, pages can never become "un-poisoned".  So, when we live
    migrate the VM, we need to preserve the poisoned status of these pages.
    
    When live migrating, we try to get the guest running on its new host as
    quickly as possible.  So, we start it running before all memory has been
    copied, and before we're certain which pages should be poisoned or not.
    
    So the basic way to use this new feature is:
    
    - On the new host, the guest's memory is registered with userfaultfd, in
      either MISSING or MINOR mode (doesn't really matter for this purpose).
    - On any first access, we get a userfaultfd event. At this point we can
      communicate with the old host to find out if the page was poisoned.
    - If so, we can respond with a UFFDIO_POISON - this places a swap marker
      so any future accesses will SIGBUS. Because the pte is now "present",
      future accesses won't generate more userfaultfd events, they'll just
      SIGBUS directly.
    
    UFFDIO_POISON does not handle unmapping previously-present PTEs.  This
    isn't needed, because during live migration we want to intercept all
    accesses with userfaultfd (not just writes, so WP mode isn't useful for
    this).  So whether minor or missing mode is being used (or both), the PTE
    won't be present in any case, so handling that case isn't needed.
    
    Similarly, UFFDIO_POISON won't replace existing PTE markers.  This might
    be okay to do, but it seems to be safer to just refuse to overwrite any
    existing entry (like a UFFD_WP PTE marker).
    
    Link: https://lkml.kernel.org/r/20230707215540.2324998-5-axelrasmussen@google.comSigned-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
    Acked-by: default avatarPeter Xu <peterx@redhat.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Brian Geffon <bgeffon@google.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
    Cc: Jiaqi Yan <jiaqiyan@google.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nadav Amit <namit@vmware.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Suleiman Souhlal <suleiman@google.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: T.J. Alumbaugh <talumbau@google.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: ZhangPeng <zhangpeng362@huawei.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    fc71884a
userfaultfd.c 58.5 KB