• Chao Peng's avatar
    KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace · 16f95f3b
    Chao Peng authored
    Add a new KVM exit type to allow userspace to handle memory faults that
    KVM cannot resolve, but that userspace *may* be able to handle (without
    terminating the guest).
    
    KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit
    conversions between private and shared memory.  With guest private memory,
    there will be two kind of memory conversions:
    
      - explicit conversion: happens when the guest explicitly calls into KVM
        to map a range (as private or shared)
    
      - implicit conversion: happens when the guest attempts to access a gfn
        that is configured in the "wrong" state (private vs. shared)
    
    On x86 (first architecture to support guest private memory), explicit
    conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE,
    but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable
    as there is (obviously) no hypercall, and there is no guarantee that the
    guest actually intends to convert between private and shared, i.e. what
    KVM thinks is an implicit conversion "request" could actually be the
    result of a guest code bug.
    
    KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to
    be implicit conversions.
    
    Note!  To allow for future possibilities where KVM reports
    KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved
    fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's
    perspective), not '0'!  Due to historical baggage within KVM, exiting to
    userspace with '0' from deep callstacks, e.g. in emulation paths, is
    infeasible as doing so would require a near-complete overhaul of KVM,
    whereas KVM already propagates -errno return codes to userspace even when
    the -errno originated in a low level helper.
    
    Report the gpa+size instead of a single gfn even though the initial usage
    is expected to always report single pages.  It's entirely possible, likely
    even, that KVM will someday support sub-page granularity faults, e.g.
    Intel's sub-page protection feature allows for additional protections at
    128-byte granularity.
    
    Link: https://lore.kernel.org/all/20230908222905.1321305-5-amoorthy@google.com
    Link: https://lore.kernel.org/all/ZQ3AmLO2SYv3DszH@google.com
    Cc: Anish Moorthy <amoorthy@google.com>
    Cc: David Matlack <dmatlack@google.com>
    Suggested-by: default avatarSean Christopherson <seanjc@google.com>
    Co-developed-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
    Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    Message-Id: <20231027182217.3615211-10-seanjc@google.com>
    Reviewed-by: default avatarFuad Tabba <tabba@google.com>
    Tested-by: default avatarFuad Tabba <tabba@google.com>
    Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    16f95f3b
x86.c 368 KB