• David Woodhouse's avatar
    KVM: x86/xen: Compatibility fixes for shared runstate area · 5ec3289b
    David Woodhouse authored
    The guest runstate area can be arbitrarily byte-aligned. In fact, even
    when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
    fields in the structure end up being unaligned due to the fact that the
    32-bit ABI only aligns them to 32 bits.
    
    So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
    is buggy, because if it's unaligned then we can't update the whole field
    atomically; the low bytes might be observable before the _UPDATE bit is.
    Xen actually updates the *byte* containing that top bit, on its own. KVM
    should do the same.
    
    In addition, we cannot assume that the runstate area fits within a single
    page. One option might be to make the gfn_to_pfn cache cope with regions
    that cross a page — but getting a contiguous virtual kernel mapping of a
    discontiguous set of IOMEM pages is a distinctly non-trivial exercise,
    and it seems this is the *only* current use case for the GPC which would
    benefit from it.
    
    An earlier version of the runstate code did use a gfn_to_hva cache for
    this purpose, but it still had the single-page restriction because it
    used the uhva directly — because it needs to be able to do so atomically
    when the vCPU is being scheduled out, so it used pagefault_disable()
    around the accesses and didn't just use kvm_write_guest_cached() which
    has a fallback path.
    
    So... use a pair of GPCs for the first and potential second page covering
    the runstate area. We can get away with locking both at once because
    nothing else takes more than one GPC lock at a time so we can invent
    a trivial ordering rule.
    
    The common case where it's all in the same page is kept as a fast path,
    but in both cases, the actual guest structure (compat or not) is built
    up from the fields in @vx, following preset pointers to the state and
    times fields. The only difference is whether those pointers point to
    the kernel stack (in the split case) or to guest memory directly via
    the GPC.  The fast path is also fixed to use a byte access for the
    XEN_RUNSTATE_UPDATE bit, then the only real difference is the dual
    memcpy.
    
    Finally, Xen also does write the runstate area immediately when it's
    configured. Flip the kvm_xen_update_runstate() and …_guest() functions
    and call the latter directly when the runstate area is set. This means
    that other ioctls which modify the runstate also write it immediately
    to the guest when they do so, which is also intended.
    
    Update the xen_shinfo_test to exercise the pathological case where the
    XEN_RUNSTATE_UPDATE flag in the top byte of the state_entry_time is
    actually in a different page to the rest of the 64-bit word.
    Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    5ec3289b
xen_shinfo_test.c 26.9 KB