• David Hildenbrand's avatar
    s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests · 06201e00
    David Hildenbrand authored
    commit fa41ba0d ("s390/mm: avoid empty zero pages for KVM guests to
    avoid postcopy hangs") introduced an undesired side effect when combined
    with memory ballooning and VM migration: memory part of the inflated
    memory balloon will consume memory.
    
    Assuming we have a 100GiB VM and inflated the balloon to 40GiB. Our VM
    will consume ~60GiB of memory. If we now trigger a VM migration,
    hypervisors like QEMU will read all VM memory. As s390x does not support
    the shared zeropage, we'll end up allocating for all previously-inflated
    memory part of the memory balloon: 50 GiB. So we might easily
    (unexpectedly) crash the VM on the migration source.
    
    Even worse, hypervisors like QEMU optimize for zeropage migration to not
    consume memory on the migration destination: when migrating a
    "page full of zeroes", on the migration destination they check whether the
    target memory is already zero (by reading the destination memory) and avoid
    writing to the memory to not allocate memory: however, s390x will also
    allocate memory here, implying that also on the migration destination, we
    will end up allocating all previously-inflated memory part of the memory
    balloon.
    
    This is especially bad if actual memory overcommit was not desired, when
    memory ballooning is used for dynamic VM memory resizing, setting aside
    some memory during boot that can be added later on demand. Alternatives
    like virtio-mem that would avoid this issue are not yet available on
    s390x.
    
    There could be ways to optimize some cases in user space: before reading
    memory in an anonymous private mapping on the migration source, check via
    /proc/self/pagemap if anything is already populated. Similarly check on
    the migration destination before reading. While that would avoid
    populating tables full of shared zeropages on all architectures, it's
    harder to get right and performant, and requires user space changes.
    
    Further, with posctopy live migration we must place a page, so there,
    "avoid touching memory to avoid allocating memory" is not really
    possible. (Note that a previously we would have falsely inserted
    shared zeropages into processes using UFFDIO_ZEROPAGE where
    mm_forbids_zeropage() would have actually forbidden it)
    
    PV is currently incompatible with memory ballooning, and in the common
    case, KVM guests don't make use of storage keys. Instead of zapping
    zeropages when enabling storage keys / PV, that turned out to be
    problematic in the past, let's do exactly the same we do with KSM pages:
    trigger unsharing faults to replace the shared zeropages by proper
    anonymous folios.
    
    What about added latency when enabling storage kes? Having a lot of
    zeropages in applicable environments (PV, legacy guests, unittests) is
    unexpected. Further, KSM could today already unshare the zeropages
    and unmerging KSM pages when enabling storage kets would unshare the
    KSM-placed zeropages in the same way, resulting in the same latency.
    
    [ agordeev: Fixed sparse and checkpatch complaints and error handling ]
    Reviewed-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
    Tested-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
    Fixes: fa41ba0d ("s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangs")
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20240411161441.910170-3-david@redhat.comSigned-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
    06201e00
mmu_context.h 3.41 KB