• David Hildenbrand's avatar
    kernel/resource: disallow access to exclusive system RAM regions · a9e7b8d4
    David Hildenbrand authored
    virtio-mem dynamically exposes memory inside a device memory region as
    system RAM to Linux, coordinating with the hypervisor which parts are
    actually "plugged" and consequently usable/accessible.
    
    On the one hand, the virtio-mem driver adds/removes whole memory blocks,
    creating/removing busy IORESOURCE_SYSTEM_RAM resources, on the other
    hand, it logically (un)plugs memory inside added memory blocks,
    dynamically either exposing them to the buddy or hiding them from the
    buddy and marking them PG_offline.
    
    In contrast to physical devices, like a DIMM, the virtio-mem driver is
    required to actually make use of any of the device-provided memory,
    because it performs the handshake with the hypervisor.  virtio-mem
    memory cannot simply be access via /dev/mem without a driver.
    
    There is no safe way to:
    a) Access plugged memory blocks via /dev/mem, as they might contain
       unplugged holes or might get silently unplugged by the virtio-mem
       driver and consequently turned inaccessible.
    b) Access unplugged memory blocks via /dev/mem because the virtio-mem
       driver is required to make them actually accessible first.
    
    The virtio-spec states that unplugged memory blocks MUST NOT be written,
    and only selected unplugged memory blocks MAY be read.  We want to make
    sure, this is the case in sane environments -- where the virtio-mem driver
    was loaded.
    
    We want to make sure that in a sane environment, nobody "accidentially"
    accesses unplugged memory inside the device managed region.  For example,
    a user might spot a memory region in /proc/iomem and try accessing it via
    /dev/mem via gdb or dumping it via something else.  By the time the mmap()
    happens, the memory might already have been removed by the virtio-mem
    driver silently: the mmap() would succeeed and user space might
    accidentially access unplugged memory.
    
    So once the driver was loaded and detected the device along the
    device-managed region, we just want to disallow any access via /dev/mem to
    it.
    
    In an ideal world, we would mark the whole region as busy ("owned by a
    driver") and exclude it; however, that would be wrong, as we don't really
    have actual system RAM at these ranges added to Linux ("busy system RAM").
    Instead, we want to mark such ranges as "not actual busy system RAM but
    still soft-reserved and prepared by a driver for future use."
    
    Let's teach iomem_is_exclusive() to reject access to any range with
    "IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE", even if not busy and even
    if "iomem=relaxed" is set.  Introduce EXCLUSIVE_SYSTEM_RAM to make it
    easier for applicable drivers to depend on this setting in their Kconfig.
    
    For now, there are no applicable ranges and we'll modify virtio-mem next
    to properly set IORESOURCE_EXCLUSIVE on the parent resource container it
    creates to contain all actual busy system RAM added via
    add_memory_driver_managed().
    
    Link: https://lkml.kernel.org/r/20210920142856.17758-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
    Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Hanjun Guo <guohanjun@huawei.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Cc: "Michael S. Tsirkin" <mst@redhat.com>
    Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a9e7b8d4
Kconfig 28.4 KB