• Carlos Llamas's avatar
    mm/mmap: undo ->mmap() when arch_validate_flags() fails · deb0f656
    Carlos Llamas authored
    Commit c462ac28 ("mm: Introduce arch_validate_flags()") added a late
    check in mmap_region() to let architectures validate vm_flags.  The check
    needs to happen after calling ->mmap() as the flags can potentially be
    modified during this callback.
    
    If arch_validate_flags() check fails we unmap and free the vma.  However,
    the error path fails to undo the ->mmap() call that previously succeeded
    and depending on the specific ->mmap() implementation this translates to
    reference increments, memory allocations and other operations what will
    not be cleaned up.
    
    There are several places (mainly device drivers) where this is an issue.
    However, one specific example is bpf_map_mmap() which keeps count of the
    mappings in map->writecnt.  The count is incremented on ->mmap() and then
    decremented on vm_ops->close().  When arch_validate_flags() fails this
    count is off since bpf_map_mmap_close() is never called.
    
    One can reproduce this issue in arm64 devices with MTE support.  Here the
    vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set
    previously.  From userspace then is enough to pass the PROT_MTE flag to
    mmap() syscall to trigger the arch_validate_flags() failure.
    
    The following program reproduces this issue:
    
      #include <stdio.h>
      #include <unistd.h>
      #include <linux/unistd.h>
      #include <linux/bpf.h>
      #include <sys/mman.h>
    
      int main(void)
      {
    	union bpf_attr attr = {
    		.map_type = BPF_MAP_TYPE_ARRAY,
    		.key_size = sizeof(int),
    		.value_size = sizeof(long long),
    		.max_entries = 256,
    		.map_flags = BPF_F_MMAPABLE,
    	};
    	int fd;
    
    	fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
    	mmap(NULL, 4096, PROT_WRITE | PROT_MTE, MAP_SHARED, fd, 0);
    
    	return 0;
      }
    
    By manually adding some log statements to the vm_ops callbacks we can
    confirm that when passing PROT_MTE to mmap() the map->writecnt is off upon
    ->release():
    
    With PROT_MTE flag:
      root@debian:~# ./bpf-test
      [  111.263874] bpf_map_write_active_inc: map=9 writecnt=1
      [  111.288763] bpf_map_release: map=9 writecnt=1
    
    Without PROT_MTE flag:
      root@debian:~# ./bpf-test
      [  157.816912] bpf_map_write_active_inc: map=10 writecnt=1
      [  157.830442] bpf_map_write_active_dec: map=10 writecnt=0
      [  157.832396] bpf_map_release: map=10 writecnt=0
    
    This patch fixes the above issue by calling vm_ops->close() when the
    arch_validate_flags() check fails, after this we can proceed to unmap and
    free the vma on the error path.
    
    Link: https://lkml.kernel.org/r/20220930003844.1210987-1-cmllamas@google.com
    Fixes: c462ac28 ("mm: Introduce arch_validate_flags()")
    Signed-off-by: default avatarCarlos Llamas <cmllamas@google.com>
    Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Reviewed-by: default avatarLiam Howlett <liam.howlett@oracle.com>
    Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: <stable@vger.kernel.org>	[5.10+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    deb0f656
mmap.c 100 KB