• John Hubbard's avatar
    mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison · 191fcdb6
    John Hubbard authored
    The following crash happens for me when running the -mm selftests (below).
    Specifically, it happens while running the uffd-stress subtests:
    
    kernel BUG at mm/hugetlb.c:7249!
    invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109
    Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
    RIP: 0010:huge_pte_alloc+0x12c/0x1a0
    ...
    Call Trace:
     <TASK>
     ? __die_body+0x63/0xb0
     ? die+0x9f/0xc0
     ? do_trap+0xab/0x180
     ? huge_pte_alloc+0x12c/0x1a0
     ? do_error_trap+0xc6/0x110
     ? huge_pte_alloc+0x12c/0x1a0
     ? handle_invalid_op+0x2c/0x40
     ? huge_pte_alloc+0x12c/0x1a0
     ? exc_invalid_op+0x33/0x50
     ? asm_exc_invalid_op+0x16/0x20
     ? __pfx_put_prev_task_idle+0x10/0x10
     ? huge_pte_alloc+0x12c/0x1a0
     hugetlb_fault+0x1a3/0x1120
     ? finish_task_switch+0xb3/0x2a0
     ? lock_is_held_type+0xdb/0x150
     handle_mm_fault+0xb8a/0xd40
     ? find_vma+0x5d/0xa0
     do_user_addr_fault+0x257/0x5d0
     exc_page_fault+0x7b/0x1f0
     asm_exc_page_fault+0x22/0x30
    
    That happens because a BUG() statement in huge_pte_alloc() attempts to
    check that a pte, if present, is a hugetlb pte, but it does so in a
    non-lockless-safe manner that leads to a false BUG() report.
    
    We got here due to a couple of bugs, each of which by itself was not quite
    enough to cause a problem:
    
    First of all, before commit c33c7948("mm: ptep_get() conversion"), the
    BUG() statement in huge_pte_alloc() was itself fragile: it relied upon
    compiler behavior to only read the pte once, despite using it twice in the
    same conditional.
    
    Next, commit c33c7948 ("mm: ptep_get() conversion") broke that
    delicate situation, by causing all direct pte reads to be done via
    READ_ONCE().  And so READ_ONCE() got called twice within the same BUG()
    conditional, leading to comparing (potentially, occasionally) different
    versions of the pte, and thus to false BUG() reports.
    
    Fix this by taking a single snapshot of the pte before using it in the
    BUG conditional.
    
    Now, that commit is only partially to blame here but, people doing
    bisections will invariably land there, so this will help them find a fix
    for a real crash.  And also, the previous behavior was unlikely to ever
    expose this bug--it was fragile, yet not actually broken.
    
    So that's why I chose this commit for the Fixes tag, rather than the
    commit that created the original BUG() statement.
    
    Link: https://lkml.kernel.org/r/20230701010442.2041858-1-jhubbard@nvidia.com
    Fixes: c33c7948 ("mm: ptep_get() conversion")
    Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
    Acked-by: default avatarJames Houghton <jthoughton@google.com>
    Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
    Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
    Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Dave Airlie <airlied@gmail.com>
    Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    191fcdb6
hugetlb.c 211 KB