• Linus Torvalds's avatar
    iov_iter: get rid of 'copy_mc' flag · a50026bd
    Linus Torvalds authored
    This flag is only set by one single user: the magical core dumping code
    that looks up user pages one by one, and then writes them out using
    their kernel addresses (by using a BVEC_ITER).
    
    That actually ends up being a huge problem, because while we do use
    copy_mc_to_kernel() for this case and it is able to handle the possible
    machine checks involved, nothing else is really ready to handle the
    failures caused by the machine check.
    
    In particular, as reported by Tong Tiangen, we don't actually support
    fault_in_iov_iter_readable() on a machine check area.
    
    As a result, the usual logic for writing things to a file under a
    filesystem lock, which involves doing a copy with page faults disabled
    and then if that fails trying to fault pages in without holding the
    locks with fault_in_iov_iter_readable() does not work at all.
    
    We could decide to always just make the MC copy "succeed" (and filling
    the destination with zeroes), and that would then create a core dump
    file that just ignores any machine checks.
    
    But honestly, this single special case has been problematic before, and
    means that all the normal iov_iter code ends up slightly more complex
    and slower.
    
    See for example commit c9eec08b ("iov_iter: Don't deal with
    iter->copy_mc in memcpy_from_iter_mc()") where David Howells
    re-organized the code just to avoid having to check the 'copy_mc' flags
    inside the inner iov_iter loops.
    
    So considering that we have exactly one user, and that one user is a
    non-critical special case that doesn't actually ever trigger in real
    life (Tong found this with manual error injection), the sane solution is
    to just decide that the onus on handling the machine check lines on that
    user instead.
    
    Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying
    the user data to a stable kernel page before writing it out.
    
    Fixes: f1982740 ("iov_iter: Convert iterate*() to inline funcs")
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarTong Tiangen <tongtiangen@huawei.com>
    Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com
    Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/Tested-by: default avatarDavid Howells <dhowells@redhat.com>
    Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
    Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
    Reported-by: default avatarTong Tiangen <tongtiangen@huawei.com>
    Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
    a50026bd
coredump.c 30.4 KB