• Linus Torvalds's avatar
    fs: fd tables have to be multiples of BITS_PER_LONG · 1c24a186
    Linus Torvalds authored
    This has always been the rule: fdtables have several bitmaps in them,
    and as a result they have to be sized properly for bitmaps.  We walk
    those bitmaps in chunks of 'unsigned long' in serveral cases, but even
    when we don't, we use the regular kernel bitops that are defined to work
    on arrays of 'unsigned long', not on some byte array.
    
    Now, the distinction between arrays of bytes and 'unsigned long'
    normally only really ends up being noticeable on big-endian systems, but
    Fedor Pchelkin and Alexey Khoroshilov reported that copy_fd_bitmaps()
    could be called with an argument that wasn't even a multiple of
    BITS_PER_BYTE.  And then it fails to do the proper copy even on
    little-endian machines.
    
    The bug wasn't in copy_fd_bitmap(), but in sane_fdtable_size(), which
    didn't actually sanitize the fdtable size sufficiently, and never made
    sure it had the proper BITS_PER_LONG alignment.
    
    That's partly because the alignment historically came not from having to
    explicitly align things, but simply from previous fdtable sizes, and
    from count_open_files(), which counts the file descriptors by walking
    them one 'unsigned long' word at a time and thus naturally ends up doing
    sizing in the proper 'chunks of unsigned long'.
    
    But with the introduction of close_range(), we now have an external
    source of "this is how many files we want to have", and so
    sane_fdtable_size() needs to do a better job.
    
    This also adds that explicit alignment to alloc_fdtable(), although
    there it is mainly just for documentation at a source code level.  The
    arithmetic we do there to pick a reasonable fdtable size already aligns
    the result sufficiently.
    
    In fact,clang notices that the added ALIGN() in that function doesn't
    actually do anything, and does not generate any extra code for it.
    
    It turns out that gcc ends up confusing itself by combining a previous
    constant-sized shift operation with the variable-sized shift operations
    in roundup_pow_of_two().  And probably due to that doesn't notice that
    the ALIGN() is a no-op.  But that's a (tiny) gcc misfeature that doesn't
    matter.  Having the explicit alignment makes sense, and would actually
    matter on a 128-bit architecture if we ever go there.
    
    This also adds big comments above both functions about how fdtable sizes
    have to have that BITS_PER_LONG alignment.
    
    Fixes: 60997c3d ("close_range: add CLOSE_RANGE_UNSHARE")
    Reported-by: default avatarFedor Pchelkin <aissur0002@gmail.com>
    Reported-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
    Link: https://lore.kernel.org/all/20220326114009.1690-1-aissur0002@gmail.com/Tested-and-acked-by: default avatarChristian Brauner <brauner@kernel.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    1c24a186
file.c 32.4 KB