• Filipe David Borba Manana's avatar
    Btrfs: improve inode hash function/inode lookup · 778ba82b
    Filipe David Borba Manana authored
    Currently the hash value used for adding an inode to the VFS's inode
    hash table consists of the plain inode number, which is a 64 bits
    integer. This results in hash table buckets (hlist_head lists) with
    too many elements for at least 2 important scenarios:
    
    1) When we have many subvolumes. Each subvolume has its own btree
       where its files and directories are added to, and each has its
       own objectid (inode number) namespace. This means that if we have
       N subvolumes, and all have inode number X associated to a file or
       directory, the corresponding inodes all map to the same hash table
       entry, resulting in a bucket (hlist_head list) with N elements;
    
    2) On 32 bits machines. Th VFS hash values are unsigned longs, which
       are 32 bits wide on 32 bits machines, and the inode (objectid)
       numbers are 64 bits unsigned integers. We simply cast the inode
       numbers to hash values, which means that for all inodes with the
       same 32 bits lower half, the same hash bucket is used for all of
       them. For example, all inodes with a number (objectid) between
       0x0000_0000_ffff_ffff and 0xffff_ffff_ffff_ffff will end up in
       the same hash table bucket.
    
    This change ensures the inode's hash value depends both on the
    objectid (inode number) and its subvolume's (btree root) objectid.
    For 32 bits machines, this change gives better entropy by making
    the hash value depend on both the upper and lower 32 bits of the
    64 bits hash previously computed.
    Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
    Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
    778ba82b
disk-io.c 112 KB