• Chris Mason's avatar
    [PATCH] reiserfs: block allocator optimizations · 734db689
    Chris Mason authored
    From: <mason@suse.com>
    From: <jeffm@suse.com>
    
    The current reiserfs allocator pretty much allocates things sequentially
    from the start of the disk, it works very nicely for desktop loads but
    once you've got more then one proc doing io data files can fragment badly.
    
    One obvious solution is something like ext2's bitmap groups, which puts
    file data into different areas of the disk based on which subdirectory
    they are in.  The problem with bitmap groups is that if you've got a
    group of subdirectories their contents will be spread out all over the
    disk, leading to lots of seeks during a sequential read.
    
    This allocator patch uses the packing locality to determine which bitmap
    group to allocate from, but when you create a file it looks in the bitmaps
    to see how 'full' that packing locality already is.  If it hasn't been
    heavily used yet, the packing locality is inherited from the parent
    directory putting files in new subdirs close to the parent subdir,
    otherwise it is the inode number of the parent directory putting new
    files far away from the parent subdir.
    
    The end result is fewer bitmap groups for the same working set.  For
    example, one test data set created by 20 procs running in parallel has
    6822 subdirs.  And with vanilla reiserfs that would mean 6822
    packing localities.  This patch turns that into 26 packing localities.
    
    This makes sequential reads of big directory trees more efficient, but
    it also makes the btree more efficient in general.  Things end up sorted
    better because groups of subdirs end up with similar keys in the btree,
    instead of being spread out all over.
    
    The bitmap grouping code tries to use the start of each bitmap group
    for metadata, and offsets the data slightly.  The data and metadata
    are still close together, but not completely intermixed like they are
    in the default allocator.  The end result is that leaf nodes tend to be
    close to each other, making metadata readahead more effective.
    
    The old block allocator had the ability to enforce a minimum
    allocation size, but did not use it.  It now tries to do a pass looking
    for larger allocation chunks before falling back to the old behaviour
    of taking any blocks it can find.
    
    The patch changes the defaults to:
    
    mount -o alloc=skip_busy:dirid_groups:packing_groups
    
    You can get back the old behaviour with mount -o alloc=skip_busy
    
    mount -o alloc=dirid_groups will turn on the bitmap groups
    mount -o alloc=packing_groups turns on the packing locality reduction code
    mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and
    skip_busy
    
    Finally the patch adds a mount -o alloc=oid_groups, which puts files into
    bitmap groups based on a hash of their objectid.  This would be used for
    databases or other situations where you have a limited number of very
    large files.
    
    This command will tell you how many packing localities are actually in
    use:
    
    debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    734db689
inode.c 85.4 KB