fs/reiserfs/file.c · 734db6899abd5a278c3826a301c49c57da3aea62 · nexedi / linux

[PATCH] reiserfs: block allocator optimizations · 734db689
Chris Mason authored Jun 17, 2004
From: <mason@suse.com>
From: <jeffm@suse.com>

The current reiserfs allocator pretty much allocates things sequentially
from the start of the disk, it works very nicely for desktop loads but
once you've got more then one proc doing io data files can fragment badly.

One obvious solution is something like ext2's bitmap groups, which puts
file data into different areas of the disk based on which subdirectory
they are in.  The problem with bitmap groups is that if you've got a
group of subdirectories their contents will be spread out all over the
disk, leading to lots of seeks during a sequential read.

This allocator patch uses the packing locality to determine which bitmap
group to allocate from, but when you create a file it looks in the bitmaps
to see how 'full' that packing locality already is.  If it hasn't been
heavily used yet, the packing locality is inherited from the parent
directory putting files in new subdirs close to the parent subdir,
otherwise it is the inode number of the parent directory putting new
files far away from the parent subdir.

The end result is fewer bitmap groups for the same working set.  For
example, one test data set created by 20 procs running in parallel has
6822 subdirs.  And with vanilla reiserfs that would mean 6822
packing localities.  This patch turns that into 26 packing localities.

This makes sequential reads of big directory trees more efficient, but
it also makes the btree more efficient in general.  Things end up sorted
better because groups of subdirs end up with similar keys in the btree,
instead of being spread out all over.

The bitmap grouping code tries to use the start of each bitmap group
for metadata, and offsets the data slightly.  The data and metadata
are still close together, but not completely intermixed like they are
in the default allocator.  The end result is that leaf nodes tend to be
close to each other, making metadata readahead more effective.

The old block allocator had the ability to enforce a minimum
allocation size, but did not use it.  It now tries to do a pass looking
for larger allocation chunks before falling back to the old behaviour
of taking any blocks it can find.

The patch changes the defaults to:

mount -o alloc=skip_busy:dirid_groups:packing_groups

You can get back the old behaviour with mount -o alloc=skip_busy

mount -o alloc=dirid_groups will turn on the bitmap groups
mount -o alloc=packing_groups turns on the packing locality reduction code
mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and
skip_busy

Finally the patch adds a mount -o alloc=oid_groups, which puts files into
bitmap groups based on a hash of their objectid.  This would be used for
databases or other situations where you have a limited number of very
large files.

This command will tell you how many packing localities are actually in
use:

debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
734db689
file.c 49.3 KB
Replace file.c