[PATCH] reiserfs: block allocator optimizations
From: <mason@suse.com> From: <jeffm@suse.com> The current reiserfs allocator pretty much allocates things sequentially from the start of the disk, it works very nicely for desktop loads but once you've got more then one proc doing io data files can fragment badly. One obvious solution is something like ext2's bitmap groups, which puts file data into different areas of the disk based on which subdirectory they are in. The problem with bitmap groups is that if you've got a group of subdirectories their contents will be spread out all over the disk, leading to lots of seeks during a sequential read. This allocator patch uses the packing locality to determine which bitmap group to allocate from, but when you create a file it looks in the bitmaps to see how 'full' that packing locality already is. If it hasn't been heavily used yet, the packing locality is inherited from the parent directory putting files in new subdirs close to the parent subdir, otherwise it is the inode number of the parent directory putting new files far away from the parent subdir. The end result is fewer bitmap groups for the same working set. For example, one test data set created by 20 procs running in parallel has 6822 subdirs. And with vanilla reiserfs that would mean 6822 packing localities. This patch turns that into 26 packing localities. This makes sequential reads of big directory trees more efficient, but it also makes the btree more efficient in general. Things end up sorted better because groups of subdirs end up with similar keys in the btree, instead of being spread out all over. The bitmap grouping code tries to use the start of each bitmap group for metadata, and offsets the data slightly. The data and metadata are still close together, but not completely intermixed like they are in the default allocator. The end result is that leaf nodes tend to be close to each other, making metadata readahead more effective. The old block allocator had the ability to enforce a minimum allocation size, but did not use it. It now tries to do a pass looking for larger allocation chunks before falling back to the old behaviour of taking any blocks it can find. The patch changes the defaults to: mount -o alloc=skip_busy:dirid_groups:packing_groups You can get back the old behaviour with mount -o alloc=skip_busy mount -o alloc=dirid_groups will turn on the bitmap groups mount -o alloc=packing_groups turns on the packing locality reduction code mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and skip_busy Finally the patch adds a mount -o alloc=oid_groups, which puts files into bitmap groups based on a hash of their objectid. This would be used for databases or other situations where you have a limited number of very large files. This command will tell you how many packing localities are actually in use: debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Showing
Please register or sign in to comment