• Jonathan Brassow's avatar
    MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) · 475901af
    Jonathan Brassow authored
    The MD RAID10 'far' and 'offset' algorithms make copies of entire stripe
    widths - copying them to a different location on the same devices after
    shifting the stripe.  An example layout of each follows below:
    
    	        "far" algorithm
    	dev1 dev2 dev3 dev4 dev5 dev6
    	==== ==== ==== ==== ==== ====
    	 A    B    C    D    E    F
    	 G    H    I    J    K    L
    	            ...
    	 F    A    B    C    D    E  --> Copy of stripe0, but shifted by 1
    	 L    G    H    I    J    K
    	            ...
    
    		"offset" algorithm
    	dev1 dev2 dev3 dev4 dev5 dev6
    	==== ==== ==== ==== ==== ====
    	 A    B    C    D    E    F
    	 F    A    B    C    D    E  --> Copy of stripe0, but shifted by 1
    	 G    H    I    J    K    L
    	 L    G    H    I    J    K
    	            ...
    
    Redundancy for these algorithms is gained by shifting the copied stripes
    one device to the right.  This patch proposes that array be divided into
    sets of adjacent devices and when the stripe copies are shifted, they wrap
    on set boundaries rather than the array size boundary.  That is, for the
    purposes of shifting, the copies are confined to their sets within the
    array.  The sets are 'near_copies * far_copies' in size.
    
    The above "far" algorithm example would change to:
    	        "far" algorithm
    	dev1 dev2 dev3 dev4 dev5 dev6
    	==== ==== ==== ==== ==== ====
    	 A    B    C    D    E    F
    	 G    H    I    J    K    L
    	            ...
    	 B    A    D    C    F    E  --> Copy of stripe0, shifted 1, 2-dev sets
    	 H    G    J    I    L    K      Dev sets are 1-2, 3-4, 5-6
    	            ...
    
    This has the affect of improving the redundancy of the array.  We can
    always sustain at least one failure, but sometimes more than one can
    be handled.  In the first examples, the pairs of devices that CANNOT fail
    together are:
    	(1,2) (2,3) (3,4) (4,5) (5,6) (1, 6) [40% of possible pairs]
    In the example where the copies are confined to sets, the pairs of
    devices that cannot fail together are:
    	(1,2) (3,4) (5,6)                    [20% of possible pairs]
    
    We cannot simply replace the old algorithms, so the 17th bit of the 'layout'
    variable is used to indicate whether we use the old or new method of computing
    the shift.  (This is similar to the way the 16th bit indicates whether the
    "far" algorithm or the "offset" algorithm is being used.)
    
    This patch only handles the cases where the number of total raid disks is
    a multiple of 'far_copies'.  A follow-on patch addresses the condition where
    this is not true.
    Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    475901af
raid10.c 129 KB