• Shaohua Li's avatar
    blk-throttle: make bandwidth change smooth · 7394e31f
    Shaohua Li authored
    When cgroups all reach low limit, cgroups can dispatch more IO. This
    could make some cgroups dispatch more IO but others not, and even some
    cgroups could dispatch less IO than their low limit. For example, cg1
    low limit 10MB/s, cg2 limit 80MB/s, assume disk maximum bandwidth is
    120M/s for the workload. Their bps could something like this:
    
    cg1/cg2 bps: T1: 10/80 -> T2: 60/60 -> T3: 10/80
    
    At T1, all cgroups reach low limit, so they can dispatch more IO later.
    Then cg1 dispatch more IO and cg2 has no room to dispatch enough IO. At
    T2, cg2 only dispatches 60M/s. Since We detect cg2 dispatches less IO
    than its low limit 80M/s, we downgrade the queue from LIMIT_MAX to
    LIMIT_LOW, then all cgroups are throttled to their low limit (T3). cg2
    will have bandwidth below its low limit at most time.
    
    The big problem here is we don't know the maximum bandwidth of the
    workload, so we can't make smart decision to avoid the situation. This
    patch makes cgroup bandwidth change smooth. After disk upgrades from
    LIMIT_LOW to LIMIT_MAX, we don't allow cgroups use all bandwidth upto
    their max limit immediately. Their bandwidth limit will be increased
    gradually to avoid above situation. So above example will became
    something like:
    
    cg1/cg2 bps: 10/80 -> 15/105 -> 20/100 -> 25/95 -> 30/90 -> 35/85 -> 40/80
    -> 45/75 -> 22/98
    
    In this way cgroups bandwidth will be above their limit in majority
    time, this still doesn't fully utilize disk bandwidth, but that's
    something we pay for sharing.
    
    Scale up is linear. The limit scales up 1/2 .low limit every
    throtl_slice after upgrade. The scale up will stop if the adjusted limit
    hits .max limit. Scale down is exponential. We cut the scale value half
    if a cgroup doesn't hit its .low limit. If the scale becomes 0, we then
    fully downgrade the queue to LIMIT_LOW state.
    
    Note this doesn't completely avoid cgroup running under its low limit.
    The best way to guarantee cgroup doesn't run under its limit is to set
    max limit. For example, if we set cg1 max limit to 40, cg2 will never
    run under its low limit.
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    7394e31f
blk-throttle.c 56.7 KB