• Coly Li's avatar
    bcache: set error_limit correctly · 7ba0d830
    Coly Li authored
    Struct cache uses io_errors for two purposes,
    - Error decay: when cache set error_decay is set, io_errors is used to
      generate a small piece of delay when I/O error happens.
    - I/O errors counter: in order to generate big enough value for error
      decay, I/O errors counter value is stored by left shifting 20 bits (a.k.a
      IO_ERROR_SHIFT).
    
    In function bch_count_io_errors(), if I/O errors counter reaches cache set
    error limit, bch_cache_set_error() will be called to retire the whold cache
    set. But current code is problematic when checking the error limit, see the
    following code piece from bch_count_io_errors(),
    
     90     if (error) {
     91             char buf[BDEVNAME_SIZE];
     92             unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT,
     93                                                 &ca->io_errors);
     94             errors >>= IO_ERROR_SHIFT;
     95
     96             if (errors < ca->set->error_limit)
     97                     pr_err("%s: IO error on %s, recovering",
     98                            bdevname(ca->bdev, buf), m);
     99             else
    100                     bch_cache_set_error(ca->set,
    101                                         "%s: too many IO errors %s",
    102                                         bdevname(ca->bdev, buf), m);
    103     }
    
    At line 94, errors is right shifting IO_ERROR_SHIFT bits, now it is real
    errors counter to compare at line 96. But ca->set->error_limit is initia-
    lized with an amplified value in bch_cache_set_alloc(),
    1545         c->error_limit  = 8 << IO_ERROR_SHIFT;
    
    It means by default, in bch_count_io_errors(), before 8<<20 errors happened
    bch_cache_set_error() won't be called to retire the problematic cache
    device. If the average request size is 64KB, it means bcache won't handle
    failed device until 512GB data is requested. This is too large to be an I/O
    threashold. So I believe the correct error limit should be much less.
    
    This patch sets default cache set error limit to 8, then in
    bch_count_io_errors() when errors counter reaches 8 (if it is default
    value), function bch_cache_set_error() will be called to retire the whole
    cache set. This patch also removes bits shifting when store or show
    io_error_limit value via sysfs interface.
    
    Nowadays most of SSDs handle internal flash failure automatically by LBA
    address re-indirect mapping. If an I/O error can be observed by upper layer
    code, it will be a notable error because that SSD can not re-indirect
    map the problematic LBA address to an available flash block. This situation
    indicates the whole SSD will be failed very soon. Therefore setting 8 as
    the default io error limit value makes sense, it is enough for most of
    cache devices.
    
    Changelog:
    v2: add reviewed-by from Hannes.
    v1: initial version for review.
    Signed-off-by: default avatarColy Li <colyli@suse.de>
    Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
    Reviewed-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
    Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
    Cc: Junhui Tang <tang.junhui@zte.com.cn>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    7ba0d830
super.c 51.1 KB