• Sergey Senozhatsky's avatar
    zsmalloc: fix zs_can_compact() integer overflow · 44f43e99
    Sergey Senozhatsky authored
    zs_can_compact() has two race conditions in its core calculation:
    
    unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
    				zs_stat_get(class, OBJ_USED);
    
    1) classes are not locked, so the numbers of allocated and used
       objects can change by the concurrent ops happening on other CPUs
    2) shrinker invokes it from preemptible context
    
    Depending on the circumstances, thus, OBJ_ALLOCATED can become
    less than OBJ_USED, which can result in either very high or
    negative `total_scan' value calculated later in do_shrink_slab().
    
    do_shrink_slab() has some logic to prevent those cases:
    
     vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
     vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
     vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
     vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
     vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
     vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    
    However, due to the way `total_scan' is calculated, not every
    shrinker->count_objects() overflow can be spotted and handled.
    To demonstrate the latter, I added some debugging code to do_shrink_slab()
    (x86_64) and the results were:
    
     vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
     vmscan: but total_scan > 0: 92679974445502
     vmscan: resulting total_scan: 92679974445502
    [..]
     vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
     vmscan: but total_scan > 0: 22634041808232578
     vmscan: resulting total_scan: 22634041808232578
    
    Even though shrinker->count_objects() has returned an overflowed value,
    the resulting `total_scan' is positive, and, what is more worrisome, it
    is insanely huge. This value is getting used later on in
    shrinker->scan_objects() loop:
    
            while (total_scan >= batch_size ||
                   total_scan >= freeable) {
                    unsigned long ret;
                    unsigned long nr_to_scan = min(batch_size, total_scan);
    
                    shrinkctl->nr_to_scan = nr_to_scan;
                    ret = shrinker->scan_objects(shrinker, shrinkctl);
                    if (ret == SHRINK_STOP)
                            break;
                    freed += ret;
    
                    count_vm_events(SLABS_SCANNED, nr_to_scan);
                    total_scan -= nr_to_scan;
    
                    cond_resched();
            }
    
    `total_scan >= batch_size' is true for a very-very long time and
    'total_scan >= freeable' is also true for quite some time, because
    `freeable < 0' and `total_scan' is large enough, for example,
    22634041808232578. The only break condition, in the given scheme of
    things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
    bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
    
    To fix the issue, take a pool stat snapshot and use it instead of
    racy zs_stat_get() calls.
    
    Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.comSigned-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: <stable@vger.kernel.org>        [4.3+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    44f43e99
zsmalloc.c 48.5 KB