• Johannes Weiner's avatar
    mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges · 869712fd
    Johannes Weiner authored
    While upgrading from 4.16 to 5.2, we noticed these allocation errors in
    the log of the new kernel:
    
      SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
        cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0
        node 0: slabs: 5, objs: 170, free: 0
    
            slab_out_of_memory+1
            ___slab_alloc+969
            __slab_alloc+14
            kmem_cache_alloc+346
            inet_twsk_alloc+60
            tcp_time_wait+46
            tcp_fin+206
            tcp_data_queue+2034
            tcp_rcv_state_process+784
            tcp_v6_do_rcv+405
            __release_sock+118
            tcp_close+385
            inet_release+46
            __sock_release+55
            sock_close+17
            __fput+170
            task_work_run+127
            exit_to_usermode_loop+191
            do_syscall_64+212
            entry_SYSCALL_64_after_hwframe+68
    
    accompanied by an increase in machines going completely radio silent
    under memory pressure.
    
    One thing that changed since 4.16 is e699e2c6 ("net, mm: account
    sock objects to kmemcg"), which made these slab caches subject to cgroup
    memory accounting and control.
    
    The problem with that is that cgroups, unlike the page allocator, do not
    maintain dedicated atomic reserves.  As a cgroup's usage hovers at its
    limit, atomic allocations - such as done during network rx - can fail
    consistently for extended periods of time.  The kernel is not able to
    operate under these conditions.
    
    We don't want to revert the culprit patch, because it indeed tracks a
    potentially substantial amount of memory used by a cgroup.
    
    We also don't want to implement dedicated atomic reserves for cgroups.
    There is no point in keeping a fixed margin of unused bytes in the
    cgroup's memory budget to accomodate a consumer that is impossible to
    predict - we'd be wasting memory and get into configuration headaches,
    not unlike what we have going with min_free_kbytes.  We do this for
    physical mem because we have to, but cgroups are an accounting game.
    
    Instead, account these privileged allocations to the cgroup, but let
    them bypass the configured limit if they have to.  This way, we get the
    benefits of accounting the consumed memory and have it exert pressure on
    the rest of the cgroup, but like with the page allocator, we shift the
    burden of reclaimining on behalf of atomic allocations onto the regular
    allocations that can block.
    
    Link: http://lkml.kernel.org/r/20191022233708.365764-1-hannes@cmpxchg.org
    Fixes: e699e2c6 ("net, mm: account sock objects to kmemcg")
    Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Cc: Suleiman Souhlal <suleiman@google.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: <stable@vger.kernel.org>	[4.18+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    869712fd
memcontrol.c 189 KB