• Vlastimil Babka's avatar
    mm, proc: account for shmem swap in /proc/pid/smaps · c261e7d9
    Vlastimil Babka authored
    Currently, /proc/pid/smaps will always show "Swap: 0 kB" for
    shmem-backed mappings, even if the mapped portion does contain pages
    that were swapped out.  This is because unlike private anonymous
    mappings, shmem does not change pte to swap entry, but pte_none when
    swapping the page out.  In the smaps page walk, such page thus looks
    like it was never faulted in.
    
    This patch changes smaps_pte_entry() to determine the swap status for
    such pte_none entries for shmem mappings, similarly to how
    mincore_page() does it.  Swapped out shmem pages are thus accounted for.
    For private mappings of tmpfs files that COWed some of the pages, swaped
    out status of the original shmem pages is naturally ignored.  If some of
    the private copies was also swapped out, they are accounted via their
    page table swap entries, so the resulting reported swap usage is then a
    sum of both swapped out private copies, and swapped out shmem pages that
    were not COWed.  No double accounting can thus happen.
    
    The accounting is arguably still not as precise as for private anonymous
    mappings, since now we will count also pages that the process in
    question never accessed, but another process populated them and then let
    them become swapped out.  I believe it is still less confusing and
    subtle than not showing any swap usage by shmem mappings at all.
    Swapped out counter might of interest of users who would like to prevent
    from future swapins during performance critical operation and pre-fault
    them at their convenience.  Especially for larger swapped out regions
    the cost of swapin is much higher than a fresh page allocation.  So a
    differentiation between pte_none vs.  swapped out is important for those
    usecases.
    
    One downside of this patch is that it makes /proc/pid/smaps more
    expensive for shmem mappings, as we consult the radix tree for each
    pte_none entry, so the overal complexity is O(n*log(n)).  I have
    measured this on a process that creates a 2GB mapping and dirties single
    pages with a stride of 2MB, and time how long does it take to cat
    /proc/pid/smaps of this process 100 times.
    
    Private anonymous mapping:
    
    real    0m0.949s
    user    0m0.116s
    sys     0m0.348s
    
    Mapping of a /dev/shm/file:
    
    real    0m3.831s
    user    0m0.180s
    sys     0m3.212s
    
    The difference is rather substantial, so the next patch will reduce the
    cost for shared or read-only mappings.
    
    In a less controlled experiment, I've gathered pids of processes on my
    desktop that have either '/dev/shm/*' or 'SYSV*' in smaps.  This
    included the Chrome browser and some KDE processes.  Again, I've run cat
    /proc/pid/smaps on each 100 times.
    
    Before this patch:
    
    real    0m9.050s
    user    0m0.518s
    sys     0m8.066s
    
    After this patch:
    
    real    0m9.221s
    user    0m0.541s
    sys     0m8.187s
    
    This suggests low impact on average systems.
    
    Note that this patch doesn't attempt to adjust the SwapPss field for
    shmem mappings, which would need extra work to determine who else could
    have the pages mapped.  Thus the value stays zero except for COWed
    swapped out pages in a shmem mapping, which are accounted as usual.
    Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Acked-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Acked-by: default avatarJerome Marchand <jmarchan@redhat.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Hugh Dickins <hughd@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c261e7d9
task_mmu.c 39.9 KB