• Barry Song's avatar
    mm: madvise: pageout: ignore references rather than clearing young · 2864f3d0
    Barry Song authored
    While doing MADV_PAGEOUT, the current code will clear PTE young so that
    vmscan won't read young flags to allow the reclamation of madvised folios
    to go ahead.  It seems we can do it by directly ignoring references, thus
    we can remove tlb flush in madvise and rmap overhead in vmscan.
    
    Regarding the side effect, in the original code, if a parallel thread runs
    side by side to access the madvised memory with the thread doing madvise,
    folios will get a chance to be re-activated by vmscan (though the time gap
    is actually quite small since checking PTEs is done immediately after
    clearing PTEs young).  But with this patch, they will still be reclaimed. 
    But this behaviour doing PAGEOUT and doing access at the same time is
    quite silly like DoS.  So probably, we don't need to care.  Or ignoring
    the new access during the quite small time gap is even better.
    
    For DAMON's DAMOS_PAGEOUT based on physical address region, we still keep
    its behaviour as is since a physical address might be mapped by multiple
    processes.  MADV_PAGEOUT based on virtual address is actually much more
    aggressive on reclamation.  To untouch paddr's DAMOS_PAGEOUT, we simply
    pass ignore_references as false in reclaim_pages().
    
    A microbench as below has shown 6% decrement on the latency of
    MADV_PAGEOUT,
    
     #define PGSIZE 4096
     main()
     {
     	int i;
     #define SIZE 512*1024*1024
     	volatile long *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
     			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    
     	for (i = 0; i < SIZE/sizeof(long); i += PGSIZE / sizeof(long))
     		p[i] =  0x11;
    
     	madvise(p, SIZE, MADV_PAGEOUT);
     }
    
    w/o patch                    w/ patch
    root@10:~# time ./a.out      root@10:~# time ./a.out
    real	0m49.634s            real   0m46.334s
    user	0m0.637s             user   0m0.648s
    sys	0m47.434s            sys    0m44.265s
    
    Link: https://lkml.kernel.org/r/20240226005739.24350-1-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
    Acked-by: default avatarMinchan Kim <minchan@kernel.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    2864f3d0
internal.h 39.1 KB