Commit c9f4cd71 authored by Huang Ying's avatar Huang Ying Committed by Linus Torvalds

mm, huge page: copy target sub-page last when copy huge page

Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue.  For example, when
copying huge page on x86_64 platform, the cache footprint is 4M.  But on
a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M LLC
(last level cache).  That is, in average, there are 2.5M LLC for each
core and 1.25M LLC for each thread.

If the cache contention is heavy when copying the huge page, and we copy
the huge page from the begin to the end, it is possible that the begin
of huge page is evicted from the cache after we finishing copying the
end of the huge page.  And it is possible for the application to access
the begin of the huge page after copying the huge page.

In c79b57e4 ("mm: hugetlb: clear target sub-page last when clearing
huge page"), to keep the cache lines of the target subpage hot, the
order to clear the subpages in the huge page in clear_huge_page() is
changed to clearing the subpage which is furthest from the target
subpage firstly, and the target subpage last.  The similar order
changing helps huge page copying too.  That is implemented in this
patch.  Because we have put the order algorithm into a separate
function, the implementation is quite simple.

The patch is a generic optimization which should benefit quite some
workloads, not for a specific use case.  To demonstrate the performance
benefit of the patch, we tested it with vm-scalability run on
transparent huge page.

With this patch, the throughput increases ~16.6% in vm-scalability
anon-cow-seq test case with 36 processes on a 2 socket Xeon E5 v3 2699
system (36 cores, 72 threads).  The test case set
/sys/kernel/mm/transparent_hugepage/enabled to be always, mmap() a big
anonymous memory area and populate it, then forked 36 child processes,
each writes to the anonymous memory area from the begin to the end, so
cause copy on write.  For each child process, other child processes
could be seen as other workloads which generate heavy cache pressure.
At the same time, the IPC (instruction per cycle) increased from 0.63 to
0.78, and the time spent in user space is reduced ~7.2%.

Link: http://lkml.kernel.org/r/20180524005851.4079-3-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Christopher Lameter <cl@linux.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent c6ddfb6c
...@@ -2752,7 +2752,8 @@ extern void clear_huge_page(struct page *page, ...@@ -2752,7 +2752,8 @@ extern void clear_huge_page(struct page *page,
unsigned long addr_hint, unsigned long addr_hint,
unsigned int pages_per_huge_page); unsigned int pages_per_huge_page);
extern void copy_user_huge_page(struct page *dst, struct page *src, extern void copy_user_huge_page(struct page *dst, struct page *src,
unsigned long addr, struct vm_area_struct *vma, unsigned long addr_hint,
struct vm_area_struct *vma,
unsigned int pages_per_huge_page); unsigned int pages_per_huge_page);
extern long copy_huge_page_from_user(struct page *dst_page, extern long copy_huge_page_from_user(struct page *dst_page,
const void __user *usr_src, const void __user *usr_src,
......
...@@ -1328,7 +1328,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) ...@@ -1328,7 +1328,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
if (!page) if (!page)
clear_huge_page(new_page, vmf->address, HPAGE_PMD_NR); clear_huge_page(new_page, vmf->address, HPAGE_PMD_NR);
else else
copy_user_huge_page(new_page, page, haddr, vma, HPAGE_PMD_NR); copy_user_huge_page(new_page, page, vmf->address,
vma, HPAGE_PMD_NR);
__SetPageUptodate(new_page); __SetPageUptodate(new_page);
mmun_start = haddr; mmun_start = haddr;
......
...@@ -4705,11 +4705,31 @@ static void copy_user_gigantic_page(struct page *dst, struct page *src, ...@@ -4705,11 +4705,31 @@ static void copy_user_gigantic_page(struct page *dst, struct page *src,
} }
} }
struct copy_subpage_arg {
struct page *dst;
struct page *src;
struct vm_area_struct *vma;
};
static void copy_subpage(unsigned long addr, int idx, void *arg)
{
struct copy_subpage_arg *copy_arg = arg;
copy_user_highpage(copy_arg->dst + idx, copy_arg->src + idx,
addr, copy_arg->vma);
}
void copy_user_huge_page(struct page *dst, struct page *src, void copy_user_huge_page(struct page *dst, struct page *src,
unsigned long addr, struct vm_area_struct *vma, unsigned long addr_hint, struct vm_area_struct *vma,
unsigned int pages_per_huge_page) unsigned int pages_per_huge_page)
{ {
int i; unsigned long addr = addr_hint &
~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
struct copy_subpage_arg arg = {
.dst = dst,
.src = src,
.vma = vma,
};
if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) { if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
copy_user_gigantic_page(dst, src, addr, vma, copy_user_gigantic_page(dst, src, addr, vma,
...@@ -4717,11 +4737,7 @@ void copy_user_huge_page(struct page *dst, struct page *src, ...@@ -4717,11 +4737,7 @@ void copy_user_huge_page(struct page *dst, struct page *src,
return; return;
} }
might_sleep(); process_huge_page(addr_hint, pages_per_huge_page, copy_subpage, &arg);
for (i = 0; i < pages_per_huge_page; i++) {
cond_resched();
copy_user_highpage(dst + i, src + i, addr + i*PAGE_SIZE, vma);
}
} }
long copy_huge_page_from_user(struct page *dst_page, long copy_huge_page_from_user(struct page *dst_page,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment