Commit fb3d1c08 authored by Christian Borntraeger's avatar Christian Borntraeger Committed by Martin Schwidefsky

s390: let the compiler do page clearing

The hardware folks told me that for page clearing "when you exactly
know what to do, hand written xc+pfd is usally faster then mvcl for
page clearing, as it saves millicode overhead and parameter parsing
and checking" as long as you dont need the cache bypassing.
Turns out that gcc already does a proper xc,pfd loop.

A small test on z196 that does

buff = mmap(NULL, bufsize,PROT_EXEC|PROT_WRITE|PROT_READ,AP_PRIVATE| MAP_ANONYMOUS,0,0);
for ( i = 0; i < bufsize; i+= 256)
    buff[i] = 0x5;

gets 20% faster (touches every cache line of a page)

and

buff = mmap(NULL, bufsize,PROT_EXEC|PROT_WRITE|PROT_READ,AP_PRIVATE| MAP_ANONYMOUS,0,0);
for ( i = 0; i < bufsize; i+= 4096)
    buff[i] = 0x5;

is within noise ratio (touches one cache line of a page).

As the clear_page is usually called for first memory accesses
we can assume that at least one cache line is used afterwards,
so this change should be always better.
Another benchmark, a make -j 40 of my testsuite in tmpfs with
hot caches on a 32cpu system:

 -- unpatched --       --  patched  --
real     0m1.017s     real     0m0.994s   (~2% faster, but in noise)
user     0m5.339s     user     0m5.016s   (~6% faster)
sys      0m0.691s     sys      0m0.632s   (~8% faster)

Let use the same define to memset as the asm-generic variant
Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
parent f0483044
...@@ -37,16 +37,7 @@ static inline void storage_key_init_range(unsigned long start, unsigned long end ...@@ -37,16 +37,7 @@ static inline void storage_key_init_range(unsigned long start, unsigned long end
#endif #endif
} }
static inline void clear_page(void *page) #define clear_page(page) memset((page), 0, PAGE_SIZE)
{
register unsigned long reg1 asm ("1") = 0;
register void *reg2 asm ("2") = page;
register unsigned long reg3 asm ("3") = 4096;
asm volatile(
" mvcl 2,0"
: "+d" (reg2), "+d" (reg3) : "d" (reg1)
: "memory", "cc");
}
/* /*
* copy_page uses the mvcl instruction with 0xb0 padding byte in order to * copy_page uses the mvcl instruction with 0xb0 padding byte in order to
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment