Commit 236222d3 authored by Paolo Abeni's avatar Paolo Abeni Committed by Ingo Molnar

x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings

According to the Intel datasheet, the REP MOVSB instruction
exposes a pretty heavy setup cost (50 ticks), which hurts
short string copy operations.

This change tries to avoid this cost by calling the explicit
loop available in the unrolled code for strings shorter
than 64 bytes.

The 64 bytes cutoff value is arbitrary from the code logic
point of view - it has been selected based on measurements,
as the largest value that still ensures a measurable gain.

Micro benchmarks of the __copy_from_user() function with
lengths in the [0-63] range show this performance gain
(shorter the string, larger the gain):

 - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4
 - in the [72%-9%] range on Intel Core i7-4810MQ

Other tested CPUs - namely Intel Atom S1260 and AMD Opteron
8216 - show no difference, because they do not expose the
ERMS feature bit.
Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com
[ Clarified the changelog. ]
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent 4d8a991d
...@@ -37,7 +37,7 @@ ENTRY(copy_user_generic_unrolled) ...@@ -37,7 +37,7 @@ ENTRY(copy_user_generic_unrolled)
movl %edx,%ecx movl %edx,%ecx
andl $63,%edx andl $63,%edx
shrl $6,%ecx shrl $6,%ecx
jz 17f jz .L_copy_short_string
1: movq (%rsi),%r8 1: movq (%rsi),%r8
2: movq 1*8(%rsi),%r9 2: movq 1*8(%rsi),%r9
3: movq 2*8(%rsi),%r10 3: movq 2*8(%rsi),%r10
...@@ -58,7 +58,8 @@ ENTRY(copy_user_generic_unrolled) ...@@ -58,7 +58,8 @@ ENTRY(copy_user_generic_unrolled)
leaq 64(%rdi),%rdi leaq 64(%rdi),%rdi
decl %ecx decl %ecx
jnz 1b jnz 1b
17: movl %edx,%ecx .L_copy_short_string:
movl %edx,%ecx
andl $7,%edx andl $7,%edx
shrl $3,%ecx shrl $3,%ecx
jz 20f jz 20f
...@@ -174,6 +175,8 @@ EXPORT_SYMBOL(copy_user_generic_string) ...@@ -174,6 +175,8 @@ EXPORT_SYMBOL(copy_user_generic_string)
*/ */
ENTRY(copy_user_enhanced_fast_string) ENTRY(copy_user_enhanced_fast_string)
ASM_STAC ASM_STAC
cmpl $64,%edx
jb .L_copy_short_string /* less then 64 bytes, avoid the costly 'rep' */
movl %edx,%ecx movl %edx,%ecx
1: rep 1: rep
movsb movsb
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment