• Andrew Morton's avatar
    [PATCH] faster copy_*_user for bad alignments on intel ia32 · a792a27c
    Andrew Morton authored
    This patch speeds up copy_*_user for some Intel ia32 processors.  It is
    based on work by Mala Anand.
    
    It is a good win.  Around 30% for all src/dest alignments except 32/32.
    
    In this test a fully-cached one gigabyte file was read into an
    8192-byte userspace buffer using read(fd, buf, 8192).  The alignment of
    the user-side buffer was altered between runs.  This is a PIII.  Times
    are in seconds.
    
    User buffer	2.5.41		2.5.41+
    				patch++
    
    0x804c000	4.373		4.343
    0x804c001	10.024		6.401
    0x804c002	10.002		6.347
    0x804c003	10.013		6.328
    0x804c004	10.105		6.273
    0x804c005	10.184		6.323
    0x804c006	10.179		6.322
    0x804c007	10.185		6.319
    0x804c008	9.725		6.347
    0x804c009	9.780		6.275
    0x804c00a	9.779		6.355
    0x804c00b	9.778		6.350
    0x804c00c	9.723		6.351
    0x804c00d	9.790		6.307
    0x804c00e	9.790		6.289
    0x804c00f	9.785		6.294
    0x804c010	9.727		6.277
    0x804c011	9.779		6.251
    0x804c012	9.783		6.246
    0x804c013	9.786		6.245
    0x804c014	9.772		6.063
    0x804c015	9.919		6.237
    0x804c016	9.920		6.234
    0x804c017	9.918		6.237
    0x804c018	9.846		6.372
    0x804c019	10.060		6.294
    0x804c01a	10.049		6.328
    0x804c01b	10.041		6.337
    0x804c01c	9.931		6.347
    0x804c01d	10.013		6.273
    0x804c01e	10.020		6.346
    0x804c01f	10.016		6.356
    0x804c020	4.442		4.366
    
    So `rep;movsl' is slower at all non-cache-aligned offsets.
    
    PII is using the PIII alignment.  I don't have a PII any more, but I do
    recall that it demonstrated the same behaviour as the PIII.
    
    The patch contains an enhancement (based on careful testing) from
    Hirokazu Takahashi <taka@valinux.co.jp>.  In cases where source and
    dest have the same alignment, but that aligment is poor, we do a short
    copy of a few bytes to bring the two pointers onto a favourable
    boundary and then do the big copy.
    
    And also a bugfix from Hirokazu Takahashi.
    
    As an added bonus, this patch decreases the kernel text by 28 kbytes.
    22k of this in in .text and the rest in __ex_table.  I'm not really
    sure why .text shrunk so much.
    
    These copy routines have no special-case for constant-sized copies.  So
    a lot of uaccess.h becomes dead code with this patch.  The next patch
    which uninlines the copy_*_user functions cleans all that up and saves
    an additional 5k.
    a792a27c
intel.c 10.4 KB