1. 17 Apr, 2015 6 commits
    • Sebastian Ott's avatar
      lib/dma-debug: fix bucket_find_contain() · a7a2c02a
      Sebastian Ott authored
      bucket_find_contain() will search the bucket list for a dma_debug_entry.
      When the entry isn't found it needs to search other buckets too, since
      only the start address of a dma range is hashed (which might be in a
      different bucket).
      
      A copy of the dma_debug_entry is used to get the previous hash bucket
      but when its list is searched the original dma_debug_entry is to be used
      not its modified copy.
      
      This fixes false "device driver tries to sync DMA memory it has not allocated"
      warnings.
      Signed-off-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Horia Geanta <horia.geanta@freescale.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7a2c02a
    • Rasmus Villemoes's avatar
      lib/vsprintf.c: even faster binary to decimal conversion · 7c43d9a3
      Rasmus Villemoes authored
      The most expensive part of decimal conversion is the divisions by 10
      (albeit done using reciprocal multiplication with appropriately chosen
      constants).  I decided to see if one could eliminate around half of
      these multiplications by emitting two digits at a time, at the cost of a
      200 byte lookup table, and it does indeed seem like there is something
      to be gained, especially on 64 bits.  Microbenchmarking shows
      improvements ranging from -50% (for numbers uniformly distributed in [0,
      2^64-1]) to -25% (for numbers heavily biased toward the smaller end, a
      more realistic distribution).
      
      On a larger scale, perf shows that top, one of the big consumers of /proc
      data, uses 0.5-1.0% fewer cpu cycles.
      
      I had to jump through some hoops to get the 32 bit code to compile and run
      on my 64 bit machine, so I'm not sure how relevant these numbers are, but
      just for comparison the microbenchmark showed improvements between -30%
      and -10%.
      
      The bloat-o-meter costs are around 150 bytes (the generated code is a
      little smaller, so it's not the full 200 bytes) on both 32 and 64 bit.
      I'm aware that extra cache misses won't show up in a microbenchmark as
      used above, but on the other hand decimal conversions often happen in bulk
      (for example in the case of top).
      
      I have of course tested that the new code generates the same output as the
      old, for both the first and last 1e10 numbers in [0,2^64-1] and 4e9
      'random' numbers in-between.
      
      Test and verification code on github: https://github.com/Villemoes/dec.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Tested-by: default avatarJeff Epler <jepler@unpythonic.net>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c43d9a3
    • Yury Norov's avatar
      lib: rename lib/find_next_bit.c to lib/find_bit.c · 840620a1
      Yury Norov authored
      This file contains implementation for all find_*_bit{,_le}
      So giving it more generic name looks reasonable.
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      840620a1
    • Yury Norov's avatar
      lib: move find_last_bit to lib/find_next_bit.c · 8f6f19dd
      Yury Norov authored
      Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
      except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
      there's no major benefit to have it separated.
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f6f19dd
    • Yury Norov's avatar
      lib: find_*_bit reimplementation · 2c57a0e2
      Yury Norov authored
      This patchset does rework to find_bit function family to achieve better
      performance, and decrease size of text.  All rework is done in patch 1.
      Patches 2 and 3 are about code moving and renaming.
      
      It was boot-tested on x86_64 and MIPS (big-endian) machines.
      Performance tests were ran on userspace with code like this:
      
      	/* addr[] is filled from /dev/urandom */
      	start = clock();
      	while (ret < nbits)
      		ret = find_next_bit(addr, nbits, ret + 1);
      
      	end = clock();
      	printf("%ld\t", (unsigned long) end - start);
      
      On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for
      find_next_bit, nbits is 8M, for find_first_bit - 80K)
      
      	find_next_bit:		find_first_bit:
      	new	current		new	current
      	26932	43151		14777	14925
      	26947	43182		14521	15423
      	26507	43824		15053	14705
      	27329	43759		14473	14777
      	26895	43367		14847	15023
      	26990	43693		15103	15163
      	26775	43299		15067	15232
      	27282	42752		14544	15121
      	27504	43088		14644	14858
      	26761	43856		14699	15193
      	26692	43075		14781	14681
      	27137	42969		14451	15061
      	...			...
      
      find_next_bit performance gain is 35-40%;
      find_first_bit - no measurable difference.
      
      On ARM machine, there is arch-specific implementation for find_bit.
      
      Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
      helpful discussions.
      
      This patch (of 3):
      
      New implementations takes less space in source file (see diffstat) and in
      object.  For me it's 710 vs 453 bytes of text.  It also shows better
      performance.
      
      find_last_bit description fixed due to obvious typo.
      
      [akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus]
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c57a0e2
    • Richard Weinberger's avatar
      alpha: forward declare struct pt_regs in processor.h · 396ada68
      Richard Weinberger authored
      Removal of exec domains uncovered this new warning.  processor.h re-used
      struct pt_regs from personality.h which is now gone.
      
        ./arch/alpha/include/asm/processor.h:47:33: warning: 'struct pt_regs' declared inside parameter list [enabled by default]
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      396ada68
  2. 16 Apr, 2015 1 commit
  3. 15 Apr, 2015 33 commits