1. 01 Mar, 2010 8 commits
  2. 26 Feb, 2010 4 commits
    • Luca Barbieri's avatar
      x86-32: Rewrite 32-bit atomic64 functions in assembly · a7e926ab
      Luca Barbieri authored
      This patch replaces atomic64_32.c with two assembly implementations,
      one for 386/486 machines using pushf/cli/popf and one for 586+ machines
      using cmpxchg8b.
      
      The cmpxchg8b implementation provides the following advantages over the
      current one:
      
      1. Implements atomic64_add_unless, atomic64_dec_if_positive and
         atomic64_inc_not_zero
      
      2. Uses the ZF flag changed by cmpxchg8b instead of doing a comparison
      
      3. Uses custom register calling conventions that reduce or eliminate
         register moves to suit cmpxchg8b
      
      4. Reads the initial value instead of using cmpxchg8b to do that.
         Currently we use lock xaddl and movl, which seems the fastest.
      
      5. Does not use the lock prefix for atomic64_set
         64-bit writes are already atomic, so we don't need that.
         We still need it for atomic64_read to avoid restoring a value
         changed in the meantime.
      
      6. Allocates registers as well or better than gcc
      
      The 386 implementation provides support for 386 and 486 machines.
      386/486 SMP is not supported (we dropped it), but such support can be
      added easily if desired.
      
      A pure assembly implementation is required due to the custom calling
      conventions, and desire to use %ebp in atomic64_add_return (we need
      7 registers...), as well as the ability to use pushf/popf in the 386
      code without an intermediate pop/push.
      
      The parameter names are changed to match the convention in atomic_64.h
      
      Changes in v3 (due to rebasing to tip/x86/asm):
      - Patches atomic64_32.h instead of atomic_32.h
      - Uses the CALL alternative mechanism from commit
        1b1d9258
      
      Changes in v2:
      - Merged 386 and cx8 support in the same patch
      - 386 support now done in assembly, C code no longer used at all
      - cmpxchg64 is used for atomic64_cmpxchg
      - stop using macros, use one-line inline functions instead
      - miscellanous changes and improvements
      Signed-off-by: default avatarLuca Barbieri <luca@luca-barbieri.com>
      LKML-Reference: <1267005265-27958-5-git-send-email-luca@luca-barbieri.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      a7e926ab
    • Luca Barbieri's avatar
      lib: Add self-test for atomic64_t · 86a89380
      Luca Barbieri authored
      This patch adds self-test on boot code for atomic64_t.
      
      This has been used to test the later changes in this patchset.
      Signed-off-by: default avatarLuca Barbieri <luca@luca-barbieri.com>
      LKML-Reference: <1267005265-27958-4-git-send-email-luca@luca-barbieri.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      86a89380
    • Luca Barbieri's avatar
      x86-32: Allow UP/SMP lock replacement in cmpxchg64 · 9c76b384
      Luca Barbieri authored
      Use the functionality just introduced in the previous patch: mark the
      lock prefixes in cmpxchg64 alternatives for UP removal.
      
      Changes in v2:
      - Naming change
      Signed-off-by: default avatarLuca Barbieri <luca@luca-barbieri.com>
      LKML-Reference: <1267005265-27958-3-git-send-email-luca@luca-barbieri.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      9c76b384
    • Luca Barbieri's avatar
      x86: Add support for lock prefix in alternatives · b3ac891b
      Luca Barbieri authored
      The current lock prefix UP/SMP alternative code doesn't allow
      LOCK_PREFIX to be used in alternatives code.
      
      This patch solves the problem by adding a new LOCK_PREFIX_ALTERNATIVE_PATCH
      macro that only records the lock prefix location but does not emit
      the prefix.
      
      The user of this macro can then start any alternative sequence with
      "lock" and have it UP/SMP patched.
      
      To make this work, the UP/SMP alternative code is changed to do the
      lock/DS prefix switching only if the byte actually contains a lock or
      DS prefix.
      
      Thus, if an alternative without the "lock" is selected, it will now do
      nothing instead of clobbering the code.
      
      Changes in v2:
      - Naming change
      - Change label to not conflict with alternatives
      Signed-off-by: default avatarLuca Barbieri <luca@luca-barbieri.com>
      LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      b3ac891b
  3. 16 Feb, 2010 1 commit
  4. 13 Jan, 2010 3 commits
    • Brian Gerst's avatar
      x86: Merge show_regs() · 3bef4447
      Brian Gerst authored
      Using kernel_stack_pointer() allows 32-bit and 64-bit versions to
      be merged.  This is more correct for 64-bit, since the old %rsp is
      always saved on the stack.
      Signed-off-by: default avatarBrian Gerst <brgerst@gmail.com>
      LKML-Reference: <1263397555-27695-1-git-send-email-brgerst@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      3bef4447
    • Dave Jones's avatar
      x86: Macroise x86 cache descriptors · 2ca49b2f
      Dave Jones authored
      Use a macro to define the cache sizes when cachesize > 1 MB.
      
      This is less typing, and less prone to introducing bugs like we
      saw in e02e0e1a, and means we
      don't have to do maths when adding new non-power-of-2 updates
      like those seen recently.
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      LKML-Reference: <20100104144735.GA18390@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2ca49b2f
    • Linus Torvalds's avatar
      x86-32: clean up rwsem inline asm statements · 59c33fa7
      Linus Torvalds authored
      This makes gcc use the right register names and instruction operand sizes
      automatically for the rwsem inline asm statements.
      
      So instead of using "(%%eax)" to specify the memory address that is the
      semaphore, we use "(%1)" or similar. And instead of forcing the operation
      to always be 32-bit, we use "%z0", taking the size from the actual
      semaphore data structure itself.
      
      This doesn't actually matter on x86-32, but if we want to use the same
      inline asm for x86-64, we'll need to have the compiler generate the proper
      64-bit names for the registers (%rax instead of %eax), and if we want to
      use a 64-bit counter too (in order to avoid the 15-bit limit on the
      write counter that limits concurrent users to 32767 threads), we'll need
      to be able to generate instructions with "q" accesses rather than "l".
      
      Since this header currently isn't enabled on x86-64, none of that matters,
      but we do want to use the xadd version of the semaphores rather than have
      to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
      when you have lots of threads all taking page faults, and the fallback
      rwsem code that uses a spinlock performs abysmally badly in that case.
      
      [ hpa: modified the patch to skip size suffixes entirely when they are
        redundant due to register operands. ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <alpine.LFD.2.00.1001121613560.17145@localhost.localdomain>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      59c33fa7
  5. 07 Jan, 2010 3 commits
  6. 30 Dec, 2009 3 commits
    • Jan Beulich's avatar
      x86-64: Modify memcpy()/memset() alternatives mechanism · 7269e881
      Jan Beulich authored
      In order to avoid unnecessary chains of branches, rather than
      implementing memcpy()/memset()'s access to their alternative
      implementations via a jump, patch the (larger) original function
      directly.
      
      The memcpy() part of this is slightly subtle: while alternative
      instruction patching does itself use memcpy(), with the
      replacement block being less than 64-bytes in size the main loop
      of the original function doesn't get used for copying memcpy_c()
      over memcpy(), and hence we can safely write over its beginning.
      
      Also note that the CFI annotations are fine for both variants of
      each of the functions.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B2BB8D30200007800026AF2@vpn.id2.novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7269e881
    • Jan Beulich's avatar
      x86-64: Modify copy_user_generic() alternatives mechanism · 1b1d9258
      Jan Beulich authored
      In order to avoid unnecessary chains of branches, rather than
      implementing copy_user_generic() as a function consisting of
      just a single (possibly patched) branch, instead properly deal
      with patching call instructions in the alternative instructions
      framework, and move the patching into the callers.
      
      As a follow-on, one could also introduce something like
      __EXPORT_SYMBOL_ALT() to avoid patching call sites in modules.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B2BB8180200007800026AE7@vpn.id2.novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1b1d9258
    • Jan Beulich's avatar
      x86: Lift restriction on the location of FIX_BTMAP_* · 499a5f1e
      Jan Beulich authored
      The early ioremap fixmap entries cover half (or for 32-bit
      non-PAE, a quarter) of a page table, yet they got
      uncondtitionally aligned so far to a 256-entry boundary. This is
      not necessary if the range of page table entries anyway falls
      into a single page table.
      
      This buys back, for (theoretically) 50% of all configurations
      (25% of all non-PAE ones), at least some of the lowmem
      necessarily lost with commit e621bd18.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B2BB66F0200007800026AD6@vpn.id2.novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      499a5f1e
  7. 28 Dec, 2009 1 commit
    • Akinobu Mita's avatar
      x86, core: Optimize hweight32() · 39d997b5
      Akinobu Mita authored
      Optimize hweight32 by using the same technique in hweight64.
      
      The proof of this technique can be found in the commit log for
      f9b41929 ("bitops: hweight()
      speedup").
      
      The userspace benchmark on x86_32 showed 20% speedup with
      bitmap_weight() which uses hweight32 to count bits for each
      unsigned long on 32bit architectures.
      
       int main(void)
       {
      	#define SZ (1024 * 1024 * 512)
      
      	static DECLARE_BITMAP(bitmap, SZ) = {
      	        [0 ... 100] = 1,
      	};
      
      	return bitmap_weight(bitmap, SZ);
       }
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1258603932-4590-1-git-send-email-akinobu.mita@gmail.com>
      [ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      39d997b5
  8. 24 Dec, 2009 17 commits