1. 06 Nov, 2013 12 commits
    • Vineet Gupta's avatar
      ARC: Fix bogus gcc warning and micro-optimise TLB iteration loop · 0a4c40a3
      Vineet Gupta authored
      ```--------------->8----------------------
      arch/arc/mm/tlb.c: In function ‘do_tlb_overlap_fault’:
      arch/arc/mm/tlb.c:688:13: warning: array subscript is above array bounds
      [-Warray-bounds]
               (pd0[n] & PAGE_MASK)) {
                   ^
      ```
      
      --------------->8----------------------
      
      While at it, remove the usless last iteration of outer loop when reading
      a TLB SET for duplicate entries.
      Suggested-by: default avatarMischa Jonker <mjonker@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      0a4c40a3
    • Vineet Gupta's avatar
      ARC: Add support for irqflags tracing and lockdep · 0dafafc3
      Vineet Gupta authored
      Lockdep required a small fix to stacktrace API which was incorrectly
      unwindign out of __switch_to for the current call frame.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      0dafafc3
    • Vineet Gupta's avatar
      ARC: Reset the value of Interrupt Priority Register · 54c8bff1
      Vineet Gupta authored
      In case bootloader has changed the priority of one/more IRQ lines
      Reported-by: default avatarNoam Camus <noamc@ezchip.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      54c8bff1
    • Vineet Gupta's avatar
      ARC: Reduce #ifdef'ery for unaligned access emulation · 07ba69a4
      Vineet Gupta authored
      Emulation not enabled is treated as if the fixup failed, so no need for
      special #ifdef checks.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      07ba69a4
    • Vineet Gupta's avatar
      ARC: Change calling convention of do_page_fault() · 21a63b56
      Vineet Gupta authored
      switch the args (address, pt_regs) to match with all the other "C"
      exception handlers.
      
      This removes the awkwardness in EV_ProtV for page fault vs. unaligned
      access.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      21a63b56
    • Vineet Gupta's avatar
      ARC: cacheflush optim - PTAG can be loop invariant if V-P is const · d4599baf
      Vineet Gupta authored
      Line op needs vaddr (indexing) and paddr (tag match). For page sized
      flushes (V-P const), each line op will need a different index, but the
      tag bits wil remain constant, hence paddr can be setup once outside the
      loop.
      
      This improves select LMBench numbers for Aliasing dcache where we have
      more "preventive" cache flushing.
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.11-rc7- Linux 3.11.0-   80 4.66 8.88 69.7 112. 268. 8.60 28.0 3489 13.K 27.K	# Non alias ARC700
      3.11-rc7- Linux 3.11.0-   80 4.64 8.51 68.6 98.5 271. 8.58 28.1 4160 15.K 32.K	# Aliasing
      3.11-rc7- Linux 3.11.0-   80 4.64 8.51 69.8 99.4 270. 8.73 27.5 3880 15.K 31.K	# PTAG loop Inv
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      d4599baf
    • Vineet Gupta's avatar
      ARC: cacheflush refactor #3: Unify the {d,i}cache flush leaf helpers · bd12976c
      Vineet Gupta authored
      With Line length being constant now, we can fold the 2 helpers into 1.
      This allows applying any optimizations (forthcoming) to single place.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      bd12976c
    • Vineet Gupta's avatar
      ARC: cacheflush refactor #2: I and D caches lines to have same size · 63d2dfdb
      Vineet Gupta authored
      Having them be different seems an obscure configuration.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      63d2dfdb
    • Vineet Gupta's avatar
      ARC: cacheflush refactor #1: push aux reg ascertaining into leaf routine · f3e4de32
      Vineet Gupta authored
      ARC dcache supports 3 ops - Inv, Flush, Flush-n-Inv.
      The programming model however provides 2 commands FLUSH, INV.
      INV will either discard or flush-n-discard (based on DT_CTRL bit)
      
      The leaf helper __dc_line_loop() used to take the AUX register
      (corresponding to the 2 commands). Now we push that to within the
      helper, paving way for code consolidations to follow.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      f3e4de32
    • Vineet Gupta's avatar
      064a6269
    • Vineet Gupta's avatar
      ARC: Annotate some functions as static · 8e457d6a
      Vineet Gupta authored
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      8e457d6a
    • Christoph Lameter's avatar
      arc: Replace __get_cpu_var uses · 6855e95c
      Christoph Lameter authored
      __get_cpu_var() is used for multiple purposes in the kernel source. One of them is
      address calculation via the form &__get_cpu_var(x). This calculates the address for
      the instance of the percpu variable of the current processor based on an offset.
      
      Other use cases are for storing and retrieving data from the current processors percpu area.
      __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store and retrieve operations
      could use a segment prefix (or global register on other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use
      optimized assembly code to read and write per cpu variables.
      
      This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr()
      or into a use of this_cpu operations that use the offset. Thereby address calcualtions are avoided
      and less registers are used when code is generated.
      
      At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.
      
      The patchset includes passes over all arches as well. Once these operations are used throughout then
      specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by
      f.e. using a global register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu variable.
      
      	DEFINE_PER_CPU(int, u);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(this_cpu_ptr(&x), y, sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	this_cpu_inc(y)
      Acked-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      6855e95c
  2. 03 Nov, 2013 3 commits
  3. 02 Nov, 2013 2 commits
  4. 01 Nov, 2013 20 commits
  5. 31 Oct, 2013 3 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (fixes from Andrew Morton) · 4f794ee8
      Linus Torvalds authored
      Merge four more fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib/scatterlist.c: don't flush_kernel_dcache_page on slab page
        mm: memcg: fix test for child groups
        mm: memcg: lockdep annotation for memcg OOM lock
        mm: memcg: use proper memcg in limit bypass
      4f794ee8
    • Ming Lei's avatar
      lib/scatterlist.c: don't flush_kernel_dcache_page on slab page · 3d77b50c
      Ming Lei authored
      Commit b1adaf65 ("[SCSI] block: add sg buffer copy helper
      functions") introduces two sg buffer copy helpers, and calls
      flush_kernel_dcache_page() on pages in SG list after these pages are
      written to.
      
      Unfortunately, the commit may introduce a potential bug:
      
       - Before sending some SCSI commands, kmalloc() buffer may be passed to
         block layper, so flush_kernel_dcache_page() can see a slab page
         finally
      
       - According to cachetlb.txt, flush_kernel_dcache_page() is only called
         on "a user page", which surely can't be a slab page.
      
       - ARCH's implementation of flush_kernel_dcache_page() may use page
         mapping information to do optimization so page_mapping() will see the
         slab page, then VM_BUG_ON() is triggered.
      
      Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
      and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
      before calling flush_kernel_dcache_page().
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Reported-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Tested-by: default avatarSimon Baatz <gmbnomis@gmail.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d77b50c
    • Johannes Weiner's avatar
      mm: memcg: fix test for child groups · 696ac172
      Johannes Weiner authored
      When memcg code needs to know whether any given memcg has children, it
      uses the cgroup child iteration primitives and returns true/false
      depending on whether the iteration loop is executed at least once or
      not.
      
      Because a cgroup's list of children is RCU protected, these primitives
      require the RCU read-lock to be held, which is not the case for all
      memcg callers.  This results in the following splat when e.g.  enabling
      hierarchy mode:
      
        WARNING: CPU: 3 PID: 1 at kernel/cgroup.c:3043 css_next_child+0xa3/0x160()
        CPU: 3 PID: 1 Comm: systemd Not tainted 3.12.0-rc5-00117-g83f11a9c-dirty #18
        Hardware name: LENOVO 3680B56/3680B56, BIOS 6QET69WW (1.39 ) 04/26/2012
        Call Trace:
          dump_stack+0x54/0x74
          warn_slowpath_common+0x78/0xa0
          warn_slowpath_null+0x1a/0x20
          css_next_child+0xa3/0x160
          mem_cgroup_hierarchy_write+0x5b/0xa0
          cgroup_file_write+0x108/0x2a0
          vfs_write+0xbd/0x1e0
          SyS_write+0x4c/0xa0
          system_call_fastpath+0x16/0x1b
      
      In the memcg case, we only care about children when we are attempting to
      modify inheritable attributes interactively.  Racing with deletion could
      mean a spurious -EBUSY, no problem.  Racing with addition is handled
      just fine as well through the memcg_create_mutex: if the child group is
      not on the list after the mutex is acquired, it won't be initialized
      from the parent's attributes until after the unlock.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      696ac172