1. 18 Dec, 2010 5 commits
    • Santosh Shilimkar's avatar
      omap4: l2x0: Enable early BRESP bit · b89cd71a
      Santosh Shilimkar authored
      The AXI protocol specifies that the write response can only
      be sent back to an AXI master when the last write data has been
      accepted. This optimization enables the PL310 to send the write
      response of certain write transactions as soon as the store buffer
      accepts the write address. This behavior is not compatible with
      the AXI protocol and is disabled by default. You enable this
      optimization by setting the Early BRESP Enable bit in the
      Auxiliary Control Register (bit [30]).
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Tested-by: default avatarNishanth Menon <nm@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      b89cd71a
    • Santosh Shilimkar's avatar
      omap4: l2x0: Set share override bit · b0f20ff9
      Santosh Shilimkar authored
      Clearing bit 22 in the PL310 Auxiliary Control register (shared
      attribute override enable) has the side effect of transforming Normal
      Shared Non-cacheable reads into Cacheable no-allocate reads.
      
      Coherent DMA buffers in Linux always have a Cacheable alias via the
      kernel linear mapping and the processor can speculatively load cache
      lines into the PL310 controller. With bit 22 cleared, Non-cacheable
      reads would unexpectedly hit such cache lines leading to buffer
      corruption
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: default avatarNishanth Menon <nm@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      b0f20ff9
    • Mans Rullgard's avatar
      omap4: l2x0: enable instruction and data prefetching · 11e02640
      Mans Rullgard authored
      Enabling L2 prefetching improves performance as shown on Panda
      ES2.1 board with mem test, and it has measurable impact on
      performances. I think we should consider it, even though it damages
      "writes" a bit. (rebased to k.org)
      Usually the prefetch is used at both levels together L1 + L2, however,
      to enable the CP15 prefetch engines, these are under security, and on
      GP devices, we cannot enable it(e.g. on PandaBoard). However, just
      enabling PL310 prefetch seems to provide performance improvement,
      as shown in the data below (from Ubuntu) and would be a great thing
      to pull in.
      
      What prefetch does is enable automatic next line prefetching. With this
      enabled, whenever the PL310 receives a cachable read request, it
      automatically prefetches the following cache line as well.
      
      Measurement Data:
      ==
      STOCK 10.10 WITHOUT PATCH
      
      ========================
      ~# ./memspeed
      size    8388608 8192k 8M
      offset  8388608, 0
      buffers 0x2aaad000 0x2b2ad000
      copy  libc          133 MB/s
      copy  Android v5    273 MB/s
      copy  Android NEON  235 MB/s
      copy  INT32         116 MB/s
      copy  ASM ARM       187 MB/s
      copy  ASM VLDM 64   204 MB/s
      copy  ASM VLDM 128  173 MB/s
      copy  ASM VLD1      216 MB/s
      read  ASM ARM       286 MB/s
      read  ASM VLDM      242 MB/s
      read  ASM VLD1      286 MB/s
      write libc         1947 MB/s
      write ASM ARM      1943 MB/s
      write ASM VSTM     1942 MB/s
      write ASM VST1     1935 MB/s
      
      10.10 + PATCH
      =============
      ~# ./memspeed
      size    8388608 8192k 8M
      offset  8388608, 0
      buffers 0x2ab17000 0x2b317000
      copy  libc          129 MB/s
      copy  Android v5    256 MB/s
      copy  Android NEON  356 MB/s
      copy  INT32         127 MB/s
      copy  ASM ARM       321 MB/s
      copy  ASM VLDM 64   337 MB/s
      copy  ASM VLDM 128  321 MB/s
      copy  ASM VLD1      350 MB/s
      read  ASM ARM       496 MB/s
      read  ASM VLDM      470 MB/s
      read  ASM VLD1      488 MB/s
      write libc         1701 MB/s
      write ASM ARM      1682 MB/s
      write ASM VSTM     1693 MB/s
      write ASM VST1     1681 MB/s
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: default avatarNishanth Menon <nm@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      11e02640
    • Santosh Shilimkar's avatar
      omap4: l2x0: Construct the AUXCTRL value using defines · 1773e60a
      Santosh Shilimkar authored
      This patch removes the hardcoded value of auxctrl value and
      construct it using bitfields
      
      Bit 25 is reserved and is always set to 1. Same value
      of this bit is retained in this patch
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: default avatarNishanth Menon <nm@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      1773e60a
    • Santosh Shilimkar's avatar
      ARM: l2x0: Add aux control register bitfields · 0aaa6f8f
      Santosh Shilimkar authored
      This patch adds the PL310 Auxiliary Control Register bitfields
      so that SOC's can use these bit fields to construct the AUXCTRL
      value to be passed/programmed instead of hardcoding it.
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      0aaa6f8f
  2. 16 Dec, 2010 2 commits
  3. 15 Dec, 2010 16 commits
  4. 14 Dec, 2010 17 commits