1. 31 Jul, 2018 6 commits
  2. 30 Jul, 2018 16 commits
  3. 24 Jul, 2018 18 commits
    • Nicholas Piggin's avatar
      tty: hvc: remove unexplained "just in case" spin delay · cca3d529
      Nicholas Piggin authored
      This delay was in the very first OPAL console commit 6.5 years ago,
      and came from the vio hvc driver. The firmware console has hardened
      sufficiently to remove it.
      Reviewed-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cca3d529
    • Nicholas Piggin's avatar
      powerpc/powernv: implement opal_put_chars_atomic · 17cc1dd4
      Nicholas Piggin authored
      The RAW console does not need writes to be atomic, so relax
      opal_put_chars to be able to do partial writes, and implement an
      _atomic variant which does not take a spinlock. This API is used
      in xmon, so the less locking that is used, the better chance there
      is that a crash can be debugged.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      17cc1dd4
    • Nicholas Piggin's avatar
      powerpc/powernv: move opal console flushing to udbg · ac4ac788
      Nicholas Piggin authored
      OPAL console writes do not have to synchronously flush firmware /
      hardware buffers unless they are going through the udbg path.
      
      Remove the unconditional flushing from opal_put_chars. Flush if
      there was no space in the buffer as an optimisation (callers loop
      waiting for success in that case). udbg flushing is moved to
      udbg_opal_putc.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ac4ac788
    • Nicholas Piggin's avatar
      powerpc/powernv: Remove OPALv1 support from opal console driver · b74d2807
      Nicholas Piggin authored
      opal_put_chars deals with partial writes because in OPALv1,
      opal_console_write_buffer_space did not work correctly. That firmware
      is not supported.
      
      This reworks the opal_put_chars code to no longer deal with partial
      writes by turning them into full writes. Partial write handling is still
      supported in terms of what gets returned to the caller, but it may not
      go to the console atomically. A warning message is printed in this
      case.
      
      This allows console flushing to be moved out of the opal_write_lock
      spinlock. That could cause the lock to be held for long periods if the
      console is busy (especially if it was being spammed by firmware),
      which is dangerous because the lock is taken by xmon to debug the
      system. Flushing outside the lock improves the situation a bit.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b74d2807
    • Nicholas Piggin's avatar
      powerpc/powernv: Implement and use opal_flush_console · d2a2262e
      Nicholas Piggin authored
      A new console flushing firmware API was introduced to replace event
      polling loops, and implemented in opal-kmsg with affddff6
      ("powerpc/powernv: Add a kmsg_dumper that flushes console output on
      panic"), to flush the console in the panic path.
      
      The OPAL console driver has other situations where interrupts are off
      and it needs to flush the console synchronously. These still use a
      polling loop.
      
      So move the opal-kmsg flush code to opal_flush_console, and use the
      new function in opal-kmsg and opal_put_chars.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d2a2262e
    • Nicholas Piggin's avatar
      powerpc/powernv: opal-kmsg use flush fallback from console code · e00da0f2
      Nicholas Piggin authored
      Use the more refined and tested event polling loop from opal_put_chars
      as the fallback console flush in the opal-kmsg path. This loop is used
      by the console driver today, whereas the opal-kmsg fallback is not
      likely to have been used for years.
      
      Use WARN_ONCE rather than a printk when the fallback is invoked to
      prepare for moving the console flush into a common function.
      Reviewed-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e00da0f2
    • Nicholas Piggin's avatar
      powerpc/powernv: opal-kmsg standardise OPAL_BUSY handling · 3a80bfc7
      Nicholas Piggin authored
      OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY,
      so implement the standard OPAL_BUSY handling for it.
      Reviewed-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3a80bfc7
    • Nicholas Piggin's avatar
      powerpc/powernv: Fix OPAL console driver OPAL_BUSY loops · 36d2dabc
      Nicholas Piggin authored
      The OPAL console driver does not delay in case it gets OPAL_BUSY or
      OPAL_BUSY_EVENT from firmware.
      
      It can't yet be made to sleep because it is called under spinlock,
      but it can be changed to the standard OPAL_BUSY loop form, and a
      delay added to keep it from hitting the firmware too frequently.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      36d2dabc
    • Nicholas Piggin's avatar
      powerpc/powernv: opal_put_chars partial write fix · bd90284c
      Nicholas Piggin authored
      The intention here is to consume and discard the remaining buffer
      upon error. This works if there has not been a previous partial write.
      If there has been, then total_len is no longer total number of bytes
      to copy. total_len is always "bytes left to copy", so it should be
      added to written bytes.
      
      This code may not be exercised any more if partial writes will not be
      hit, but this is a small bugfix before a larger change.
      Reviewed-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bd90284c
    • Mukesh Ojha's avatar
      powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler · b29336c0
      Mukesh Ojha authored
      Fixes: 8034f715 ("powernv/opal-dump: Convert to irq domain")
      
      Converts all the return explicit number to a more proper IRQ_HANDLED,
      which looks proper incase of interrupt handler returning case.
      
      Here, It also removes error message like "nobody cared" which was
      getting unveiled while returning -1 or 0 from handler.
      Signed-off-by: default avatarMukesh Ojha <mukesh02@linux.vnet.ibm.com>
      Reviewed-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b29336c0
    • Mukesh Ojha's avatar
      powerpc/powernv/opal-dump : Handles opal_dump_info properly · a5bbe8fd
      Mukesh Ojha authored
      Moves the return value check of 'opal_dump_info' to a proper place which
      was previously unnecessarily filling all the dump info even on failure.
      Signed-off-by: default avatarMukesh Ojha <mukesh02@linux.vnet.ibm.com>
      Acked-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: default avatarJeremy Kerr <jk@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a5bbe8fd
    • Cyril Bur's avatar
      powerpc/tm: Remove struct thread_info param from tm_reclaim_thread() · edd00b83
      Cyril Bur authored
      Since commit dc310669 ("powerpc: tm: Always use fp_state and
      vr_state to store live registers") tm_reclaim_thread() doesn't use the
      parameter anymore, both callers have to bother getting it as they have
      no need for a struct thread_info either.
      
      Just remove it and adjust the callers.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      edd00b83
    • Cyril Bur's avatar
      powerpc/tm: Update function prototype comment · a596a7e9
      Cyril Bur authored
      In commit eb5c3f1c ("powerpc: Always save/restore checkpointed regs
      during treclaim/trecheckpoint") __tm_recheckpoint was modified to no
      longer take the second parameter 'unsigned long orig_msr' as part of a
      TM rewrite to simplify the reclaiming/recheckpointing process.
      
      There is a comment in the asm file where the function is delcared which
      has an incorrect prototype with the 'orig_msr' parameter.
      
      This patch corrects the comment.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a596a7e9
    • Simon Guo's avatar
      selftests/powerpc: Update memcmp_64 selftest for VMX implementation · c827ac45
      Simon Guo authored
      This patch reworked selftest memcmp_64 so that memcmp selftest can
      cover more test cases.
      
      It adds testcases for:
      - memcmp over 4K bytes size.
      - s1/s2 with different/random offset on 16 bytes boundary.
      - enter/exit_vmx_ops pairness.
      Signed-off-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      [mpe: Add -maltivec to fix build on some toolchains]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c827ac45
    • Simon Guo's avatar
      powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() · c2a4e54e
      Simon Guo authored
      This patch is based on the previous VMX patch on memcmp().
      
      To optimize ppc64 memcmp() with VMX instruction, we need to think about
      the VMX penalty brought with: If kernel uses VMX instruction, it needs
      to save/restore current thread's VMX registers. There are 32 x 128 bits
      VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store.
      
      The major concern regarding the memcmp() performance in kernel is KSM,
      who will use memcmp() frequently to merge identical pages. So it will
      make sense to take some measures/enhancement on KSM to see whether any
      improvement can be done here.  Cyril Bur indicates that the memcmp() for
      KSM has a higher possibility to fail (unmatch) early in previous bytes
      in following mail.
      	https://patchwork.ozlabs.org/patch/817322/#1773629
      And I am taking a follow-up on this with this patch.
      
      Per some testing, it shows KSM memcmp() will fail early at previous 32
      bytes.  More specifically:
          - 76% cases will fail/unmatch before 16 bytes;
          - 83% cases will fail/unmatch before 32 bytes;
          - 84% cases will fail/unmatch before 64 bytes;
      So 32 bytes looks a better choice than other bytes for pre-checking.
      
      The early failure is also true for memcmp() for non-KSM case. With a
      non-typical call load, it shows ~73% cases fail before first 32 bytes.
      
      This patch adds a 32 bytes pre-checking firstly before jumping into VMX
      operations, to avoid the unnecessary VMX penalty. It is not limited to
      KSM case. And the testing shows ~20% improvement on memcmp() average
      execution time with this patch.
      
      And note the 32B pre-checking is only performed when the compare size
      is long enough (>=4K currently) to allow VMX operation.
      
      The detail data and analysis is at:
      https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.mdSigned-off-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c2a4e54e
    • Simon Guo's avatar
      powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision · d58badfb
      Simon Guo authored
      This patch add VMX primitives to do memcmp() in case the compare size
      is equal or greater than 4K bytes. KSM feature can benefit from this.
      
      Test result with following test program(replace the "^>" with ""):
      ------
      ># cat tools/testing/selftests/powerpc/stringloops/memcmp.c
      >#include <malloc.h>
      >#include <stdlib.h>
      >#include <string.h>
      >#include <time.h>
      >#include "utils.h"
      >#define SIZE (1024 * 1024 * 900)
      >#define ITERATIONS 40
      
      int test_memcmp(const void *s1, const void *s2, size_t n);
      
      static int testcase(void)
      {
              char *s1;
              char *s2;
              unsigned long i;
      
              s1 = memalign(128, SIZE);
              if (!s1) {
                      perror("memalign");
                      exit(1);
              }
      
              s2 = memalign(128, SIZE);
              if (!s2) {
                      perror("memalign");
                      exit(1);
              }
      
              for (i = 0; i < SIZE; i++)  {
                      s1[i] = i & 0xff;
                      s2[i] = i & 0xff;
              }
              for (i = 0; i < ITERATIONS; i++) {
      		int ret = test_memcmp(s1, s2, SIZE);
      
      		if (ret) {
      			printf("return %d at[%ld]! should have returned zero\n", ret, i);
      			abort();
      		}
      	}
      
              return 0;
      }
      
      int main(void)
      {
              return test_harness(testcase, "memcmp");
      }
      ------
      Without this patch (but with the first patch "powerpc/64: Align bytes
      before fall back to .Lshort in powerpc64 memcmp()." in the series):
      	4.726728762 seconds time elapsed                                          ( +-  3.54%)
      With VMX patch:
      	4.234335473 seconds time elapsed                                          ( +-  2.63%)
      		There is ~+10% improvement.
      
      Testing with unaligned and different offset version (make s1 and s2 shift
      random offset within 16 bytes) can archieve higher improvement than 10%..
      Signed-off-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d58badfb
    • Simon Guo's avatar
      powerpc: add vcmpequd/vcmpequb ppc instruction macro · f1ecbaf4
      Simon Guo authored
      Some old tool chains don't know about instructions like vcmpequd.
      
      This patch adds .long macro for vcmpequd and vcmpequb, which is
      a preparation to optimize ppc64 memcmp with VMX instructions.
      Signed-off-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f1ecbaf4
    • Simon Guo's avatar
      powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() · 2d9ee327
      Simon Guo authored
      Currently memcmp() 64bytes version in powerpc will fall back to .Lshort
      (compare per byte mode) if either src or dst address is not 8 bytes aligned.
      It can be opmitized in 2 situations:
      
      1) if both addresses are with the same offset with 8 bytes boundary:
      memcmp() can compare the unaligned bytes within 8 bytes boundary firstly
      and then compare the rest 8-bytes-aligned content with .Llong mode.
      
      2)  If src/dst addrs are not with the same offset of 8 bytes boundary:
      memcmp() can align src addr with 8 bytes, increment dst addr accordingly,
       then load src with aligned mode and load dst with unaligned mode.
      
      This patch optmizes memcmp() behavior in the above 2 situations.
      
      Tested with both little/big endian. Performance result below is based on
      little endian.
      
      Following is the test result with src/dst having the same offset case:
      (a similar result was observed when src/dst having different offset):
      (1) 256 bytes
      Test with the existing tools/testing/selftests/powerpc/stringloops/memcmp:
      - without patch
      	29.773018302 seconds time elapsed                                          ( +- 0.09% )
      - with patch
      	16.485568173 seconds time elapsed                                          ( +-  0.02% )
      		-> There is ~+80% percent improvement
      
      (2) 32 bytes
      To observe performance impact on < 32 bytes, modify
      tools/testing/selftests/powerpc/stringloops/memcmp.c with following:
      -------
       #include <string.h>
       #include "utils.h"
      
      -#define SIZE 256
      +#define SIZE 32
       #define ITERATIONS 10000
      
       int test_memcmp(const void *s1, const void *s2, size_t n);
      --------
      
      - Without patch
      	0.244746482 seconds time elapsed                                          ( +-  0.36%)
      - with patch
      	0.215069477 seconds time elapsed                                          ( +-  0.51%)
      		-> There is ~+13% improvement
      
      (3) 0~8 bytes
      To observe <8 bytes performance impact, modify
      tools/testing/selftests/powerpc/stringloops/memcmp.c with following:
      -------
       #include <string.h>
       #include "utils.h"
      
      -#define SIZE 256
      -#define ITERATIONS 10000
      +#define SIZE 8
      +#define ITERATIONS 1000000
      
       int test_memcmp(const void *s1, const void *s2, size_t n);
      -------
      - Without patch
             1.845642503 seconds time elapsed                                          ( +- 0.12% )
      - With patch
             1.849767135 seconds time elapsed                                          ( +- 0.26% )
      		-> They are nearly the same. (-0.2%)
      Signed-off-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2d9ee327