1. 04 Jul, 2019 1 commit
  2. 03 Jul, 2019 5 commits
    • Thomas Gleixner's avatar
      x86/fsgsbase: Revert FSGSBASE support · 049331f2
      Thomas Gleixner authored
      The FSGSBASE series turned out to have serious bugs and there is still an
      open issue which is not fully understood yet.
      
      The confidence in those changes has become close to zero especially as the
      test cases which have been shipped with that series were obviously never
      run before sending the final series out to LKML.
      
        ./fsgsbase_64 >/dev/null
        Segmentation fault
      
      As the merge window is close, the only sane decision is to revert FSGSBASE
      support. The revert is necessary as this branch has been merged into
      perf/core already and rebasing all of that a few days before the merge
      window is not the most brilliant idea.
      
      I could definitely slap myself for not noticing the test case fail when
      merging that series, but TBH my expectations weren't that low back
      then. Won't happen again.
      
      Revert the following commits:
      539bca53 ("x86/entry/64: Fix and clean up paranoid_exit")
      2c7b5ac5 ("Documentation/x86/64: Add documentation for GS/FS addressing mode")
      f987c955 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2")
      2032f1f9 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
      5bf0cab6 ("x86/entry/64: Document GSBASE handling in the paranoid path")
      708078f6 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
      79e1932f ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      1d07316b ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
      f60a83df ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace")
      1ab5f3f7 ("x86/process/64: Use FSBSBASE in switch_to() if available")
      a86b4625 ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
      8b71340d ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions")
      b64ed19b ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      049331f2
    • Andy Lutomirski's avatar
      selftests/x86/fsgsbase: Fix some test case bugs · 697096b1
      Andy Lutomirski authored
      This refactors do_unexpected_base() to clean up some code.  It also
      fixes the following bugs in test_ptrace_write_gsbase():
      
       - Incorrect printf() format string caused crashes.
      
       - Hardcoded 0x7 for the gs selector was not reliably correct.
      
      It also documents the fact that the test is expected to fail on old
      kernels.
      
      Fixes: a87730cc ("selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with FSGSBASE")
      Fixes: 1b6858d5 ("selftests/x86/fsgsbase: Test ptracer-induced GSBASE write")
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc:  "BaeChang Seok" <chang.seok.bae@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: "BaeChang Seok" <chang.seok.bae@intel.com>
      Link: https://lkml.kernel.org/r/bab29c84f2475e2c30ddb00f1b877fcd7f4f96a8.1562125333.git.luto@kernel.org
      
      697096b1
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.3-20190703' of... · a328a259
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.3-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf metrics:
      
        Andi Kleen:
      
        - Fixes for SkylakeX and CascadeLakeX Intel vendor events.
      
        - Avoid extra ':' for --raw metrics.
      
        - Don't include duration_time in group.
      
      perf script:
      
        Arnaldo Carvalho de Melo/Jiri Olsa:
      
        - Fix processing guest samples.
      
      perf diff:
      
        Jin Yao:
      
        - Do diffs by basic blocks.
      
      objtool:
      
        Jiri Olsa:
      
        - Fix build by linking against tools/lib/ctype.o sources.
      
      perf pmu:
      
        John Garry:
      
        - Support more complex PMU event aliasing.
      
        - Add support for Hisi hip08 DDRC, HHA and L3C PMU aliasing.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a328a259
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.3-20190701' of... · a041ede0
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.3-20190701' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf annotate:
      
        Mao Han:
      
        - Add support for the csky processor architecture.
      
      perf stat:
      
        Andi Kleen:
      
        - Fix metrics with --no-merge.
      
        - Don't merge events in the same PMU.
      
        - Fix group lookup for metric group.
      
      Intel PT:
      
        Adrian Hunter:
      
        - Improve CBR (Core to Bus Ratio) packets support.
      
        - Fix thread stack return from kernel for kernel only case.
      
        - Export power and ptwrite events to sqlite and postgresql.
      
      core libraries:
      
        Arnaldo Carvalho de Melo:
      
        - Find routines in tools/perf/util/ that have implementations in the kernel
          libraries (lib/*.c), such as strreplace(), strim(), skip_spaces() and reuse
          them after making a copy into tools/lib and tools/include/.
      
          This continues the effort of having tools/ code looking as much as possible
          like kernel source code, to help encourage people to work on both the kernel
          and in tools hosted in the kernel sources.
      
          That in turn will help moving stuff that uses those routines to
          tools/lib/perf/ where they will be made available for use in other tools.
      
          In the process ditch old cruft, remove unused variables and add missing
          include directives for headers providing things used in places that were
          building by sheer luck.
      
        Kyle Meyer:
      
        - Bump MAX_NR_CPUS and MAX_CACHES to get these tools to work on more machines.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a041ede0
    • Arnaldo Carvalho de Melo's avatar
      perf script: Allow specifying the files to process guest samples · 15a108af
      Arnaldo Carvalho de Melo authored
      The 'perf kvm' command set up things so that we can record, report, top,
      etc, but not 'script', so make 'perf script' be able to process samples
      by allowing to pass guest kallsyms, vmlinux, modules, etc, and if at
      least one of those is provided, set perf_guest to true so that guest
      samples get properly resolved.
      
      Testing it:
      
        # perf kvm --guest --guestkallsyms /wb/rhel6.kallsyms --guestmodules /wb/rhel6.modules record -e cycles:Gk
      ^C[ perf record: Woken up 7 times to write data ]
        [ perf record: Captured and wrote 3.602 MB perf.data.guest (10492 samples) ]
      
        #
        # perf evlist -i perf.data.guest
      cycles:Gk
        # perf evlist -v -i perf.data.guest
      cycles:Gk: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_user: 1, exclude_hv: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_host: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        #
        # perf kvm --guestkallsyms /wb/rhel6.kallsyms --guestmodules /wb/rhel6.modules report --stdio -s sym | head -30
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 10K of event 'cycles:Gk'
        # Event count (approx.): 2434201408
        #
        # Overhead  Symbol
        # ........  ..............................................
        #
            11.93%  [g] avtab_search_node
             3.95%  [g] sidtab_context_to_sid
             2.41%  [g] n_tty_write
             2.20%  [g] _spin_unlock_irqrestore
             1.37%  [g] _aesni_dec4
             1.33%  [g] kmem_cache_alloc
             1.07%  [g] native_write_cr0
             0.99%  [g] kfree
             0.95%  [g] _spin_lock
             0.91%  [g] __memset
             0.87%  [g] schedule
             0.83%  [g] _spin_lock_irqsave
             0.76%  [g] __kmalloc
             0.67%  [g] avc_has_perm_noaudit
             0.66%  [g] kmem_cache_free
             0.65%  [g] glue_xts_crypt_128bit
             0.59%  [g] __d_lookup
             0.59%  [g] __audit_syscall_exit
             0.56%  [g] __memcpy
        #
      
      Then, when trying to use perf script to generate a python script and
      then process the events after adding a python hook for non-tracepoint
      events:
      
        # perf script -i perf.data.guest -g python
        generated Python script: perf-script.py
        # vim perf-script.py
        # tail -2 perf-script.py
        def process_event(param_dict):
              print(param_dict["symbol"])
        #
        # perf script -i perf.data.guest -s perf-script.py  | head
        in trace_begin
        vmx_vmexit
        vmx_vmexit
        vmx_vmexit
        vmx_vmexit
        vmx_vmexit
        vmx_vmexit
        vmx_vmexit
        vmx_vmexit
        vmx_vmexit
        231
        #
      
      We'd see just the vmx_vmexit, i.e. the samples from the guest don't show
      up.
      
      After this patch:
      
        # perf script --guestkallsyms /wb/rhel6.kallsyms --guestmodules /wb/rhel6.modules -i perf.data.guest -s perf-script.py 2> /dev/null | head -30
        in trace_begin
        apic_timer_interrupt
        apic_timer_interrupt
        apic_timer_interrupt
        apic_timer_interrupt
        apic_timer_interrupt
        save_args
        do_timer
        drain_array
        inode_permission
        avc_has_perm_noaudit
        run_timer_softirq
        apic_timer_interrupt
        apic_timer_interrupt
        apic_timer_interrupt
        apic_timer_interrupt
        apic_timer_interrupt
        kvm_guest_apic_eoi_write
        run_posix_cpu_timers
        _spin_lock
        handle_pte_fault
        rcu_irq_enter
        delay_tsc
        delay_tsc
        native_read_tsc
        apic_timer_interrupt
        sys_open
        internal_add_timer
        list_del
        rcu_exit_nohz
        #
      
      Jiri Olsa noticed we need to set 'perf_guest' to true if we want to
      process guest samples and I made it be set if one of the guest files
      settings get set via the command line options added in this patch, that
      match those present in the 'perf kvm' command.
      
      We probably want to have 'perf record', 'perf report' etc to notice that
      there are guest samples and do the right thing, which is to look for
      files with some suffix that make it be associated with the guest used to
      collect the samples, i.e. if a vmlinux file is passed, we can get the
      build-id from it, if not some other identifier or simply looking for
      "kallsyms.guest", for instance, in the current directory.
      Reported-by: default avatarMariano Pache <npache@redhat.com>
      Tested-by: default avatarMariano Pache <npache@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
      Cc: Ali Raza <alirazabhutta.10@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Orran Krieger <okrieger@redhat.com>
      Cc: Ramkumar Ramachandra <artagnon@gmail.com>
      Cc: Yunlong Song <yunlong.song@huawei.com>
      Link: https://lkml.kernel.org/n/tip-d54gj64rerlxcqsrod05biwn@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      15a108af
  3. 02 Jul, 2019 29 commits
  4. 28 Jun, 2019 2 commits
    • Ricardo Neri's avatar
      x86/mtrr: Skip cache flushes on CPUs with cache self-snooping · fd329f27
      Ricardo Neri authored
      Programming MTRR registers in multi-processor systems is a rather lengthy
      process. Furthermore, all processors must program these registers in lock
      step and with interrupts disabled; the process also involves flushing
      caches and TLBs twice. As a result, the process may take a considerable
      amount of time.
      
      On some platforms, this can lead to a large skew of the refined-jiffies
      clock source. Early when booting, if no other clock is available (e.g.,
      booting with hpet=disabled), the refined-jiffies clock source is used to
      monitor the TSC clock source. If the skew of refined-jiffies is too large,
      Linux wrongly assumes that the TSC is unstable:
      
        clocksource: timekeeping watchdog on CPU1: Marking clocksource
                     'tsc-early' as unstable because the skew is too large:
        clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
                     fffedb90 mask: ffffffff
        clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
                     mask: ffffffffffffffff
        tsc: Marking TSC unstable due to clocksource watchdog
      
      As per measurements, around 98% of the time needed by the procedure to
      program MTRRs in multi-processor systems is spent flushing caches with
      wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
      Architectures Software Developer's Manual, it is not necessary to flush
      caches if the CPU supports cache self-snooping. Thus, skipping the cache
      flushes can reduce by several tens of milliseconds the time needed to
      complete the programming of the MTRR registers:
      
      Platform                      	Before	   After
      104-core (208 Threads) Skylake  1437ms      28ms
        2-core (  4 Threads) Haswell   114ms       2ms
      Reported-by: default avatarMohammad Etemadi <mohammad.etemadi@intel.com>
      Signed-off-by: default avatarRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Alan Cox <alan.cox@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jordan Borgner <mail@jordan-borgner.de>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Link: https://lkml.kernel.org/r/1561689337-19390-3-git-send-email-ricardo.neri-calderon@linux.intel.com
      fd329f27
    • Ricardo Neri's avatar
      x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata · 1e03bff3
      Ricardo Neri authored
      Processors which have self-snooping capability can handle conflicting
      memory type across CPUs by snooping its own cache. However, there exists
      CPU models in which having conflicting memory types still leads to
      unpredictable behavior, machine check errors, or hangs.
      
      Clear this feature on affected CPUs to prevent its use.
      Suggested-by: default avatarAlan Cox <alan.cox@intel.com>
      Signed-off-by: default avatarRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jordan Borgner <mail@jordan-borgner.de>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Mohammad Etemadi <mohammad.etemadi@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Link: https://lkml.kernel.org/r/1561689337-19390-2-git-send-email-ricardo.neri-calderon@linux.intel.com
      1e03bff3
  5. 26 Jun, 2019 3 commits