• Kan Liang's avatar
    perf/x86/intel/lbr: Support Architectural LBR · 47125db2
    Kan Liang authored
    Last Branch Records (LBR) enables recording of software path history by
    logging taken branches and other control flows within architectural
    registers now. Intel CPUs have had model-specific LBR for quite some
    time, but this evolves them into an architectural feature now.
    
    The main improvements of Architectural LBR implemented includes:
    - Linux kernel can support the LBR features without knowing the model
      number of the current CPU.
    - Architectural LBR capabilities can be enumerated by CPUID. The
      lbr_ctl_map is based on the CPUID Enumeration.
    - The possible LBR depth can be retrieved from CPUID enumeration. The
      max value is written to the new MSR_ARCH_LBR_DEPTH as the number of
      LBR entries.
    - A new IA32_LBR_CTL MSR is introduced to enable and configure LBRs,
      which replaces the IA32_DEBUGCTL[bit 0] and the LBR_SELECT MSR.
    - Each LBR record or entry is still comprised of three MSRs,
      IA32_LBR_x_FROM_IP, IA32_LBR_x_TO_IP and IA32_LBR_x_TO_IP.
      But they become the architectural MSRs.
    - Architectural LBR is stack-like now. Entry 0 is always the youngest
      branch, entry 1 the next youngest... The TOS MSR has been removed.
    
    The way to enable/disable Architectural LBR is similar to the previous
    model-specific LBR. __intel_pmu_lbr_enable/disable() can be reused, but
    some modifications are required, which include:
    - MSR_ARCH_LBR_CTL is used to enable and configure the Architectural
      LBR.
    - When checking the value of the IA32_DEBUGCTL MSR, ignoring the
      DEBUGCTLMSR_LBR (bit 0) for Architectural LBR, which has no meaning
      and always return 0.
    - The FREEZE_LBRS_ON_PMI has to be explicitly set/clear, because
      MSR_IA32_DEBUGCTLMSR is not touched in __intel_pmu_lbr_disable() for
      Architectural LBR.
    - Only MSR_ARCH_LBR_CTL is cleared in __intel_pmu_lbr_disable() for
      Architectural LBR.
    
    Some Architectural LBR dedicated functions are implemented to
    reset/read/save/restore LBR.
    - For reset, writing to the ARCH_LBR_DEPTH MSR clears all Arch LBR
      entries, which is a lot faster and can improve the context switch
      latency.
    - For read, the branch type information can be retrieved from
      the MSR_ARCH_LBR_INFO_*. But it's not fully compatible due to
      OTHER_BRANCH type. The software decoding is still required for the
      OTHER_BRANCH case.
      LBR records are stored in the age order as well. Reuse
      intel_pmu_store_lbr(). Check the CPUID enumeration before accessing
      the corresponding bits in LBR_INFO.
    - For save/restore, applying the fast reset (writing ARCH_LBR_DEPTH).
      Reading 'lbr_from' of entry 0 instead of the TOS MSR to check if the
      LBR registers are reset in the deep C-state. If 'the deep C-state
      reset' bit is not set in CPUID enumeration, ignoring the check.
      XSAVE support for Architectural LBR will be implemented later.
    
    The number of LBR entries cannot be hardcoded anymore, which should be
    retrieved from CPUID enumeration. A new structure
    x86_perf_task_context_arch_lbr is introduced for Architectural LBR.
    Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/1593780569-62993-15-git-send-email-kan.liang@linux.intel.com
    47125db2
core.c 147 KB