1. 25 Jun, 2019 9 commits
  2. 24 Jun, 2019 11 commits
    • Ian Rogers's avatar
      perf/cgroups: Don't rotate events for cgroups unnecessarily · fd7d5517
      Ian Rogers authored
      Currently perf_rotate_context assumes that if the context's nr_events !=
      nr_active a rotation is necessary for perf event multiplexing. With
      cgroups, nr_events is the total count of events for all cgroups and
      nr_active will not include events in a cgroup other than the current
      task's. This makes rotation appear necessary for cgroups when it is not.
      
      Add a perf_event_context flag that is set when rotation is necessary.
      Clear the flag during sched_out and set it when a flexible sched_in
      fails due to resources.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190601082722.44543-1-irogers@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fd7d5517
    • Jiri Olsa's avatar
      perf/x86/rapl: Get quirk state from new probe framework · 637d97b5
      Jiri Olsa authored
      Getting the apply_quirk bool from new rapl_model_match array.
      
      And because apply_quirk was the last remaining piece of data
      in rapl_cpu_match, replacing it with rapl_model_match as device
      table.
      
      The switch to new perf_msr_probe detection API is done.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-9-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      637d97b5
    • Jiri Olsa's avatar
      perf/x86/rapl: Get attributes from new probe framework · 5fc1bd84
      Jiri Olsa authored
      We no longer need model specific attribute arrays,
      because we get all this detected in rapl_events_attrs.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-8-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5fc1bd84
    • Jiri Olsa's avatar
      perf/x86/rapl: Get MSR values from new probe framework · 122f1c51
      Jiri Olsa authored
      There's no need to have special code for getting
      the bit and MSR value for given event. We can
      now easily get it from rapl_msrs array.
      
      Also getting rid of RAPL_IDX_*, which is no longer
      needed and replacing INTEL_RAPL* with PERF_RAPL*
      enums.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-7-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      122f1c51
    • Jiri Olsa's avatar
      perf/x86/rapl: Get rapl_cntr_mask from new probe framework · cd105aed
      Jiri Olsa authored
      We get rapl_cntr_mask from perf_msr_probe call, as a replacement
      for current intel_rapl_init_fun::cntr_mask value for each model.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-6-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cd105aed
    • Jiri Olsa's avatar
      perf/x86/rapl: Use new MSR detection interface · 5fb5273a
      Jiri Olsa authored
      Using perf_msr_probe function to probe for RAPL MSRs.
      
      Adding new rapl_model_match device table, that
      gathers events info for given model, following
      the MSR and cstate module design.
      
      It will replace the current rapl_cpu_match device
      table and detection code in following patches.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-5-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5fb5273a
    • Jiri Olsa's avatar
      perf/x86/cstate: Use new probe function · 8f2a28c5
      Jiri Olsa authored
      Using perf_msr_probe function to probe for cstate events.
      
      The functionality is the same, with one exception, that
      perf_msr_probe checks for rdmsr to return value != 0 for
      given MSR register.
      
      Using the new attribute groups and adding the events via
      pmu::attr_update.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-4-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8f2a28c5
    • Jiri Olsa's avatar
      perf/x86/msr: Use new probe function · dde5e720
      Jiri Olsa authored
      Using perf_msr_probe function to probe for msr events.
      
      The functionality is the same, with one exception, that
      perf_msr_probe checks for rdmsr to return value != 0 for
      given MSR register.
      
      Using the new attribute groups and adding the events via
      pmu::attr_update.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-3-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dde5e720
    • Jiri Olsa's avatar
      perf/x86: Add MSR probe interface · 98253a54
      Jiri Olsa authored
      Adding perf_msr_probe function to provide interface for
      checking up on MSR register and set the related attribute
      group visibility.
      
      User defines following struct for each MSR register:
      
        struct perf_msr {
             u64                       msr;
             struct attribute_group   *grp;
             bool                    (*test)(int idx, void *data);
             bool                      no_check;
        };
      
      Where:
        msr      - is the MSR address
        attrs    - is attribute groups array to add if the check passed
        test     - is test function pointer
        no_check - is bool that bypass the check and adds the
                    attribute without any test
      
      The array of struct perf_msr is passed into:
      
        perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void *data)
      
      Together with:
        cnt  - which is the number of struct msr array elements
        data - which is user pointer passed to the test function
        zero - allow counters that returns zero on rdmsr
      
      The perf_msr_probe will executed test code, read the MSR and
      check the value is != 0. If all these tests pass, related
      attribute group is kept visible.
      
      Also adding PMU_EVENT_GROUP macro helper to define attribute
      group for single attribute. It will be used in following patches.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Liang
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190616140358.27799-2-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      98253a54
    • Ingo Molnar's avatar
    • Ingo Molnar's avatar
      b9271f0c
  3. 23 Jun, 2019 5 commits
    • Fenghua Yu's avatar
      Documentation/ABI: Document umwait control sysfs interfaces · 203dffac
      Fenghua Yu authored
      Since two new sysfs interface files are created for umwait control, add
      an ABI document entry for the files:
      
         /sys/devices/system/cpu/umwait_control/enable_c02
         /sys/devices/system/cpu/umwait_control/max_time
      
      [ tglx: Made the write value instructions readable ]
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-6-git-send-email-fenghua.yu@intel.com
      203dffac
    • Fenghua Yu's avatar
      x86/umwait: Add sysfs interface to control umwait maximum time · bd9a0c97
      Fenghua Yu authored
      IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
      that processor can stay in C0.1 or C0.2. A zero value means no maximum
      time.
      
      Each instruction sets its own deadline in the instruction's implicit
      input EDX:EAX value. The instruction wakes up if the time-stamp counter
      reaches or exceeds the specified deadline, or the umwait maximum time
      expires, or a store happens in the monitored address range in umwait.
      
      The administrator can write an unsigned 32-bit number to
      /sys/devices/system/cpu/umwait_control/max_time to change the default
      value. Note that a value of zero means there is no limit. The lower two
      bits of the value must be zero.
      
      [ tglx: Simplify the write function. Massage changelog ]
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua.yu@intel.com
      bd9a0c97
    • Fenghua Yu's avatar
      x86/umwait: Add sysfs interface to control umwait C0.2 state · ff4b353f
      Fenghua Yu authored
      C0.2 state in umwait and tpause instructions can be enabled or disabled
      on a processor through IA32_UMWAIT_CONTROL MSR register.
      
      By default, C0.2 is enabled and the user wait instructions results in
      lower power consumption with slower wakeup time.
      
      But in real time systems which require faster wakeup time although power
      savings could be smaller, the administrator needs to disable C0.2 and all
      umwait invocations from user applications use C0.1.
      
      Create a sysfs interface which allows the administrator to control C0.2
      state during run time.
      
      Andy Lutomirski suggested to turn off local irqs before writing the MSR to
      ensure the cached control value is not changed by a concurrent sysfs write
      from a different CPU via IPI.
      
      [ tglx: Simplified the update logic in the write function and got rid of
        	all the convoluted type casts. Added a shared update function and
      	made the namespace consistent. Moved the sysfs create invocation.
      	Massaged changelog ]
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua.yu@intel.com
      ff4b353f
    • Fenghua Yu's avatar
      x86/umwait: Initialize umwait control values · bd688c69
      Fenghua Yu authored
      umwait or tpause allows the processor to enter a light-weight
      power/performance optimized state (C0.1 state) or an improved
      power/performance optimized state (C0.2 state) for a period specified by
      the instruction or until the system time limit or until a store to the
      monitored address range in umwait.
      
      IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
      the processor and to set the maximum time the processor can reside in C0.1
      or C0.2.
      
      By default C0.2 is enabled so the user wait instructions can enter the
      C0.2 state to save more power with slower wakeup time.
      
      Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
      default. A quote from Andy:
      
        "What I want to avoid is the case where it works dramatically differently
         on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
         behave a bit differently if the max timeout is hit, and I'd like that
         path to get exercised widely by making it happen even on default
         configs."
      
      A sysfs interface to adjust the time and the C0.2 enablement is provided in
      a follow up change.
      
      [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
        	MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
        	mask throughout the code.
      	Massaged comments and changelog ]
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
      bd688c69
    • Fenghua Yu's avatar
      x86/cpufeatures: Enumerate user wait instructions · 6dbbf5ec
      Fenghua Yu authored
      umonitor, umwait, and tpause are a set of user wait instructions.
      
      umonitor arms address monitoring hardware using an address. The
      address range is determined by using CPUID.0x5. A store to
      an address within the specified address range triggers the
      monitoring hardware to wake up the processor waiting in umwait.
      
      umwait instructs the processor to enter an implementation-dependent
      optimized state while monitoring a range of addresses. The optimized
      state may be either a light-weight power/performance optimized state
      (C0.1 state) or an improved power/performance optimized state
      (C0.2 state).
      
      tpause instructs the processor to enter an implementation-dependent
      optimized state C0.1 or C0.2 state and wake up when time-stamp counter
      reaches specified timeout.
      
      The three instructions may be executed at any privilege level.
      
      The instructions provide power saving method while waiting in
      user space. Additionally, they can allow a sibling hyperthread to
      make faster progress while this thread is waiting. One example of an
      application usage of umwait is when waiting for input data from another
      application, such as a user level multi-threaded packet processing
      engine.
      
      Availability of the user wait instructions is indicated by the presence
      of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].
      
      Detailed information on the instructions and CPUID feature WAITPKG flag
      can be found in the latest Intel Architecture Instruction Set Extensions
      and Future Features Programming Reference and Intel 64 and IA-32
      Architectures Software Developer's Manual.
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua.yu@intel.com
      6dbbf5ec
  4. 22 Jun, 2019 15 commits
    • Linus Torvalds's avatar
      Linux 5.2-rc6 · 4b972a01
      Linus Torvalds authored
      4b972a01
    • Linus Torvalds's avatar
      Merge tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 6698a71a
      Linus Torvalds authored
      Pull iommu fix from Joerg Roedel:
       "Revert a commit from the previous pile of fixes which causes new
        lockdep splats. It is better to revert it for now and work on a better
        and more well tested fix"
      
      * tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
      6698a71a
    • Peter Xu's avatar
      Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock" · 0aafc8ae
      Peter Xu authored
      This reverts commit 7560cc3c.
      
      With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt:
      
          ======================================================
          WARNING: possible circular locking dependency detected
          5.2.0-rc5 #78 Not tainted
          ------------------------------------------------------
          swapper/0/1 is trying to acquire lock:
          00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
          but task is already holding lock:
          00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (device_domain_lock){....}:
                 _raw_spin_lock_irqsave+0x3c/0x50
                 dmar_insert_one_dev_info+0xbb/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
          -> #0 (&(&iommu->lock)->rlock){+.+.}:
                 lock_acquire+0x9e/0x170
                 _raw_spin_lock+0x25/0x30
                 domain_context_mapping_one+0xa5/0x4e0
                 pci_for_each_dma_alias+0x30/0x140
                 dmar_insert_one_dev_info+0x3b2/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
      
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(device_domain_lock);
                                         lock(&(&iommu->lock)->rlock);
                                         lock(device_domain_lock);
            lock(&(&iommu->lock)->rlock);
      
           *** DEADLOCK ***
          2 locks held by swapper/0/1:
           #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
           #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
      
          stack backtrace:
          CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
          Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
          Call Trace:
           dump_stack+0x85/0xc0
           print_circular_bug.cold.57+0x15c/0x195
           __lock_acquire+0x152a/0x1710
           lock_acquire+0x9e/0x170
           ? domain_context_mapping_one+0xa5/0x4e0
           _raw_spin_lock+0x25/0x30
           ? domain_context_mapping_one+0xa5/0x4e0
           domain_context_mapping_one+0xa5/0x4e0
           ? domain_context_mapping_one+0x4e0/0x4e0
           pci_for_each_dma_alias+0x30/0x140
           dmar_insert_one_dev_info+0x3b2/0x510
           domain_add_dev_info+0x50/0x90
           dev_prepare_static_identity_mapping+0x30/0x68
           intel_iommu_init+0xddd/0x1422
           ? printk+0x58/0x6f
           ? lockdep_hardirqs_on+0xf0/0x180
           ? do_early_param+0x8e/0x8e
           ? e820__memblock_setup+0x63/0x63
           pci_iommu_init+0x16/0x3f
           do_one_initcall+0x5d/0x2b4
           ? do_early_param+0x8e/0x8e
           ? rcu_read_lock_sched_held+0x55/0x60
           ? do_early_param+0x8e/0x8e
           kernel_init_freeable+0x218/0x2c1
           ? rest_init+0x230/0x230
           kernel_init+0xa/0x100
           ret_from_fork+0x3a/0x50
      
      domain_context_mapping_one() is taking device_domain_lock first then
      iommu lock, while dmar_insert_one_dev_info() is doing the reverse.
      
      That should be introduced by commit:
      
      7560cc3c ("iommu/vt-d: Fix lock inversion between iommu->lock and
                    device_domain_lock", 2019-05-27)
      
      So far I still cannot figure out how the previous deadlock was
      triggered (I cannot find iommu lock taken before calling of
      iommu_flush_dev_iotlb()), however I'm pretty sure that that change
      should be incomplete at least because it does not fix all the places
      so we're still taking the locks in different orders, while reverting
      that commit is very clean to me so far that we should always take
      device_domain_lock first then the iommu lock.
      
      We can continue to try to find the real culprit mentioned in
      7560cc3c, but for now I think we should revert it to fix current
      breakage.
      
      CC: Joerg Roedel <joro@8bytes.org>
      CC: Lu Baolu <baolu.lu@linux.intel.com>
      CC: dave.jiang@intel.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      0aafc8ae
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b253d5f3
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "If an IOMMU is present, ignore the P2PDMA whitelist we added for v5.2
        because we don't yet know how to support P2PDMA in that case (Logan
        Gunthorpe)"
      
      * tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
      b253d5f3
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f4102766
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three driver fixes (and one version number update): a suspend hang in
        ufs, a qla hard lock on module removal and a qedi panic during
        discovery"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: Fix hardlockup in abort command during driver remove
        scsi: ufs: Avoid runtime suspend possibly being blocked forever
        scsi: qedi: update driver version to 8.37.0.20
        scsi: qedi: Check targetname while finding boot target information
      f4102766
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · a8282bf0
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "This is a frustratingly large batch at rc5. Some of these were sent
        earlier but were missed by me due to being distracted by other things,
        and some took a while to track down due to needing manual bisection on
        old hardware. But still we clearly need to improve our testing of KVM,
        and of 32-bit, so that we catch these earlier.
      
        Summary: seven fixes, all for bugs introduced this cycle.
      
         - The commit to add KASAN support broke booting on 32-bit SMP
           machines, due to a refactoring that moved some setup out of the
           secondary CPU path.
      
         - A fix for another 32-bit SMP bug introduced by the fast syscall
           entry implementation for 32-bit BOOKE. And a build fix for the same
           commit.
      
         - Our change to allow the DAWR to be force enabled on Power9
           introduced a bug in KVM, where we clobber r3 leading to a host
           crash.
      
         - The same commit also exposed a previously unreachable bug in the
           nested KVM handling of DAWR, which could lead to an oops in a
           nested host.
      
         - One of the DMA reworks broke the b43legacy WiFi driver on some
           people's powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.
      
         - A fix for TLB flushing in KVM introduced a new bug, as it neglected
           to also flush the ERAT, this could lead to memory corruption in the
           guest.
      
        Thanks to: Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry
        Finger, Michael Neuling, Suraj Jitindar Singh"
      
      * tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
        powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
        KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode
        KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
        powerpc/32: fix build failure on book3e with KVM
        powerpc/booke: fix fast syscall entry on SMP
        powerpc/32s: fix initial setup of segment registers on secondary CPU
      a8282bf0
    • Marcel Holtmann's avatar
      Bluetooth: Fix regression with minimum encryption key size alignment · 693cd8ce
      Marcel Holtmann authored
      When trying to align the minimum encryption key size requirement for
      Bluetooth connections, it turns out doing this in a central location in
      the HCI connection handling code is not possible.
      
      Original Bluetooth version up to 2.0 used a security model where the
      L2CAP service would enforce authentication and encryption.  Starting
      with Bluetooth 2.1 and Secure Simple Pairing that model has changed into
      that the connection initiator is responsible for providing an encrypted
      ACL link before any L2CAP communication can happen.
      
      Now connecting Bluetooth 2.1 or later devices with Bluetooth 2.0 and
      before devices are causing a regression.  The encryption key size check
      needs to be moved out of the HCI connection handling into the L2CAP
      channel setup.
      
      To achieve this, the current check inside hci_conn_security() has been
      moved into l2cap_check_enc_key_size() helper function and then called
      from four decisions point inside L2CAP to cover all combinations of
      Secure Simple Pairing enabled devices and device using legacy pairing
      and legacy service security model.
      
      Fixes: d5bb334a ("Bluetooth: Align minimum encryption key size for LE and BR/EDR connections")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203643Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      693cd8ce
    • Konstantin Khlebnikov's avatar
      x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUs · cc9e303c
      Konstantin Khlebnikov authored
      Since commit 7d5905dc ("x86 / CPU: Always show current CPU frequency
      in /proc/cpuinfo") open and read of /proc/cpuinfo sends IPI to all CPUs.
      Many applications read /proc/cpuinfo at the start for trivial reasons like
      counting cores or detecting cpu features. While sensitive workloads like
      DPDK network polling don't like any interrupts.
      
      Integrates this feature with cpu isolation and do not send IPIs to CPUs
      without housekeeping flag HK_FLAG_MISC (set by nohz_full).
      
      Code that requests cpu frequency like show_cpuinfo() falls back to the last
      frequency set by the cpufreq driver if this method returns 0.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: https://lkml.kernel.org/r/155790354043.1104.15333317408370209.stgit@buzz
      cc9e303c
    • Tony W Wang-oc's avatar
      x86/acpi/cstate: Add Zhaoxin processors support for cache flush policy in C3 · f8c0e061
      Tony W Wang-oc authored
      Same as Intel, Zhaoxin MP CPUs support C3 share cache and on all
      recent Zhaoxin platforms ARB_DISABLE is a nop. So set related
      flags correctly in the same way as Intel does.
      Signed-off-by: default avatarTony W Wang-oc <TonyWWang-oc@zhaoxin.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "hpa@zytor.com" <hpa@zytor.com>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net>
      Cc: "lenb@kernel.org" <lenb@kernel.org>
      Cc: David Wang <DavidWang@zhaoxin.com>
      Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com>
      Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com>
      Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com>
      Link: https://lkml.kernel.org/r/a370503660994669991a7f7cda7c5e98@zhaoxin.com
      f8c0e061
    • Tony W Wang-oc's avatar
      ACPI, x86: Add Zhaoxin processors support for NONSTOP TSC · 773b2f30
      Tony W Wang-oc authored
      Zhaoxin CPUs have NONSTOP TSC feature, so enable the ACPI
      driver support for it.
      Signed-off-by: default avatarTony W Wang-oc <TonyWWang-oc@zhaoxin.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "hpa@zytor.com" <hpa@zytor.com>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net>
      Cc: "lenb@kernel.org" <lenb@kernel.org>
      Cc: David Wang <DavidWang@zhaoxin.com>
      Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com>
      Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com>
      Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com>
      Link: https://lkml.kernel.org/r/d1cfd937dabc44518d42038b55522c53@zhaoxin.com
      773b2f30
    • Tony W Wang-oc's avatar
      x86/cpu: Create Zhaoxin processors architecture support file · 761fdd5e
      Tony W Wang-oc authored
      Add x86 architecture support for new Zhaoxin processors.
      Carve out initialization code needed by Zhaoxin processors into
      a separate compilation unit.
      
      To identify Zhaoxin CPU, add a new vendor type X86_VENDOR_ZHAOXIN
      for system recognition.
      Signed-off-by: default avatarTony W Wang-oc <TonyWWang-oc@zhaoxin.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "hpa@zytor.com" <hpa@zytor.com>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net>
      Cc: "lenb@kernel.org" <lenb@kernel.org>
      Cc: David Wang <DavidWang@zhaoxin.com>
      Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com>
      Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com>
      Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com>
      Link: https://lkml.kernel.org/r/01042674b2f741b2aed1f797359bdffb@zhaoxin.com
      761fdd5e
    • Andy Shevchenko's avatar
      x86/cpu: Split Tremont based Atoms from the rest · 0a05fa67
      Andy Shevchenko authored
      Split Tremont based Atoms from the rest to keep logical grouping.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20190617115537.33309-1-andriy.shevchenko@linux.intel.com
      0a05fa67
    • Thomas Gleixner's avatar
      Documentation/x86/64: Add documentation for GS/FS addressing mode · 2c7b5ac5
      Thomas Gleixner authored
      Explain how the GS/FS based addressing can be utilized in user space
      applications along with the differences between the generic prctl() based
      GS/FS base control and the FSGSBASE version available on newer CPUs.
      Originally-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "Bae, Chang Seok" <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>,
      Cc: H . Peter Anvin <hpa@zytor.com>
      Cc: "Shankar, Ravi V" <ravi.v.shankar@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906132246310.1791@nanos.tec.linutronix.de
      2c7b5ac5
    • Andi Kleen's avatar
      x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 · f987c955
      Andi Kleen authored
      The kernel needs to explicitly enable FSGSBASE. So, the application needs
      to know if it can safely use these instructions. Just looking at the CPUID
      bit is not enough because it may be running in a kernel that does not
      enable the instructions.
      
      One way for the application would be to just try and catch the SIGILL.
      But that is difficult to do in libraries which may not want to overwrite
      the signal handlers of the main application.
      
      Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF
      aux vector. AT_HWCAP2 is already used by PPC for similar purposes.
      
      The application can access it open coded or by using the getauxval()
      function in newer versions of glibc.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarChang S. Bae <chang.seok.bae@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com
      f987c955
    • Andy Lutomirski's avatar
      x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit · 2032f1f9
      Andy Lutomirski authored
      Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
      FSGSBASE by default, and add nofsgsbase to disable it.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarChang S. Bae <chang.seok.bae@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com
      2032f1f9