• Ingo Molnar's avatar
    x86, perf event: Turn off unstructured raw event access to offcore registers · b52c55c6
    Ingo Molnar authored
    Andi Kleen pointed out that the Intel offcore support patches were merged
    without user-space tool support to the functionality:
    
     |
     | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
     | user space bits were not. This made it impossible to set the extra mask
     | and actually do the OFFCORE profiling
     |
    
    Andi submitted a preliminary patch for user-space support, as an
    extension to perf's raw event syntax:
    
     |
     | Some raw events -- like the Intel OFFCORE events -- support additional
     | parameters. These can be appended after a ':'.
     |
     | For example on a multi socket Intel Nehalem:
     |
     |    perf stat -e r1b7:20ff -a sleep 1
     |
     | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
     | that measures any access to DRAM on another socket.
     |
    
    But this kind of usability is absolutely unacceptable - users should not
    be expected to type in magic, CPU and model specific incantations to get
    access to useful hardware functionality.
    
    The proper solution is to expose useful offcore functionality via
    generalized events - that way users do not have to care which specific
    CPU model they are using, they can use the conceptual event and not some
    model specific quirky hexa number.
    
    We already have such generalization in place for CPU cache events,
    and it's all very extensible.
    
    "Offcore" events measure general DRAM access patters along various
    parameters. They are particularly useful in NUMA systems.
    
    We want to support them via generalized DRAM events: either as the
    fourth level of cache (after the last-level cache), or as a separate
    generalization category.
    
    That way user-space support would be very obvious, memory access
    profiling could be done via self-explanatory commands like:
    
      perf record -e dram ./myapp
      perf record -e dram-remote ./myapp
    
    ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
    accesses.
    
    These generalized events would work on all CPUs and architectures that
    have comparable PMU features.
    
    ( Note, these are just examples: actual implementation could have more
      sophistication and more parameter - as long as they center around
      similarly simple usecases. )
    
    Now we do not want to revert *all* of the current offcore bits, as they
    are still somewhat useful for generic last-level-cache events, implemented
    in this commit:
    
      e994d7d2: perf: Fix LLC-* events on Intel Nehalem/Westmere
    
    But we definitely do not yet want to expose the unstructured raw events
    to user-space, until better generalization and usability is implemented
    for these hardware event features.
    
    ( Note: after generalization has been implemented raw offcore events can be
      supported as well: there can always be an odd event that is marginally
      useful but not useful enough to generalize. DRAM profiling is definitely
      *not* such a category so generalization must be done first. )
    
    Furthermore, PERF_TYPE_RAW access to these registers was not intended
    to go upstream without proper support - it was a side-effect of the above
    e994d7d2 commit, not mentioned in the changelog.
    
    As v2.6.39 is nearing release we go for the simplest approach: disable
    the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
    kernel and becomes an ABI.
    
    Once proper structure is implemented for these hardware events and users
    are offered usable solutions we can revisit this issue.
    Reported-by: default avatarAndi Kleen <ak@linux.intel.com>
    Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    b52c55c6
perf_event.c 42.6 KB