1. 29 Aug, 2024 4 commits
    • James Clark's avatar
      perf: cs-etm: Move traceid_list to each queue · 77c123f5
      James Clark authored
      The global list won't work for per-sink trace ID allocations, so put a
      list in each queue where the IDs will be unique to that queue.
      
      To keep the same behavior as before, for version 0 of the HW_ID packets,
      copy all the HW_ID mappings into all queues.
      
      This change doesn't effect the decoders, only trace ID lookups on the
      Perf side. The decoders are still created with global mappings which
      will be fixed in a later commit.
      Reviewed-by: default avatarMike Leach <mike.leach@linaro.org>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240722101202.26915-4-james.clark@linaro.orgSigned-off-by: default avatarJames Clark <james.clark@linaro.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      77c123f5
    • James Clark's avatar
      perf: cs-etm: Allocate queues for all CPUs · 57880a79
      James Clark authored
      Make cs_etm__setup_queue() setup a queue even if it's empty, and
      pre-allocate queues based on the max CPU that was recorded. In per-CPU
      mode aux queues are indexed based on CPU ID even if all CPUs aren't
      recorded, sparse queue arrays aren't used.
      
      This will allow HW_IDs to be saved even if no aux data was received in
      that queue without having to call cs_etm__setup_queue() from two
      different places.
      Reviewed-by: default avatarMike Leach <mike.leach@linaro.org>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240722101202.26915-3-james.clark@linaro.orgSigned-off-by: default avatarJames Clark <james.clark@linaro.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      57880a79
    • James Clark's avatar
      perf cs-etm: Create decoders after both AUX and HW_ID search passes · b6aa0de9
      James Clark authored
      Both of these passes gather information about how to create the
      decoders. AUX records determine formatted/unformatted, and the HW_IDs
      determine the traceID/metadata mappings.
      
      Therefore it makes sense to cache the information and wait until both
      passes are over until creating the decoders, rather than creating them
      at the first HW_ID found.
      
      This will allow a simplification of the creation process where
      cs_etm_queue->traceid_list will exclusively used to create the decoders,
      rather than the current two methods depending on whether the trace is
      formatted or not.
      
      Previously the sample CPU from the AUX record was used to initialize
      the decoder CPU, but actually sample CPU == AUX queue index in per-CPU
      mode, so saving the sample CPU isn't required.
      
      Similarly formatted/unformatted was used upfront to create the decoders,
      but now it's cached until later.
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarMike Leach <mike.leach@linaro.org>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Signed-off-by: default avatarJames Clark <james.clark@linaro.org>
      Tested-by: default avatarGanapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
      Tested-by: default avatarLeo Yan <leo.yan@arm.com>
      Acked-by: default avatarSuzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240722101202.26915-2-james.clark@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b6aa0de9
    • Arnaldo Carvalho de Melo's avatar
      Revert "tools build: Remove leftover libcap tests that prevents fast path... · 0fd77ae4
      Arnaldo Carvalho de Melo authored
      Revert "tools build: Remove leftover libcap tests that prevents fast path feature detection from working"
      
      Ian pointed out that the libcap feature test is also used by bpftool, so
      we can't remove it just because perf stopped using it, revert the
      removal of the feature test.
      
      Since both perf and libcap uses the fast path feature detection
      (tools/build/feature/test-all.c), probably the best thing is to keep
      libcap-devel when building perf even it not being used there.
      
      This reverts commit 47b3b643.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0fd77ae4
  2. 28 Aug, 2024 32 commits
  3. 26 Aug, 2024 1 commit
  4. 22 Aug, 2024 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf python: Disable -Wno-cast-function-type-mismatch if present on clang · 00dc5146
      Arnaldo Carvalho de Melo authored
      The -Wcast-function-type-mismatch option was introduced in clang 19 and
      its enabled by default, since we use -Werror, and python bindings do
      casts that are valid but trips this warning, disable it if present.
      
      Closes: https://lore.kernel.org/all/CA+icZUXoJ6BS3GMhJHV3aZWyb5Cz2haFneX0C5pUMUUhG-UVKQ@mail.gmail.comReported-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org # To allow building with the upcoming clang 19
      Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      00dc5146
    • Arnaldo Carvalho de Melo's avatar
      perf python: Allow checking for the existence of warning options in clang · b8116230
      Arnaldo Carvalho de Melo authored
      We'll need to check if an warning option introduced in clang 19 is
      available on the clang version being used, so cover the error message
      emitted when testing for a -W option.
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b8116230
    • Namhyung Kim's avatar
      perf annotate-data: Copy back variable types after move · 1cfd01eb
      Namhyung Kim authored
      In some cases, compilers don't set the location expression in DWARF
      precisely.  For instance, it may assign a variable to a register after
      copying it from a different register.  Then it should use the register
      for the new type but still uses the old register.  This makes hard to
      track the type information properly.
      
      This is an example I found in __tcp_transmit_skb().  The first argument
      (sk) of this function is a pointer to sock and there's a variable (tp)
      for tcp_sock.
      
        static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
        				int clone_it, gfp_t gfp_mask, u32 rcv_nxt)
        {
        	...
        	struct tcp_sock *tp;
      
        	BUG_ON(!skb || !tcp_skb_pcount(skb));
        	tp = tcp_sk(sk);
        	prior_wstamp = tp->tcp_wstamp_ns;
        	tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
        	...
      
      So it basically calls tcp_sk(sk) to get the tcp_sock pointer from sk.
      But it turned out to be the same value because tcp_sock embeds sock as
      the first member.  The sk is located in reg5 (RDI) and tp is in reg3
      (RBX).  The offset of tcp_wstamp_ns is 0x748 and tcp_clock_cache is
      0x750.  So you need to use RBX (reg3) to access the fields in the
      tcp_sock.  But the code used RDI (reg5) as it has the same value.
      
        $ pahole --hex -C tcp_sock vmlinux | grep -e 748 -e 750
      	u64                tcp_wstamp_ns;        /* 0x748   0x8 */
      	u64                tcp_clock_cache;      /* 0x750   0x8 */
      
      And this is the disassembly of the part of the function.
      
        <__tcp_transmit_skb>:
        ...
        44:  mov    %rdi, %rbx
        47:  mov    0x748(%rdi), %rsi
        4e:  mov    0x750(%rdi), %rax
        55:  cmp    %rax, %rsi
      
      Because compiler put the debug info to RBX, it only knows RDI is a
      pointer to sock and accessing those two fields resulted in error
      due to offset being beyond the type size.
      
        -----------------------------------------------------------
        find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
        CU for net/ipv4/tcp_output.c (die:0x817f543)
        frame base: cfa=0 fbreg=6
        scope: [1/1] (die:81aac3e)
        bb: [0 - 30]
        var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
        var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
        var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
        var [5] reg1 type='int' size=0x4 (die:0x818059e)
        var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
        var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)                   <<<--- the first argument ('sk' at %RDI)
        mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
        mov [20] stack canary -> reg0
        mov [29] reg0 -> -0x30(stack) stack canary
        bb: [36 - 3e]
        mov [36] reg4 -> reg15 type='struct sk_buff*' size=0x8 (die:0x8181360)
        bb: [44 - 63]
        mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c)          <<<--- calling tcp_sk()
        var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead)              <<<--- new variable ('tp' at %RBX)
        var [4e] reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
        mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
        chk [63] reg5 offset=0x748 ok=1 kind=1 (struct sock*) : offset bigger than size    <<<--- access with old variable
        final result: offset bigger than size
      
      While it's a fault in the compiler, we could work around this issue by
      using the type of new variable when it's copied directly.  So I've added
      copied_from field in the register state to track those direct register
      to register copies.  After that new register gets a new type and the old
      register still has the same type, it'll update (copy it back) the type
      of the old register.
      
      For example, if we can update type of reg5 at __tcp_transmit_skb+0x47,
      we can find the target type of the instruction at 0x63 like below:
      
        -----------------------------------------------------------
        find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
        ...
        bb: [44 - 63]
        mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c)
        var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead)
        var [47] copyback reg5 type='struct tcp_sock*' size=0x8 (die:0x819eead)     <<<--- here
        mov [47] 0x748(reg5) -> reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
        mov [4e] 0x750(reg5) -> reg0 type='unsigned long long' size=0x8 (die:0x8180edd)
        mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
        chk [63] reg5 offset=0x748 ok=1 kind=1 (struct tcp_sock*) : Good!           <<<--- new type
        found by insn track: 0x748(reg5) type-offset=0x748
        final result:  type='struct tcp_sock' size=0xa98 (die:0x819eeb2)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240821232628.353177-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1cfd01eb