1. 27 Aug, 2022 8 commits
    • Zhengjun Xing's avatar
      perf stat: Capitalize topdown metrics' names · 48648548
      Zhengjun Xing authored
      Capitalize topdown metrics' names to follow the intel SDM.
      
      Before:
      
       # ./perf stat -a  sleep 1
      
       Performance counter stats for 'system wide':
      
              228,094.05 msec cpu-clock                        #  225.026 CPUs utilized
                     842      context-switches                 #    3.691 /sec
                     224      cpu-migrations                   #    0.982 /sec
                      70      page-faults                      #    0.307 /sec
              23,164,105      cycles                           #    0.000 GHz
              29,403,446      instructions                     #    1.27  insn per cycle
               5,268,185      branches                         #   23.097 K/sec
                  33,239      branch-misses                    #    0.63% of all branches
             136,248,990      slots                            #  597.337 K/sec
              32,976,450      topdown-retiring                 #     24.2% retiring
               4,651,918      topdown-bad-spec                 #      3.4% bad speculation
              26,148,695      topdown-fe-bound                 #     19.2% frontend bound
              72,515,776      topdown-be-bound                 #     53.2% backend bound
               6,008,540      topdown-heavy-ops                #      4.4% heavy operations       #     19.8% light operations
               3,934,049      topdown-br-mispredict            #      2.9% branch mispredict      #      0.5% machine clears
              16,655,439      topdown-fetch-lat                #     12.2% fetch latency          #      7.0% fetch bandwidth
              41,635,972      topdown-mem-bound                #     30.5% memory bound           #     22.7% Core bound
      
             1.013634593 seconds time elapsed
      
      After:
      
       # ./perf stat -a  sleep 1
      
       Performance counter stats for 'system wide':
      
              228,081.94 msec cpu-clock                        #  225.003 CPUs utilized
                     824      context-switches                 #    3.613 /sec
                     224      cpu-migrations                   #    0.982 /sec
                      67      page-faults                      #    0.294 /sec
              22,647,423      cycles                           #    0.000 GHz
              28,870,551      instructions                     #    1.27  insn per cycle
               5,167,099      branches                         #   22.655 K/sec
                  32,383      branch-misses                    #    0.63% of all branches
             133,411,074      slots                            #  584.926 K/sec
              32,352,607      topdown-retiring                 #     24.3% Retiring
               4,456,977      topdown-bad-spec                 #      3.3% Bad Speculation
              25,626,487      topdown-fe-bound                 #     19.2% Frontend Bound
              70,955,316      topdown-be-bound                 #     53.2% Backend Bound
               5,834,844      topdown-heavy-ops                #      4.4% Heavy Operations       #     19.9% Light Operations
               3,738,781      topdown-br-mispredict            #      2.8% Branch Mispredict      #      0.5% Machine Clears
              16,286,803      topdown-fetch-lat                #     12.2% Fetch Latency          #      7.0% Fetch Bandwidth
              40,802,069      topdown-mem-bound                #     30.6% Memory Bound           #     22.6% Core Bound
      
             1.013683125 seconds time elapsed
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarXing Zhengjun <zhengjun.xing@linux.intel.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220825015458.3252239-1-zhengjun.xing@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      48648548
    • Kan Liang's avatar
      perf docs: Update the documentation for the save_type filter · 3126204c
      Kan Liang authored
      Update the documentation to reflect the kernel changes.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20220816125612.2042397-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3126204c
    • Ian Rogers's avatar
      perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=address · d72e5cf3
      Ian Rogers authored
      An array of strings is passed to cmd_record but not freed. As
      cmd_record modifies the array, add another array as a copy that can be
      mutated allowing the original array contents to all be freed.
      
      Detected with -fsanitize=address.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20220824145733.409005-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d72e5cf3
    • Andi Kleen's avatar
      perf record: Fix manpage formatting of description of support to hybrid systems · e89eaa61
      Andi Kleen authored
      The Intel hybrid description is written in a different style than the
      rest of the perf record man page. There were some new command line
      options added after it which resulted in very strange section ordering.
      Move the hybrid include last.
      
      Also the sub sections in the hybrid document don't fit the record
      manpage well (especially since it talks about all kinds of unrelated
      commands). I left this for now, but would be better to separate this
      properly in the different man pages.
      
      It would be better to use sub sections for the other sections, but these
      don't seem to be supported in AsciiDoc?
      
      Some of the examples are still misrendered in the manpage with an
      indented troff command, but I don't know how to fix that.
      
      In any case it's now better than before.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220818100127.249401-1-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e89eaa61
    • Ian Rogers's avatar
      perf test: Stat test for repeat with a weak group · 0c361c6e
      Ian Rogers authored
      Breaking a weak group requires multiple passes of an evlist, with
      multiple runs this can introduce bugs ultimately leading to
      segfaults. Add a test to cover this.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20220822213352.75721-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0c361c6e
    • Ian Rogers's avatar
      perf stat: Clear evsel->reset_group for each stat run · bf515f02
      Ian Rogers authored
      If a weak group is broken then the reset_group flag remains set for
      the next run. Having reset_group set means the counter isn't created
      and ultimately a segfault.
      
      A simple reproduction of this is:
      
        # perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W
      
      which will be added as a test in the next patch.
      
      Fixes: 4804e011 ("perf stat: Use affinity for opening events")
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarXing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20220822213352.75721-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bf515f02
    • Arnaldo Carvalho de Melo's avatar
      tools kvm headers arm64: Update KVM header from the kernel sources · dbcfe5ec
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        ae3b1da9 ("KVM: arm64: Fix compile error due to sign extension")
      
      That doesn't result in any changes in tooling (when built on x86), only
      addresses this perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
        diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Yang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/all/YwOMCCc4E79FuvDe@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dbcfe5ec
    • James Clark's avatar
      perf python: Fix build when PYTHON_CONFIG is user supplied · bc9e7fe3
      James Clark authored
      The previous change to Python autodetection had a small mistake where
      the auto value was used to determine the Python binary, rather than the
      user supplied value. The Python binary is only used for one part of the
      build process, rather than the final linking, so it was producing
      correct builds in most scenarios, especially when the auto detected
      value matched what the user wanted, or the system only had a valid set
      of Pythons.
      
      Change it so that the Python binary path is derived from either the
      PYTHON_CONFIG value or PYTHON value, depending on what is specified by
      the user. This was the original intention.
      
      This error was spotted in a build failure an odd cross compilation
      environment after commit 4c41cb46 ("perf python: Prefer
      python3") was merged.
      
      Fixes: 630af16e ("perf tools: Use Python devtools for version autodetection rather than runtime")
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220728093946.1337642-1-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bc9e7fe3
  2. 26 Aug, 2022 8 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · e022620b
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "A bumper crop of arm64 fixes for -rc3.
      
        The largest change is fixing our parsing of the 'rodata=full' command
        line option, which kstrtobool() started treating as 'rodata=false'.
        The fix actually makes the parsing of that option much less fragile
        and updates the documentation at the same time.
      
        We still have a boot issue pending when KASLR is disabled at compile
        time, but there's a fresh fix on the list which I'll send next week if
        it holds up to testing.
      
        Summary:
      
         - Fix workaround for Cortex-A76 erratum #1286807
      
         - Add workaround for AMU erratum #2457168 on Cortex-A510
      
         - Drop reference to removed CONFIG_ARCH_RANDOM #define
      
         - Fix parsing of the "rodata=full" cmdline option
      
         - Fix a bunch of issues in the SME register state switching and sigframe code
      
         - Fix incorrect extraction of the CTR_EL0.CWG register field
      
         - Fix ACPI cache topology probing when the PPTT is not present
      
         - Trivial comment and whitespace fixes"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/sme: Don't flush SVE register state when handling SME traps
        arm64/sme: Don't flush SVE register state when allocating SME storage
        arm64/signal: Flush FPSIMD register state when disabling streaming mode
        arm64/signal: Raise limit on stack frames
        arm64/cache: Fix cache_type_cwg() for register generation
        arm64/sysreg: Guard SYS_FIELD_ macros for asm
        arm64/sysreg: Directly include bitfield.h
        arm64: cacheinfo: Fix incorrect assignment of signed error value to unsigned fw_level
        arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly
        arm64: fix rodata=full
        arm64: Fix comment typo
        docs/arm64: elf_hwcaps: unify newlines in HWCAP lists
        arm64: adjust KASLR relocation after ARCH_RANDOM removal
        arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76
      e022620b
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 012bd7e8
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A handful of fixes for the Microchip device trees
      
       - A pair of fixes to eliminate build warnings
      
      * tag 'riscv-for-linus-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: dts: microchip: mpfs: remove pci axi address translation property
        riscv: dts: microchip: mpfs: remove bogus card-detect-delay
        riscv: dts: microchip: mpfs: remove ti,fifo-depth property
        riscv: dts: microchip: mpfs: fix incorrect pcie child node name
        riscv: traps: add missing prototype
        riscv: signal: fix missing prototype warning
        riscv: dts: microchip: correct L2 cache interrupts
      012bd7e8
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.0-1' of... · c23f864d
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix a bunch of build errors/warnings, a poweroff error and an
        unbalanced locking in do_page_fault()"
      
      * tag 'loongarch-fixes-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: mm: Avoid unnecessary page fault retires on shared memory types
        LoongArch: Add subword xchg/cmpxchg emulation
        LoongArch: Cleanup headers to avoid circular dependency
        LoongArch: Cleanup reset routines with new API
        LoongArch: Fix build warnings in VDSO
        LoongArch: Select PCI_QUIRKS to avoid build error
      c23f864d
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2022-08-26-1' of git://anongit.freedesktop.org/drm/drm · 78effb4a
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Weekly fixes, lots of amdgpu fixes mostly for IP blocks introduced in
        6.0-rc1, otherwise vc4, nouveau fixes.
      
        gem:
         - Fix handle release leak
      
        nouveau:
         - Fix fencing when moving BO
      
        vc4:
         - HDMI fixes
      
        amdgpu:
         - GFX 11.0 fixes
         - PSP XGMI handling fixes
         - GFX9 fix for compute-only IPs
         - Drop duplicated function call
         - Fix warning due to missing header
         - NBIO 7.7 fixes
         - DCN 3.1.4 fixes
         - SDMA 6.0 fixes
         - SMU 13.0 fixes
         - Arcturus GPUVM page table fix
         - MMHUB 1.0 fix
      
        amdkfd:
         - GC 10.3.7 fix
      
        radeon:
         - Delayed work flush fix"
      
      * tag 'drm-fixes-2022-08-26-1' of git://anongit.freedesktop.org/drm/drm: (21 commits)
        drm/amdgpu: mmVM_L2_CNTL3 register not initialized correctly
        drm/amdgpu: add MGCG perfmon setting for gfx11
        drm/amdkfd: Fix isa version for the GC 10.3.7
        drm/amdgpu: Fix page table setup on Arcturus
        drm/amd/pm: update SMU 13.0.0 driver_if header
        drm/amdgpu: add sdma instance check for gfx11 CGCG
        drm/amd/display: enable PCON support for dcn314
        drm/amdgpu: enable NBIO IP v7.7.0 Clock Gating
        drm/amdgpu: add NBIO IP v7.7.0 Clock Gating support
        drm/amdgpu: add TX_POWER_CTRL_1 macro definitions for NBIO IP v7.7.0
        nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
        drm/radeon: add a force flush to delay work when radeon
        drm/amd/display: Include missing header
        drm/amdgpu: Remove the additional kfd pre reset call for sriov
        drm/amdgpu: Check num_gfx_rings for gfx v9_0 rb setup.
        drm/amdgpu: fix hive reference leak when adding xgmi device
        drm/amdgpu: Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to psp_hw_fini
        drm/amdgpu: enable GFXOFF allow control for GC IP v11.0.1
        drm/gem: Fix GEM handle release errors
        drm/vc4: hdmi: Rework power up
        ...
      78effb4a
    • Linus Torvalds's avatar
      Merge tag 'block-6.0-2022-08-26' of git://git.kernel.dk/linux-block · 3e5c673f
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - MD pull request via Song:
            - Fix for clustered raid (Guoqing Jiang)
            - req_op fix (Bart Van Assche)
            - Fix race condition in raid recreate (David Sloan)
      
       - loop configuration overflow fix (Siddh)
      
       - Fix missing commit_rqs call for certain conditions (Yu)
      
      * tag 'block-6.0-2022-08-26' of git://git.kernel.dk/linux-block:
        md: call __md_stop_writes in md_stop
        Revert "md-raid: destroy the bitmap after destroying the thread"
        md: Flush workqueue md_rdev_misc_wq in md_alloc()
        md/raid10: Fix the data type of an r10_sync_page_io() argument
        loop: Check for overflow while configuring loop
        blk-mq: fix io hung due to missing commit_rqs
      3e5c673f
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.0-2022-08-26' of git://git.kernel.dk/linux-block · 0b0861eb
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Add missing header file to the MAINTAINERS entry for io_uring (Ammar)
      
       - liburing and the kernel ship the same io_uring.h header, but one
         change we've had for a long time only in liburing is to ensure it's
         C++ safe. Add extern C around it, so we can more easily sync them in
         the future (Ammar)
      
       - Fix an off-by-one in the sync cancel added in this merge window (me)
      
       - Error handling fix for passthrough (Kanchan)
      
       - Fix for address saving for async execution for the zc tx support
         (Pavel)
      
       - Fix ordering for TCP zc notifications, so we always have them ordered
         correctly between "data was sent" and "data was acked". This isn't
         strictly needed with the notification slots, but we've been pondering
         disabling the slot support for 6.0 - and if we do, then we do require
         the ordering to be sane. Regardless of that, it's the sane thing to
         do in terms of API (Pavel)
      
       - Minor cleanup for indentation and lockdep annotation (Pavel)
      
      * tag 'io_uring-6.0-2022-08-26' of git://git.kernel.dk/linux-block:
        io_uring/net: save address for sendzc async execution
        io_uring: conditional ->async_data allocation
        io_uring/notif: order notif vs send CQEs
        io_uring/net: fix indentation
        io_uring/net: fix zc send link failing
        io_uring/net: fix must_hold annotation
        io_uring: fix submission-failure handling for uring-cmd
        io_uring: fix off-by-one in sync cancelation file check
        io_uring: uapi: Add `extern "C"` in io_uring.h for liburing
        MAINTAINERS: Add `include/linux/io_uring_types.h`
      0b0861eb
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 5373081b
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Ten fixes.
      
        Of the three core changes, the two large ones are a complete reversion
        of the async rework and an ALUA timing rework (the latter shouldn't
        affect non-ALUA paths).
      
        The remaining patches are all small and all but one in drivers"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Revert "Rework asynchronous resume support"
        scsi: core: Fix passthrough retry counter handling
        scsi: ufs: core: Reduce the power mode change timeout
        scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq
        scsi: ufs: host: ufs-exynos: Make fsd_ufs_drvs static
        scsi: megaraid_sas: Remove unnecessary kfree()
        scsi: megaraid_sas: Fix double kfree()
        scsi: ufs: core: Enable link lost interrupt
        scsi: core: Allow the ALUA transitioning state enough time
        scsi: qla2xxx: Disable ATIO interrupt coalesce for quad port ISP27XX
      5373081b
    • Mikulas Patocka's avatar
      wait_on_bit: add an acquire memory barrier · 8238b457
      Mikulas Patocka authored
      There are several places in the kernel where wait_on_bit is not followed
      by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read).
      
      On architectures with weak memory ordering, it may happen that memory
      accesses that follow wait_on_bit are reordered before wait_on_bit and
      they may return invalid data.
      
      Fix this class of bugs by introducing a new function "test_bit_acquire"
      that works like test_bit, but has acquire memory ordering semantics.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8238b457
  3. 25 Aug, 2022 24 commits