linux-stable/tools/include/linux
Linus Torvalds 48a577dc1b perf tools changes for v6.0: 1st batch
- Introduce 'perf lock contention' subtool, using new lock contention
   tracepoints and using BPF for in kernel aggregation and then userspace
   processing using the perf tooling infrastructure for resolving symbols, target
   specification, etc.
 
   Since the new lock contention tracepoints don't provide lock names, get up to
   8 stack traces and display the first non-lock function symbol name as a caller:
 
   $ perf lock report -F acquired,contended,avg_wait,wait_total
 
                   Name   acquired  contended     avg wait    total wait
 
    update_blocked_a...         40         40      3.61 us     144.45 us
    kernfs_fop_open+...          5          5      3.64 us      18.18 us
     _nohz_idle_balance          3          3      2.65 us       7.95 us
    tick_do_update_j...          1          1      6.04 us       6.04 us
     ep_scan_ready_list          1          1      3.93 us       3.93 us
 
   Supports the usual 'perf record' + 'perf report' workflow as well as a
   BCC/bpftrace like mode where you start the tool and then press control+C to get
   results:
 
    $ sudo perf lock contention -b
   ^C
    contended   total wait     max wait     avg wait         type   caller
 
           42    192.67 us     13.64 us      4.59 us     spinlock   queue_work_on+0x20
           23     85.54 us     10.28 us      3.72 us     spinlock   worker_thread+0x14a
            6     13.92 us      6.51 us      2.32 us        mutex   kernfs_iop_permission+0x30
            3     11.59 us     10.04 us      3.86 us        mutex   kernfs_dop_revalidate+0x3c
            1      7.52 us      7.52 us      7.52 us     spinlock   kthread+0x115
            1      7.24 us      7.24 us      7.24 us     rwlock:W   sys_epoll_wait+0x148
            2      7.08 us      3.99 us      3.54 us     spinlock   delayed_work_timer_fn+0x1b
            1      6.41 us      6.41 us      6.41 us     spinlock   idle_balance+0xa06
            2      2.50 us      1.83 us      1.25 us        mutex   kernfs_iop_lookup+0x2f
            1      1.71 us      1.71 us      1.71 us        mutex   kernfs_iop_getattr+0x2c
   ...
 
 - Add new 'perf kwork' tool to trace time properties of kernel work (such as
   softirq, and workqueue), uses eBPF skeletons to collect info in kernel space,
   aggregating data that then gets processed by the userspace tool, e.g.:
 
   # perf kwork report
 
    Kwork Name      | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
   ----------------------------------------------------------------------------------------------------
    nvme0q5:130     | 004 |      1.101 ms |    49 |    0.051 ms |    26035.056403 s |  26035.056455 s |
    amdgpu:162      | 002 |      0.176 ms |     9 |    0.046 ms |    26035.268020 s |  26035.268066 s |
    nvme0q24:149    | 023 |      0.161 ms |    55 |    0.009 ms |    26035.655280 s |  26035.655288 s |
    nvme0q20:145    | 019 |      0.090 ms |    33 |    0.014 ms |    26035.939018 s |  26035.939032 s |
    nvme0q31:156    | 030 |      0.075 ms |    21 |    0.010 ms |    26035.052237 s |  26035.052247 s |
    nvme0q8:133     | 007 |      0.062 ms |    12 |    0.021 ms |    26035.416840 s |  26035.416861 s |
    nvme0q6:131     | 005 |      0.054 ms |    22 |    0.010 ms |    26035.199919 s |  26035.199929 s |
    nvme0q19:144    | 018 |      0.052 ms |    14 |    0.010 ms |    26035.110615 s |  26035.110625 s |
    nvme0q7:132     | 006 |      0.049 ms |    13 |    0.007 ms |    26035.125180 s |  26035.125187 s |
    nvme0q18:143    | 017 |      0.033 ms |    14 |    0.007 ms |    26035.169698 s |  26035.169705 s |
    nvme0q17:142    | 016 |      0.013 ms |     1 |    0.013 ms |    26035.565147 s |  26035.565160 s |
    enp5s0-rx-0:164 | 006 |      0.004 ms |     4 |    0.002 ms |    26035.928882 s |  26035.928884 s |
    enp5s0-tx-0:166 | 008 |      0.003 ms |     3 |    0.002 ms |    26035.870923 s |  26035.870925 s |
   --------------------------------------------------------------------------------------------------------
 
   See commit log messages for more examples with extra options to limit the events time window, etc.
 
 - Add support for new AMD IBS (Instruction Based Sampling) features:
 
   With the DataSrc extensions, the source of data can be decoded among:
  - Local L3 or other L1/L2 in CCX.
  - A peer cache in a near CCX.
  - Data returned from DRAM.
  - A peer cache in a far CCX.
  - DRAM address map with "long latency" bit set.
  - Data returned from MMIO/Config/PCI/APIC.
  - Extension Memory (S-Link, GenZ, etc - identified by the CS target
     and/or address map at DF's choice).
  - Peer Agent Memory.
 
 - Support hardware tracing with Intel PT on guest machines, combining the
   traces with the ones in the host machine.
 
 - Add a "-m" option to 'perf buildid-list' to show kernel and modules
   build-ids, to display all of the information needed to do external
   symbolization of kernel stack traces, such as those collected by
   bpf_get_stackid().
 
 - Add arch TSC frequency information to perf.data file headers.
 
 - Handle changes in the binutils disassembler function signatures in
   perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer).
 
 - Fix building the perf perl binding with the newest gcc in distros such
   as fedora rawhide, where some new warnings were breaking the build as
   perf uses -Werror.
 
 - Add 'perf test' entry for branch stack sampling.
 
 - Add ARM SPE system wide 'perf test' entry.
 
 - Add user space counter reading tests to 'perf test'.
 
 - Build with python3 by default, if available.
 
 - Add python converter script for the vendor JSON event files.
 
 - Update vendor JSON files for alderlake, bonnell, broadwell, broadwellde,
   broadwellx, cascadelakex, elkhartlake, goldmont, goldmontplus, haswell,
   haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, knightslanding,
   nehalemep, nehalemex, sandybridge, sapphirerapids, silvermont, skylake,
   skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp and westmereex.
 
 - Add vendor JSON File for Intel meteorlake.
 
 - Add Arm Cortex-A78C and X1C JSON vendor event files.
 
 - Add workaround to symbol address reading from ELF files without phdr,
   falling back to the previoous equation.
 
 - Convert legacy map definition to BTF-defined in the perf BPF script test.
 
 - Rework prologue generation code to stop using libbpf deprecated APIs.
 
 - Add default hybrid events for 'perf stat' on x86.
 
 - Add topdown metrics in the default 'perf stat' on the hybrid machines (big/little cores).
 
 - Prefer sampled CPU when exporting JSON in 'perf data convert'
 
 - Fix ('perf stat CSV output linter') and ("Check branch stack sampling") 'perf test' entries on s390.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYuw6gwAKCRCyPKLppCJ+
 J5+iAP0RL6sKMhzdkRjRYfG8CluJ401YaPHadzv5jxP8gOZz2gEAsuYDrMF9t1zB
 4DqORfobdX9UQEJjP9oRltU73GM0swI=
 =2/M0
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:

 - Introduce 'perf lock contention' subtool, using new lock contention
   tracepoints and using BPF for in kernel aggregation and then
   userspace processing using the perf tooling infrastructure for
   resolving symbols, target specification, etc.

   Since the new lock contention tracepoints don't provide lock names,
   get up to 8 stack traces and display the first non-lock function
   symbol name as a caller:

    $ perf lock report -F acquired,contended,avg_wait,wait_total

                    Name   acquired  contended     avg wait    total wait

     update_blocked_a...         40         40      3.61 us     144.45 us
     kernfs_fop_open+...          5          5      3.64 us      18.18 us
      _nohz_idle_balance          3          3      2.65 us       7.95 us
     tick_do_update_j...          1          1      6.04 us       6.04 us
      ep_scan_ready_list          1          1      3.93 us       3.93 us

   Supports the usual 'perf record' + 'perf report' workflow as well as
   a BCC/bpftrace like mode where you start the tool and then press
   control+C to get results:

     $ sudo perf lock contention -b
    ^C
    contended   total wait     max wait     avg wait         type   caller

            42    192.67 us     13.64 us      4.59 us     spinlock   queue_work_on+0x20
            23     85.54 us     10.28 us      3.72 us     spinlock   worker_thread+0x14a
             6     13.92 us      6.51 us      2.32 us        mutex   kernfs_iop_permission+0x30
             3     11.59 us     10.04 us      3.86 us        mutex   kernfs_dop_revalidate+0x3c
             1      7.52 us      7.52 us      7.52 us     spinlock   kthread+0x115
             1      7.24 us      7.24 us      7.24 us     rwlock:W   sys_epoll_wait+0x148
             2      7.08 us      3.99 us      3.54 us     spinlock   delayed_work_timer_fn+0x1b
             1      6.41 us      6.41 us      6.41 us     spinlock   idle_balance+0xa06
             2      2.50 us      1.83 us      1.25 us        mutex   kernfs_iop_lookup+0x2f
             1      1.71 us      1.71 us      1.71 us        mutex   kernfs_iop_getattr+0x2c
    ...

 - Add new 'perf kwork' tool to trace time properties of kernel work
   (such as softirq, and workqueue), uses eBPF skeletons to collect info
   in kernel space, aggregating data that then gets processed by the
   userspace tool, e.g.:

    # perf kwork report

     Kwork Name      | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
    ----------------------------------------------------------------------------------------------------
     nvme0q5:130     | 004 |      1.101 ms |    49 |    0.051 ms |    26035.056403 s |  26035.056455 s |
     amdgpu:162      | 002 |      0.176 ms |     9 |    0.046 ms |    26035.268020 s |  26035.268066 s |
     nvme0q24:149    | 023 |      0.161 ms |    55 |    0.009 ms |    26035.655280 s |  26035.655288 s |
     nvme0q20:145    | 019 |      0.090 ms |    33 |    0.014 ms |    26035.939018 s |  26035.939032 s |
     nvme0q31:156    | 030 |      0.075 ms |    21 |    0.010 ms |    26035.052237 s |  26035.052247 s |
     nvme0q8:133     | 007 |      0.062 ms |    12 |    0.021 ms |    26035.416840 s |  26035.416861 s |
     nvme0q6:131     | 005 |      0.054 ms |    22 |    0.010 ms |    26035.199919 s |  26035.199929 s |
     nvme0q19:144    | 018 |      0.052 ms |    14 |    0.010 ms |    26035.110615 s |  26035.110625 s |
     nvme0q7:132     | 006 |      0.049 ms |    13 |    0.007 ms |    26035.125180 s |  26035.125187 s |
     nvme0q18:143    | 017 |      0.033 ms |    14 |    0.007 ms |    26035.169698 s |  26035.169705 s |
     nvme0q17:142    | 016 |      0.013 ms |     1 |    0.013 ms |    26035.565147 s |  26035.565160 s |
     enp5s0-rx-0:164 | 006 |      0.004 ms |     4 |    0.002 ms |    26035.928882 s |  26035.928884 s |
     enp5s0-tx-0:166 | 008 |      0.003 ms |     3 |    0.002 ms |    26035.870923 s |  26035.870925 s |
    --------------------------------------------------------------------------------------------------------

   See commit log messages for more examples with extra options to limit
   the events time window, etc.

 - Add support for new AMD IBS (Instruction Based Sampling) features:

   With the DataSrc extensions, the source of data can be decoded among:
     - Local L3 or other L1/L2 in CCX.
     - A peer cache in a near CCX.
     - Data returned from DRAM.
     - A peer cache in a far CCX.
     - DRAM address map with "long latency" bit set.
     - Data returned from MMIO/Config/PCI/APIC.
     - Extension Memory (S-Link, GenZ, etc - identified by the CS target
       and/or address map at DF's choice).
     - Peer Agent Memory.

 - Support hardware tracing with Intel PT on guest machines, combining
   the traces with the ones in the host machine.

 - Add a "-m" option to 'perf buildid-list' to show kernel and modules
   build-ids, to display all of the information needed to do external
   symbolization of kernel stack traces, such as those collected by
   bpf_get_stackid().

 - Add arch TSC frequency information to perf.data file headers.

 - Handle changes in the binutils disassembler function signatures in
   perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer).

 - Fix building the perf perl binding with the newest gcc in distros
   such as fedora rawhide, where some new warnings were breaking the
   build as perf uses -Werror.

 - Add 'perf test' entry for branch stack sampling.

 - Add ARM SPE system wide 'perf test' entry.

 - Add user space counter reading tests to 'perf test'.

 - Build with python3 by default, if available.

 - Add python converter script for the vendor JSON event files.

 - Update vendor JSON files for most Intel cores.

 - Add vendor JSON File for Intel meteorlake.

 - Add Arm Cortex-A78C and X1C JSON vendor event files.

 - Add workaround to symbol address reading from ELF files without phdr,
   falling back to the previoous equation.

 - Convert legacy map definition to BTF-defined in the perf BPF script
   test.

 - Rework prologue generation code to stop using libbpf deprecated APIs.

 - Add default hybrid events for 'perf stat' on x86.

 - Add topdown metrics in the default 'perf stat' on the hybrid machines
   (big/little cores).

 - Prefer sampled CPU when exporting JSON in 'perf data convert'

 - Fix ('perf stat CSV output linter') and ("Check branch stack
   sampling") 'perf test' entries on s390.

* tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
  perf stat: Refactor __run_perf_stat() common code
  perf lock: Print the number of lost entries for BPF
  perf lock: Add --map-nr-entries option
  perf lock: Introduce struct lock_contention
  perf scripting python: Do not build fail on deprecation warnings
  genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO
  perf build: Suppress openssl v3 deprecation warnings in libcrypto feature test
  perf parse-events: Break out tracepoint and printing
  perf parse-events: Don't #define YY_EXTRA_TYPE
  tools bpftool: Don't display disassembler-four-args feature test
  tools bpftool: Fix compilation error with new binutils
  tools bpf_jit_disasm: Don't display disassembler-four-args feature test
  tools bpf_jit_disasm: Fix compilation error with new binutils
  tools perf: Fix compilation error with new binutils
  tools include: add dis-asm-compat.h to handle version differences
  tools build: Don't display disassembler-four-args feature test
  tools build: Add feature test for init_disassemble_info API changes
  perf test: Add ARM SPE system wide test
  perf tools: Rework prologue generation code
  perf bpf: Convert legacy map definition to BTF-defined
  ...
2022-08-06 09:36:08 -07:00
..
sched XArray: Add calls to might_alloc() 2022-07-10 21:17:30 -04:00
unaligned License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
arm-smccc.h tools: Import ARM SMCCC definitions 2022-05-03 21:30:19 +01:00
atomic.h tools/include: Update atomic definitions 2022-02-20 08:44:37 +02:00
bitmap.h bitmap: Fix return values to be unsigned 2022-06-03 06:52:58 -07:00
bitops.h bitops: more BITS_TO_* macros 2020-02-04 03:05:26 +00:00
bits.h linux/bits.h: fix compilation error with GENMASK 2021-05-22 15:09:07 -10:00
btf_ids.h tools/bpf: Sync btf_ids.h to tools 2022-06-29 13:21:52 -07:00
bug.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
build_bug.h tools headers: Syncronize linux/build_bug.h with the kernel sources 2021-01-15 16:31:46 -03:00
cache.h tools/include: Add cache.h stub 2022-02-20 08:44:37 +02:00
compiler_types.h tools: Add sparse context/locking annotations in compiler-types.h 2021-08-19 15:39:22 -03:00
compiler-gcc.h tools: compiler-gcc.h: Guard error attribute use with __has_attribute 2021-09-13 15:51:41 -07:00
compiler.h tools compiler.h: Remove duplicate #ifndef noinline block 2022-03-12 11:00:57 -03:00
const.h linux/bits.h: fix compilation error with GENMASK 2021-05-22 15:09:07 -10:00
coresight-pmu.h perf cs-etm: Update deduction of TRCCONFIGR register for branch broadcast 2022-02-15 17:15:32 -03:00
ctype.h tools headers: Update linux/ctype.h with the kernel sources 2020-12-18 17:32:28 -03:00
debugfs.h tools/include: Add debugfs.h stub 2022-02-20 08:44:37 +02:00
delay.h tools/lib/lockdep: Remove private kernel headers 2017-06-05 09:28:14 +02:00
err.h docs: fix broken documentation links 2019-06-08 13:42:13 -06:00
export.h module: remove EXPORT_UNUSED_SYMBOL* 2021-02-08 12:28:07 +01:00
filter.h bpf: Add bitwise atomic instructions 2021-01-14 18:34:29 -08:00
find.h tools: sync tools/bitmap with mother linux 2022-01-15 08:47:31 -08:00
ftrace.h tools/lib/lockdep: Remove private kernel headers 2017-06-05 09:28:14 +02:00
gfp.h tools: Move gfp.h and slab.h from radix-tree to lib 2022-02-20 08:44:37 +02:00
hash.h hash.h: remove unused define directive 2022-01-20 08:52:54 +02:00
hashtable.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
interrupt.h tools/lib/lockdep: Remove private kernel headers 2017-06-05 09:28:14 +02:00
io.h tools/include: Add io.h stub 2022-02-20 08:44:37 +02:00
jhash.h tools/: replace HTTP links with HTTPS ones 2020-08-07 11:33:21 -07:00
kallsyms.h kallsyms/printk: add loglvl to print_ip_sym() 2020-06-09 09:39:10 -07:00
kconfig.h tools headers: Remove broken definition of __LITTLE_ENDIAN 2021-07-14 14:39:36 -03:00
kern_levels.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kernel.h tools/include: Add _RET_IP_ and math definitions to kernel.h 2022-02-20 08:44:37 +02:00
linkage.h tools/lib/lockdep: Remove private kernel headers 2017-06-05 09:28:14 +02:00
list_sort.h tools lib: Adopt list_sort() from the kernel sources 2021-10-20 10:30:59 -03:00
list.h tools lib: Add list_last_entry_or_null() 2022-07-26 16:02:13 -03:00
log2.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
math64.h tools include: Add an initial math64.h 2021-04-15 16:38:02 -03:00
math.h tools: Fix math.h breakage 2021-11-30 09:14:42 -08:00
mm.h tools/include: Add mm.h file 2022-02-20 08:44:37 +02:00
module.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mutex.h tools/lib/lockdep: Remove private kernel headers 2017-06-05 09:28:14 +02:00
nmi.h tools/lib/lockdep: Add empty nmi.h 2018-09-10 13:48:23 +02:00
numa.h tools/: replace open encodings for NUMA_NO_NODE 2019-03-05 21:07:14 -08:00
objtool.h This was a moderately busy cycle for documentation, but nothing all that 2022-08-02 19:24:24 -07:00
overflow.h compiler.h: drop fallback overflow checkers 2021-09-13 10:18:28 -07:00
pfn.h tools/include: Add pfn.h stub 2022-02-20 08:44:37 +02:00
poison.h mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
rbtree_augmented.h docs: Add rbtree documentation to the core-api 2020-04-21 10:29:19 -06:00
rbtree.h rbtree: Add generic add and find helpers 2021-02-17 14:07:31 +01:00
rcu.h rcu: Don't return a value from rcu_assign_pointer() 2019-06-13 15:38:34 -07:00
refcount.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ring_buffer.h tools headers: Add missing perf_event.h include 2019-08-22 11:12:36 -03:00
seq_file.h tools/lib/lockdep: Remove private kernel headers 2017-06-05 09:28:14 +02:00
sizes.h tools: bpftool: add "prog run" subcommand to test-run programs 2019-07-05 23:48:07 +02:00
slab.h tools: Add kmem_cache_alloc_lru() 2022-04-22 14:24:28 -04:00
spinlock.h tools/lib/lockdep: drop leftover liblockdep headers 2021-12-09 09:37:49 -08:00
static_call_types.h static_call: Move struct static_call_key definition to static_call_types.h 2021-03-11 16:04:39 +01:00
string.h tools lib: Adopt memchr_inv() from kernel 2020-11-27 08:34:52 -03:00
stringify.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
time64.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
types.h memblock: test suite and a small cleanup 2022-03-27 13:36:06 -07:00
zalloc.h tools lib: Adopt zalloc()/zfree() from tools/perf 2019-07-09 10:13:26 -03:00