mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
synced 2025-01-09 15:29:16 +00:00
314 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Josh Poimboeuf
|
fb799447ae |
x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handler() due to lack of information about the next frame. The problem is UNWIND_HINT_EMPTY is used in two different situations: 1) The end of the kernel stack unwind before hitting user entry, boot code, or fork entry 2) A blind spot in ORC coverage where the unwinder has to bail due to lack of information about the next frame The ORC unwinder has no way to tell the difference between the two. When it encounters an undefined stack state with 'end=1', it blindly marks the stack reliable, which can break the livepatch consistency model. Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and UNWIND_HINT_END_OF_STACK. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org |
||
Josh Poimboeuf
|
f902cfdd46 |
x86,objtool: Introduce ORC_TYPE_*
Unwind hints and ORC entry types are two distinct things. Separate them out more explicitly. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/cc879d38fff8a43f8f7beb2fd56e35a5a384d7cd.1677683419.git.jpoimboe@kernel.org |
||
Linus Torvalds
|
857f1268a5 |
Changes in this cycle were:
- Shrink 'struct instruction', to improve objtool performance & memory footprint. - Other maximum memory usage reductions - this makes the build both faster, and fixes kernel build OOM failures on allyesconfig and similar configs when they try to build the final (large) vmlinux.o. - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying single-byte instruction (PUSH/POP or LEAVE). This requires the extension of the ORC metadata structure with a 'signal' field. - Misc fixes & cleanups. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmQAVp8RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gV6A//YbWb4nNxYbRFBd1O3FnFfy4efrDQ4btI hwkL6f7jka9RnIpIEatJvaLdNvyN5tuPCC/+B5eVnvFdd1JcBUmj5D+zYFt6H6qt BG4M6TNHFkP1kOJVfFGn8UPRfoMz2oMiEqilpsc1Yuf7b3ldMJtGUoHaeZC9pyqe RUisKNw4WHZp2G/gTBUWxW17xpWY3Awgch/w4HCu8wMnR+uEC44i0UCBfnAadl36 ar66PfhMJcQIv0XkK9wu43g7+HFnjpxHOx35JW3lRot0xRnwl/JcsmaX5iPkh0gt HV8eLH80J0homeMZDY7vWIKJxGeLkIdfjO5gxwTdnFc9rQw3GwHp1B7WTS6J3Vwe gM00kyaGly3CvkKMiz5QQBfViWCjE25nYS8X0i9Oz6Gk58IkRPGByaDTKRjNrDJB BwH9DE9xb3dPVZRv/PejkTdggQWo+FDTrL8ulHIjUFK11M7VubwkskecNHkfpAOE TRy5iLjMocF8u7hdyec6Mma2K6qEndC2Rw9ZMPQ7TeieMsBcl63cSRgSJLFfdRhr /5c6Hr2SNQKU8xu+3j49GyBwFvp4CwCa+GPs9/o+l0uCvuKNIn9B788cm4TjxLJ9 C3PRzE6B/CaLhYvlC5k5cNM+I4YpoMU/mvSvY6HcC0Duj2nSAWS2VV60MVMDpqVX 8nK4xnla2tM= =bpPY -----END PGP SIGNATURE----- Merge tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Shrink 'struct instruction', to improve objtool performance & memory footprint - Other maximum memory usage reductions - this makes the build both faster, and fixes kernel build OOM failures on allyesconfig and similar configs when they try to build the final (large) vmlinux.o - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying single-byte instruction (PUSH/POP or LEAVE). This requires the extension of the ORC metadata structure with a 'signal' field - Misc fixes & cleanups * tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) objtool: Fix ORC 'signal' propagation objtool: Remove instruction::list x86: Fix FILL_RETURN_BUFFER objtool: Fix overlapping alternatives objtool: Union instruction::{call_dest,jump_table} objtool: Remove instruction::reloc objtool: Shrink instruction::{type,visited} objtool: Make instruction::alts a single-linked list objtool: Make instruction::stack_ops a single-linked list objtool: Change arch_decode_instruction() signature x86/entry: Fix unwinding from kprobe on PUSH/POP instruction x86/unwind/orc: Add 'signal' field to ORC metadata objtool: Optimize layout of struct special_alt objtool: Optimize layout of struct symbol objtool: Allocate multiple structures with calloc() objtool: Make struct check_options static objtool: Make struct entries[] static and const objtool: Fix HOSTCC flag usage objtool: Properly support make V=1 objtool: Install libsubcmd in build ... |
||
Linus Torvalds
|
0df82189bc |
perf tools changes for v6.3:
- 'perf lock contention' improvements: - Add -o/--lock-owner option: $ sudo ./perf lock contention -abo -- ./perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 4.766 [sec] 4.766540 usecs/op 209795 ops/sec contended total wait max wait avg wait pid owner 403 565.32 us 26.81 us 1.40 us -1 Unknown 4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe 1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe 1 2.03 us 2.03 us 2.03 us 5068 chrome The owner is unknown in most cases. Filtering only for the mutex locks, it will more likely get the owners. - -S/--callstack-filter is to limit display entries having the given string in the callstack $ sudo ./perf lock contention -abv -S net sleep 1 ... contended total wait max wait avg wait type caller 5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d 0xffffffffa5dd1c60 _raw_spin_lock+0x30 0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d 0xffffffffa5cd8267 ip6_finish_output2+0x2c7 0xffffffffa5cdac14 ip6_finish_output+0x1d4 0xffffffffa5cdb477 ip6_xmit+0x457 0xffffffffa5d1fd17 inet6_csk_xmit+0xd7 0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a 0xffffffffa5c6467d tcp_keepalive_timer+0x2fd Please note that to have the -b option (BPF) working above one has to build with BUILD_BPF_SKEL=1. - Add more 'perf test' entries to test these new features. - Add Ian Rogers to MAINTAINERS as a perf tools reviewer. - Add support for retire latency feature (pipeline stall of a instruction compared to the previous one, in cycles) present on some Intel processors. - Add 'perf c2c' report option to show false sharing with adjacent cachelines, to be used in machines with cacheline prefetching, where accesses to a cacheline brings the next one too. - Skip 'perf test bpf' when the required kernel-debuginfo package isn't installed. perf script: - Add 'cgroup' field for 'perf script' output: $ perf record --all-cgroups -- true $ perf script -F comm,pid,cgroup true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... - Add support for showing branch speculation information in 'perf script' and in the 'perf report' raw dump (-D). perf record: - Fix 'perf record' segfault with --overwrite and --max-size. Intel PT: - Add support for synthesizing "cycle" events from Intel PT traces as we support "instruction" events when Intel PT CYC packets are available. This enables much more accurate profiles than when using the regular 'perf record -e cycles' (the default) when the workload lasts for very short periods (<10ms). - .plt symbol handling improvements, better handling IBT (in the past MPX) done in the context of decoding Intel PT processor traces, IFUNC symbols on x86_64, static executables, understanding .plt.got symbols on x86_64. - Add a 'perf test' to test symbol resolution, part of the .plt improvements series, this tests things like symbol size in contexts where only the symbol start is available (kallsyms), etc. - Better handle auxtrace/Intel PT data when using pipe mode (perf record sleep 1|perf report). - Fix symbol lookup with kcore with multiple segments match stext, getting the symbol resolution to just show DSOs as unknown. ARM: - Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace Macrocell v4). - Ensure ARM64 CoreSight timestamps don't go backwards. - Document that ARM64 SPE (Statistical Profiling Extension) is used with 'perf c2c/mem'. - Add raw decoding for ARM64 SPEv1.2 previous branch address. - Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1, TLB, cache, branch, PE utilization and instruction mix metrics. - Update decoder code for OpenCSD version 1.4, on ARM64 systems. - Fix command line auto-complete of CPU events on aarch64. perf test/bench: - Switch basic BPF filtering test to use syscall tracepoint to avoid the variable number of probes inserted when using the previous probe point (do_epoll_wait) that happens on different CPU architectures. - Fix DWARF unwind test by adding non-inline to expected function in a backtrace. - Use 'grep -c' where the longer form 'grep | wc -l' was being used. - Add getpid and execve benchmarks to 'perf bench syscall'. Miscellaneous: - Avoid d3-flame-graph package dependency in 'perf script flamegraph', making this feature more generally available. - Add JSON metric events to present CPI stall cycles in Power10. - Assorted improvements/refactorings on the JSON metrics parsing code. Build: - Fix 'perf probe' and 'perf test' when libtraceevent isn't linked, as several tests use tracepoints, those should be skipped. - More fallout fixes for the removal of tools/lib/traceevent/. - Fix build error when linking with libpfm. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY/YzGgAKCRCyPKLppCJ+ J98CAP4/GD3E86Dk+S+w5FmPEHuBKootuZ3pHOqCnXLiyKFZqgEAs9TWOg9KVKGh io9cLluMjzfRwQrND8cpn3VfXxWvVAQ= =L1qh -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: "Miscellaneous: - Add Ian Rogers to MAINTAINERS as a perf tools reviewer. - Add support for retire latency feature (pipeline stall of a instruction compared to the previous one, in cycles) present on some Intel processors. - Add 'perf c2c' report option to show false sharing with adjacent cachelines, to be used in machines with cacheline prefetching, where accesses to a cacheline brings the next one too. - Skip 'perf test bpf' when the required kernel-debuginfo package isn't installed. - Avoid d3-flame-graph package dependency in 'perf script flamegraph', making this feature more generally available. - Add JSON metric events to present CPI stall cycles in Power10. - Assorted improvements/refactorings on the JSON metrics parsing code. perf lock contention: - Add -o/--lock-owner option: $ sudo ./perf lock contention -abo -- ./perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 4.766 [sec] 4.766540 usecs/op 209795 ops/sec contended total wait max wait avg wait pid owner 403 565.32 us 26.81 us 1.40 us -1 Unknown 4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe 1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe 1 2.03 us 2.03 us 2.03 us 5068 chrome The owner is unknown in most cases. Filtering only for the mutex locks, it will more likely get the owners. - -S/--callstack-filter is to limit display entries having the given string in the callstack: $ sudo ./perf lock contention -abv -S net sleep 1 ... contended total wait max wait avg wait type caller 5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d 0xffffffffa5dd1c60 _raw_spin_lock+0x30 0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d 0xffffffffa5cd8267 ip6_finish_output2+0x2c7 0xffffffffa5cdac14 ip6_finish_output+0x1d4 0xffffffffa5cdb477 ip6_xmit+0x457 0xffffffffa5d1fd17 inet6_csk_xmit+0xd7 0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a 0xffffffffa5c6467d tcp_keepalive_timer+0x2fd Please note that to have the -b option (BPF) working above one has to build with BUILD_BPF_SKEL=1. - Add more 'perf test' entries to test these new features. perf script: - Add 'cgroup' field for 'perf script' output: $ perf record --all-cgroups -- true $ perf script -F comm,pid,cgroup true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... - Add support for showing branch speculation information in 'perf script' and in the 'perf report' raw dump (-D). perf record: - Fix 'perf record' segfault with --overwrite and --max-size. perf test/bench: - Switch basic BPF filtering test to use syscall tracepoint to avoid the variable number of probes inserted when using the previous probe point (do_epoll_wait) that happens on different CPU architectures. - Fix DWARF unwind test by adding non-inline to expected function in a backtrace. - Use 'grep -c' where the longer form 'grep | wc -l' was being used. - Add getpid and execve benchmarks to 'perf bench syscall'. Intel PT: - Add support for synthesizing "cycle" events from Intel PT traces as we support "instruction" events when Intel PT CYC packets are available. This enables much more accurate profiles than when using the regular 'perf record -e cycles' (the default) when the workload lasts for very short periods (<10ms). - .plt symbol handling improvements, better handling IBT (in the past MPX) done in the context of decoding Intel PT processor traces, IFUNC symbols on x86_64, static executables, understanding .plt.got symbols on x86_64. - Add a 'perf test' to test symbol resolution, part of the .plt improvements series, this tests things like symbol size in contexts where only the symbol start is available (kallsyms), etc. - Better handle auxtrace/Intel PT data when using pipe mode (perf record sleep 1|perf report). - Fix symbol lookup with kcore with multiple segments match stext, getting the symbol resolution to just show DSOs as unknown. ARM: - Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace Macrocell v4). - Ensure ARM64 CoreSight timestamps don't go backwards. - Document that ARM64 SPE (Statistical Profiling Extension) is used with 'perf c2c/mem'. - Add raw decoding for ARM64 SPEv1.2 previous branch address. - Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1, TLB, cache, branch, PE utilization and instruction mix metrics. - Update decoder code for OpenCSD version 1.4, on ARM64 systems. - Fix command line auto-complete of CPU events on aarch64. Build: - Fix 'perf probe' and 'perf test' when libtraceevent isn't linked, as several tests use tracepoints, those should be skipped. - More fallout fixes for the removal of tools/lib/traceevent/. - Fix build error when linking with libpfm" * tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits) perf tests stat_all_metrics: Change true workload to sleep workload for system wide check perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc perf intel-pt: Synthesize cycle events perf c2c: Add report option to show false sharing in adjacent cachelines perf record: Fix segfault with --overwrite and --max-size perf stat: Avoid merging/aggregating metric counts twice perf tools: Fix perf tool build error in util/pfm.c perf tools: Fix auto-complete on aarch64 perf lock contention: Support old rw_semaphore type perf lock contention: Add -o/--lock-owner option perf lock contention: Fix to save callstack for the default modified perf test bpf: Skip test if kernel-debuginfo is not present perf probe: Update the exit error codes in function try_to_find_probe_trace_event perf script: Fix missing Retire Latency fields option documentation perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT perf test x86: Support the retire_lat (Retire Latency) sample_type check perf test bpf: Check for libtraceevent support perf script: Support Retire Latency perf report: Support Retire Latency perf lock contention: Support filters for different aggregation ... |
||
Ingo Molnar
|
585a78c1f7 |
Merge branch 'linus' into objtool/core, to pick up Xen dependencies
Pick up dependencies - freshly merged upstream via xen-next - before applying dependent objtool changes. Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Linus Torvalds
|
877934769e |
- Cache the AMD debug registers in per-CPU variables to avoid MSR writes
where possible, when supporting a debug registers swap feature for SEV-ES guests - Add support for AMD's version of eIBRS called Automatic IBRS which is a set-and-forget control of indirect branch restriction speculation resources on privilege change - Add support for a new x86 instruction - LKGS - Load kernel GS which is part of the FRED infrastructure - Reset SPEC_CTRL upon init to accomodate use cases like kexec which rediscover - Other smaller fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmP1RDIACgkQEsHwGGHe VUohBw//ZB9ZRqsrKdm6D9YaP2x4Zb+kqKqo6rjYeWaYqyPyCwDujPwh+pb3Oq1t aj62muDv1t/wEJc8mKNkfXkjEEtBVAOcpb5YIpKreoEvNKyevol83Ih0u5iJcTRE E5qf8HDS8b/JZrcazJJLl6WQmQNH5RiKSu5bbCpRhoeOcyo5pRYR5MztK9vNmAQk GMdwHsUSU+jN8uiE4HnpaOb/luhgFindRwZVTpdjJegQWLABS8cl3CKeTv4+PW45 isvv37XnQP248wsptIEVRHeG6g3g/HtvwRx7DikUw06QwUyUK7H9hJssOoSP8TL9 u4psRwfWnJ1OxU6klL+s0Ii+pjQ97wXmK/oqK7QkdUwhWqR/mQAW2e9kWHAngyDn A6mKbzSM6HFAeSXQpB9cMb6uvYRD44SngDFe3WXtEK8jiiQ70ikUm4E28I5KJOPg s+RyioHk0NFRHYSOOBqNG1NKz6ED7L3GbgbbzxkgMh21AAyI3X351t+PtGoLV5ew eqOsM7lbg9Scg1LvPk1JcoALS8USWqgar397rz9qGUs+OkPWBtEBCmTdMz/Eb+2t g/WHdLS5/ajSs5gNhT99W3DeqZMPDEkgBRSeyBBmY3CUD3gBL2wXEktRXv504zBR RC4oyUPX3c9E2ib6GATLE3kBLbcz9hTWbMxF+X3lLJvTVd/Qc2o= =v/ZC -----END PGP SIGNATURE----- Merge tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Cache the AMD debug registers in per-CPU variables to avoid MSR writes where possible, when supporting a debug registers swap feature for SEV-ES guests - Add support for AMD's version of eIBRS called Automatic IBRS which is a set-and-forget control of indirect branch restriction speculation resources on privilege change - Add support for a new x86 instruction - LKGS - Load kernel GS which is part of the FRED infrastructure - Reset SPEC_CTRL upon init to accomodate use cases like kexec which rediscover - Other smaller fixes and cleanups * tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd: Cache debug register values in percpu variables KVM: x86: Propagate the AMD Automatic IBRS feature to the guest x86/cpu: Support AMD Automatic IBRS x86/cpu, kvm: Add the SMM_CTL MSR not present feature x86/cpu, kvm: Add the Null Selector Clears Base feature x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code x86/cpu, kvm: Add support for CPUID_80000021_EAX x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h> x86/gsseg: Use the LKGS instruction if available for load_gs_index() x86/gsseg: Move load_gs_index() to its own new header file x86/gsseg: Make asm_load_gs_index() take an u16 x86/opcode: Add the LKGS instruction to x86-opcode-map x86/cpufeature: Add the CPU feature bit for LKGS x86/bugs: Reset speculation control settings on init x86/cpu: Remove redundant extern x86_read_arch_cap_msr() |
||
Josh Poimboeuf
|
ffb1b4a410 |
x86/unwind/orc: Add 'signal' field to ORC metadata
Add a 'signal' field which allows unwind hints to specify whether the instruction pointer should be taken literally (like for most interrupts and exceptions) rather than decremented (like for call stack return addresses) when used to find the next ORC entry. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org |
||
Tiezhu Yang
|
540f8b5640 |
perf bench syscall: Add execve syscall benchmark
This commit adds the execve syscall benchmark, more syscall benchmarks can be added in the future. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1668052208-14047-5-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Tiezhu Yang
|
391f84e555 |
perf bench syscall: Add getpgid syscall benchmark
This commit adds a simple getpgid syscall benchmark, more syscall benchmarks can be added in the future. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1668052208-14047-4-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Tiezhu Yang
|
3fe91f3262 |
perf bench syscall: Introduce bench_syscall_common()
In the current code, there is only a basic syscall benchmark via getppid, this is not enough. Introduce bench_syscall_common() so that we can add more syscalls to benchmark. This is preparation for later patch, no functionality change. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1668052208-14047-3-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Tiezhu Yang
|
1bad502775 |
tools x86: Keep list sorted by number in unistd_{32,64}.h
It is better to keep list sorted by number in unistd_{32,64}.h, so that we can add more syscall number to a proper position. This is preparation for later patch, no functionality change. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1668052208-14047-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
effa76856f |
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
To pick the changes in: 8aff460f216753d8 ("KVM: x86: Add a VALID_MASK for the flags in kvm_msr_filter_range") c1340fe3590ebbe7 ("KVM: x86: Add a VALID_MASK for the flag in kvm_msr_filter") be83794210e7020f ("KVM: x86: Disallow the use of KVM_MSR_FILTER_DEFAULT_ALLOW in the kernel") That just rebuilds kvm-stat.c on x86, no change in functionality. This silences these perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h' diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Cc: Aaron Lewis <aaronlewis@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lore.kernel.org/lkml/Y8VR5wSAkd2A0HxS@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
H. Peter Anvin (Intel)
|
5a91f12660 |
x86/opcode: Add the LKGS instruction to x86-opcode-map
Add the instruction opcode used by LKGS to x86-opcode-map. Opcode number is per public FRED draft spec v3.0. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230112072032.35626-3-xin3.li@intel.com |
||
H. Peter Anvin (Intel)
|
660569472d |
x86/cpufeature: Add the CPU feature bit for LKGS
Add the CPU feature bit for LKGS (Load "Kernel" GS). LKGS instruction is introduced with Intel FRED (flexible return and event delivery) specification. Search for the latest FRED spec in most search engines with this search pattern: site:intel.com FRED (flexible return and event delivery) specification LKGS behaves like the MOV to GS instruction except that it loads the base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor cache, which is exactly what Linux kernel does to load a user level GS base. Thus, with LKGS, there is no need to SWAPGS away from the kernel GS base. [ mingo: Minor tweaks to the description. ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230112072032.35626-2-xin3.li@intel.com |
||
Arnaldo Carvalho de Melo
|
a66558dcb1 |
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in: 97fa21f65c3eb5bb ("x86/resctrl: Move MSR defines into msr-index.h") 7420ae3bb977b46e ("x86/intel_epb: Set Alder Lake N and Raptor Lake P normal EPB") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2022-12-20 14:28:40.893794072 -0300 +++ after 2022-12-20 14:28:54.831993914 -0300 @@ -266,6 +266,7 @@ [0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO", [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT", [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG", + [0xc0000200 - x86_64_specific_MSRs_offset] = "IA32_MBA_BW_BASE", [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS", [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL", [0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR", $ Now one can trace systemwide asking to see backtraces to where that MSR is being read/written, see this example with a previous update: # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/lkml/Y6HyTOGRNvKfCVe4@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
51c4f2bf53 |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from: 5e85c4ebf206e50c ("x86: KVM: Advertise AVX-IFMA CPUID to user space") af2872f622547656 ("x86: KVM: Advertise AMX-FP16 CPUID to user space") 6a19d7aa5821522e ("x86: KVM: Advertise CMPccXADD CPUID to user space") aaa65d17eec372c6 ("x86/tsx: Add a feature bit for TSX control MSR support") b1599915f09157e9 ("x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit") 16a7fe3728a8b832 ("KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest") 80e4c1cd42fff110 ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH") 7df548840c496b01 ("x86/bugs: Add "unknown" reporting for MMIO Stale Data") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiaxi Chen <jiaxi.chen@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kai Huang <kai.huang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/Y6CD%2FIcEbDW5X%2FpN@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
0bc1d0e2c1 |
tools headers disabled-cpufeatures: Sync with the kernel sources
To pick the changes from: 15e15d64bd8e12d8 ("x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.h") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h' diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h Cc: Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Y6B2w3WqifB%2FV70T@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Linus Torvalds
|
8fa590bf34 |
ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. * Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). * Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. * Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. * Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. * Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: * Second batch of the lazy destroy patches * First batch of KVM changes for kernel virtual != physical address support * Removal of a unused function x86: * Allow compiling out SMM support * Cleanup and documentation of SMM state save area format * Preserve interrupt shadow in SMM state save area * Respond to generic signals during slow page faults * Fixes and optimizations for the non-executable huge page errata fix. * Reprogram all performance counters on PMU filter change * Cleanups to Hyper-V emulation and tests * Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) * Advertise several new Intel features * x86 Xen-for-KVM: ** Allow the Xen runstate information to cross a page boundary ** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured ** Add support for 32-bit guests in SCHEDOP_poll * Notable x86 fixes and cleanups: ** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). ** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. ** Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. ** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. ** Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. ** Advertise (on AMD) that the SMM_CTL MSR is not supported ** Remove unnecessary exports Generic: * Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: * Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. * Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. * Introduce actual atomics for clear/set_bit() in selftests * Add support for pinning vCPUs in dirty_log_perf_test. * Rename the so called "perf_util" framework to "memstress". * Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. * Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. * Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). * A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. * x86-specific selftest changes: ** Clean up x86's page table management. ** Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. ** Clean up the nEPT support checks. ** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. ** Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: * Remove deleted ioctls from documentation * Clean up the docs for the x86 MSR filter. * Various fixes -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA== =QAYX -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM64: - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function x86: - Allow compiling out SMM support - Cleanup and documentation of SMM state save area format - Preserve interrupt shadow in SMM state save area - Respond to generic signals during slow page faults - Fixes and optimizations for the non-executable huge page errata fix. - Reprogram all performance counters on PMU filter change - Cleanups to Hyper-V emulation and tests - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) - Advertise several new Intel features - x86 Xen-for-KVM: - Allow the Xen runstate information to cross a page boundary - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured - Add support for 32-bit guests in SCHEDOP_poll - Notable x86 fixes and cleanups: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. - Advertise (on AMD) that the SMM_CTL MSR is not supported - Remove unnecessary exports Generic: - Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: - Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. - Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. - Introduce actual atomics for clear/set_bit() in selftests - Add support for pinning vCPUs in dirty_log_perf_test. - Rename the so called "perf_util" framework to "memstress". - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. - Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). - A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. - x86-specific selftest changes: - Clean up x86's page table management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. - Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: - Remove deleted ioctls from documentation - Clean up the docs for the x86 MSR filter. - Various fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits) KVM: x86: Add proper ReST tables for userspace MSR exits/flags KVM: selftests: Allocate ucall pool from MEM_REGION_DATA KVM: arm64: selftests: Align VA space allocator with TTBR0 KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: x86: Advertise that the SMM_CTL MSR is not supported KVM: x86: remove unnecessary exports KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic" tools: KVM: selftests: Convert clear/set_bit() to actual atomics tools: Drop "atomic_" prefix from atomic test_and_set_bit() tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests perf tools: Use dedicated non-atomic clear/set bit helpers tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers KVM: arm64: selftests: Enable single-step without a "full" ucall() KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself KVM: Remove stale comment about KVM_REQ_UNHALT KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR KVM: Reference to kvm_userspace_memory_region in doc and comments KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl ... |
||
Linus Torvalds
|
7a76117f9f |
platform-drivers-x86 for v6.2-1
Highlights: - Intel: - PMC: Add support for Meteor Lake - Intel On Demand: various updates - ideapad-laptop: - Add support for various Fn keys on new models - Fix touchpad on/off handling in a generic way to avoid having to add more and more quirks - android-x86-tablets: Add support for 2 more X86 Android tablet models - New Dell WMI DDV driver - Miscellaneous cleanups and small bugfixes The following is an automated git shortlog grouped by driver: ACPI: - battery: Pass battery hook pointer to hook callbacks ISST: - Fix typo in comments Move existing HP drivers to a new hp subdir: - Move existing HP drivers to a new hp subdir dell: - Add new dell-wmi-ddv driver dell-ddv: - Warn if ePPID has a suspicious length - Improve buffer handling huawei-wmi: - remove unnecessary member - fix return value calculation - do not hard-code sizes ideapad-laptop: - Make touchpad_ctrl_via_ec a module option - Stop writing VPCCMD_W_TOUCHPAD at probe time - Send KEY_TOUCHPAD_TOGGLE on some models - Only toggle ps2 aux port on/off on select models - Do not send KEY_TOUCHPAD* events on probe / resume - Refactor ideapad_sync_touchpad_state() - support for more special keys in WMI - Add new _CFG bit numbers for future use - Revert "check for touchpad support in _CFG" intel/pmc: - Relocate Alder Lake PCH support - Relocate Tiger Lake PCH support - Relocate Ice Lake PCH support - Relocate Cannon Lake Point PCH support - Relocate Sunrise Point PCH support - Move variable declarations and definitions to header and core.c - Replace all the reg_map with init functions intel/pmc/core: - Add Meteor Lake support to pmc core driver intel_scu_ipc: - fix possible name leak in __intel_scu_ipc_register() mxm-wmi: - fix memleak in mxm_wmi_call_mx[ds|mx]() platform/mellanox: - mlxbf-pmc: Fix event typo - Add BlueField-3 support in the tmfifo driver platform/x86/amd: - pmc: Add a workaround for an s0i3 issue on Cezanne platform/x86/amd/pmf: - pass the struct by reference platform/x86/dell: - alienware-wmi: Use sysfs_emit() instead of scnprintf() platform/x86/intel: - pmc: Fix repeated word in comment platform/x86/intel/hid: - Add module-params for 5 button array + SW_TABLET_MODE reporting platform/x86/intel/sdsi: - Add meter certificate support - Support different GUIDs - Hide attributes if hardware doesn't support - Add Intel On Demand text sony-laptop: - Convert to use sysfs_emit_at() API thinkpad_acpi: - use strstarts() - Fix max_brightness of thinklight tools/arch/x86: - intel_sdsi: Add support for reading meter certificates - intel_sdsi: Add support for new GUID - intel_sdsi: Read more On Demand registers - intel_sdsi: Add Intel On Demand text - intel_sdsi: Add support for reading state certificates uv_sysfs: - Use sysfs_emit() instead of scnprintf() wireless-hotkey: - use ACPI HID as phys x86-android-tablets: - Add Advantech MICA-071 extra button - Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data - Add Medion Lifetab S10346 data -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmOW+QgUHGhkZWdvZWRl QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yAPwf/dAYLHiC2ox5YlNTLX2DvU+jOpeBv W+EIx4oHQz1+O9jrWMLyvS9zTwTEAf6ANLiMP3damEvtJnB72ClgFITzlJAaB4zN yj0SdxoBRMt6zDL2QwMkwitvb5kJonLfO2H7NsMwA6f0KP1X8sio3oVRAMMVwlzz nwDKM/VBpuxmy+d880wRRoAkgRkTsPIOwBkYdo1525NU7kkTmtrMpgM+SXQsHTJn TB9uQnyuiq5/znh3k1Qn+OGwXQezmGz2Fb76IcW5RzUQDew6n6b3kzILee5ddynT Pa7/ibwpV+FtZjm2kS/l4tV+WPdA+s5TSWoq7Hz0jzBX9GdOORcMZmEneg== =z16d -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: - Intel: - PMC: Add support for Meteor Lake - Intel On Demand: various updates - Ideapad-laptop: - Add support for various Fn keys on new models - Fix touchpad on/off handling in a generic way to avoid having to add more and more quirks - Android x86 tablets: - Add support for two more X86 Android tablet models - New Dell WMI DDV driver - Miscellaneous cleanups and small bugfixes * tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (52 commits) platform/mellanox: mlxbf-pmc: Fix event typo platform/x86: intel_scu_ipc: fix possible name leak in __intel_scu_ipc_register() platform/x86: sony-laptop: Convert to use sysfs_emit_at() API platform/x86/dell: alienware-wmi: Use sysfs_emit() instead of scnprintf() platform/x86: uv_sysfs: Use sysfs_emit() instead of scnprintf() platform/x86: mxm-wmi: fix memleak in mxm_wmi_call_mx[ds|mx]() platform/x86: x86-android-tablets: Add Advantech MICA-071 extra button platform/x86: x86-android-tablets: Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data platform/x86: x86-android-tablets: Add Medion Lifetab S10346 data platform/x86: wireless-hotkey: use ACPI HID as phys platform/x86/intel/hid: Add module-params for 5 button array + SW_TABLET_MODE reporting platform/x86: ideapad-laptop: Make touchpad_ctrl_via_ec a module option platform/x86: ideapad-laptop: Stop writing VPCCMD_W_TOUCHPAD at probe time platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models platform/x86: ideapad-laptop: Only toggle ps2 aux port on/off on select models platform/x86: ideapad-laptop: Do not send KEY_TOUCHPAD* events on probe / resume platform/x86: ideapad-laptop: Refactor ideapad_sync_touchpad_state() tools/arch/x86: intel_sdsi: Add support for reading meter certificates tools/arch/x86: intel_sdsi: Add support for new GUID tools/arch/x86: intel_sdsi: Read more On Demand registers ... |
||
Sean Christopherson
|
bb056c0f08 |
tools: KVM: selftests: Convert clear/set_bit() to actual atomics
Convert {clear,set}_bit() to atomics as KVM's ucall implementation relies on clear_bit() being atomic, they are defined in atomic.h, and the same helpers in the kernel proper are atomic. KVM's ucall infrastructure is the only user of clear_bit() in tools/, and there are no true set_bit() users. tools/testing/nvdimm/ does make heavy use of set_bit(), but that code builds into a kernel module of sorts, i.e. pulls in all of the kernel's header and so is already getting the kernel's atomic set_bit(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221119013450.2643007-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Sean Christopherson
|
36293352ff |
tools: Drop "atomic_" prefix from atomic test_and_set_bit()
Drop the "atomic_" prefix from tools' atomic_test_and_set_bit() to match the kernel nomenclature where test_and_set_bit() is atomic, and __test_and_set_bit() provides the non-atomic variant. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221119013450.2643007-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Javier Martinez Canillas
|
66a9221d73 |
KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
The documentation says that the ioctl has been deprecated, but it has been actually removed and the remaining references are just left overs. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Message-Id: <20221202105011.185147-3-javierm@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
David E. Box
|
7fdc03a737 |
tools/arch/x86: intel_sdsi: Add support for reading meter certificates
Add option to read and decode On Demand meter certificates. Link: https://github.com/intel/intel-sdsi/blob/master/meter-certificate.rst Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221119002343.1281885-10-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
David E. Box
|
429e789c67 |
tools/arch/x86: intel_sdsi: Add support for new GUID
The structure and content of the On Demand registers is based on the GUID which is read from hardware through sysfs. Add support for decoding the registers of a new GUID 0xF210D9EF. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20221119002343.1281885-9-david.e.box@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
David E. Box
|
a8041a89b7 |
tools/arch/x86: intel_sdsi: Read more On Demand registers
Add decoding of the following On Demand register fields: 1. NVRAM content authorization error status 2. Enabled features: telemetry and attestation 3. Key provisioning status 4. NVRAM update limit 5. PCU_CR3_CAPID_CFG Link: https://github.com/intel/intel-sdsi/blob/master/state-certificate-encoding.rst Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221119002343.1281885-8-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
David E. Box
|
334599bccb |
tools/arch/x86: intel_sdsi: Add Intel On Demand text
Intel Software Defined Silicon (SDSi) is now officially known as Intel On Demand. Change text in tool to indicate this. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221119002343.1281885-7-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
David E. Box
|
3088258ea7 |
tools/arch/x86: intel_sdsi: Add support for reading state certificates
Add option to read and decode On Demand state certificates. Link: https://github.com/intel/intel-sdsi/blob/master/state-certificate-encoding.rst Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221119002343.1281885-6-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
Peter Gonda
|
cf4694be2b |
tools: Add atomic_test_and_set_bit()
Add x86 and generic implementations of atomic_test_and_set_bit() to allow KVM selftests to atomically manage bitmaps. Note, the generic version is taken from arch_test_and_set_bit() as of commit 415d83249709 ("locking/atomic: Make test_and_*_bit() ordered on failure"). Signed-off-by: Peter Gonda <pgonda@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006003409.649993-5-seanjc@google.com |
||
Borislav Petkov
|
2632daebaf |
x86/cpu: Restore AMD's DE_CFG MSR after resume
DE_CFG contains the LFENCE serializing bit, restore it on resume too. This is relevant to older families due to the way how they do S3. Unify and correct naming while at it. Fixes: e4d0e84e4907 ("x86/cpu/AMD: Make LFENCE a serializing instruction") Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Reported-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Arnaldo Carvalho de Melo
|
74455fd7e4 |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from: 257449c6a50298bd ("x86/cpufeatures: Add LbrExtV2 feature bit") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/lkml/Y1g6vGPqPhOrXoaN@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
4402e360d0 |
tools headers: Update the copy of x86's memcpy_64.S used in 'perf bench'
We also need to add SYM_TYPED_FUNC_START() to util/include/linux/linkage.h and update tools/perf/check_headers.sh to ignore the include cfi_types.h line when checking if the kernel original files drifted from the copies we carry. This is to get the changes from: ccace936eec7b805 ("x86: Add types to indirectly called assembly functions") Addressing these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S' diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/lkml/Y1f3VRIec9EBgX6F@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
a3a365655a |
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in: b8d1d163604bd1e6 ("x86/apic: Don't disable x2APIC if locked") ca5b7c0d9621702e ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2022-10-14 18:06:34.294561729 -0300 +++ after 2022-10-14 18:06:41.285744044 -0300 @@ -264,6 +264,7 @@ [0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE", [0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX", [0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO", + [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT", [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG", [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS", [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL", $ Now one can trace systemwide asking to see backtraces to where that MSR is being read/written, see this example with a previous update: # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/lkml/Y0nQkz2TUJxwfXJd@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ravi Bangoria
|
160ae99365 |
perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
Although new details added into this header is currently used by kernel only, tools copy needs to be in sync with kernel file to avoid tools/perf/check-headers.sh warnings. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-3-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
356edeca2e |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from: 7df548840c496b01 ("x86/bugs: Add "unknown" reporting for MMIO Stale Data") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://lore.kernel.org/lkml/YysTRji90sNn2p5f@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Nick Desaulniers
|
a0a12c3ed0 |
asm goto: eradicate CC_HAS_ASM_GOTO
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0. The minimum supported versions of these tools for the build according to Documentation/process/changes.rst are 5.1 and 11.0.0 respectively. Remove the feature detection script, Kconfig option, and clean up some fallback code that is no longer supported. The removed script was also testing for a GCC specific bug that was fixed in the 4.7 release. Also remove workarounds for bpftrace using clang older than 9.0.0, since other BPF backend fixes are required at this point. Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/ Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637 Acked-by: Borislav Petkov <bp@suse.de> Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Arnaldo Carvalho de Melo
|
e5bc0deae5 |
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
To pick the changes in: 43bb9e000ea4c621 ("KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific") 94dfc73e7cf4a31d ("treewide: uapi: Replace zero-length arrays with flexible-array members") bfbcc81bb82cbbad ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior") b172862241b48499 ("KVM: x86: PIT: Preserve state of speaker port data bit") ed2351174e38ad4f ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault") That just rebuilds kvm-stat.c on x86, no change in functionality. This silences these perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h' diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Cc: Chenyi Qiang <chenyi.qiang@intel.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Durrant <pdurrant@amazon.com> Link: https://lore.kernel.org/lkml/Yv6OMPKYqYSbUxwZ@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
eea085d114 |
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
To pick the changes in: 2f4073e08f4cc5a4 ("KVM: VMX: Enable Notify VM exit") That makes 'perf kvm-stat' aware of this new NOTIFY exit reason, thus addressing the following perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h' diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Tao Xu <tao3.xu@intel.com> Link: http://lore.kernel.org/lkml/Yv6LavXMZ+njijpq@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
62ed93d199 |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from: 2b1299322016731d ("x86/speculation: Add RSB VM Exit protections") 28a99e95f55c6185 ("x86/amd: Use IBPB for firmware calls") 4ad3278df6fe2b08 ("x86/speculation: Disable RRSBA behavior") 26aae8ccbc197223 ("x86/cpu/amd: Enumerate BTC_NO") 9756bba28470722d ("x86/speculation: Fill RSB on vmexit for IBRS") 3ebc170068885b6f ("x86/bugs: Add retbleed=ibpb") 2dbb887e875b1de3 ("x86/entry: Add kernel IBRS implementation") 6b80b59b35557065 ("x86/bugs: Report AMD retbleed vulnerability") a149180fbcf336e9 ("x86: Add magic AMD return-thunk") 15e67227c49a5783 ("x86: Undo return-thunk damage") a883d624aed463c8 ("x86/cpufeatures: Move RETPOLINE flags to word 11") aae99a7c9ab371b2 ("x86/cpufeatures: Introduce x2AVIC CPUID bit") 6f33a9daff9f0790 ("x86: Fix comment for X86_FEATURE_ZEN") 51802186158c74a0 ("x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexandre Chartre <alexandre.chartre@oracle.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Borislav Petkov <bp@suse.de> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Wyes Karny <wyes.karny@amd.com> Link: https://lore.kernel.org/lkml/Yvznmu5oHv0ZDN2w@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
7f7f86a7bd |
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in: 2b1299322016731d ("x86/speculation: Add RSB VM Exit protections") 4af184ee8b2c0a69 ("tools/power turbostat: dump secondary Turbo-Ratio-Limit") 4ad3278df6fe2b08 ("x86/speculation: Disable RRSBA behavior") d7caac991feeef1b ("x86/cpu/amd: Add Spectral Chicken") 6ad0ad2bf8a67e27 ("x86/bugs: Report Intel retbleed vulnerability") c59a1f106f5cd484 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") 465932db25f36648 ("x86/cpu: Add new VMX feature, Tertiary VM-Execution control") 027bbb884be006b0 ("KVM: x86/speculation: Disable Fill buffer clear within guests") 51802186158c74a0 ("x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2022-08-17 09:05:13.938246475 -0300 +++ after 2022-08-17 09:05:22.221455851 -0300 @@ -161,6 +161,7 @@ [0x0000048f] = "IA32_VMX_TRUE_EXIT_CTLS", [0x00000490] = "IA32_VMX_TRUE_ENTRY_CTLS", [0x00000491] = "IA32_VMX_VMFUNC", + [0x00000492] = "IA32_VMX_PROCBASED_CTLS3", [0x000004c1] = "IA32_PMC0", [0x000004d0] = "IA32_MCG_EXT_CTL", [0x00000560] = "IA32_RTIT_OUTPUT_BASE", @@ -212,6 +213,7 @@ [0x0000064D] = "PLATFORM_ENERGY_STATUS", [0x0000064e] = "PPERF", [0x0000064f] = "PERF_LIMIT_REASONS", + [0x00000650] = "SECONDARY_TURBO_RATIO_LIMIT", [0x00000658] = "PKG_WEIGHTED_CORE_C0_RES", [0x00000659] = "PKG_ANY_CORE_C0_RES", [0x0000065A] = "PKG_ANY_GFXE_C0_RES", $ Now one can trace systemwide asking to see backtraces to where those MSRs are being read/written, see this example with a previous update: # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Like Xu <like.xu@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/lkml/YvzbT24m2o5U%2F7+q@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Linus Torvalds
|
5318b987fe |
More from the CPU vulnerability nightmares front:
Intel eIBRS machines do not sufficiently mitigate against RET mispredictions when doing a VM Exit therefore an additional RSB, one-entry stuffing is needed. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLqsGsACgkQEsHwGGHe VUpXGg//ZEkxhf3Ri7X9PknAWNG6eIEqigKqWcdnOw+Oq/GMVb6q7JQsqowK7KBZ AKcY5c/KkljTJNohditnfSOePyCG5nDTPgfkjzIawnaVdyJWMRCz/L4X2cv6ykDl 2l2EvQm4Ro8XAogYhE7GzDg/osaVfx93OkLCQj278VrEMWgM/dN2RZLpn+qiIkNt DyFlQ7cr5UASh/svtKLko268oT4JwhQSbDHVFLMJ52VaLXX36yx4rValZHUKFdox ZDyj+kiszFHYGsI94KAD0dYx76p6mHnwRc4y/HkVcO8vTacQ2b9yFYBGTiQatITf 0Nk1RIm9m3rzoJ82r/U0xSIDwbIhZlOVNm2QtCPkXqJZZFhopYsZUnq2TXhSWk4x GQg/2dDY6gb/5MSdyLJmvrTUtzResVyb/hYL6SevOsIRnkwe35P6vDDyp15F3TYK YvidZSfEyjtdLISBknqYRQD964dgNZu9ewrj+WuJNJr+A2fUvBzUebXjxHREsugN jWp5GyuagEKTtneVCvjwnii+ptCm6yfzgZYLbHmmV+zhinyE9H1xiwVDvo5T7DDS ZJCBgoioqMhp5qR59pkWz/S5SNGui2rzEHbAh4grANy8R/X5ASRv7UHT9uAo6ve1 xpw6qnE37CLzuLhj8IOdrnzWwLiq7qZ/lYN7m+mCMVlwRWobbOo= =a8em -----END PGP SIGNATURE----- Merge tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 eIBRS fixes from Borislav Petkov: "More from the CPU vulnerability nightmares front: Intel eIBRS machines do not sufficiently mitigate against RET mispredictions when doing a VM Exit therefore an additional RSB, one-entry stuffing is needed" * tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Add LFENCE to RSB fill sequence x86/speculation: Add RSB VM Exit protections |
||
Linus Torvalds
|
48a577dc1b |
perf tools changes for v6.0: 1st batch
- Introduce 'perf lock contention' subtool, using new lock contention tracepoints and using BPF for in kernel aggregation and then userspace processing using the perf tooling infrastructure for resolving symbols, target specification, etc. Since the new lock contention tracepoints don't provide lock names, get up to 8 stack traces and display the first non-lock function symbol name as a caller: $ perf lock report -F acquired,contended,avg_wait,wait_total Name acquired contended avg wait total wait update_blocked_a... 40 40 3.61 us 144.45 us kernfs_fop_open+... 5 5 3.64 us 18.18 us _nohz_idle_balance 3 3 2.65 us 7.95 us tick_do_update_j... 1 1 6.04 us 6.04 us ep_scan_ready_list 1 1 3.93 us 3.93 us Supports the usual 'perf record' + 'perf report' workflow as well as a BCC/bpftrace like mode where you start the tool and then press control+C to get results: $ sudo perf lock contention -b ^C contended total wait max wait avg wait type caller 42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20 23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a 6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30 3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c 1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115 1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148 2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b 1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06 2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f 1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c ... - Add new 'perf kwork' tool to trace time properties of kernel work (such as softirq, and workqueue), uses eBPF skeletons to collect info in kernel space, aggregating data that then gets processed by the userspace tool, e.g.: # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | ---------------------------------------------------------------------------------------------------- nvme0q5:130 | 004 | 1.101 ms | 49 | 0.051 ms | 26035.056403 s | 26035.056455 s | amdgpu:162 | 002 | 0.176 ms | 9 | 0.046 ms | 26035.268020 s | 26035.268066 s | nvme0q24:149 | 023 | 0.161 ms | 55 | 0.009 ms | 26035.655280 s | 26035.655288 s | nvme0q20:145 | 019 | 0.090 ms | 33 | 0.014 ms | 26035.939018 s | 26035.939032 s | nvme0q31:156 | 030 | 0.075 ms | 21 | 0.010 ms | 26035.052237 s | 26035.052247 s | nvme0q8:133 | 007 | 0.062 ms | 12 | 0.021 ms | 26035.416840 s | 26035.416861 s | nvme0q6:131 | 005 | 0.054 ms | 22 | 0.010 ms | 26035.199919 s | 26035.199929 s | nvme0q19:144 | 018 | 0.052 ms | 14 | 0.010 ms | 26035.110615 s | 26035.110625 s | nvme0q7:132 | 006 | 0.049 ms | 13 | 0.007 ms | 26035.125180 s | 26035.125187 s | nvme0q18:143 | 017 | 0.033 ms | 14 | 0.007 ms | 26035.169698 s | 26035.169705 s | nvme0q17:142 | 016 | 0.013 ms | 1 | 0.013 ms | 26035.565147 s | 26035.565160 s | enp5s0-rx-0:164 | 006 | 0.004 ms | 4 | 0.002 ms | 26035.928882 s | 26035.928884 s | enp5s0-tx-0:166 | 008 | 0.003 ms | 3 | 0.002 ms | 26035.870923 s | 26035.870925 s | -------------------------------------------------------------------------------------------------------- See commit log messages for more examples with extra options to limit the events time window, etc. - Add support for new AMD IBS (Instruction Based Sampling) features: With the DataSrc extensions, the source of data can be decoded among: - Local L3 or other L1/L2 in CCX. - A peer cache in a near CCX. - Data returned from DRAM. - A peer cache in a far CCX. - DRAM address map with "long latency" bit set. - Data returned from MMIO/Config/PCI/APIC. - Extension Memory (S-Link, GenZ, etc - identified by the CS target and/or address map at DF's choice). - Peer Agent Memory. - Support hardware tracing with Intel PT on guest machines, combining the traces with the ones in the host machine. - Add a "-m" option to 'perf buildid-list' to show kernel and modules build-ids, to display all of the information needed to do external symbolization of kernel stack traces, such as those collected by bpf_get_stackid(). - Add arch TSC frequency information to perf.data file headers. - Handle changes in the binutils disassembler function signatures in perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer). - Fix building the perf perl binding with the newest gcc in distros such as fedora rawhide, where some new warnings were breaking the build as perf uses -Werror. - Add 'perf test' entry for branch stack sampling. - Add ARM SPE system wide 'perf test' entry. - Add user space counter reading tests to 'perf test'. - Build with python3 by default, if available. - Add python converter script for the vendor JSON event files. - Update vendor JSON files for alderlake, bonnell, broadwell, broadwellde, broadwellx, cascadelakex, elkhartlake, goldmont, goldmontplus, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, knightslanding, nehalemep, nehalemex, sandybridge, sapphirerapids, silvermont, skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp and westmereex. - Add vendor JSON File for Intel meteorlake. - Add Arm Cortex-A78C and X1C JSON vendor event files. - Add workaround to symbol address reading from ELF files without phdr, falling back to the previoous equation. - Convert legacy map definition to BTF-defined in the perf BPF script test. - Rework prologue generation code to stop using libbpf deprecated APIs. - Add default hybrid events for 'perf stat' on x86. - Add topdown metrics in the default 'perf stat' on the hybrid machines (big/little cores). - Prefer sampled CPU when exporting JSON in 'perf data convert' - Fix ('perf stat CSV output linter') and ("Check branch stack sampling") 'perf test' entries on s390. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYuw6gwAKCRCyPKLppCJ+ J5+iAP0RL6sKMhzdkRjRYfG8CluJ401YaPHadzv5jxP8gOZz2gEAsuYDrMF9t1zB 4DqORfobdX9UQEJjP9oRltU73GM0swI= =2/M0 -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: - Introduce 'perf lock contention' subtool, using new lock contention tracepoints and using BPF for in kernel aggregation and then userspace processing using the perf tooling infrastructure for resolving symbols, target specification, etc. Since the new lock contention tracepoints don't provide lock names, get up to 8 stack traces and display the first non-lock function symbol name as a caller: $ perf lock report -F acquired,contended,avg_wait,wait_total Name acquired contended avg wait total wait update_blocked_a... 40 40 3.61 us 144.45 us kernfs_fop_open+... 5 5 3.64 us 18.18 us _nohz_idle_balance 3 3 2.65 us 7.95 us tick_do_update_j... 1 1 6.04 us 6.04 us ep_scan_ready_list 1 1 3.93 us 3.93 us Supports the usual 'perf record' + 'perf report' workflow as well as a BCC/bpftrace like mode where you start the tool and then press control+C to get results: $ sudo perf lock contention -b ^C contended total wait max wait avg wait type caller 42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20 23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a 6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30 3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c 1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115 1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148 2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b 1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06 2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f 1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c ... - Add new 'perf kwork' tool to trace time properties of kernel work (such as softirq, and workqueue), uses eBPF skeletons to collect info in kernel space, aggregating data that then gets processed by the userspace tool, e.g.: # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | ---------------------------------------------------------------------------------------------------- nvme0q5:130 | 004 | 1.101 ms | 49 | 0.051 ms | 26035.056403 s | 26035.056455 s | amdgpu:162 | 002 | 0.176 ms | 9 | 0.046 ms | 26035.268020 s | 26035.268066 s | nvme0q24:149 | 023 | 0.161 ms | 55 | 0.009 ms | 26035.655280 s | 26035.655288 s | nvme0q20:145 | 019 | 0.090 ms | 33 | 0.014 ms | 26035.939018 s | 26035.939032 s | nvme0q31:156 | 030 | 0.075 ms | 21 | 0.010 ms | 26035.052237 s | 26035.052247 s | nvme0q8:133 | 007 | 0.062 ms | 12 | 0.021 ms | 26035.416840 s | 26035.416861 s | nvme0q6:131 | 005 | 0.054 ms | 22 | 0.010 ms | 26035.199919 s | 26035.199929 s | nvme0q19:144 | 018 | 0.052 ms | 14 | 0.010 ms | 26035.110615 s | 26035.110625 s | nvme0q7:132 | 006 | 0.049 ms | 13 | 0.007 ms | 26035.125180 s | 26035.125187 s | nvme0q18:143 | 017 | 0.033 ms | 14 | 0.007 ms | 26035.169698 s | 26035.169705 s | nvme0q17:142 | 016 | 0.013 ms | 1 | 0.013 ms | 26035.565147 s | 26035.565160 s | enp5s0-rx-0:164 | 006 | 0.004 ms | 4 | 0.002 ms | 26035.928882 s | 26035.928884 s | enp5s0-tx-0:166 | 008 | 0.003 ms | 3 | 0.002 ms | 26035.870923 s | 26035.870925 s | -------------------------------------------------------------------------------------------------------- See commit log messages for more examples with extra options to limit the events time window, etc. - Add support for new AMD IBS (Instruction Based Sampling) features: With the DataSrc extensions, the source of data can be decoded among: - Local L3 or other L1/L2 in CCX. - A peer cache in a near CCX. - Data returned from DRAM. - A peer cache in a far CCX. - DRAM address map with "long latency" bit set. - Data returned from MMIO/Config/PCI/APIC. - Extension Memory (S-Link, GenZ, etc - identified by the CS target and/or address map at DF's choice). - Peer Agent Memory. - Support hardware tracing with Intel PT on guest machines, combining the traces with the ones in the host machine. - Add a "-m" option to 'perf buildid-list' to show kernel and modules build-ids, to display all of the information needed to do external symbolization of kernel stack traces, such as those collected by bpf_get_stackid(). - Add arch TSC frequency information to perf.data file headers. - Handle changes in the binutils disassembler function signatures in perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer). - Fix building the perf perl binding with the newest gcc in distros such as fedora rawhide, where some new warnings were breaking the build as perf uses -Werror. - Add 'perf test' entry for branch stack sampling. - Add ARM SPE system wide 'perf test' entry. - Add user space counter reading tests to 'perf test'. - Build with python3 by default, if available. - Add python converter script for the vendor JSON event files. - Update vendor JSON files for most Intel cores. - Add vendor JSON File for Intel meteorlake. - Add Arm Cortex-A78C and X1C JSON vendor event files. - Add workaround to symbol address reading from ELF files without phdr, falling back to the previoous equation. - Convert legacy map definition to BTF-defined in the perf BPF script test. - Rework prologue generation code to stop using libbpf deprecated APIs. - Add default hybrid events for 'perf stat' on x86. - Add topdown metrics in the default 'perf stat' on the hybrid machines (big/little cores). - Prefer sampled CPU when exporting JSON in 'perf data convert' - Fix ('perf stat CSV output linter') and ("Check branch stack sampling") 'perf test' entries on s390. * tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits) perf stat: Refactor __run_perf_stat() common code perf lock: Print the number of lost entries for BPF perf lock: Add --map-nr-entries option perf lock: Introduce struct lock_contention perf scripting python: Do not build fail on deprecation warnings genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO perf build: Suppress openssl v3 deprecation warnings in libcrypto feature test perf parse-events: Break out tracepoint and printing perf parse-events: Don't #define YY_EXTRA_TYPE tools bpftool: Don't display disassembler-four-args feature test tools bpftool: Fix compilation error with new binutils tools bpf_jit_disasm: Don't display disassembler-four-args feature test tools bpf_jit_disasm: Fix compilation error with new binutils tools perf: Fix compilation error with new binutils tools include: add dis-asm-compat.h to handle version differences tools build: Don't display disassembler-four-args feature test tools build: Add feature test for init_disassemble_info API changes perf test: Add ARM SPE system wide test perf tools: Rework prologue generation code perf bpf: Convert legacy map definition to BTF-defined ... |
||
Daniel Sneddon
|
2b12993220 |
x86/speculation: Add RSB VM Exit protections
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE. == Background == Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change. To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests. == Problem == Here's a simplification of how guests are run on Linux' KVM: void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs } The execution flow for that would look something like this to the processor: 1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest() Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code: * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing. * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand". IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction. However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address. Balanced CALL/RET instruction pairs such as in step #5 are not affected. == Solution == The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e., eIBRS systems which enable legacy IBRS explicitly. However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT and most of them need a new mitigation. Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT. The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE. In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE. There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO. [ bp: Massage, incorporate review comments from Andy Cooper. ] Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
Linus Torvalds
|
e2b5421007 |
flexible-array transformations in UAPI for 6.0-rc1
Hi Linus, Please, pull the following treewide patch that replaces zero-length arrays with flexible-array members in UAPI. This patch has been baking in linux-next for 5 weeks now. -fstrict-flex-arrays=3 is coming and we need to land these changes to prevent issues like these in the short future: ../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source] strcpy(de3->name, "."); ^ Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If this breaks anything, we can use a union with a new member name. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 Thanks -- Gustavo -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmLoNdcACgkQRwW0y0cG 2zEVeg//QYJ3j2pbKt9zB6muO3SkrNoMPc5wpY/SITUeiDscukLvGzJG88eIZskl NaEjbmacHmdlQrBkUdr10i1+hkb2zRd6/j42GIDXEhhKTMoT2UxJCBp47KSvd7VY dKNLGsgQs3kwmmxLEGu6w6vywWpI5wxXTKWL1Q/RpUXoOnLmsMEbzKTjf12a1Edl 9gPNY+tMHIHyB0pGIRXDY/ZF5c+FcRFn6kKeMVzJL0bnX7FI4UmYe83k9ajEiLWA MD3JAw/mNv2X0nizHHuQHIjtky8Pr+E8hKs5ni88vMYmFqeABsTw4R1LJykv/mYa NakU1j9tHYTKcs2Ju+gIvSKvmatKGNmOpti/8RAjEX1YY4cHlHWNsigVbVRLqfo7 SKImlSUxOPGFS3HAJQCC9P/oZgICkUdD6sdLO1PVBnE1G3Fvxg5z6fGcdEuEZkVR PQwlYDm1nlTuScbkgVSBzyU/AkntVMJTuPWgbpNo+VgSXWZ8T/U8II0eGrFVf9rH +y5dAS52/bi6OP0la7fNZlq7tcPfNG9HJlPwPb1kQtuPT4m6CBhth/rRrDJwx8za 0cpJT75Q3CI0wLZ7GN4yEjtNQrlAeeiYiS4LMQ/SFFtg1KzvmYYVmWDhOf0+mMDA f7bq4cxEg2LHwrhRgQQWowFVBu7yeiwKbcj9sybfA27bMqCtfto= =8yMq -----END PGP SIGNATURE----- Merge tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull uapi flexible array update from Gustavo Silva: "A treewide patch that replaces zero-length arrays with flexible-array members in UAPI. This has been baking in linux-next for 5 weeks now. '-fstrict-flex-arrays=3' is coming and we need to land these changes to prevent issues like these in the short future: fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source] strcpy(de3->name, "."); ^ Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If this breaks anything, we can use a union with a new member name" Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 * tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: treewide: uapi: Replace zero-length arrays with flexible-array members |
||
Arnaldo Carvalho de Melo
|
18808564aa |
Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up the fixes that went upstream via acme/perf/urgent and to get to v5.19. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
553de6e115 |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from: 28a99e95f55c6185 ("x86/amd: Use IBPB for firmware calls") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org Link: https://lore.kernel.org/lkml/Yt6oWce9UDAmBAtX@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
0698461ad2 |
Merge remote-tracking branch 'torvalds/master' into perf/core
To update the perf/core codebase. Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end of evsel__config(), after what was added in: 49c692b7dfc9b6c0 ("perf offcpu: Accept allowed sample types only") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
91d248c3b9 |
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes from these csets: 4ad3278df6fe2b08 ("x86/speculation: Disable RRSBA behavior") d7caac991feeef1b ("x86/cpu/amd: Add Spectral Chicken") That cause no changes to tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after $ Just silences this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/YtQTm9wsB3hxQWvy@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
f098addbdb |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from: f43b9876e857c739 ("x86/retbleed: Add fine grained Kconfig knobs") a149180fbcf336e9 ("x86: Add magic AMD return-thunk") 15e67227c49a5783 ("x86: Undo return-thunk damage") 369ae6ffc41a3c11 ("x86/retpoline: Cleanup some #ifdefery") 4ad3278df6fe2b08 x86/speculation: Disable RRSBA behavior 26aae8ccbc197223 x86/cpu/amd: Enumerate BTC_NO 9756bba28470722d x86/speculation: Fill RSB on vmexit for IBRS 3ebc170068885b6f x86/bugs: Add retbleed=ibpb 2dbb887e875b1de3 x86/entry: Add kernel IBRS implementation 6b80b59b35557065 x86/bugs: Report AMD retbleed vulnerability a149180fbcf336e9 x86: Add magic AMD return-thunk 15e67227c49a5783 x86: Undo return-thunk damage a883d624aed463c8 x86/cpufeatures: Move RETPOLINE flags to word 11 51802186158c74a0 x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h' diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org Link: https://lore.kernel.org/lkml/YtQM40VmiLTkPND2@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Pawan Gupta
|
4ad3278df6 |
x86/speculation: Disable RRSBA behavior
Some Intel processors may use alternate predictors for RETs on RSB-underflow. This condition may be vulnerable to Branch History Injection (BHI) and intramode-BTI. Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines, eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against such attacks. However, on RSB-underflow, RET target prediction may fallback to alternate predictors. As a result, RET's predicted target may get influenced by branch history. A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback behavior when in kernel mode. When set, RETs will not take predictions from alternate predictors, hence mitigating RETs as well. Support for this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2). For spectre v2 mitigation, when a user selects a mitigation that protects indirect CALLs and JMPs against BHI and intramode-BTI, set RRSBA_DIS_S also to protect RETs for RSB-underflow case. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
Gustavo A. R. Silva
|
94dfc73e7c |
treewide: uapi: Replace zero-length arrays with flexible-array members
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This code was transformed with the help of Coccinelle: (linux-5.19-rc2$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch) @@ identifier S, member, array; type T1, T2; @@ struct S { ... T1 member; T2 array[ - 0 ]; }; -fstrict-flex-arrays=3 is coming and we need to land these changes to prevent issues like these in the short future: ../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source] strcpy(de3->name, "."); ^ Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If this breaks anything, we can use a union with a new member name. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/78 Build-tested-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/62b675ec.wKX6AOZ6cbE71vtF%25lkp@intel.com/ Acked-by: Dan Williams <dan.j.williams@intel.com> # For ndctl.h Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> |