mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-10 15:10:38 +00:00
9 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
bcf5470eb4 |
s390 updates for 6.3 merge window
- Large cleanup of the con3270/tty3270 driver. Among others this fixes: * Background Color Support * ASCII Line Character Support * VT100 Support * Geometries other than 80x24 - Cleanup and improve cmpxchg() code. Also add cmpxchg_user_key() to uaccess functions, which will be used by KVM to access KVM guest memory with a specific storage key. - Add support for user space events counting to CPUMF. - Cleanup the vfio/ccw code, which also allows now to properly support 2K Format-2 IDALs. - Move kernel page table allocation and initialization to decompressor, which finally allows to enter the kernel with dynamic address translation enabled. This in turn allows to get rid of code with special handling in the kernel, which has to distinguish if DAT is on or off. - Replace kretprobe with rethook. - Various improvements to vfio/ap queue resets: * Use TAPQ to verify completion of a reset in progress rather than multiple invocations of ZAPQ. * Check TAPQ response codes when verifying successful completion of ZAPQ. * Fix erroneous handling of some error response codes. * Increase the maximum amount of time to wait for successful completion of ZAPQ. - Rework system call wrappers to get rid of alias functions, which were only left on s390. - Cleanup diag288_wdt watchdog driver. It has been agreed on with Guenter Roeck that this goes upstream via the s390 tree. - Add missing loadparm parameter handling for list-directed ECKD ipl/reipl. - Various improvements to memory detection code. - Remove arch_cpu_idle_time() since the current implementation is broken, and allows user space observable accounted idle times which can temporarily decrease. - Add Reset DAT-Protection support: (only) allow to change PTEs from RO to RW with a new RDP instruction. Unlike the currently used IPTE instruction, this does not necessarily guarantee that TLBs of all CPUs are synchronously flushed; and that remote CPUs can see spurious protection faults. The overall improvement for not requiring an all CPU synchronization, like it is required with IPTE, should be beneficial. - Fix KFENCE page fault reporting. - Smaller cleanups and improvement all over the place. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmPzWhQACgkQIg7DeRsp bsKkdQ//QbCwDMt6T3bmi6gGgs9HRSkLOTHAlOIRuetEzRiBzm/O4Gm8NycvPspl BIcuXmQKt+gBS44tWikKpwuhmWrAtiFUxs/M1uPfRXqjUf+ZFJinPJgtPCBa/3rv tQkh541QxpX4K5Ks71WKv2Kh0RjaTqw5Kj+rlDBYHsxZvb28mDigINRYoVSxNUKi dTVlR0UgdGLecXfezpvWeEAbJu6Q2pbIkOT3tNOumNqRAoUN4cbH3P0agHJdq8oj L/++d4tfVbwL8N/VCwIVBeW/AQzA0B2UCDVz75Pd55+FFrIGVp1hn7QC9QQieomL fzGOTrL4D9U8JkAIJqhioA1NlcN1+QW2svoMVo0N3vBJpIbzX4bZKTDxZZ26dG9H ox7YvhsZtJA7p34X5hetoObzZcmiYJStT+BDao7q1x3oLf4G31HaP+YUDIKBPmNW ieZa+ujYbKor1pD6ysaMVX+c1qhfX6S/V0uBAikoqMWUVUvH/ZeuSxCSfMuvWUrQ KFuc0HnPiiIO1Ux3wN5oN33+pWCSdUcJOeg4aj0jkkFT9Ct3TOBupIGBGyhOAh6r OTp1iqJuQjwOmkWPLyRuGMzRmDDp+hWz9qNF/DFwIV1IMi9AzJjEOg31cMVxx3gG iM25560uqvhhUwQFbyjVgJmj00dmqMX07q8QSVOgg9oUrnRhQrc= =MW9x -----END PGP SIGNATURE----- Merge tag 's390-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Large cleanup of the con3270/tty3270 driver. Among others this fixes: - Background Color Support - ASCII Line Character Support - VT100 Support - Geometries other than 80x24 - Cleanup and improve cmpxchg() code. Also add cmpxchg_user_key() to uaccess functions, which will be used by KVM to access KVM guest memory with a specific storage key - Add support for user space events counting to CPUMF - Cleanup the vfio/ccw code, which also allows now to properly support 2K Format-2 IDALs - Move kernel page table allocation and initialization to decompressor, which finally allows to enter the kernel with dynamic address translation enabled. This in turn allows to get rid of code with special handling in the kernel, which has to distinguish if DAT is on or off - Replace kretprobe with rethook - Various improvements to vfio/ap queue resets: - Use TAPQ to verify completion of a reset in progress rather than multiple invocations of ZAPQ. - Check TAPQ response codes when verifying successful completion of ZAPQ. - Fix erroneous handling of some error response codes. - Increase the maximum amount of time to wait for successful completion of ZAPQ - Rework system call wrappers to get rid of alias functions, which were only left on s390 - Cleanup diag288_wdt watchdog driver. It has been agreed on with Guenter Roeck that this goes upstream via the s390 tree - Add missing loadparm parameter handling for list-directed ECKD ipl/reipl - Various improvements to memory detection code - Remove arch_cpu_idle_time() since the current implementation is broken, and allows user space observable accounted idle times which can temporarily decrease - Add Reset DAT-Protection support: (only) allow to change PTEs from RO to RW with a new RDP instruction. Unlike the currently used IPTE instruction, this does not necessarily guarantee that TLBs of all CPUs are synchronously flushed; and that remote CPUs can see spurious protection faults. The overall improvement for not requiring an all CPU synchronization, like it is required with IPTE, should be beneficial - Fix KFENCE page fault reporting - Smaller cleanups and improvement all over the place * tag 's390-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (182 commits) s390/irq,idle: simplify idle check s390/processor: add test_and_set_cpu_flag() and test_and_clear_cpu_flag() s390/processor: let cpu helper functions return boolean values s390/kfence: fix page fault reporting s390/zcrypt: introduce ctfm field in struct CPRBX s390: remove confusing comment from uapi types header file vfio/ccw: remove WARN_ON during shutdown s390/entry: remove toolchain dependent micro-optimization s390/mem_detect: do not truncate online memory ranges info s390/vx: remove __uint128_t type from __vector128 struct again s390/mm: add support for RDP (Reset DAT-Protection) s390/mm: define private VM_FAULT_* reasons from top bits Documentation: s390: correct spelling s390/ap: fix status returned by ap_qact() s390/ap: fix status returned by ap_aqic() s390: vfio-ap: tighten the NIB validity check Revert "s390/mem_detect: do not update output parameters on failure" s390/idle: remove arch_cpu_idle_time() and corresponding code s390/vx: use simple assignments to access __vector128 members s390/vx: add 64 and 128 bit members to __vector128 struct ... |
||
Thomas Richter
|
1e99c242ac |
s390/cpum_cf: merge source files for CPU Measurement counter facility
With no in-kernel user, the source files can be merged. Move all functions and the variable definitions to file perf_cpum_cf.c This file now contains all the necessary functions and definitions for the CPU Measurement counter facility device driver. The files cpu_mcf.h and perf_cpum_cf_common.c are deleted. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> |
||
Namhyung Kim
|
0a9081cf0a |
perf/core: Add perf_sample_save_raw_data() helper
When we save the raw_data to the perf sample data, we need to update the sample flags and the dynamic size. To make sure this is done consistently, add the perf_sample_save_raw_data() helper and convert all call sites. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230118060559.615653-4-namhyung@kernel.org |
||
Linus Torvalds
|
add7695957 |
Perf events updates for v6.2:
- Thoroughly rewrite the data structures that implement perf task context handling, with the goal of fixing various quirks and unfeatures both in already merged, and in upcoming proposed code. The old data structure is the per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' In this new design this is replaced with a single task context and a single CPU context, plus intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' [ See commit bd2756811766 for more details. ] This rewrite was developed by Peter Zijlstra and Ravi Bangoria. - Optimize perf_tp_event() - Update the Intel uncore PMU driver, extending it with UPI topology discovery on various hardware models. - Misc fixes & cleanups Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmOXjuURHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1j+VhAAknimsLwenTHCGQp7yqsWSKfBr9KI2UgD ZgtQuuwRwSzwqAEwC5Mt6zcIkxRNhU1ookFPqQbpY3XA0W4aNakUk8bDF8QIEKW0 MFWxn7PtReWqKcUay2oEGGurqZ5OtfpljJGxigQh5oVeMGc+itIwHF2JefeyoRnu pq7R2qDgOBb7Np4lWTdqXGmKufzp04/nely2IZQBO8x80cGRZiKQIrGrch6vLUf7 3iEz9rwmvPyz0aczYSpa/duEZDMLm4lWNK4oMUEXuUWC8gU7CUzBJsJ3AS5NgxAu yGBXe/s7GHqwtc/F30l5gK/J5WAyK83IF7sckxTj0dBUpyC6wQwwYPm8BaCAMoqN X6mU7Ve938Siih1TyOBZfZsrtDDILhV2N/nku2erb3iqes26u0RcT25rWtu9Yqvn hm4Gm6cmkHWq4EOHSBvAdC7l7lDZ3fyVI5+8nN9ly9Qv867HjG70dvIr9iEEolpX rhFAz8r/NwTXhDY0AmFZcOkrnNV3IuHtibJ/9wJlgJNqDPqN12Wxqdzy0Nj3HH6G EsukBO05cWaDS0gB8MpaO6Q6YtqAr87ZY+afHDBwcfkME50/CyBLr5rd47dTR+Ip B+zreYKcaNHdEMd1A9KULRTTDnEjlXYMwjVVJiPRV0jcmA3dHmM46HN5Ae9NdO6+ R2BAWv9XR6M= =KNaI -----END PGP SIGNATURE----- Merge tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Thoroughly rewrite the data structures that implement perf task context handling, with the goal of fixing various quirks and unfeatures both in already merged, and in upcoming proposed code. The old data structure is the per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' In this new design this is replaced with a single task context and a single CPU context, plus intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' [ See commit bd2756811766 for more details. ] This rewrite was developed by Peter Zijlstra and Ravi Bangoria. - Optimize perf_tp_event() - Update the Intel uncore PMU driver, extending it with UPI topology discovery on various hardware models. - Misc fixes & cleanups * tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) perf/x86/intel/uncore: Fix reference count leak in __uncore_imc_init_box() perf/x86/intel/uncore: Fix reference count leak in snr_uncore_mmio_map() perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox() perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology() perf/x86/intel/uncore: Make set_mapping() procedure void perf/x86/intel/uncore: Update sysfs-devices-mapping file perf/x86/intel/uncore: Enable UPI topology discovery for Sapphire Rapids perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server perf/x86/intel/uncore: Get UPI NodeID and GroupID perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server perf/x86/intel/uncore: Generalize get_topology() for SKX PMUs perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D perf/x86/intel/uncore: Clear attr_update properly perf/x86/intel/uncore: Introduce UPI topology type perf/x86/intel/uncore: Generalize IIO topology support perf/core: Don't allow grouping events from different hw pmus perf/amd/ibs: Make IBS a core pmu perf: Fix function pointer case perf/x86/amd: Remove the repeated declaration perf: Fix possible memleak in pmu_dev_alloc() ... |
||
Linus Torvalds
|
47477c84b8 |
s390 updates for 6.2 merge window
- Factor out handle_write() function and simplify 3215 console write operation. - When 3170 terminal emulator is connected to the 3215 console driver the boot time could be very long due to limited buffer space or missing operator input. Add con3215_drop command line parameter and con3215_drop sysfs attribute file to instruct the kernel drop console data when such conditions are met. - Fix white space errors in 3215 console driver. - Move enum paiext_mode definition to a header file and rename it to paievt_mode to indicate this is now used for several events. Rename PAI_MODE_COUNTER to PAI_MODE_COUNTING to make consistent with PAI_MODE_SAMPLING. - Simplify the logic of PMU pai_crypto mapped buffer reference counter and make it consistent with PMU pai_ext. - Rename PMU pai_crypto mapped buffer structure member users to active_events to make it consistent with PMU pai_ext. - Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP configuration option. This results in saving of 12K per 1M hugetlb page (~1.2%) and 32764K per 2G hugetlb page (~1.6%). - Use generic serial.h, bugs.h, shmparam.h and vga.h header files and scrap s390-specific versions. - The generic percpu setup code does not expect the s390-like implementation and emits a warning. To get rid of that warning and provide sane CPU-to-node and CPU-to-CPU distance mappings implementat a minimal version of setup_per_cpu_areas(). - Use kstrtobool() instead of strtobool() for re-IPL sysfs device attributes. - Avoid unnecessary lookup of a pointer to MSI descriptor when setting IRQ affinity for a PCI device. - Get rid of "an incompatible function type cast" warning by changing debug_sprintf_format_fn() function prototype so it matches the debug_format_proc_t function type. - Remove unused info_blk_hdr__pcpus() and get_page_state() functions. - Get rid of clang "unused unused insn cache ops function" warning by moving s390_insn definition to a private header. - Get rid of clang "unused function" warning by making function raw3270_state_final() only available if CONFIG_TN3270_CONSOLE is enabled. - Use kstrobool() to parse sclp_con_drop parameter to make it identical to the con3215_drop parameter and allow passing values like "yes" and "true". - Use sysfs_emit() for all SCLP sysfs show functions, which is the current standard way to generate output strings. - Make SCLP con_drop sysfs attribute also writable and allow to change its value during runtime. This makes SCLP console drop handling consistent with the 3215 device driver. - Virtual and physical addresses are indentical on s390. However, there is still a confusion when pointers are directly casted to physical addresses or vice versa. Use correct address converters virt_to_phys() and phys_to_virt() for s390 channel IO drivers. - Support for power managemant has been removed from s390 since quite some time. Remove unused power managemant code from the appldata device driver. - Allow memory tools like KASAN see memory accesses from the checksum code. Switch to GENERIC_CSUM if KASAN is enabled, just like x86 does. - Add support of ECKD DASDs disks so it could be used as boot and dump devices. - Follow checkpatch recommendations and use octal values instead of S_IRUGO and S_IWUSR for dump device attributes in sysfs. - Changes to vx-insn.h do not cause a recompile of C files that use asm(".include \"asm/vx-insn.h\"\n") magic to access vector instruction macros from inline assemblies. Add wrapper include header file to avoid this problem. - Use vector instruction macros instead of byte patterns to increase register validation routine readability. - The current machine check register validation handling does not take into account various scenarios and might lead to killing a wrong user process or potentially ignore corrupted FPU registers. Simplify logic of the machine check handler and stop the whole machine if the previous context was kerenel mode. If the previous context was user mode, kill the current task. - Introduce sclp_emergency_printk() function which can be used to emit a message in emergency cases. It is supposed to be used in cases where regular console device drivers may not work anymore, e.g. unrecoverable machine checks. Keep the early Service-Call Control Block so it can also be used after initdata has been freed to allow sclp_emergency_printk() implementation. - In case a system will be stopped because of an unrecoverable machine check error print the machine check interruption code to give a hint of what went wrong. - Move storage error checking from the assembly entry code to C in order to simplify machine check handling. Enter the handler with DAT turned on, which simplifies the entry code even more. - The machine check extended save areas are allocated using a private "nmi_save_areas" slab cache which guarantees a required power-of-two alignment. Get rid of that cache in favour of kmalloc(). -----BEGIN PGP SIGNATURE----- iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCY5ckrhccYWdvcmRlZXZA bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8NlrAQD8NCLeEAkhGCRnzdTyngExCrzV Mw//cEnksUkIPqalJgEArbyFjGh05ecNaiDQduH8Gh94/qOhGE4obMdTgMWq7QY= =3aou -----END PGP SIGNATURE----- Merge tag 's390-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Factor out handle_write() function and simplify 3215 console write operation - When 3170 terminal emulator is connected to the 3215 console driver the boot time could be very long due to limited buffer space or missing operator input. Add con3215_drop command line parameter and con3215_drop sysfs attribute file to instruct the kernel drop console data when such conditions are met - Fix white space errors in 3215 console driver - Move enum paiext_mode definition to a header file and rename it to paievt_mode to indicate this is now used for several events. Rename PAI_MODE_COUNTER to PAI_MODE_COUNTING to make consistent with PAI_MODE_SAMPLING - Simplify the logic of PMU pai_crypto mapped buffer reference counter and make it consistent with PMU pai_ext - Rename PMU pai_crypto mapped buffer structure member users to active_events to make it consistent with PMU pai_ext - Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP configuration option. This results in saving of 12K per 1M hugetlb page (~1.2%) and 32764K per 2G hugetlb page (~1.6%) - Use generic serial.h, bugs.h, shmparam.h and vga.h header files and scrap s390-specific versions - The generic percpu setup code does not expect the s390-like implementation and emits a warning. To get rid of that warning and provide sane CPU-to-node and CPU-to-CPU distance mappings implementat a minimal version of setup_per_cpu_areas() - Use kstrtobool() instead of strtobool() for re-IPL sysfs device attributes - Avoid unnecessary lookup of a pointer to MSI descriptor when setting IRQ affinity for a PCI device - Get rid of "an incompatible function type cast" warning by changing debug_sprintf_format_fn() function prototype so it matches the debug_format_proc_t function type - Remove unused info_blk_hdr__pcpus() and get_page_state() functions - Get rid of clang "unused unused insn cache ops function" warning by moving s390_insn definition to a private header - Get rid of clang "unused function" warning by making function raw3270_state_final() only available if CONFIG_TN3270_CONSOLE is enabled - Use kstrobool() to parse sclp_con_drop parameter to make it identical to the con3215_drop parameter and allow passing values like "yes" and "true" - Use sysfs_emit() for all SCLP sysfs show functions, which is the current standard way to generate output strings - Make SCLP con_drop sysfs attribute also writable and allow to change its value during runtime. This makes SCLP console drop handling consistent with the 3215 device driver - Virtual and physical addresses are indentical on s390. However, there is still a confusion when pointers are directly casted to physical addresses or vice versa. Use correct address converters virt_to_phys() and phys_to_virt() for s390 channel IO drivers - Support for power managemant has been removed from s390 since quite some time. Remove unused power managemant code from the appldata device driver - Allow memory tools like KASAN see memory accesses from the checksum code. Switch to GENERIC_CSUM if KASAN is enabled, just like x86 does - Add support of ECKD DASDs disks so it could be used as boot and dump devices - Follow checkpatch recommendations and use octal values instead of S_IRUGO and S_IWUSR for dump device attributes in sysfs - Changes to vx-insn.h do not cause a recompile of C files that use asm(".include \"asm/vx-insn.h\"\n") magic to access vector instruction macros from inline assemblies. Add wrapper include header file to avoid this problem - Use vector instruction macros instead of byte patterns to increase register validation routine readability - The current machine check register validation handling does not take into account various scenarios and might lead to killing a wrong user process or potentially ignore corrupted FPU registers. Simplify logic of the machine check handler and stop the whole machine if the previous context was kerenel mode. If the previous context was user mode, kill the current task - Introduce sclp_emergency_printk() function which can be used to emit a message in emergency cases. It is supposed to be used in cases where regular console device drivers may not work anymore, e.g. unrecoverable machine checks Keep the early Service-Call Control Block so it can also be used after initdata has been freed to allow sclp_emergency_printk() implementation - In case a system will be stopped because of an unrecoverable machine check error print the machine check interruption code to give a hint of what went wrong - Move storage error checking from the assembly entry code to C in order to simplify machine check handling. Enter the handler with DAT turned on, which simplifies the entry code even more - The machine check extended save areas are allocated using a private "nmi_save_areas" slab cache which guarantees a required power-of-two alignment. Get rid of that cache in favour of kmalloc() * tag 's390-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (38 commits) s390/nmi: get rid of private slab cache s390/nmi: move storage error checking back to C, enter with DAT on s390/nmi: print machine check interruption code before stopping system s390/sclp: introduce sclp_emergency_printk() s390/sclp: keep sclp_early_sccb s390/nmi: rework register validation handling s390/nmi: use vector instruction macros instead of byte patterns s390/vx: add vx-insn.h wrapper include file s390/ipl: use octal values instead of S_* macros s390/ipl: add eckd dump support s390/ipl: add eckd support vfio/ccw: identify CCW data addresses as physical vfio/ccw: sort out physical vs virtual pointers usage s390/checksum: support GENERIC_CSUM, enable it for KASAN s390/appldata: remove power management callbacks s390/cio: sort out physical vs virtual pointers usage s390/sclp: allow to change sclp_console_drop during runtime s390/sclp: convert to use sysfs_emit() s390/sclp: use kstrobool() to parse sclp_con_drop parameter s390/3270: make raw3270_state_final() depend on CONFIG_TN3270_CONSOLE ... |
||
Peter Zijlstra
|
bd27568117 |
perf: Rewrite core context handling
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged). Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU *Current design:* Currently we have a per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task. Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU. The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc. Each perf_event is then associated with its PMU and one perf_event_context. *Proposed design:* New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' With the new design, perf_event_context will hold all events for all pmus in the (respective pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key: {cpu, pmu, cgroup, group_index} Each perf_event_context carries a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For example, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc. Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information. Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context. Further optimizations to current implementation are possible. For example, ctx_resched() can be optimized to reschedule only single pmu events. Much thanks to Ravi for picking this up and pushing it towards completion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com |
||
Thomas Richter
|
8b1e6a3fb3 |
s390/pai: fix raw data collection for PMU pai_ext
Commit 838d9bb62d13 ("perf: Use sample_flags for raw_data") changed the way the raw data of an event is collected. Adjust the PMU pai_ext to the new scheme. Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> |
||
Thomas Richter
|
4c78796301 |
s390/pai: move enum definition to header file
Move enum definition to header file. This is done in preparation for a follow on patch where this enum will be used in another source file. Also change the enum name from paiext_mode to paievt_mode to indicate this enum is now used for several events. Make naming consistent and rename PAI_MODE_COUNTER to PAI_MODE_COUNTING. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> |
||
Thomas Richter
|
c432fefe8e |
s390/pai: Add support for PAI Extension 1 NNPA counters
PMU device driver perf_paiext supports Processor Activity Instrumentation Extension (PAIE1), available with IBM z16: - maps a 512 byte block to lowcore address 0x1508 called PAIE1 control block. - maps a 1024 byte block at PAIE1 control block entry with index 2. - uses control register bit 14 to enable PAIE1 control block lookup. - turn PAIE1 nnpa counting on and off by setting bit 63 in PAIE1 control block entry with index 2. - creates a sample with raw data on each context switch out when at context switch some mapped counters have a value of nonzero. This device driver only supports CPU wide context, no task context is allowed. Support for counting: - one or more counters can be specified using perf stat -e pai_ext/xxx/ where xxx stands for the counter event name. Multiple invocation of this command is possible. The counter names are listed in /sys/devices/pai_ext/events directory. - one special counters can be specified using perf stat -e pai_ext/NNPA_ALL/ which returns the sum of all incremented nnpa counters. - multiple counting events can run in parallel. Support for Sampling: - one event pai_ext/NNPA_ALL/ is reserved for sampling. The event collects data at context switch out and saves them in the ring buffer. - no multiple invocations are possible. The PAIE1 nnpa counter events are system wide. No task context is supported. Therefore some restrictions documented in function paiext_busy() apply. Extend qpaci assembly instruction to query supported memory mapped nnpa counters. It returns the number of counters (no holes allowed in that range). PAIE1 nnpa counter events can not be created when a CPU hot plug add is processed. This means a CPU hot plug add does not get the necessary PAIE1 event to record PAIE1 nnpa counter increments on the newly added CPU. CPU hot plug remove removes the event and terminates the counting of PAIE1 counters immediately. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> |