linux-stable/arch/x86/events
Kan Liang 25dfc9e357 perf/x86/intel: Limit the period on Haswell
Running the ltp test cve-2015-3290 concurrently reports the following
warnings.

perfevents: irq loop stuck!
  WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174
  intel_pmu_handle_irq+0x285/0x370
  Call Trace:
   <NMI>
   ? __warn+0xa4/0x220
   ? intel_pmu_handle_irq+0x285/0x370
   ? __report_bug+0x123/0x130
   ? intel_pmu_handle_irq+0x285/0x370
   ? __report_bug+0x123/0x130
   ? intel_pmu_handle_irq+0x285/0x370
   ? report_bug+0x3e/0xa0
   ? handle_bug+0x3c/0x70
   ? exc_invalid_op+0x18/0x50
   ? asm_exc_invalid_op+0x1a/0x20
   ? irq_work_claim+0x1e/0x40
   ? intel_pmu_handle_irq+0x285/0x370
   perf_event_nmi_handler+0x3d/0x60
   nmi_handle+0x104/0x330

Thanks to Thomas Gleixner's analysis, the issue is caused by the low
initial period (1) of the frequency estimation algorithm, which triggers
the defects of the HW, specifically erratum HSW11 and HSW143. (For the
details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/)

The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL
event, but the initial period in the freq mode is 1. The erratum is the
same as the BDM11, which has been supported in the kernel. A minimum
period of 128 is enforced as well on HSW.

HSW143 is regarding that the fixed counter 1 may overcount 32 with the
Hyper-Threading is enabled. However, based on the test, the hardware
has more issues than it tells. Besides the fixed counter 1, the message
'interrupt took too long' can be observed on any counter which was armed
with a period < 32 and two events expired in the same NMI. A minimum
period of 32 is enforced for the rest of the events.
The recommended workaround code of the HSW143 is not implemented.
Because it only addresses the issue for the fixed counter. It brings
extra overhead through extra MSR writing. No related overcounting issue
has been reported so far.

Fixes: 3a632cb229 ("perf/x86/intel: Add simple Haswell PMU support")
Reported-by: Li Huafei <lihuafei1@huawei.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com
Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/
2024-08-25 16:49:05 +02:00
..
amd perf/x86/amd/uncore: Fix DF and UMC domain identification 2024-07-04 16:00:41 +02:00
intel perf/x86/intel: Limit the period on Haswell 2024-08-25 16:49:05 +02:00
zhaoxin perf/x86: Support counter mask 2024-07-04 16:00:36 +02:00
core.c perf/x86: Fix smp_processor_id()-in-preemptible warnings 2024-07-31 12:57:39 +02:00
Kconfig perf/x86/Kconfig: Fix indentation in the Kconfig file 2022-05-25 15:54:26 +02:00
Makefile perf/x86: Move branch classifier 2022-08-27 00:05:44 +02:00
msr.c perf/x86/msr: Switch to new Intel CPU model defines 2024-04-29 10:31:04 +02:00
perf_event_flags.h perf/x86/intel: Support branch counters logging 2023-10-27 15:05:11 +02:00
perf_event.h perf/x86/intel: Support Perfmon MSRs aliasing 2024-07-04 16:00:40 +02:00
probe.c perf/x86/rapl: Add msr mask support 2021-02-10 14:44:54 +01:00
probe.h perf/x86/rapl: Add msr mask support 2021-02-10 14:44:54 +01:00
rapl.c - Flip the logic to add feature names to /proc/cpuinfo to having to 2024-07-15 20:25:16 -07:00
utils.c perf/x86/lbr: Filter vsyscall addresses 2023-10-08 12:25:18 +02:00