mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-04 12:16:41 +00:00
383 lines
20 KiB
ReStructuredText
383 lines
20 KiB
ReStructuredText
|
.. SPDX-License-Identifier: GPL-2.0
|
||
|
.. include:: <isonum.txt>
|
||
|
|
||
|
===============================================
|
||
|
``amd-pstate`` CPU Performance Scaling Driver
|
||
|
===============================================
|
||
|
|
||
|
:Copyright: |copy| 2021 Advanced Micro Devices, Inc.
|
||
|
|
||
|
:Author: Huang Rui <ray.huang@amd.com>
|
||
|
|
||
|
|
||
|
Introduction
|
||
|
===================
|
||
|
|
||
|
``amd-pstate`` is the AMD CPU performance scaling driver that introduces a
|
||
|
new CPU frequency control mechanism on modern AMD APU and CPU series in
|
||
|
Linux kernel. The new mechanism is based on Collaborative Processor
|
||
|
Performance Control (CPPC) which provides finer grain frequency management
|
||
|
than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
|
||
|
the ACPI P-states driver to manage CPU frequency and clocks with switching
|
||
|
only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a
|
||
|
flexible, low-latency interface for the Linux kernel to directly
|
||
|
communicate the performance hints to hardware.
|
||
|
|
||
|
``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``,
|
||
|
``ondemand``, etc. to manage the performance hints which are provided by
|
||
|
CPPC hardware functionality that internally follows the hardware
|
||
|
specification (for details refer to AMD64 Architecture Programmer's Manual
|
||
|
Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic
|
||
|
frequency control function according to kernel governors on some of the
|
||
|
Zen2 and Zen3 processors, and we will implement more AMD specific functions
|
||
|
in future after we verify them on the hardware and SBIOS.
|
||
|
|
||
|
|
||
|
AMD CPPC Overview
|
||
|
=======================
|
||
|
|
||
|
Collaborative Processor Performance Control (CPPC) interface enumerates a
|
||
|
continuous, abstract, and unit-less performance value in a scale that is
|
||
|
not tied to a specific performance state / frequency. This is an ACPI
|
||
|
standard [2]_ which software can specify application performance goals and
|
||
|
hints as a relative target to the infrastructure limits. AMD processors
|
||
|
provides the low latency register model (MSR) instead of AML code
|
||
|
interpreter for performance adjustments. ``amd-pstate`` will initialize a
|
||
|
``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks
|
||
|
to manage each performance update behavior. ::
|
||
|
|
||
|
Highest Perf ------>+-----------------------+ +-----------------------+
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | Max Perf ---->| |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
Nominal Perf ------>+-----------------------+ +-----------------------+
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | Desired Perf ---->| |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
Lowest non- | | | |
|
||
|
linear perf ------>+-----------------------+ +-----------------------+
|
||
|
| | | |
|
||
|
| | Lowest perf ---->| |
|
||
|
| | | |
|
||
|
Lowest perf ------>+-----------------------+ +-----------------------+
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
0 ------>+-----------------------+ +-----------------------+
|
||
|
|
||
|
AMD P-States Performance Scale
|
||
|
|
||
|
|
||
|
.. _perf_cap:
|
||
|
|
||
|
AMD CPPC Performance Capability
|
||
|
--------------------------------
|
||
|
|
||
|
Highest Performance (RO)
|
||
|
.........................
|
||
|
|
||
|
It is the absolute maximum performance an individual processor may reach,
|
||
|
assuming ideal conditions. This performance level may not be sustainable
|
||
|
for long durations and may only be achievable if other platform components
|
||
|
are in a specific state; for example, it may require other processors be in
|
||
|
an idle state. This would be equivalent to the highest frequencies
|
||
|
supported by the processor.
|
||
|
|
||
|
Nominal (Guaranteed) Performance (RO)
|
||
|
......................................
|
||
|
|
||
|
It is the maximum sustained performance level of the processor, assuming
|
||
|
ideal operating conditions. In absence of an external constraint (power,
|
||
|
thermal, etc.) this is the performance level the processor is expected to
|
||
|
be able to maintain continuously. All cores/processors are expected to be
|
||
|
able to sustain their nominal performance state simultaneously.
|
||
|
|
||
|
Lowest non-linear Performance (RO)
|
||
|
...................................
|
||
|
|
||
|
It is the lowest performance level at which nonlinear power savings are
|
||
|
achieved, for example, due to the combined effects of voltage and frequency
|
||
|
scaling. Above this threshold, lower performance levels should be generally
|
||
|
more energy efficient than higher performance levels. This register
|
||
|
effectively conveys the most efficient performance level to ``amd-pstate``.
|
||
|
|
||
|
Lowest Performance (RO)
|
||
|
........................
|
||
|
|
||
|
It is the absolute lowest performance level of the processor. Selecting a
|
||
|
performance level lower than the lowest nonlinear performance level may
|
||
|
cause an efficiency penalty but should reduce the instantaneous power
|
||
|
consumption of the processor.
|
||
|
|
||
|
AMD CPPC Performance Control
|
||
|
------------------------------
|
||
|
|
||
|
``amd-pstate`` passes performance goals through these registers. The
|
||
|
register drives the behavior of the desired performance target.
|
||
|
|
||
|
Minimum requested performance (RW)
|
||
|
...................................
|
||
|
|
||
|
``amd-pstate`` specifies the minimum allowed performance level.
|
||
|
|
||
|
Maximum requested performance (RW)
|
||
|
...................................
|
||
|
|
||
|
``amd-pstate`` specifies a limit the maximum performance that is expected
|
||
|
to be supplied by the hardware.
|
||
|
|
||
|
Desired performance target (RW)
|
||
|
...................................
|
||
|
|
||
|
``amd-pstate`` specifies a desired target in the CPPC performance scale as
|
||
|
a relative number. This can be expressed as percentage of nominal
|
||
|
performance (infrastructure max). Below the nominal sustained performance
|
||
|
level, desired performance expresses the average performance level of the
|
||
|
processor subject to hardware. Above the nominal performance level,
|
||
|
processor must provide at least nominal performance requested and go higher
|
||
|
if current operating conditions allow.
|
||
|
|
||
|
Energy Performance Preference (EPP) (RW)
|
||
|
.........................................
|
||
|
|
||
|
Provides a hint to the hardware if software wants to bias toward performance
|
||
|
(0x0) or energy efficiency (0xff).
|
||
|
|
||
|
|
||
|
Key Governors Support
|
||
|
=======================
|
||
|
|
||
|
``amd-pstate`` can be used with all the (generic) scaling governors listed
|
||
|
by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then,
|
||
|
it is responsible for the configuration of policy objects corresponding to
|
||
|
CPUs and provides the ``CPUFreq`` core (and the scaling governors attached
|
||
|
to the policy objects) with accurate information on the maximum and minimum
|
||
|
operating frequencies supported by the hardware. Users can check the
|
||
|
``scaling_cur_freq`` information comes from the ``CPUFreq`` core.
|
||
|
|
||
|
``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic
|
||
|
frequency control. It is to fine tune the processor configuration on
|
||
|
``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate``
|
||
|
registers adjust_perf callback to implement the CPPC similar performance
|
||
|
update behavior. It is initialized by ``sugov_start`` and then populate the
|
||
|
CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as
|
||
|
the utilization update callback function in CPU scheduler. CPU scheduler
|
||
|
will call ``cpufreq_update_util`` and assign the target performance
|
||
|
according to the ``struct sugov_cpu`` that utilization update belongs to.
|
||
|
Then ``amd-pstate`` updates the desired performance according to the CPU
|
||
|
scheduler assigned.
|
||
|
|
||
|
|
||
|
Processor Support
|
||
|
=======================
|
||
|
|
||
|
The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is
|
||
|
not existed at the detected processor, and it uses ``acpi_cpc_valid`` to
|
||
|
check the _CPC existence. All Zen based processors support legacy ACPI
|
||
|
hardware P-States function, so while the ``amd-pstate`` fails to be
|
||
|
initialized, the kernel will fall back to initialize ``acpi-cpufreq``
|
||
|
driver.
|
||
|
|
||
|
There are two types of hardware implementations for ``amd-pstate``: one is
|
||
|
`Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support
|
||
|
<perf_cap_>`_. It can use :c:macro:`X86_FEATURE_CPPC` feature flag (for
|
||
|
details refer to Processor Programming Reference (PPR) for AMD Family
|
||
|
19h Model 51h, Revision A1 Processors [3]_) to indicate the different
|
||
|
types. ``amd-pstate`` is to register different ``static_call`` instances
|
||
|
for different hardware implementations.
|
||
|
|
||
|
Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the
|
||
|
future, it will be supported on more and more AMD processors.
|
||
|
|
||
|
Full MSR Support
|
||
|
-----------------
|
||
|
|
||
|
Some new Zen3 processors such as Cezanne provide the MSR registers directly
|
||
|
while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set.
|
||
|
``amd-pstate`` can handle the MSR register to implement the fast switch
|
||
|
function in ``CPUFreq`` that can shrink latency of frequency control on the
|
||
|
interrupt context. The functions with ``pstate_xxx`` prefix represent the
|
||
|
operations of MSR registers.
|
||
|
|
||
|
Shared Memory Support
|
||
|
----------------------
|
||
|
|
||
|
If :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, that means the
|
||
|
processor supports shared memory solution. In this case, ``amd-pstate``
|
||
|
uses the ``cppc_acpi`` helper methods to implement the callback functions
|
||
|
that defined on ``static_call``. The functions with ``cppc_xxx`` prefix
|
||
|
represent the operations of acpi cppc helpers for shared memory solution.
|
||
|
|
||
|
|
||
|
AMD P-States and ACPI hardware P-States always can be supported in one
|
||
|
processor. But AMD P-States has the higher priority and if it is enabled
|
||
|
with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond
|
||
|
to the request from AMD P-States.
|
||
|
|
||
|
|
||
|
User Space Interface in ``sysfs``
|
||
|
==================================
|
||
|
|
||
|
``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
|
||
|
control its functionality at the system level. They located in the
|
||
|
``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. ::
|
||
|
|
||
|
root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd*
|
||
|
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf
|
||
|
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq
|
||
|
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq
|
||
|
|
||
|
|
||
|
``amd_pstate_highest_perf / amd_pstate_max_freq``
|
||
|
|
||
|
Maximum CPPC performance and CPU frequency that the driver is allowed to
|
||
|
set in percent of the maximum supported CPPC performance level (the highest
|
||
|
performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
|
||
|
In some of ASICs, the highest CPPC performance is not the one in the _CPC
|
||
|
table, so we need to expose it to sysfs. If boost is not active but
|
||
|
supported, this maximum frequency will be larger than the one in
|
||
|
``cpuinfo``.
|
||
|
This attribute is read-only.
|
||
|
|
||
|
``amd_pstate_lowest_nonlinear_freq``
|
||
|
|
||
|
The lowest non-linear CPPC CPU frequency that the driver is allowed to set
|
||
|
in percent of the maximum supported CPPC performance level (Please see the
|
||
|
lowest non-linear performance in `AMD CPPC Performance Capability
|
||
|
<perf_cap_>`_).
|
||
|
This attribute is read-only.
|
||
|
|
||
|
For other performance and frequency values, we can read them back from
|
||
|
``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`.
|
||
|
|
||
|
|
||
|
``amd-pstate`` vs ``acpi-cpufreq``
|
||
|
======================================
|
||
|
|
||
|
On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables
|
||
|
provided by the platform firmware used for CPU performance scaling, but
|
||
|
only provides 3 P-states on AMD processors.
|
||
|
However, on modern AMD APU and CPU series, it provides the collaborative
|
||
|
processor performance control according to ACPI protocol and customize this
|
||
|
for AMD platforms. That is fine-grain and continuous frequency range
|
||
|
instead of the legacy hardware P-states. ``amd-pstate`` is the kernel
|
||
|
module which supports the new AMD P-States mechanism on most of future AMD
|
||
|
platforms. The AMD P-States mechanism will be the more performance and energy
|
||
|
efficiency frequency management method on AMD processors.
|
||
|
|
||
|
Kernel Module Options for ``amd-pstate``
|
||
|
=========================================
|
||
|
|
||
|
``shared_mem``
|
||
|
Use a module param (shared_mem) to enable related processors manually with
|
||
|
**amd_pstate.shared_mem=1**.
|
||
|
Due to the performance issue on the processors with `Shared Memory Support
|
||
|
<perf_cap_>`_, so we disable it for the moment and will enable this by default
|
||
|
once we address performance issue on this solution.
|
||
|
|
||
|
The way to check whether current processor is `Full MSR Support <perf_cap_>`_
|
||
|
or `Shared Memory Support <perf_cap_>`_ : ::
|
||
|
|
||
|
ray@hr-test1:~$ lscpu | grep cppc
|
||
|
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
|
||
|
|
||
|
If CPU Flags have cppc, then this processor supports `Full MSR Support
|
||
|
<perf_cap_>`_. Otherwise it supports `Shared Memory Support <perf_cap_>`_.
|
||
|
|
||
|
|
||
|
``cpupower`` tool support for ``amd-pstate``
|
||
|
===============================================
|
||
|
|
||
|
``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency
|
||
|
information. And it is in progress to support more and more operations for new
|
||
|
``amd-pstate`` module with this tool. ::
|
||
|
|
||
|
root@hr-test1:/home/ray# cpupower frequency-info
|
||
|
analyzing CPU 0:
|
||
|
driver: amd-pstate
|
||
|
CPUs which run at the same hardware frequency: 0
|
||
|
CPUs which need to have their frequency coordinated by software: 0
|
||
|
maximum transition latency: 131 us
|
||
|
hardware limits: 400 MHz - 4.68 GHz
|
||
|
available cpufreq governors: ondemand conservative powersave userspace performance schedutil
|
||
|
current policy: frequency should be within 400 MHz and 4.68 GHz.
|
||
|
The governor "schedutil" may decide which speed to use
|
||
|
within this range.
|
||
|
current CPU frequency: Unable to call hardware
|
||
|
current CPU frequency: 4.02 GHz (asserted by call to kernel)
|
||
|
boost state support:
|
||
|
Supported: yes
|
||
|
Active: yes
|
||
|
AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz.
|
||
|
AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz.
|
||
|
AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz.
|
||
|
AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz.
|
||
|
|
||
|
|
||
|
Diagnostics and Tuning
|
||
|
=======================
|
||
|
|
||
|
Trace Events
|
||
|
--------------
|
||
|
|
||
|
There are two static trace events that can be used for ``amd-pstate``
|
||
|
diagnostics. One of them is the cpu_frequency trace event generally used
|
||
|
by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event
|
||
|
specific to ``amd-pstate``. The following sequence of shell commands can
|
||
|
be used to enable them and see their output (if the kernel is generally
|
||
|
configured to support event tracing). ::
|
||
|
|
||
|
root@hr-test1:/home/ray# cd /sys/kernel/tracing/
|
||
|
root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable
|
||
|
root@hr-test1:/sys/kernel/tracing# cat trace
|
||
|
# tracer: nop
|
||
|
#
|
||
|
# entries-in-buffer/entries-written: 47827/42233061 #P:2
|
||
|
#
|
||
|
# _-----=> irqs-off
|
||
|
# / _----=> need-resched
|
||
|
# | / _---=> hardirq/softirq
|
||
|
# || / _--=> preempt-depth
|
||
|
# ||| / delay
|
||
|
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
||
|
# | | | |||| | |
|
||
|
<idle>-0 [015] dN... 4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true
|
||
|
<idle>-0 [007] d.h.. 4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
|
||
|
cat-2161 [000] d.... 4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true
|
||
|
sshd-2125 [004] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true
|
||
|
<idle>-0 [007] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
|
||
|
<idle>-0 [003] d.s.. 4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true
|
||
|
<idle>-0 [011] d.s.. 4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true
|
||
|
|
||
|
The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling
|
||
|
governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the
|
||
|
policies with other scaling governors).
|
||
|
|
||
|
|
||
|
Reference
|
||
|
===========
|
||
|
|
||
|
.. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming,
|
||
|
https://www.amd.com/system/files/TechDocs/24593.pdf
|
||
|
|
||
|
.. [2] Advanced Configuration and Power Interface Specification,
|
||
|
https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf
|
||
|
|
||
|
.. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors
|
||
|
https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip
|