In the parse_perf_domain function, if the call to
of_parse_phandle_with_args returns an error, then the reference to the
CPU device node that was acquired at the start of the function would not
be properly decremented.
Address this by declaring the variable with the __free(device_node)
cleanup attribute.
Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The current LATENCY_MULTIPLIER which has been around for nearly 20 years
causes rate_limit_us to be always in ms range.
On M1 mac mini I get 50 and 56us transition latency, but due to the 1000
multiplier we end up setting rate_limit_us to 50 and 56ms, which gets
capped into 2ms and was 10ms before e13aa799c2 ("cpufreq: Change
default transition delay to 2ms")
On Intel I5 system transition latency is 20us but due to the multiplier
we end up with 20ms that again is capped to 2ms.
Given how good modern hardware and how modern workloads require systems
to be more responsive to cater for sudden changes in workload (tasks
sleeping/wakeup/migrating, uclamp causing a sudden boost or cap) and
that 2ms is quarter of the time of 120Hz refresh rate system, drop the
old logic in favour of providing 50% headroom.
rate_limit_us = 1.5 * latency.
I considered not adding any headroom which could mean that we can end up
with infinite back-to-back requests.
I also considered providing a constant headroom (e.g: 100us) assuming
that any h/w or f/w dealing with the request shouldn't require a large
headroom when transition_latency is actually high.
But for both cases I wasn't sure if h/w or f/w can end up being
overwhelmed dealing with the freq requests in a potentially busy system.
So I opted for providing 50% breathing room.
This is expected to impact schedutil only as the other user,
dbs_governor, takes the max(2*tick, transition_delay_us) and the former
was at least 2ms on 1ms TICK, which is equivalent to the max_delay_us
before applying this patch. For systems with TICK of 4ms, this value
would have almost always ended up with 8ms sampling rate.
For systems that report 0 transition latency, we still default to
returning 1ms as transition delay.
This helps in eliminating a source of latency for applying requests as
mentioned in [1]. For example if we have a 1ms tick, most systems will
miss sending an update at tick when updating the util_avg for a task/CPU
(rate_limit_us will be 2ms for most systems).
Link: https://lore.kernel.org/lkml/20240724212255.mfr2ybiv2j2uqek7@airbuntu/ # [1]
Link: https://lore.kernel.org/lkml/20240205022500.2232124-1-qyousef@layalina.io/
Signed-off-by: Qais Yousef <qyousef@layalina.io>
Link: https://patch.msgid.link/20240728192659.58115-1-qyousef@layalina.io
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq core doesn't check the return type of the exit() callback
and there is not much the core can do on failures at that point. Just
drop the returned value and make it return void.
Signed-off-by: Lizhe <sensor1010@163.com>
[ Viresh: Reworked the patches to fix all missing changes together. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # Mediatek
Acked-by: Sudeep Holla <sudeep.holla@arm.com> # scpi, scmi, vexpress
Acked-by: Mario Limonciello <mario.limonciello@amd.com> # amd
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> # bmips
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com> # omap
Since this function is supposed to return boost_enabled which is anyway
a bool type make sure that it's return value is also marked as bool.
This helps maintain better consistency in data types being used.
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/20240627060117.1809477-1-d-gole@ti.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Provide to the scheduler a feedback about the temporary max available
capacity. Unlike arch_update_thermal_pressure(), this doesn't need to be
filtered as the pressure will happen for dozens of ms or more.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20240326091616.3696851-2-vincent.guittot@linaro.org
These are changes that for some reason ended up not making it into the
first four branches but that should still make it into 6.9:
- A rework of the omap clock support that touches both drivers and
device tree files
- The reset controller branch changes that had a dependency on late
bugfixes. Merging them here avoids a backmerge of 6.8-rc5 into the
drivers branch
- The RISC-V/starfive, RISC-V/microchip and ARM/Broadcom devicetree
changes that got delayed and needed some extra time in linux-next
for wider testing.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmX5vYcACgkQYKtH/8kJ
UiemkhAAu2lYNpttx+qVlEzQvPKyID5Y+E0cVRmM5e79/fOumNomSzFwtKztCbz2
PV1CHwmDYANKsI8tl91PAe8PzD+9Er+8xa6YYVSMG5bLC2aGdF4k5hzMnRmfhlDe
uRT/9iNH0w+S1p44+wXI9Y++uZhxJtCqa6kytxybl6YrG2/l3Wm0PVcMAD/MWT1l
OULRg5gv3+7qHLKE0ffd0J7I7zCvKA5cEqnieGSO8+k1jsOE3BvgLttfPUuUsi3x
8yWAJ2cEv293Cao8x8rw39TYIHQOznLMNzK/GCIemL4k9TafbGbuVPUGQZ6oX1SQ
+/biiUV8CMLzanw2Ds7piQ/4J8EoJjh7jCf9pETORlHLaCMQaYUk4I2KnBWmjxuO
QBy6Py68EkyT1zv7YFkpdxeABkwkrObMmVsjfyltd2lCF6oC+xbIw5IOVPgnUiTc
WANL3y+hS5zv+ABmpkRhDPe9KrcoO95sJgGaoMPatwD1/2JkdV7EkvbXWdnipb1w
REYk4xuRlJcAgyjc5nrQXR8FuPX63c08NFkOw+AInFV8ipyH+8nkesb0w54aegsR
Tihhl0WUxk/e9FLFVlPiYRNdyqOb2HKteRwRxsA1LqqcWdpYjplBrkZhHb3+ESnP
lQaQ7AtZRoIjwsImYen3M2W1cFS214BAqoonLLYSd0ponCB05Ng=
=IzoE
-----END PGP SIGNATURE-----
Merge tag 'soc-late-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull more ARM SoC updates from Arnd Bergmann:
"These are changes that for some reason ended up not making it into the
first four branches but that should still make it into 6.9:
- A rework of the omap clock support that touches both drivers and
device tree files
- The reset controller branch changes that had a dependency on late
bugfixes. Merging them here avoids a backmerge of 6.8-rc5 into the
drivers branch
- The RISC-V/starfive, RISC-V/microchip and ARM/Broadcom devicetree
changes that got delayed and needed some extra time in linux-next
for wider testing"
* tag 'soc-late-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (31 commits)
soc: fsl: dpio: fix kcalloc() argument order
bus: ts-nbus: Improve error reporting
bus: ts-nbus: Convert to atomic pwm API
riscv: dts: starfive: jh7110: Add camera subsystem nodes
ARM: bcm: stop selecing CONFIG_TICK_ONESHOT
ARM: dts: omap3: Update clksel clocks to use reg instead of ti,bit-shift
ARM: dts: am3: Update clksel clocks to use reg instead of ti,bit-shift
clk: ti: Improve clksel clock bit parsing for reg property
clk: ti: Handle possible address in the node name
dt-bindings: pwm: opencores: Add compatible for StarFive JH8100
dt-bindings: riscv: cpus: reg matches hart ID
reset: Instantiate reset GPIO controller for shared reset-gpios
reset: gpio: Add GPIO-based reset controller
cpufreq: do not open-code of_phandle_args_equal()
of: Add of_phandle_args_equal() helper
reset: simple: add support for Sophgo SG2042
dt-bindings: reset: sophgo: support SG2042
riscv: dts: microchip: add specific compatible for mpfs pdma
riscv: dts: microchip: add missing CAN bus clocks
ARM: brcmstb: Add debug UART entry for 74165
...
- Fix couple of warnings related to W=1 builds. (Viresh Kumar).
- Move Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar).
- Extend dev_pm_opp_data with turbo support (Sibi Sankar).
- dt-bindings: drop maxItems from inner items (David Heidelberg).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmXulEoACgkQ0rkcPK6B
EhzN4g//Z0HsbM3jB7f43RosiTB7uS6M6Xp4ODTQLwZQvf9Jq9pl2cp8tgczIKFx
1FacJsjq/Fjs9wZeASIs75Rz//w7ngJqJ4sM+uYBHzN1B+V5KQK2RvlHSIpEdu63
puktEQHGAtnCpY+SdjnrI972Qic/cl5qP/ewLeq+WBVcKHNdSbfV1n1rVG9+6ylC
lrnJGJbt6h7yIqL24hwv4rUJDvvdusFGXtAHBOlrRvM8fJPY9OUiA9UME2e6mNlJ
PfJWgyWjnJ1+RAloUJ8YH/96hpl/A7fTegD7BuK7RhMp63b8Qf8oAgPDoI2DFx+0
VD/yDsKv1q+wcB5iPbSdap9zBuvvBJxRUZJ0JKLhXX4SiCtjhfFEkHbdfThal2qW
/XnUoIqkkQQ6MZIA/bKhJzMxsgyhnaQaTSBMScnAXXbVB5VwO9d8GEtknYPrJQxu
5LKXYWmbFkJD5B1pMdVeij3b9irB1KknS7gULphrxYaJ+pxIGBQ2FCORm9n8G5LC
aQ6TWaKz+s//wpd7nYHq1UyFhLC7FY2Q1BR7hflq4h5r9ZyeKbpouOYSJU1hwOri
AlmX/tLS6te9y+mFbmqXdMH/TDZ53IYoMhTDFR0N+zcuKCegoGAOJIb8mJMpH62y
/nL/IB4sXaTLrbPk5JpXnAtjK7wfCtCsyIYwFsdvZy2Xp0bKMCw=
=SvDq
-----END PGP SIGNATURE-----
Merge tag 'opp-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm
Merge OPP (operating performance points) updates for 6.9 from Viresh
Kumar:
"- Fix couple of warnings related to W=1 builds. (Viresh Kumar).
- Move Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar).
- Extend dev_pm_opp_data with turbo support (Sibi Sankar).
- dt-bindings: drop maxItems from inner items (David Heidelberg)."
* tag 'opp-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
dt-bindings: opp: drop maxItems from inner items
OPP: debugfs: Fix warning around icc_get_name()
OPP: debugfs: Fix warning with W=1 builds
cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h
OPP: Extend dev_pm_opp_data with turbo support
Move the declaration of functions defined in the OPP core to pm_opp.h.
These were added to cpufreq.h as it was the only user of the APIs, but
that was a mistake perhaps. Fix it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Resolving a frequency to an efficient one should not transgress
policy->max (which can be set for thermal reason) and policy->min.
Currently, there is possibility where scaling_cur_freq can exceed
scaling_max_freq when scaling_max_freq is an inefficient frequency.
Add a check to ensure that resolving a frequency will respect
policy->min/max.
Cc: All applicable <stable@vger.kernel.org>
Fixes: 1f39fa0dcc ("cpufreq: Introducing CPUFREQ_RELATION_E")
Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com>
[ rjw: Whitespace adjustment, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A minimum sampling rate value of 10ms was introduced in:
commit cef9615a85 ("[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case")
The use of this value was removed in:
commit ed4676e254 ("cpufreq: Replace "max_transition_latency" with "dynamic_switching"")
Remove:
- a comment referencing this value
- an unused macro associated to this value
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use newly added of_phandle_args_equal() helper to compare two
of_phandle_args.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240129115216.96479-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Platform firmware sends notify 0x85 to inform the OS that the highest
performance of a CPU has changed.
This will be used by the AMD P-state driver to update the ranking of
preferred cores and set the priority of cores accordingly.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Meng Li <li.meng@amd.com>
Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-device-notification-values
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different from the frequency that has been
used to compute the capacity of a CPU.
The new arch_scale_freq_ref() returns a fixed and coherent frequency
that can be used to compute the capacity for a given frequency.
[ Also fix a arch_set_freq_scale() newline style wart in <linux/cpufreq.h>. ]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20231211104855.558096-3-vincent.guittot@linaro.org
The Energy Aware Scheduler (EAS) relies on the schedutil governor.
When moving to/from the schedutil governor, sched domains must be
rebuilt to allow re-evaluating the enablement conditions of EAS.
This is done through sched_cpufreq_governor_change().
Having a cpufreq governor assumes a cpufreq driver is running.
Inserting/removing a cpufreq driver should trigger a re-evaluation
of EAS enablement conditions, avoiding to see EAS enabled when
removing a running cpufreq driver.
Rebuild the sched domains in schedutil's sugov_init()/sugov_exit(),
allowing to check EAS's enablement condition whenever schedutil
governor is initialized/exited from.
Move relevant code up in schedutil.c to avoid a split and conditional
function declaration.
Rename sched_cpufreq_governor_change() to sugov_eas_rebuild_sd().
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The boost control currently applies to the whole system. However, users
may prefer to boost a subset of cores in order to provide prioritized
performance to workloads running on the boosted cores.
Enable per-policy boost by adding a 'boost' sysfs interface under each
policy path. This can be found at:
/sys/devices/system/cpu/cpufreq/policy<*>/boost
Same to the global boost switch, writing 1/0 to the per-policy 'boost'
enables/disables boost on a cpufreq policy respectively.
The user view of global and per-policy boost controls should be:
1. Enabling global boost initially enables boost on all policies, and
per-policy boost can then be enabled or disabled individually, given that
the platform does support so.
2. Disabling global boost makes the per-policy boost interface illegal.
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Wei Xu <xuwei5@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Migrate various platforms to use remove callback returning void
(Yangtao Li).
- Add online/offline/exit hooks for Tegra driver (Sumit Gupta).
- Explicitly include correct DT includes (Rob Herring).
- Frequency domain updates for qcom-hw driver (Neil Armstrong).
- Modify AMD pstate driver return the highest_perf value (Meng Li).
- Generic cleanups for cppc, mediatek and powernow driver (Liao Chang
and Konrad Dybcio).
- Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino
Del Regno and Konrad Dybcio).
- brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmTsdAMACgkQ0rkcPK6B
Ehw97RAAveiaWFM9cxEb8rvXKCWkrnBf6ijI7fG+3DJalUiJ0dCDJfxygFSnhypb
cU50NEhUj57OIzGVQZHVJI0usaQwfhohEsJ+KCdj6CcjsKKrbLY7mgTu/NMfZb/+
5bHF4o2WB8zp0PezpRKaRSb3/gRoshMSaE91uceHRYdIhBScSqcOq4CjxFQv+Nxv
yv9e6lKwLZsfU42k1Ja+ZGjyWLaFA6ZwBEp6NulcIF3AK/ki1dpthGtwkd8XE//S
yxCBPRgbCcW+0X/sjJ5CfMs1zw+3Syh35nXDfh3C4aykPT9nxEnsC+tx3Dgne7E/
c849zgOMNs3NNhOQzR8xb+rxSNpcV9uXrVbgLgQkc2EIzng4ha8gm6Nq8V8f792f
qldMgmX7YIz+5PuH1ClCRmRvnCbfA0gp7SO/gTCXE3I3IBKl8MUsq2khCwayy4wM
0JkqVE3DUF729JO5eF1cZh/jRrb+cvkW2IlWUYo6YBHPdcru3X8ceRGQCUoDU208
02OrKI/1nwk9ZrC7mQXOsg+/gZIEZszzV4iU+6Kc9MQT8h1ZyBGp6MA89LdzqH2e
FJaPtbk1lEsPUaT9i6Tk+uUZO3D7n8hj83J+8PV9wptqqxe+7ZL5Z3gxw03L1sJE
SvajjgUBH13twe0cFt9Ho6XFB9rJ2Bg+jYPOrlkeHwpHtyTeDpY=
=o0wB
-----END PGP SIGNATURE-----
Merge tag 'cpufreq-arm-updates-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 6.6 from Viresh Kumar:
"- Migrate various platforms to use remove callback returning void
(Yangtao Li).
- Add online/offline/exit hooks for Tegra driver (Sumit Gupta).
- Explicitly include correct DT includes (Rob Herring).
- Frequency domain updates for qcom-hw driver (Neil Armstrong).
- Modify AMD pstate driver return the highest_perf value (Meng Li).
- Generic cleanups for cppc, mediatek and powernow driver (Liao Chang
and Konrad Dybcio).
- Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino
Del Regno and Konrad Dybcio).
- brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva)."
* tag 'cpufreq-arm-updates-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (33 commits)
cpufreq: tegra194: remove opp table in exit hook
cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
cpufreq: tegra194: add online/offline hooks
cpufreq: qcom-cpufreq-hw: add support for 4 freq domains
dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain
cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie
cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.
cpufreq: Prefer to print cpuid in MIN/MAX QoS register error message
cpufreq: amd-pstate-ut: Modify the function to get the highest_perf value
cpufreq: mediatek-hw: Remove unused define
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: brcmstb-avs-cpufreq: Fix -Warray-bounds bug
cpufreq: blocklist MSM8998 in cpufreq-dt-platdev
cpufreq: omap: Convert to platform remove callback returning void
cpufreq: qoriq: Convert to platform remove callback returning void
cpufreq: acpi: Convert to platform remove callback returning void
cpufreq: tegra186: Convert to platform remove callback returning void
cpufreq: qcom-nvmem: Convert to platform remove callback returning void
cpufreq: kirkwood: Convert to platform remove callback returning void
cpufreq: pcc-cpufreq: Convert to platform remove callback returning void
...
The valid values of policy.{min, max} should be between 'min' and 'max',
so use clamp() helper macro to makes cpufreq_verify_within_limits() easier
to follow.
Signed-off-by: Liao Chang <liaochang1@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq framework used to use the zero of return value to reflect
the cppc_cpufreq_get_rate() had failed to get current frequecy and treat
all positive integer to be succeed. Since cppc_get_perf_ctrs() returns a
negative integer in error case, so it is better to convert the value to
zero as the return value of cppc_cpufreq_get_rate().
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If fast_switch_possible flag is set by the scaling driver, the governor
is free to select fast_switch function even if adjust_perf is set. Some
scaling drivers which use adjust_perf don't set fast_switch thinking
that the governor would never fall back to fast_switch. But the governor
can fall back to fast_switch even in runtime if frequency invariance is
disabled due to some reason. This could crash the kernel if the driver
didn't set the fast_switch function pointer.
Therefore, fail driver registration if it has adjust_perf without
fast_switch.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- First part of DT header detangling dropping cpu.h from of_device.h
and replacing some includes with forward declarations. A handful of
drivers needed some adjustment to their includes as a result.
- Refactor of_device.h to be used by bus drivers rather than various
device drivers. This moves non-bus related functions out of
of_device.h. The end goal is for of_platform.h and of_device.h to stop
including each other.
- Refactor open coded parsing of "ranges" in some bus drivers to use DT
address parsing functions
- Add some new address parsing functions of_property_read_reg(),
of_range_count(), and of_range_to_resource() in preparation to convert
more open coded parsing of DT addresses to use them.
- Treewide clean-ups to use of_property_read_bool() and
of_property_present() as appropriate. The ones here are the ones
that didn't get picked up elsewhere.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmRIOrkACgkQ+vtdtY28
YcN9WA//R+QrmSPExhfgio5y+aOJDWucqnAcyAusPctLcF7h7j0CdzpwaSRkdaH4
KiLjeyt6tKn8wt8w7m/+SmCsSYXPn81GH/Y5I2F40x6QMrY3cVOXUsulKQA+6ZjZ
PmW3bMcz0Dw9IhUK3R/WX96+9UdoytKg5qoTzNzPTKpvKA1yHa/ogl2FnHJS5W+8
Rxz+1oJ70VMIWGpBOc0acHuB2S0RHZ46kPKkPTBgFYEwtmJ8qobvV3r3uQapNaIP
2jnamPu0tAaQoSaJKKSulToziT+sd1sNB+9oyu/kP+t3PXzq4qwp2Gr4jzUYKs4A
ZF3DPhMR3YLLN41g/L3rtB0T/YIS287sZRuaLhCqldNpRerSDk4b0HRAksGk1XrI
HqYXjWPbRxqYiIUWkInfregSTYJfGPxeLfLKrawNO34/eEV4JrkSKy8d0AJn04EK
jTRqI3L7o23ZPxs29uH/3+KK90J3emPZkF7GWVJTEAMsM8jYZduGh7EpsttJLaz/
QnxbTBm9295ahIdCfo/OQhqjWnaNhpbTzf31pyrBZ/itXV7gQ0xjwqPwiyFwI+o/
F/r81xqdwQ3Ni8MKt2c7zLyVA95JHPe95KQ3GrDXR68aByJr4RuhKG8Y2Pj1VOb3
V+Hsu5uhwKrK7Yqe+rHDnJBO00OCO8nwbWhMy2xVxoTkSFCjDmo=
=89Zj
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull more devicetree updates from Rob Herring:
- First part of DT header detangling dropping cpu.h from of_device.h
and replacing some includes with forward declarations. A handful of
drivers needed some adjustment to their includes as a result.
- Refactor of_device.h to be used by bus drivers rather than various
device drivers. This moves non-bus related functions out of
of_device.h. The end goal is for of_platform.h and of_device.h to
stop including each other.
- Refactor open coded parsing of "ranges" in some bus drivers to use DT
address parsing functions
- Add some new address parsing functions of_property_read_reg(),
of_range_count(), and of_range_to_resource() in preparation to
convert more open coded parsing of DT addresses to use them.
- Treewide clean-ups to use of_property_read_bool() and
of_property_present() as appropriate. The ones here are the ones that
didn't get picked up elsewhere.
* tag 'devicetree-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (34 commits)
bus: tegra-gmi: Replace of_platform.h with explicit includes
hte: Use of_property_present() for testing DT property presence
w1: w1-gpio: Use of_property_read_bool() for boolean properties
virt: fsl: Use of_property_present() for testing DT property presence
soc: fsl: Use of_property_present() for testing DT property presence
sbus: display7seg: Use of_property_read_bool() for boolean properties
sparc: Use of_property_read_bool() for boolean properties
sparc: Use of_property_present() for testing DT property presence
bus: mvebu-mbus: Remove open coded "ranges" parsing
of/address: Add of_property_read_reg() helper
of/address: Add of_range_count() helper
of/address: Add support for 3 address cell bus
of/address: Add of_range_to_resource() helper
of: unittest: Add bus address range parsing tests
of: Drop cpu.h include from of_device.h
OPP: Adjust includes to remove of_device.h
irqchip: loongson-eiointc: Add explicit include for cpuhotplug.h
cpuidle: Adjust includes to remove of_device.h
cpufreq: sun50i: Add explicit include for cpu.h
cpufreq: Adjust includes to remove of_device.h
...
Now that of_cpu_device_node_get() is defined in of.h, of_device.h is just
implicitly including other includes, and is no longer needed. Adjust the
include files with what was implicitly included by of_device.h (cpu.h and
of.h) and drop including of_device.h.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230329-dt-cpu-header-cleanups-v1-14-581e2605fe47@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Since the cpufreq core directly uses freq_table, for cpufreq drivers
that set their target_index() callback, make it mandatory for them to
set the same.
Since this is set per policy and normally from policy->init(), do this
from cpufreq_table_validate_and_sort() which gets called right after
->init().
Reported-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
All but a few drivers ignore the return value of
cpufreq_unregister_driver(). Those few that don't only call it after
cpufreq_register_driver() succeeded, in which case the call doesn't
fail.
Make the function return no value and add a WARN_ON for the case that
the function is called in an invalid situation (i.e. without a previous
successful call to cpufreq_register_driver()).
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com> # brcmstb-avs-cpufreq.c
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
of_perf_domain_get_sharing_cpumask currently assumes a 1-argument
phandle format, and directly returns the argument. Generalize this to
return the full of_phandle_args, so it can be used by drivers which use
other phandle styles (e.g. separate nodes). This also requires changing
the CPU sharing match to compare the full args structure.
Also, make sure to of_node_put(args.np) (the original code was leaking a
reference).
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The frequency invariance infrastructure provides the APERF/MPERF samples
already. Utilize them for the cpu frequency display in /proc/cpuinfo.
The sample is considered valid for 20ms. So for idle or isolated NOHZ full
CPUs the function returns 0, which is matching the previous behaviour.
This gets rid of the mass IPIs and a delay of 20ms for stabilizing observed
by Eric when reading /proc/cpuinfo.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220415161206.875029458@linutronix.de
This effectively revert '4bf8e582119e ("cpufreq: Remove ready()
callback")', in order to reintroduce the ready callback.
This is needed in order to be able to leave the thermal pressure
interrupts in the Qualcomm CPUfreq driver disabled during
initialization, so that it doesn't fire while related_cpus are still 0.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[ Viresh: Added the Chinese translation as well and updated commit msg ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Merge Energy Model and power capping updates for 5.16-rc1:
- Add support for inefficient operating performance points to the
Energy Model and modify cpufreq to use them properly (Vincent
Donnefort).
- Rearrange the DTPM framework code to simplify it and make it easier
to follow (Daniel Lezcano).
- Fix power intialization in DTPM (Daniel Lezcano).
- Add CPU load consideration when estimating the instaneous power
consumption in DTPM (Daniel Lezcano).
* pm-em:
cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
PM: EM: Mark inefficiencies in CPUFreq
cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
cpufreq: Introducing CPUFREQ_RELATION_E
cpufreq: Add an interface to mark inefficient frequencies
cpufreq: Make policy min/max hard requirements
PM: EM: Allow skipping inefficient states
PM: EM: Extend em_perf_domain with a flag field
PM: EM: Mark inefficient states
PM: EM: Fix inefficient states detection
* powercap:
powercap/drivers/dtpm: Fix power limit initialization
powercap/drivers/dtpm: Scale the power with the load
powercap/drivers/dtpm: Use container_of instead of a private data field
powercap/drivers/dtpm: Simplify the dtpm table
powercap/drivers/dtpm: Encapsulate even more the code
Pull ARM cpufreq updates for 5.16-rc1 from Viresh Kumar:
"- Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).
- Fix the parameter usage of the newly added perf-domain API (Hector
Yuan).
- Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
Guenter Roeck, and Arnd Bergmann)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: Fix parameter in parse_perf_domain()
cpufreq: tegra186/tegra194: Handle errors in BPMP response
cpufreq: remove useless INIT_LIST_HEAD()
cpufreq: s3c244x: add fallthrough comments for switch
cpufreq: vexpress: Drop unused variable
Let the governors schedutil, conservative and ondemand to work, if possible
on efficient frequencies only.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This newly introduced flag can be applied by a governor to a CPUFreq
relation, when looking for a frequency within the policy table. The
resolution would then only walk through efficient frequencies.
Even with the flag set, the policy max limit will still be honoured. If no
efficient frequencies can be found within the limits of the policy, an
inefficient one would be returned.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some SoCs such as the sd855 have OPPs within the same policy whose cost is
higher than others with a higher frequency. Those OPPs are inefficients
and it might be interesting for a governor to not use them.
cpufreq_table_set_inefficient() allows the caller to identify a specified
frequency as being inefficient. Inefficient frequencies are only supported
on sorted tables.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar:
"This adds a new cpufreq driver for Mediatek, which had been going
through reviews since last one year."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
Add of_perf_domain_get_sharing_cpumask function to group cpu
to specific performance domain.
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: create separate routine parse_perf_domain() and always set the
cpumask. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This isn't used anymore, get rid of it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Many cpufreq drivers register with the energy model for each policy and
do exactly the same thing. Follow the footsteps of thermal-cooling, to
get it done from the cpufreq core itself.
Provide a new callback, which will be called, if present, by the cpufreq
core at the right moment (more on that in the code's comment). Also
provide a generic implementation that uses dev_pm_opp_of_register_em().
This also allows us to register with the EM at a later point of time,
compared to ->init(), from where the EM core can access cpufreq policy
directly using cpufreq_cpu_get() type of helpers and perform other work,
like marking few frequencies inefficient, this will be done separately.
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Commit e3c0623608 ("cpufreq: add cpufreq_driver_resolve_freq()")
introduced this callback, back in 2016, for drivers that provide the
->target() callback.
The kernel hasn't seen a single user of it in the past 5 years and
it is not likely to be used any time soon.
Remove it for now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that all users of ->stop_cpu() have been migrated to using other
callbacks, drop it from the core.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Minor edits in the subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This flag is set by one of the drivers but it isn't used in the code
otherwise. Remove the unused flag and update the driver.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
During cpufreq driver's registration, if the ->init() callback for all
the CPUs fail then there is not much point in keeping the driver around
as it will only account for more of unnecessary noise, for example
cpufreq core will try to suspend/resume the driver which never got
registered properly.
The removal of such a driver is avoided if the driver carries the
CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no
one should ever need it now. A lot of drivers do set this flag, probably
because they just copied it from other drivers.
This was added earlier for some platforms [2] because their cpufreq
drivers were getting registered before the CPUs were registered with
subsys framework. And hence they used to fail.
The same isn't true anymore though. The current code flow in the kernel
is:
start_kernel()
-> kernel_init()
-> kernel_init_freeable()
-> do_basic_setup()
-> driver_init()
-> cpu_dev_init()
-> subsys_system_register() //For CPUs
-> do_initcalls()
-> cpufreq_register_driver()
Clearly, the CPUs will always get registered with subsys framework
before any cpufreq driver can get probed. Remove the flag and update the
relevant drivers.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
First off, some cpufreq drivers (eg. intel_pstate) can pass hints
beyond the current target frequency to the hardware and there are no
provisions for doing that in the cpufreq framework. In particular,
today the driver has to assume that it should not allow the frequency
to fall below the one requested by the governor (or the required
capacity may not be provided) which may not be the case and which may
lead to excessive energy usage in some scenarios.
Second, the hints passed by these drivers to the hardware need not be
in terms of the frequency, so representing the utilization numbers
coming from the scheduler as frequency before passing them to those
drivers is not really useful.
Address the two points above by adding a special-purpose replacement
for the ->fast_switch callback, called ->adjust_perf, allowing the
governor to pass abstract performance level (rather than frequency)
values for the minimum (required) and target (desired) performance
along with the CPU capacity to compare them to.
Also update the schedutil governor to use the new callback instead
of ->fast_switch if present and if the utilization mertics are
frequency-invariant (that is requisite for the direct mapping
between the utilization and the CPU performance levels to be a
reasonable approximation).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is
set for the current governor to struct cpufreq_policy, so that the
drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to
access the governor object during every frequency transition.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Introduce a new governor flag, CPUFREQ_GOV_STRICT_TARGET, for the
governors that want the target frequency to be set exactly to the
given value without leaving any room for adjustments on the hardware
side and set this flag for the powersave and performance governors.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
A new cpufreq governor flag will be added subsequently, so replace
the bool dynamic_switching fleid in struct cpufreq_governor with a
flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for
the "dynamic switching" governors instead of it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
The restore_freq field in struct cpufreq_policy is only used by
__target_index() in one place and a local variable in that function
may as well be used instead of it, so drop it and modify
__target_index() accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>