There are only two callers of thermal_zone_device_is_enabled()
and one of them call is under the zone lock and the other one uses
lockdep_assert_held() on that lock. Thus the lockdep_assert_held()
in thermal_zone_device_is_enabled() is redundant and it could be
dropped, but then the function would merely become a wrapper around
a simple tz->mode check that is more convenient to do directly.
Accordingly, drop thermal_zone_device_is_enabled() altogether and update
its callers to check tz->mode directly as appropriate.
While at it, combine the tz->mode and tz->suspended checks in
__thermal_zone_device_update() because they are of a similar category
and if any of them evaluates to "true", the outcome is the same.
No intentinal functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/9353673.CDJkKcVGEf@rjwysocki.net
The only case in which thermal_zone_device_set_polling() is called
with its second argument equal to zero is when passive cooling is
under way and passive_delay_jiffies is 0, which only happens when
the given thermal zone is not polled at all.
If monitor_thermal_zone() is modified to check passive_delay_jiffies
directly, the check of the thermal_zone_device_set_polling() second
argument against 0 can be dropped and a passive_delay check can be
dropped from thermal_zone_device_register_with_trips(), so change the
code accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2004353.PYKUYFuaPT@rjwysocki.net
Since monitor_thermal_zone() is only called when the given thermal zone
has been enabled, as per the thermal_zone_device_is_enabled() check in
__thermal_zone_device_update(), the tz->mode check in it always
evaluates to "false" and the thermal_zone_device_set_polling()
invocation depending on it is dead code, so drop it.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/10547425.nUPlyArG6x@rjwysocki.net
Along the lines of commit 24aad192c671 ("thermal: core: Drop
redundant checks from thermal_bind_cdev_to_trip()") notice that
thermal_unbind_cdev_from_trip() is only called by
thermal_zone_cdev_unbind() under the thermal zone lock, so it
need not use lockdep_assert_held() for that lock.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3341369.44csPzL39Z@rjwysocki.net
Since thermal_bind_cdev_to_trip() is only called by
thermal_zone_cdev_binding() under the thermal zone lock and the latter
is only called by thermal_zone_device_register_with_trips() and
__thermal_cooling_device_register(), under thermal_list_lock in both
cases, both lockdep_assert_held() assertions can be dropped from it.
Moreover, in both cases thermal_zone_cdev_binding() is called after
both tz and cdev have been added to the global lists of thermal zones
and cooling device, respectively, so the check against their list nodes
in thermal_bind_cdev_to_trip() is redundant and can be dropped either.
Link: https://lore.kernel.org/linux-pm/CAJZ5v0jwkc2PB+osSkkYF9vJ1Vpp3MFE=cGQmQ2Xzjb3yjVfJg@mail.gmail.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/4607540.LvFx2qVVIh@rjwysocki.net
Using round_jiffies() in thermal_set_delay_jiffies() is invalid because
its argument should be time in the future in absolute jiffies and it
computes the result with respect to the current jiffies value at the
invocation time. Fortunately, in the majority of cases it does not
make any difference due to the time_is_after_jiffies() check in
round_jiffies_common().
While using round_jiffies_relative() instead of round_jiffies() might
reflect the intent a bit better, it still would not be defensible
because that function should be called when the timer is about to be
set and it is not suitable for pre-computation of delay values.
Accordingly, drop thermal_set_delay_jiffies() altogether, simply
convert polling_delay and passive_delay to jiffies during thermal
zone initialization and make thermal_zone_device_set_polling() call
round_jiffies_relative() on the delay if it is greather than 1 second.
Fixes: 17d399cd9c89 ("thermal/core: Precompute the delays from msecs to jiffies")
Fixes: e5f2cda61d06 ("thermal/core: Move thermal_set_delay_jiffies to static")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/1994438.PYKUYFuaPT@rjwysocki.net
Make thermal_bind_cdev_to_trip() take a struct cooling_spec pointer
to reduce the number of its arguments, change the return type of
thermal_unbind_cdev_from_trip() to void and rearrange the code in
thermal_zone_cdev_binding() to reduce the indentation level.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/1831773.TLkxdtWsSY@rjwysocki.net
[ rjw: Changed the name of a new function argument ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are no more callers of thermal_zone_bind_cooling_device() and
thermal_zone_unbind_cooling_device(), so drop them along with all of
the corresponding headers, code and documentation.
Moreover, because the .bind() and .unbind() thermal zone callbacks would
only be used when the above functions, respectively, were called, drop
them as well along with all of the code related to them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/4251116.1IzOArtZ34@rjwysocki.net
Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
are only called locally in the thermal core now, they can be static,
so change their definitions accordingly and drop their headers from
the global thermal header file.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/3512161.QJadu78ljV@rjwysocki.net
The current design of the code binding cooling devices to trip points in
thermal zones is convoluted and hard to follow.
Namely, a driver that registers a thermal zone can provide .bind()
and .unbind() operations for it, which are required to call either
thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip(),
respectively, or thermal_zone_bind_cooling_device() and
thermal_zone_unbind_cooling_device(), respectively, for every relevant
trip point and the given cooling device. Moreover, if .bind() is
provided and .unbind() is not, the cleanup necessary during the removal
of a thermal zone or a cooling device may not be carried out.
In other words, the core relies on the thermal zone owners to do the
right thing, which is error prone and far from obvious, even though all
of that is not really necessary. Specifically, if the core could ask
the thermal zone owner, through a special thermal zone callback, whether
or not a given cooling device should be bound to a given trip point in
the given thermal zone, it might as well carry out all of the binding
and unbinding by itself. In particular, the unbinding can be done
automatically without involving the thermal zone owner at all because
all of the thermal instances associated with a thermal zone or cooling
device going away must be deleted regardless.
Accordingly, introduce a new thermal zone operation, .should_bind(),
that can be invoked by the thermal core for a given thermal zone,
trip point and cooling device combination in order to check whether
or not the cooling device should be bound to the trip point at hand.
It takes an additional cooling_spec argument allowing the thermal
zone owner to specify the highest and lowest cooling states of the
cooling device and its weight for the given trip point binding.
Make the thermal core use this operation, if present, in the absence of
.bind() and .unbind(). Note that .should_bind() will be called under
the thermal zone lock.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/9334403.CDJkKcVGEf@rjwysocki.net
Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
acquire the thermal zone lock, the locking rules for their callers get
complicated. In particular, the thermal zone lock cannot be acquired
in any code path leading to one of these functions even though it might
be useful to do so.
To address this, remove the thermal zone locking from both these
functions, add lockdep assertions for the thermal zone lock to both
of them and make their callers acquire the thermal zone lock instead.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/3837835.kQq0lBPeGt@rjwysocki.net
Because the trip and cdev pointers are sufficient to identify a thermal
instance holding them unambiguously, drop the additional thermal zone
checks from two loops walking the list of thermal instances in a
thermal zone.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/10527734.nUPlyArG6x@rjwysocki.net
It is not necessary to look up the thermal zone and the cooling device
in the respective global lists to check whether or not they are
registered. It is sufficient to check whether or not their respective
list nodes are empty for this purpose.
Use the above observation to simplify thermal_bind_cdev_to_trip(). In
addition, eliminate an unnecessary ternary operator from it.
Moreover, add lockdep_assert_held() for thermal_list_lock to it because
that lock must be held by its callers when it is running.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/3324214.44csPzL39Z@rjwysocki.net
Fold bind_cdev() into __thermal_cooling_device_register() and bind_tz()
into thermal_zone_device_register_with_trips() to reduce code bloat and
make it somewhat easier to follow the code flow.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/2962184.e9J7NaK4W3@rjwysocki.net
In order to set the scene for the thresholds support which have to
manipulate the low and high temperature boundaries for the interrupt
support, we must pass the low and high values to the incoming
thresholds routine.
The variables are set from the thermal_zone_set_trips() where the
function loops the thermal trips to figure out the next and the
previous temperatures to set the interrupt to be triggered when they
are crossed.
These variables will be needed by the function in charge of handling
the thresholds in the incoming changes but they are local to the
aforementioned function thermal_zone_set_trips().
Move the low and high boundaries computation out of the function in
thermal_zone_device_update() so they are accessible from there.
The positive side effect is they are computed in the same loop as
handle_thermal_trip(), so we remove one loop.
Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240816081241.1925221-2-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.
For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.
However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.
To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
Commit a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone
temperature is invalid") introduced a polling mechanism by which the
thermal core attampts to get a valid temperature value for thermal zones
where the .get_temp() callback returns errors to start with (for
example, due to initialization ordering woes). However, this polling is
carried out periodically ad infinitum and every iteration of it causes
a message to be printed to the kernel log which means a lot of log noise
on systems where there are thermal zones that never get ready for some
reason. It is also not really useful to continuously poll thermal zones
that never respond.
To address this, modify the thermal core to increase the delay between
consecutive thermal zone temperature checks after every check that fails
until it reaches a certain maximum value. At that point, the thermal
zone in question will be disabled, but user space will be able to
reenable it if it believes that the failure is transient.
Also change the code to print messages regarding failed temperature
checks to the kernel log only twice, once when the thermal zone's
.get_temp() callback returns an error for the first time and once when
disabling the given thermal zone. In addition, a dev_crit() message
will be printed at that point if the given thermal zone contains a
critical trip point to notify the system operator about the situation.
Fixes: a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Link: https://lore.kernel.org/linux-acpi/CAGnHSE=RyPK++UG0-wAtVKgeJxe0uzFYgLxm+RUOKKoQquW=Ow@mail.gmail.com/
Reported-by: Tom Yan <tom.ty89@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2962033.e9J7NaK4W3@rjwysocki.net
Pull a wrapper around thermal zone .change_mode() callback out of
thermal_zone_device_set_mode() because it will be used elsewhere
subsequently.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2206793.irdbgypaU6@rjwysocki.net
The iwlwifi wireless driver registers a thermal zone that is only needed
when the network interface handled by it is up and it wants that thermal
zone to be effectively ignored by the core otherwise.
Before commit a8a261774466 ("thermal: core: Call monitor_thermal_zone()
if zone temperature is invalid") that could be achieved by returning
an error code from the thermal zone's .get_temp() callback because the
core did not really handle errors returned by it almost at all.
However, commit a8a261774466 made the core attempt to recover from the
situation in which the temperature of a thermal zone cannot be
determined due to errors returned by its .get_temp() and is always
invalid from the core's perspective.
That was done because there are thermal zones in which .get_temp()
returns errors to start with due to some difficulties related to the
initialization ordering, but then it will start to produce valid
temperature values at one point.
Unfortunately, the simple approach taken by commit a8a261774466,
which is to poll the thermal zone periodically until its .get_temp()
callback starts to return valid temperature values, is at odds with
the special thermal zone in iwlwifi in which .get_temp() may always
return an error because its network interface may always be down. If
that happens, every attempt to invoke the thermal zone's .get_temp()
callback resulting in an error causes the thermal core to print a
dev_warn() message to the kernel log which is super-noisy.
To address this problem, make the core handle the case in which
.get_temp() returns 0, but the temperature value returned by it
is not actually valid, in a special way. Namely, make the core
completely ignore the invalid temperature value coming from
.get_temp() in that case, which requires folding in
update_temperature() into its caller and a few related changes.
On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0
and put THERMAL_TEMP_INVALID into the temperature return memory
location instead of returning an error when the firmware is not
running or it is not of the right type.
Also, to clearly separate the handling of invalid temperature
values from the thermal zone initialization, introduce a special
THERMAL_TEMP_INIT value specifically for the latter purpose.
Fixes: a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Closes: https://lore.kernel.org/linux-pm/20240715044527.GA1544@sol.localdomain/
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4950004.31r3eYUQgx@rjwysocki.net
[ rjw: Rebased on top of the current mainline ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge updates related to the thermal core for 6.11-rc1:
- Redesign the .set_trip_temp() thermal zone callback to take a trip
pointer instead of a trip ID and update its users (Rafael Wysocki).
- Avoid using invalid combinations of polling_delay and passive_delay
thermal zone parameters (Rafael Wysocki).
- Update a cooling device registration function to take a const
argument (Krzysztof Kozlowski).
- Make the uniphier thermal driver use thermal_zone_for_each_trip() for
walking trip points (Rafael Wysocki).
* thermal-core:
thermal: core: Add sanity checks for polling_delay and passive_delay
thermal: trip: Fold __thermal_zone_get_trip() into its caller
thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback
thermal: imx: Drop critical trip check from imx_set_trip_temp()
thermal: trip: Add conversion macros for thermal trip priv field
thermal: helpers: Introduce thermal_trip_is_bound_to_cdev()
thermal: core: Change passive_delay and polling_delay data type
thermal: core: constify 'type' in devm_thermal_of_cooling_device_register()
thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip points
If polling_delay is nonzero and passive_delay is greater than
polling_delay, the thermal zone temperature will be updated less
often when tz->passive is nonzero, which is not as expected. Make
the thermal zone registration fail with -EINVAL in that case as
this is a clear thermal zone configuration mistake.
If polling_delay is nonzero and passive_delay is 0, which is regarded
as a valid thermal zone configuration, the thermal zone will use polling
except when tz->passive is nonzero. However, the expected behavior in
that case is to continue temperature polling with the same delay value
regardless of tz->passive, so set passive_delay to the polling_delay
value then.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/5802156.DvuYhMxLoT@rjwysocki.net
It is better to use unsigned int as the data type for the passive_delay
and polling_delay arguments of thermal_zone_device_register_with_trips()
because they are implicitly cast to unsigned int anyway in
thermal_set_delay_jiffies() and if they happen to be negative at that
point, the resulting behavior may not be as desired.
Update the thermal_zone_device_register_with_trips() definition
accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/5803791.DvuYhMxLoT@rjwysocki.net
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The order in which lists are sorted in __thermal_zone_device_update()
is reverse with respect to what it should be due to a mistake in
thermal_trip_notify_cmp().
Fix it and observe that it is not necessary to sort the lists in
different orders. They can both be sorted in ascending order if
way_down_list is walked in reverse order which allows the code to
be slightly more straightforward (and less prone to silly mistakes).
Fixes: 7454f2c42cce ("thermal: core: Sort trip point crossing notifications by temperature")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12481676.O9o76ZdvQC@rjwysocki.net
Commit 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip()
if zone temperature is invalid") caused __thermal_zone_device_update()
to return early if the current thermal zone temperature was invalid.
This was done to avoid running handle_thermal_trip() and governor
callbacks in that case which led to confusion. However, it went too
far because monitor_thermal_zone() still needs to be called even when
the zone temperature is invalid to ensure that it will be updated
eventually in case thermal polling is enabled and the driver has no
other means to notify the core of zone temperature changes (for example,
it does not register an interrupt handler or ACPI notifier).
Also if the .set_trips() zone callback is expected to set up monitoring
interrupts for a thermal zone, it has to be provided with valid
boundaries and that can only happen if the zone temperature is known.
Accordingly, to ensure that __thermal_zone_device_update() will
run again after a failing zone temperature check, make it call
monitor_thermal_zone() regardless of whether or not the zone
temperature is valid and make the latter schedule a thermal zone
temperature update if the zone temperature is invalid even if
polling is not enabled for the thermal zone.
Fixes: 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid")
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2764814.mvXUDI8C0e@rjwysocki.net
[ rjw: Changed THERMAL_RECHECK_DELAY_MS to 250 ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The 'type' string passed to thermal_of_cooling_device_register() is a
'const char *', so do the same in the devm interface.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240703083141.96013-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge thermal core changes for v6.11:
- Fix and clean up several minor shortcomings in thermal debug (Rafael
Wysocki).
- Rename __thermal_zone_set_trips() to thermal_zone_set_trips() and
make it use trip thresholds (Rafael Wysocki).
- Use READ_ONCE() for lockless access to trip temperature and
hysteresis (Rafael Wysocki).
- Drop unnecessary cooling device target state checks from the
Bang-Bang thermal governor (Rafael Wysocki).
- Avoid invoking thermal governor .trip_crossed() callback for critical
and hot trip points (Rafael Wysocki).
* thermal-core:
thermal: core: Avoid calling .trip_crossed() for critical and hot trips
thermal: gov_bang_bang: Drop unnecessary cooling device target state checks
thermal: trip: Use READ_ONCE() for lockless access to trip properties
thermal: trip: Make thermal_zone_set_trips() use trip thresholds
thermal: trip: Rename __thermal_zone_set_trips() to thermal_zone_set_trips()
thermal: trip: Use common set of trip type names
thermal/debugfs: Move some statements from under thermal_dbg->lock
thermal/debugfs: Compute maximum temperature for mitigation episode as a whole
thermal/debugfs: Adjust check for trips without statistics in tze_seq_show()
thermal/debugfs: Fix up units in "mitigations" files
thermal/debugfs: Print mitigation timestamp value in milliseconds
thermal/debugfs: Do not extend mitigation episodes beyond system resume
thermal/debugfs: Use helper to update trip point overstepping duration
- Remove the filtered mode for mt8188 from lvts_thermal as it is not
supported on this platform and fail the lvts_thermal initialization
when the golden temperature is zero as that means the efuse data is
not correctly set (Julien Panis).
- Update the processor_thermal part of the Intel int340x driver to
support shared interrupts as the processor thermal device interrupt
may in fact be shared with PCI devices (Srinivas Pandruvada).
- Synchronize the suspend-prepare and post-suspend actions of the
thermal PM notifier to avoid a destructive race condition and
change the priority of that notifier to the minimum to avoid
interference between the work items spawned by it and the other
PM notifiers during system resume (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZ1clgSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxftEQAKE9MJHvo1zTgGq3jU198pYj6/oIf4F8
GR8wTSIhmSO+YgUINbjcGaUoCbwROeaLzLDyAXsrG0vS1t1/c5TGGujvQnMCnrdz
rSwGoXtJeRK2yYNUWz/Mt1e/ai04sn/i9/9gZJOagRQaas45SdUIxLqpQs17R5cP
R/8tFAyjMVyQ1MvI4mP3zK1yvE4fBeC18ZGKXNM57tzGBr0dWcSLrVJH/vS1/3Zx
947LCzngfw8rPx1v+1LSIaCja62StUHBfmkTHnXDaegCFaWCuyUGgHV7yV3RW5sc
+jQfMo6WH+O10TWvP28tc9dwamB0kfGr7oZXDH0ucpTwg621tThQZL/urxcAX1BS
i5HQlATfUD/qWq1c5KUEZMRVje/rU8+SnSbf/xZeQKF4albOoZUWAk6VWqyJiwrr
VTnMsVVCEZ62Vawyk7tchdMu233GMoMndRPu/Xq+ourKf8cDzzXdzn678cofahpn
5u+droa/n9O7gkku8Mz4zTmgVyXyJtLi1tMGHYT/M7XM/OrKlwvqmpAnUSuAfaQ8
C8ywxyrk1GJQoy8LLStaSvUFcAtrdmBpv5hSFPDXkbmy/xHaKw7+vccjL9CybmXo
P4u2psBQ4opeg0x5I+dS5Tp0WcFXcdo5UGSaw2dRT+7WDxA9ALk3hs/s+vHzv1zO
Fe7iya1SwbZB
=qXau
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix the Mediatek lvts_thermal driver, the Intel int340x driver,
and the thermal core (two issues related to system suspend).
Specifics:
- Remove the filtered mode for mt8188 from lvts_thermal as it is not
supported on this platform and fail the lvts_thermal initialization
when the golden temperature is zero as that means the efuse data is
not correctly set (Julien Panis)
- Update the processor_thermal part of the Intel int340x driver to
support shared interrupts as the processor thermal device interrupt
may in fact be shared with PCI devices (Srinivas Pandruvada)
- Synchronize the suspend-prepare and post-suspend actions of the
thermal PM notifier to avoid a destructive race condition and
change the priority of that notifier to the minimum to avoid
interference between the work items spawned by it and the other
PM notifiers during system resume (Rafael Wysocki)"
* tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: processor_thermal: Support shared interrupts
thermal: core: Change PM notifier priority to the minimum
thermal: core: Synchronize suspend-prepare and post-suspend actions
thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data
thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
It is reported that commit 5a5efdaffda5 ("thermal: core: Resume thermal
zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2
to become invalid after a resume from S3 (and it is necessary to reboot
the machine to restore correct battery data). Some investigation into
the problem indicated that it happened because, after the commit in
question, the ACPI battery PM notifier ran in parallel with
thermal_zone_device_resume() for one of the thermal zones which
apparently confused the platform firmware on the affected system.
While the exact reason for the firmware confusion remains unclear, it
is arguably not particularly relevant, and the expected behavior of the
affected system can be restored by making the thermal PM notifier run
at the lowest priority which avoids interference between work items
spawned by it and the other PM notifiers (that will run before those
work items now).
Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881
Reported-by: fhortner@yahoo.de
Tested-by: fhortner@yahoo.de
Cc: 6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit 5a5efdaffda5 ("thermal: core: Resume thermal zones
asynchronously") it is theoretically possible that, if a system suspend
starts immediately after a system resume, thermal_zone_device_resume()
spawned by the thermal PM notifier for one of the thermal zones at the
end of the system resume will run after the PM thermal notifier for the
suspend-prepare action. If that happens, tz->suspended set by the latter
will be reset by the former which may lead to unexpected consequences.
To avoid that race, synchronize thermal_zone_device_resume() with the
suspend-prepare thermal PM notifier with the help of additional bool
field and completion in struct thermal_zone_device.
Note that this also ensures running __thermal_zone_device_update() at
least once for each thermal zone between system resume and the following
system suspend in case it is needed to start thermal mitigation.
Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Invoking the governor .trip_crossed() callback for critical and hot
trips is pointless because they are handled directly by the core,
so make thermal_governor_trip_crossed() avoid doing that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Modify thermal_zone_set_trips() to use trip thresholds instead of
computing the low temperature for each trip to avoid deriving both
the high and low temperature levels from the same trip (which may
happen if the zone temperature falls into the hysteresis range of
one trip).
Accordingly, make __thermal_zone_device_update() call
thermal_zone_set_trips() later, when threshold values have been
updated for all trips.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Drop the pointless double underline prefix from the function name as per
the subject.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Because thermal zone handling by the thermal core is started from
scratch during resume from system-wide suspend, prevent the debug
code from extending mitigation episodes beyond that point by ending
the mitigation episode currently in progress, if any, for each thermal
zone.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling
device state to thermal_debug_cdev_add()") causes the ACPI fan driver
to fail probing on some systems which turns out to be due to the _FST
control method returning an invalid value until _FSL is first evaluated
for the given fan. If this happens, the .get_cur_state() cooling device
callback returns an error and __thermal_cooling_device_register() fails
as uses that callback after commit 31a0fa0019b0.
Arguably, _FST should not return an invalid value even if it is
evaluated before _FSL, so this may be regarded as a platform firmware
issue, but at the same time it is not a good enough reason for failing
the cooling device registration where the initial cooling device state
is only needed to initialize a thermal debug facility.
Accordingly, modify __thermal_cooling_device_register() to avoid
calling thermal_debug_cdev_add() instead of returning an error if the
initial .get_cur_state() callback invocation fails.
Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com
Reported-by: Laura Nao <laura.nao@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Laura Nao <laura.nao@collabora.com>
When a trip point becomes invalid after being crossed on the way up,
it is involved in a mitigation episode that needs to be adjusted to
compensate for the trip going away.
For this reason, introduce thermal_zone_trip_down() as a wrapper
around thermal_trip_crossed() and make thermal_zone_set_trip_temp()
call it if the new temperature of the trip at hand is equal to
THERMAL_TEMP_INVALID and it has been crossed on the way up to trigger
all of the necessary adjustments in user space, the thermal debug
code and the zone governor.
Fixes: 8c69a777e480 ("thermal: core: Fix the handling of invalid trip points")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a helper function called thermal_trip_crossed() to be invoked by
__thermal_zone_device_update() in order to notify user space, the
thermal debug code and the zone governor about trip crossing.
Subsequently, this will also be used in the case when a trip point
becomes invalid after being crossed on the way up.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 9ad18043fb35 ("thermal: core: Send trip crossing notifications
at init time if needed") overlooked the case when a trip point that
has started as invalid is set to a valid temperature later. Namely,
the initial threshold value for all trips is zero, so if a previously
invalid trip becomes valid and its (new) low temperature is above the
zone temperature, a spurious trip crossing notification will occur and
it may trigger the WARN_ON() in handle_thermal_trip().
To address this, set the initial threshold for all trips to INT_MAX.
There is also the case when a valid writable trip becomes invalid that
requires special handling. First, in accordance with the change
mentioned above, the trip's threshold needs to be set to INT_MAX to
avoid the same issue. Second, if the trip in question is passive and
it has been crossed by the thermal zone temperature on the way up, the
zone's passive count has been incremented and it is in the passive
polling mode, so its passive count needs to be adjusted to allow the
passive polling to be turned off eventually.
Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed")
Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core")
Reported-by: Zhang Rui <zhang.rui@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Passive polling is enabled by setting the 'passive' field in
struct thermal_zone_device to a positive value so long as the
'passive_delay_jiffies' field is greater than zero. It causes
the thermal core to actively check the thermal zone temperature
periodically which in theory should be done after crossing a
passive trip point on the way up in order to allow governors to
react more rapidly to temperature changes and adjust mitigation
more precisely.
However, the 'passive' field in struct thermal_zone_device is currently
managed by governors which is quite problematic. First of all, only
two governors, Step-Wise and Power Allocator, update that field at
all, so the other governors do not benefit from passive polling,
although in principle they should. Moreover, if the zone governor is
changed from, say, Step-Wise to Fair-Share after 'passive' has been
incremented by the former, it is not going to be reset back to zero by
the latter even if the zone temperature falls down below all passive
trip points.
For this reason, make handle_thermal_trip() increment 'passive'
to enable passive polling for the given thermal zone whenever a
passive trip point is crossed on the way up and decrement it
whenever a passive trip point is crossed on the way down. Also
remove the 'passive' field updates from governors and additionally
clear it in thermal_zone_device_init() to prevent passive polling
from being enabled after a system resume just beacuse it was enabled
before suspending the system.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Make __thermal_zone_device_update() bail out if update_temperature()
fails to update the zone temperature because __thermal_zone_get_temp()
has returned an error and the current zone temperature is
THERMAL_TEMP_INVALID (user space receiving netlink thermal messages,
thermal debug code and thermal governors may get confused otherwise).
Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
If cdev_dt_seq_show() runs before the first state transition of a cooling
device, it will not print any state residency information for it, even
though it might be reasonably expected to print residency information for
the initial state of the cooling device.
For this reason, rearrange the code to get the initial state of a cooling
device at the registration time and pass it to thermal_debug_cdev_add(),
so that the latter can create a duration record for that state which will
allow cdev_dt_seq_show() to print its residency information.
Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
Reported-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Add a wrapper around the .trip_crossed() governor callback invocation
to reduce code duplications slightly and improve the code layout in
__thermal_zone_device_update().
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
which is a better match for the purpose of the function.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Since thermal_debug_update_temp() is called before invoking
thermal_debug_tz_trip_down() for the trips that were crossed by the
zone temperature on the way up, it updates the statistics for them
as though the current zone temperature was above the low temperature
of each of them. However, if a given trip has just been crossed on the
way down, the zone temperature is in fact below its low temperature,
but this is handled by thermal_debug_tz_trip_down() running after the
update of the trip statistics.
The remedy is to call thermal_debug_update_temp() after
thermal_debug_tz_trip_down() has been invoked for all of the
trips in question, but then thermal_debug_tz_trip_up() needs to
be adjusted, so it does not update the statistics for the trips
that has just been crossed on the way up, as that will be taken
care of by thermal_debug_update_temp() down the road.
Modify the code accordingly.
Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Modify handle_thermal_trip() to call handle_critical_trips() only after
finding that the trip temperature has been crossed on the way up and
remove the redundant temperature check from the latter.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Since all of the governors in the tree have been switched over to using
the new callbacks, either .trip_crossed() or .manage(), the .throttle()
governor callback is not used any more, so drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Introduce a new thermal governor callback called .manage() that will be
invoked once per thermal zone update after processing all of the trip
points in the core.
This will allow governors that look at multiple trip points together
to check all of them in a consistent configuration, so they don't need
to play tricks with skipping .throttle() invocations that they are not
interested in and they can avoid carrying out the same computations for
multiple times in one cycle.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>