Bit 1 of sem->owner (RWSEM_ANONYMOUSLY_OWNED) is used to designate an
anonymous owner - readers or an anonymous writer. The setting of this
anonymous bit is used as an indicator that optimistic spinning cannot
be done on this rwsem.
With the upcoming reader optimistic spinning patches, a reader-owned
rwsem can be spinned on for a limit period of time. We still need
this bit to indicate a rwsem is nonspinnable, but not setting this
bit loses its meaning that the owner is known. So rename the bit
to RWSEM_NONSPINNABLE to clarify its meaning.
This patch also fixes a DEBUG_RWSEMS_WARN_ON() bug in __up_write().
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-12-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When the front of the wait queue is a reader, other readers
immediately following the first reader will also be woken up at the
same time. However, if there is a writer in between. Those readers
behind the writer will not be woken up.
Because of optimistic spinning, the lock acquisition order is not FIFO
anyway. The lock handoff mechanism will ensure that lock starvation
will not happen.
Assuming that the lock hold times of the other readers still in the
queue will be about the same as the readers that are being woken up,
there is really not much additional cost other than the additional
latency due to the wakeup of additional tasks by the waker. Therefore
all the readers up to a maximum of 256 in the queue are woken up when
the first waiter is a reader to improve reader throughput. This is
somewhat similar in concept to a phase-fair R/W lock.
With a locking microbenchmark running on 5.1 based kernel, the total
locking rates (in kops/s) on a 8-socket IvyBridge-EX system with
equal numbers of readers and writers before and after this patch were
as follows:
# of Threads Pre-Patch Post-patch
------------ --------- ----------
4 1,641 1,674
8 731 1,062
16 564 924
32 78 300
64 38 195
240 50 149
There is no performance gain at low contention level. At high contention
level, however, this patch gives a pretty decent performance boost.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-11-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
An RT task can do optimistic spinning only if the lock holder is
actually running. If the state of the lock holder isn't known, there
is a possibility that high priority of the RT task may block forward
progress of the lock holder if it happens to reside on the same CPU.
This will lead to deadlock. So we have to make sure that an RT task
will not spin on a reader-owned rwsem.
When the owner is temporarily set to NULL, there are two cases
where we may want to continue spinning:
1) The lock owner is in the process of releasing the lock, sem->owner
is cleared but the lock has not been released yet.
2) The lock was free and owner cleared, but another task just comes
in and acquire the lock before we try to get it. The new owner may
be a spinnable writer.
So an RT task is now made to retry one more time to see if it can
acquire the lock or continue spinning on the new owning writer.
When testing on a 8-socket IvyBridge-EX system, the one additional retry
seems to improve locking performance of RT write locking threads under
heavy contentions. The table below shows the locking rates (in kops/s)
with various write locking threads before and after the patch.
Locking threads Pre-patch Post-patch
--------------- --------- -----------
4 2,753 2,608
8 2,529 2,520
16 1,727 1,918
32 1,263 1,956
64 889 1,343
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-10-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the use of wake_q, we can do task wakeups without holding the
wait_lock. There is one exception in the rwsem code, though. It is
when the writer in the slowpath detects that there are waiters ahead
but the rwsem is not held by a writer. This can lead to a long wait_lock
hold time especially when a large number of readers are to be woken up.
Remediate this situation by releasing the wait_lock before waking
up tasks and re-acquiring it afterward. The rwsem_try_write_lock()
function is also modified to read the rwsem count directly to avoid
stale count value.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-9-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because of writer lock stealing, it is possible that a constant
stream of incoming writers will cause a waiting writer or reader to
wait indefinitely leading to lock starvation.
This patch implements a lock handoff mechanism to disable lock stealing
and force lock handoff to the first waiter or waiters (for readers)
in the queue after at least a 4ms waiting period unless it is a RT
writer task which doesn't need to wait. The waiting period is used to
avoid discouraging lock stealing too much to affect performance.
The setting and clearing of the handoff bit is serialized by the
wait_lock. So racing is not possible.
A rwsem microbenchmark was run for 5 seconds on a 2-socket 40-core
80-thread Skylake system with a v5.1 based kernel and 240 write_lock
threads with 5us sleep critical section.
Before the patch, the min/mean/max numbers of locking operations for
the locking threads were 1/7,792/173,696. After the patch, the figures
became 5,842/6,542/7,458. It can be seen that the rwsem became much
more fair, though there was a drop of about 16% in the mean locking
operations done which was a tradeoff of having better fairness.
Making the waiter set the handoff bit right after the first wakeup can
impact performance especially with a mixed reader/writer workload. With
the same microbenchmark with short critical section and equal number of
reader and writer threads (40/40), the reader/writer locking operation
counts with the current patch were:
40 readers, Iterations Min/Mean/Max = 1,793/1,794/1,796
40 writers, Iterations Min/Mean/Max = 1,793/34,956/86,081
By making waiter set handoff bit immediately after wakeup:
40 readers, Iterations Min/Mean/Max = 43/44/46
40 writers, Iterations Min/Mean/Max = 43/1,263/3,191
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-8-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch modifies rwsem_spin_on_owner() to return four possible
values to better reflect the state of lock holder which enables us to
make a better decision of what to do next.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-7-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After merging all the relevant rwsem code into one single file, there
are a number of optimizations and cleanups that can be done:
1) Remove all the EXPORT_SYMBOL() calls for functions that are not
accessed elsewhere.
2) Remove all the __visible tags as none of the functions will be
called from assembly code anymore.
3) Make all the internal functions static.
4) Remove some unneeded blank lines.
5) Remove the intermediate rwsem_down_{read|write}_failed*() functions
and rename __rwsem_down_{read|write}_failed_common() to
rwsem_down_{read|write}_slowpath().
6) Remove "__" prefix of __rwsem_mark_wake().
7) Use atomic_long_try_cmpxchg_acquire() as much as possible.
8) Remove the rwsem_rtrylock and rwsem_wtrylock lock events as they
are not that useful.
That enables the compiler to do better optimization and reduce code
size. The text+data size of rwsem.o on an x86-64 machine with gcc8 was
reduced from 10237 bytes to 5030 bytes with this change.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-6-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now we only have one implementation of rwsem. Even though we still use
xadd to handle reader locking, we use cmpxchg for writer instead. So
the filename rwsem-xadd.c is not strictly correct. Also no one outside
of the rwsem code need to know the internal implementation other than
function prototypes for two internal functions that are called directly
from percpu-rwsem.c.
So the rwsem-xadd.c and rwsem.h files are now merged into rwsem.c in
the following order:
<upper part of rwsem.h>
<rwsem-xadd.c>
<lower part of rwsem.h>
<rwsem.c>
The rwsem.h file now contains only 2 function declarations for
__up_read() and __down_read().
This is a code relocation patch with no code change at all except
making __up_read() and __down_read() non-static functions so they
can be used by percpu-rwsem.c.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-5-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current way of using various reader, writer and waiting biases
in the rwsem code are confusing and hard to understand. I have to
reread the rwsem count guide in the rwsem-xadd.c file from time to
time to remind myself how this whole thing works. It also makes the
rwsem code harder to be optimized.
To make rwsem more sane, a new locking scheme similar to the one in
qrwlock is now being used. The atomic long count has the following
bit definitions:
Bit 0 - writer locked bit
Bit 1 - waiters present bit
Bits 2-7 - reserved for future extension
Bits 8-X - reader count (24/56 bits)
The cmpxchg instruction is now used to acquire the write lock. The read
lock is still acquired with xadd instruction, so there is no change here.
This scheme will allow up to 16M/64P active readers which should be
more than enough. We can always use some more reserved bits if necessary.
With that change, we can deterministically know if a rwsem has been
write-locked. Looking at the count alone, however, one cannot determine
for certain if a rwsem is owned by readers or not as the readers that
set the reader count bits may be in the process of backing out. So we
still need the reader-owned bit in the owner field to be sure.
With a locking microbenchmark running on 5.1 based kernel, the total
locking rates (in kops/s) of the benchmark on a 8-socket 120-core
IvyBridge-EX system before and after the patch were as follows:
Before Patch After Patch
# of Threads wlock rlock wlock rlock
------------ ----- ----- ----- -----
1 30,659 31,341 31,055 31,283
2 8,909 16,457 9,884 17,659
4 9,028 15,823 8,933 20,233
8 8,410 14,212 7,230 17,140
16 8,217 25,240 7,479 24,607
The locking rates of the benchmark on a Power8 system were as follows:
Before Patch After Patch
# of Threads wlock rlock wlock rlock
------------ ----- ----- ----- -----
1 12,963 13,647 13,275 13,601
2 7,570 11,569 7,902 10,829
4 5,232 5,516 5,466 5,435
8 5,233 3,386 5,467 3,168
The locking rates of the benchmark on a 2-socket ARM64 system were
as follows:
Before Patch After Patch
# of Threads wlock rlock wlock rlock
------------ ----- ----- ----- -----
1 21,495 21,046 21,524 21,074
2 5,293 10,502 5,333 10,504
4 5,325 11,463 5,358 11,631
8 5,391 11,712 5,470 11,680
The performance are roughly the same before and after the patch. There
are run-to-run variations in performance. Runs with higher variances
usually have higher throughput.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-4-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After the following commit:
59aabfc7e959 ("locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write()")
the rwsem_wake() forgoes doing a wakeup if the wait_lock cannot be directly
acquired and an optimistic spinning locker is present. This can help performance
by avoiding spinning on the wait_lock when it is contended.
With the later commit:
133e89ef5ef3 ("locking/rwsem: Enable lockless waiter wakeup(s)")
the performance advantage of the above optimization diminishes as the average
wait_lock hold time become much shorter.
With a later patch that supports rwsem lock handoff, we can no
longer relies on the fact that the presence of an optimistic spinning
locker will ensure that the lock will be acquired by a task soon and
rwsem_wake() will be called later on to wake up waiters. This can lead
to missed wakeup and application hang.
So the original 59aabfc7e959 commit has to be reverted.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-3-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The owner field in the rw_semaphore structure is used primarily for
optimistic spinning. However, identifying the rwsem owner can also be
helpful in debugging as well as tracing locking related issues when
analyzing crash dump. The owner field may also store state information
that can be important to the operation of the rwsem.
So the owner field is now made a permanent member of the rw_semaphore
structure irrespective of CONFIG_RWSEM_SPIN_ON_OWNER.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-2-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a cfs_rq sleeps and returns its quota, we delay for 5ms before
waking any throttled cfs_rqs to coalesce with other cfs_rqs going to
sleep, as this has to be done outside of the rq lock we hold.
The current code waits for 5ms without any sleeps, instead of waiting
for 5ms from the first sleep, which can delay the unthrottle more than
we want. Switch this around so that we can't push this forward forever.
This requires an extra flag rather than using hrtimer_active, since we
need to start a new timer if the current one is in the process of
finishing.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com>
Acked-by: Phil Auld <pauld@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/xm26a7euy6iq.fsf_-_@bsegall-linux.svl.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jens reported that significant performance can be had on some block
workloads by special casing local wakeups. That is, wakeups on the
current task before it schedules out.
Given something like the normal wait pattern:
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (cond)
break;
schedule();
}
__set_current_state(TASK_RUNNING);
Any wakeup (on this CPU) after set_current_state() and before
schedule() would benefit from this.
Normal wakeups take p->pi_lock, which serializes wakeups to the same
task. By eliding that we gain concurrency on:
- ttwu_stat(); we already had concurrency on rq stats, this now also
brings it to task stats. -ENOCARE
- tracepoints; it is now possible to get multiple instances of
trace_sched_waking() (and possibly trace_sched_wakeup()) for the
same task. Tracers will have to learn to cope.
Furthermore, p->pi_lock is used by set_special_state(), to order
against TASK_RUNNING stores from other CPUs. But since this is
strictly CPU local, we don't need the lock, and set_special_state()'s
disabling of IRQs is sufficient.
After the normal wakeup takes p->pi_lock it issues
smp_mb__after_spinlock(), in order to ensure the woken task must
observe prior stores before we observe the p->state. If this is CPU
local, this will be satisfied with a compiler barrier, and we rely on
try_to_wake_up() being a funcation call, which implies such.
Since, when 'p == current', 'p->on_rq' must be true, the normal wakeup
would continue into the ttwu_remote() branch, which normally is
concerned with exactly this wakeup scenario, except from a remote CPU.
IOW we're waking a task that is still running. In this case, we can
trivially avoid taking rq->lock, all that's left from this is to set
p->state.
This then yields an extremely simple and fast path for 'p == current'.
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: gkohli@codeaurora.org
Cc: hch@lst.de
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
runnable_avg_yN_inv[] is only used in kernel/sched/pelt.c but was
included in several other places because they need other macros all
came from kernel/sched/sched-pelt.h which was generated by
Documentation/scheduler/sched-pelt. As the result, it causes compilation
a lot of warnings,
kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
...
Silence it by appending the __maybe_unused attribute for it, so all
generated variables and macros can still be kept in the same file.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1559596304-31581-1-git-send-email-cai@lca.pw
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cfs_rq_has_blocked() and others_have_blocked() are only used within
update_blocked_averages(). The !CONFIG_FAIR_GROUP_SCHED version of the
latter calls them within a #define CONFIG_NO_HZ_COMMON block, whereas
the CONFIG_FAIR_GROUP_SCHED one calls them unconditionnally.
As reported by Qian, the above leads to this warning in
!CONFIG_NO_HZ_COMMON configs:
kernel/sched/fair.c: In function 'update_blocked_averages':
kernel/sched/fair.c:7750:7: warning: variable 'done' set but not used [-Wunused-but-set-variable]
It wouldn't be wrong to keep cfs_rq_has_blocked() and
others_have_blocked() as they are, but since their only current use is
to figure out when we can stop calling update_blocked_averages() on
fully decayed NOHZ idle CPUs, we can give them a new definition for
!CONFIG_NO_HZ_COMMON.
Change the definition of cfs_rq_has_blocked() and
others_have_blocked() for !CONFIG_NO_HZ_COMMON so that the
NOHZ-specific blocks of update_blocked_averages() become no-ops and
the 'done' variable gets optimised out.
While at it, remove the CONFIG_NO_HZ_COMMON block from the
!CONFIG_FAIR_GROUP_SCHED definition of update_blocked_averages() by
using the newly-introduced update_blocked_load_status() helper.
No change in functionality intended.
[ Additions by Peter Zijlstra. ]
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190603115424.7951-1-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Non-inline io_schedule() was introduced in:
commit 10ab56434f2f ("sched/core: Separate out io_schedule_prepare() and io_schedule_finish()")
Keep in line with io_schedule_timeout(), otherwise "/proc/<pid>/wchan" will
report io_schedule() rather than its callers when waiting for IO.
Reported-by: Jilong Kou <koujilong@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 10ab56434f2f ("sched/core: Separate out io_schedule_prepare() and io_schedule_finish()")
Link: https://lkml.kernel.org/r/20190603091338.2695-1-gaoxiang25@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf_sample_regs_user() uses 'current->mm' to test for the presence of
userspace, but this is insufficient, consider use_mm().
A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
PF_KTHREAD after it sets the new ->mm but before it drops to userspace
for the first time.
Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process")
Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reported-by: Young Xiao <92siuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the architecture supports the batching of jump label updates, use it!
An easy way to see the benefits of this patch is switching the
schedstats on and off. For instance:
-------------------------- %< ----------------------------
#!/bin/sh
while [ true ]; do
sysctl -w kernel.sched_schedstats=1
sleep 2
sysctl -w kernel.sched_schedstats=0
sleep 2
done
-------------------------- >% ----------------------------
while watching the IPI count:
-------------------------- %< ----------------------------
# watch -n1 "cat /proc/interrupts | grep Function"
-------------------------- >% ----------------------------
With the current mode, it is possible to see +- 168 IPIs each 2 seconds,
while with this patch the number of IPIs goes to 3 each 2 seconds.
Regarding the performance impact of this patch set, I made two measurements:
The time to update a key (the task that is causing the change)
The time to run the int3 handler (the side effect on a thread that
hits the code being changed)
The schedstats static key was chosen as the key to being switched on and off.
The reason being is that it is used in more than 56 places, in a hot path. The
change in the schedstats static key will be done with the following command:
while [ true ]; do
sysctl -w kernel.sched_schedstats=1
usleep 500000
sysctl -w kernel.sched_schedstats=0
usleep 500000
done
In this way, they key will be updated twice per second. To force the hit of the
int3 handler, the system will also run a kernel compilation with two jobs per
CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor
@2.27GHz.
Regarding the update part, on average, the regular kernel takes 57 ms to update
the schedstats key, while the kernel with the batch updates takes just 1.4 ms
on average. Although it seems to be too good to be true, it makes sense: the
schedstats key is used in 56 places, so it was expected that it would take
around 56 times to update the keys with the current implementation, as the
IPIs are the most expensive part of the update.
Regarding the int3 handler, the non-batch handler takes 45 ns on average, while
the batch version takes around 180 ns. At first glance, it seems to be a high
value. But it is not, considering that it is doing 56 updates, rather than one!
It is taking four times more, only. This gain is possible because the patch
uses a binary search in the vector: log2(56)=5.8. So, it was expected to have
an overhead within four times.
(voice of tv propaganda) But, that is not all! As the int3 handler keeps on for
a shorter period (because the update part is on for a shorter time), the number
of hits in the int3 handler decreased by 10%.
The question then is: Is it worth paying the price of "135 ns" more in the int3
handler?
Considering that, in this test case, we are saving the handling of 53 IPIs,
that takes more than these 135 ns, it seems to be a meager price to be paid.
Moreover, the test case was forcing the hit of the int3, in practice, it
does not take that often. While the IPI takes place on all CPUs, hitting
the int3 handler or not!
For instance, in an isolated CPU with a process running in user-space
(nohz_full use-case), the chances of hitting the int3 handler is barely zero,
while there is no way to avoid the IPIs. By bounding the IPIs, we are improving
a lot this scenario.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/acc891dbc2dbc9fd616dd680529a2337b1d1274c.1560325897.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the batching mode, all the entries of a given key are updated at once.
During the update of a key, a hit in the int3 handler will check if the
hitting code address belongs to one of these keys.
To optimize the search of a given code in the vector of entries being
updated, a binary search is used. The binary search relies on the order
of the entries of a key by its code. Hence the keys need to be sorted
by the code too, so sort the entries of a given key by the code.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/f57ae83e0592418ba269866bb7ade570fc8632e0.1560325897.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the check if a jump_entry is valid to a function. No functional
change.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/56b69bd3f8e644ed64f2dbde7c088030b8cbe76b.1560325897.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull timer fixes from Thomas Gleixner:
"A set of small fixes:
- Repair the ktime_get_coarse() functions so they actually deliver
what they are supposed to: tick granular time stamps. The current
code missed to add the accumulated nanoseconds part of the
timekeeper so the resulting granularity was 1 second.
- Prevent the tracer from infinitely recursing into time getter
functions in the arm architectured timer by marking these functions
notrace
- Fix a trivial compiler warning caused by wrong qualifier ordering"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Repair ktime_get_coarse*() granularity
clocksource/drivers/arm_arch_timer: Don't trace count reader functions
clocksource/drivers/timer-ti-dm: Change to new style declaration
Alexei Starovoitov says:
====================
pull-request: bpf 2019-06-15
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix stack layout of JITed x64 bpf code, from Alexei.
2) fix out of bounds memory access in bpf_sk_storage, from Arthur.
3) fix lpm trie walk, from Jonathan.
4) fix nested bpf_perf_event_output, from Matt.
5) and several other fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as
they do not increment bpf_prog_active while executing.
This enables three levels of nesting, to support
- a kprobe or raw tp or perf event,
- another one of the above that irq context happens to call, and
- another one in nmi context
(at most one of which may be a kprobe or perf event).
Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data")
Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Out of range read of stack trace output
- Fix for NULL pointer dereference in trace_uprobe_create()
- Fix to a livepatching / ftrace permission race in the module code
- Fix for NULL pointer dereference in free_ftrace_func_mapper()
- A couple of build warning clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXQToxhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qusmAP4/mmJPgsDchnu5ui0wB8BByzJlsPn8
luXFDuqI4f34zgD+JCmeYbj5LLh98D9XkaaEgP4yz3yKsWeSdwWPCU0vTgo=
=M/+E
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Out of range read of stack trace output
- Fix for NULL pointer dereference in trace_uprobe_create()
- Fix to a livepatching / ftrace permission race in the module code
- Fix for NULL pointer dereference in free_ftrace_func_mapper()
- A couple of build warning clean ups
* tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
module: Fix livepatch/ftrace module text permissions race
tracing/uprobe: Fix obsolete comment on trace_uprobe_create()
tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create()
tracing: Make two symbols static
tracing: avoid build warning with HAVE_NOP_MCOUNT
tracing: Fix out-of-range read in trace_stack_print()
stop_machine is the only user left of cpu_relax_yield. Given that it
now has special semantics which are tied to stop_machine introduce a
weak stop_machine_yield function which architectures can override, and
get rid of the generic cpu_relax_yield implementation.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The stop_machine loop to advance the state machine and to wait for all
affected CPUs to check-in calls cpu_relax_yield in a tight loop until
the last missing CPUs acknowledged the state transition.
On a virtual system where not all logical CPUs are backed by real CPUs
all the time it can take a while for all CPUs to check-in. With the
current definition of cpu_relax_yield a diagnose 0x44 is done which
tells the hypervisor to schedule *some* other CPU. That can be any
CPU and not necessarily one of the CPUs that need to run in order to
advance the state machine. This can lead to a pretty bad diagnose 0x44
storm until the last missing CPU finally checked-in.
Replace the undirected cpu_relax_yield based on diagnose 0x44 with a
directed yield. Each CPU in the wait loop will pick up the next CPU
in the cpumask of stop_machine. The diagnose 0x9c is used to tell the
hypervisor to run this next CPU instead of the current one. If there
is only a limited number of real CPUs backing the virtual CPUs we
end up with the real CPUs passed around in a round-robin fashion.
[heiko.carstens@de.ibm.com]:
Use cpumask_next_wrap as suggested by Peter Zijlstra.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Pull cgroup fixes from Tejun Heo:
"This has an unusually high density of tricky fixes:
- task_get_css() could deadlock when it races against a dying cgroup.
- cgroup.procs didn't list thread group leaders with live threads.
This could mislead readers to think that a cgroup is empty when
it's not. Fixed by making PROCS iterator include dead tasks. I made
a couple mistakes making this change and this pull request contains
a couple follow-up patches.
- When cpusets run out of online cpus, it updates cpusmasks of member
tasks in bizarre ways. Joel improved the behavior significantly"
* 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: restore sanity to cpuset_cpus_allowed_fallback()
cgroup: Fix css_task_iter_advance_css_set() cset skip condition
cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
cgroup: Include dying leaders with live threads in PROCS iterations
cgroup: Implement css_task_iter_skip()
cgroup: Call cgroup_release() before __exit_signal()
docs cgroups: add another example size for hugetlb
cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
Convert proc_dointvec_minmax_bpf_stats() into a more generic
helper, since we are going to use jump labels more often.
Note that sysctl_bpf_stats_enabled is removed, since
it is no longer needed/used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
.ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net
uses RCU to detect it has setup the resources for tx. The assumption
accidentally broke when introducing bulk queue in devmap.
Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
dev_map_free() waits for flush_needed bitmap to be empty in order to
ensure all flush operations have completed before freeing its entries.
However the corresponding clear_bit() was called before using the
entries, so the entries could be used after free.
All access to the entries needs to be done before clearing the bit.
It seems commit a5e2da6e9787 ("bpf: netdev is never null in
__dev_map_flush") accidentally changed the clear_bit() and memory access
order.
Note that the problem happens only in __dev_map_flush(), not in
dev_map_flush_old(). dev_map_flush_old() is called only after nulling
out the corresponding netdev_map entry, so dev_map_free() never frees
the entry thus no such race happens there.
Fixes: a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush")
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Convert the PM documents to ReST, in order to allow them to
build with Sphinx.
The conversion is actually:
- add blank lines and indentation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Commit 0597c49c69d5 ("tracing/uprobes: Use dyn_event framework for
uprobe events") cleaned up the usage of trace_uprobe_create(), and the
function has been no longer used for removing uprobe/uretprobe.
Link: http://lkml.kernel.org/r/20190614074026.8045-2-devel@etsukata.com
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fix sparse warnings:
kernel/trace/trace.c:6927:24: warning:
symbol 'get_tracing_log_err' was not declared. Should it be static?
kernel/trace/trace.c:8196:15: warning:
symbol 'trace_instance_dir' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/20190614153210.24424-1-yuehaibing@huawei.com
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it)
and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for
testing CC_USING_* defines) to avoid conditional compilation and fix
the following gcc 9 warning on s390:
kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined
but not used [-Wunused-function]
Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours
Fixes: 2f4df0017baed ("tracing: Add -mcount-nop option support")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In order to prepare to add them to the Kernel API book,
convert the files to ReST format.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Convert the cgroup-v1 files to ReST format, in order to
allow a later addition to the admin-guide.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
The inline keyword was not at the beginning of the function declarations.
Fix the following warnings triggered when using W=1:
kernel/time/clocksource.c:108:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
kernel/time/clocksource.c:113:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Cc: kernel-janitors@vger.kernel.org
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20190524103339.28787-1-malat@debian.org
Jason reported that the coarse ktime based time getters advance only once
per second and not once per tick as advertised.
The code reads only the monotonic base time, which advances once per
second. The nanoseconds are accumulated on every tick in xtime_nsec up to
a second and the regular time getters take this nanoseconds offset into
account, but the ktime_get_coarse*() implementation fails to do so.
Add the accumulated xtime_nsec value to the monotonic base time to get the
proper per tick advancing coarse tinme.
Fixes: b9ff604cff11 ("timekeeping: Add ktime_get_coarse_with_offset")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906132136280.1791@nanos.tec.linutronix.de
The declaration for pfn_is_nosave is only available in
kernel/power/power.h. Since this function can be override in arch,
expose it globally. Having a prototype will make sure to avoid warning
(sometime treated as error with W=1) such as:
arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes]
This moves the declaration into a globally visible header file and add
missing include to avoid a warning on powerpc.
Also remove the duplicated prototypes since not required anymore.
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>