License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2015-04-24 14:56:37 -04:00
|
|
|
#ifndef _GEN_PV_LOCK_SLOWPATH
|
|
|
|
#error "do not include this file"
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#include <linux/hash.h>
|
2018-10-30 15:09:49 -07:00
|
|
|
#include <linux/memblock.h>
|
2015-07-11 21:19:19 -04:00
|
|
|
#include <linux/debug_locks.h>
|
2015-04-24 14:56:37 -04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Implement paravirt qspinlocks; the general idea is to halt the vcpus instead
|
|
|
|
* of spinning them.
|
|
|
|
*
|
|
|
|
* This relies on the architecture to provide two paravirt hypercalls:
|
|
|
|
*
|
|
|
|
* pv_wait(u8 *ptr, u8 val) -- suspends the vcpu if *ptr == val
|
|
|
|
* pv_kick(cpu) -- wakes a suspended vcpu
|
|
|
|
*
|
|
|
|
* Using these we implement __pv_queued_spin_lock_slowpath() and
|
|
|
|
* __pv_queued_spin_unlock() to replace native_queued_spin_lock_slowpath() and
|
|
|
|
* native_queued_spin_unlock().
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define _Q_SLOW_VAL (3U << _Q_LOCKED_OFFSET)
|
|
|
|
|
2015-11-09 19:09:27 -05:00
|
|
|
/*
|
|
|
|
* Queue Node Adaptive Spinning
|
|
|
|
*
|
|
|
|
* A queue node vCPU will stop spinning if the vCPU in the previous node is
|
|
|
|
* not running. The one lock stealing attempt allowed at slowpath entry
|
|
|
|
* mitigates the slight slowdown for non-overcommitted guest with this
|
|
|
|
* aggressive wait-early mechanism.
|
|
|
|
*
|
|
|
|
* The status of the previous node will be checked at fixed interval
|
|
|
|
* controlled by PV_PREV_CHECK_MASK. This is to ensure that we won't
|
|
|
|
* pound on the cacheline of the previous node too heavily.
|
|
|
|
*/
|
|
|
|
#define PV_PREV_CHECK_MASK 0xff
|
|
|
|
|
2015-07-11 16:36:52 -04:00
|
|
|
/*
|
|
|
|
* Queue node uses: vcpu_running & vcpu_halted.
|
|
|
|
* Queue head uses: vcpu_running & vcpu_hashed.
|
|
|
|
*/
|
2015-04-24 14:56:37 -04:00
|
|
|
enum vcpu_state {
|
|
|
|
vcpu_running = 0,
|
2015-07-11 16:36:52 -04:00
|
|
|
vcpu_halted, /* Used only in pv_wait_node */
|
|
|
|
vcpu_hashed, /* = pv_hash'ed + vcpu_halted */
|
2015-04-24 14:56:37 -04:00
|
|
|
};
|
|
|
|
|
|
|
|
struct pv_node {
|
|
|
|
struct mcs_spinlock mcs;
|
|
|
|
int cpu;
|
|
|
|
u8 state;
|
|
|
|
};
|
|
|
|
|
2015-11-10 16:18:56 -05:00
|
|
|
/*
|
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
Currently, all the lock waiters entering the slowpath will do one
lock stealing attempt to acquire the lock. That helps performance,
especially in VMs with over-committed vCPUs. However, the current
pvqspinlocks still don't perform as good as unfair locks in many cases.
On the other hands, unfair locks do have the problem of lock starvation
that pvqspinlocks don't have.
This patch combines the best attributes of an unfair lock and a
pvqspinlock into a hybrid lock with 2 modes - queued mode & unfair
mode. A lock waiter goes into the unfair mode when there are waiters
in the wait queue but the pending bit isn't set. Otherwise, it will
go into the queued mode waiting in the queue for its turn.
On a 2-socket 36-core E5-2699 v3 system (HT off), a kernel build
(make -j<n>) was done in a VM with unpinned vCPUs 3 times with the
best time selected and <n> is the number of vCPUs available. The build
times of the original pvqspinlock, hybrid pvqspinlock and unfair lock
with various number of vCPUs are as follows:
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
30 342.1s 329.1s 329.1s
36 314.1s 305.3s 307.3s
45 345.0s 302.1s 306.6s
54 365.4s 308.6s 307.8s
72 358.9s 293.6s 303.9s
108 343.0s 285.9s 304.2s
The hybrid pvqspinlock performs better or comparable to the unfair
lock.
By turning on QUEUED_LOCK_STAT, the table below showed the number
of lock acquisitions in unfair mode and queue mode after a kernel
build with various number of vCPUs.
vCPUs queued mode unfair mode
----- ----------- -----------
30 9,130,518 294,954
36 10,856,614 386,809
45 8,467,264 11,475,373
54 6,409,987 19,670,855
72 4,782,063 25,712,180
It can be seen that as the VM became more and more over-committed,
the ratio of locks acquired in unfair mode increases. This is all
done automatically to get the best overall performance as possible.
Using a kernel locking microbenchmark with number of locking
threads equals to the number of vCPUs available on the same machine,
the minimum, average and maximum (min/avg/max) numbers of locking
operations done per thread in a 5-second testing interval are shown
below:
vCPUs hybrid pvqlock unfair lock
----- -------------- -----------
36 822,135/881,063/950,363 75,570/313,496/ 690,465
54 542,435/581,664/625,937 35,460/204,280/ 457,172
72 397,500/428,177/499,299 17,933/150,679/ 708,001
108 257,898/288,150/340,871 3,085/181,176/1,257,109
It can be seen that the hybrid pvqspinlocks are more fair and
performant than the unfair locks in this test.
The table below shows the kernel build times on a smaller 2-socket
16-core 32-thread E5-2620 v4 system.
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
16 436.8s 433.4s 435.6s
36 366.2s 364.8s 364.5s
48 423.6s 376.3s 370.2s
64 433.1s 376.6s 376.8s
Again, the performance of the hybrid pvqspinlock was comparable to
that of the unfair lock.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510089486-3466-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 16:18:06 -05:00
|
|
|
* Hybrid PV queued/unfair lock
|
|
|
|
*
|
2015-11-10 16:18:56 -05:00
|
|
|
* By replacing the regular queued_spin_trylock() with the function below,
|
|
|
|
* it will be called once when a lock waiter enter the PV slowpath before
|
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
Currently, all the lock waiters entering the slowpath will do one
lock stealing attempt to acquire the lock. That helps performance,
especially in VMs with over-committed vCPUs. However, the current
pvqspinlocks still don't perform as good as unfair locks in many cases.
On the other hands, unfair locks do have the problem of lock starvation
that pvqspinlocks don't have.
This patch combines the best attributes of an unfair lock and a
pvqspinlock into a hybrid lock with 2 modes - queued mode & unfair
mode. A lock waiter goes into the unfair mode when there are waiters
in the wait queue but the pending bit isn't set. Otherwise, it will
go into the queued mode waiting in the queue for its turn.
On a 2-socket 36-core E5-2699 v3 system (HT off), a kernel build
(make -j<n>) was done in a VM with unpinned vCPUs 3 times with the
best time selected and <n> is the number of vCPUs available. The build
times of the original pvqspinlock, hybrid pvqspinlock and unfair lock
with various number of vCPUs are as follows:
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
30 342.1s 329.1s 329.1s
36 314.1s 305.3s 307.3s
45 345.0s 302.1s 306.6s
54 365.4s 308.6s 307.8s
72 358.9s 293.6s 303.9s
108 343.0s 285.9s 304.2s
The hybrid pvqspinlock performs better or comparable to the unfair
lock.
By turning on QUEUED_LOCK_STAT, the table below showed the number
of lock acquisitions in unfair mode and queue mode after a kernel
build with various number of vCPUs.
vCPUs queued mode unfair mode
----- ----------- -----------
30 9,130,518 294,954
36 10,856,614 386,809
45 8,467,264 11,475,373
54 6,409,987 19,670,855
72 4,782,063 25,712,180
It can be seen that as the VM became more and more over-committed,
the ratio of locks acquired in unfair mode increases. This is all
done automatically to get the best overall performance as possible.
Using a kernel locking microbenchmark with number of locking
threads equals to the number of vCPUs available on the same machine,
the minimum, average and maximum (min/avg/max) numbers of locking
operations done per thread in a 5-second testing interval are shown
below:
vCPUs hybrid pvqlock unfair lock
----- -------------- -----------
36 822,135/881,063/950,363 75,570/313,496/ 690,465
54 542,435/581,664/625,937 35,460/204,280/ 457,172
72 397,500/428,177/499,299 17,933/150,679/ 708,001
108 257,898/288,150/340,871 3,085/181,176/1,257,109
It can be seen that the hybrid pvqspinlocks are more fair and
performant than the unfair locks in this test.
The table below shows the kernel build times on a smaller 2-socket
16-core 32-thread E5-2620 v4 system.
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
16 436.8s 433.4s 435.6s
36 366.2s 364.8s 364.5s
48 423.6s 376.3s 370.2s
64 433.1s 376.6s 376.8s
Again, the performance of the hybrid pvqspinlock was comparable to
that of the unfair lock.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510089486-3466-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 16:18:06 -05:00
|
|
|
* being queued.
|
|
|
|
*
|
|
|
|
* The pending bit is set by the queue head vCPU of the MCS wait queue in
|
|
|
|
* pv_wait_head_or_lock() to signal that it is ready to spin on the lock.
|
|
|
|
* When that bit becomes visible to the incoming waiters, no lock stealing
|
|
|
|
* is allowed. The function will return immediately to make the waiters
|
|
|
|
* enter the MCS wait queue. So lock starvation shouldn't happen as long
|
|
|
|
* as the queued mode vCPUs are actively running to set the pending bit
|
|
|
|
* and hence disabling lock stealing.
|
|
|
|
*
|
|
|
|
* When the pending bit isn't set, the lock waiters will stay in the unfair
|
|
|
|
* mode spinning on the lock unless the MCS wait queue is empty. In this
|
|
|
|
* case, the lock waiters will enter the queued mode slowpath trying to
|
|
|
|
* become the queue head and set the pending bit.
|
|
|
|
*
|
|
|
|
* This hybrid PV queued/unfair lock combines the best attributes of a
|
|
|
|
* queued lock (no lock starvation) and an unfair lock (good performance
|
|
|
|
* on not heavily contended locks).
|
2015-11-10 16:18:56 -05:00
|
|
|
*/
|
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
Currently, all the lock waiters entering the slowpath will do one
lock stealing attempt to acquire the lock. That helps performance,
especially in VMs with over-committed vCPUs. However, the current
pvqspinlocks still don't perform as good as unfair locks in many cases.
On the other hands, unfair locks do have the problem of lock starvation
that pvqspinlocks don't have.
This patch combines the best attributes of an unfair lock and a
pvqspinlock into a hybrid lock with 2 modes - queued mode & unfair
mode. A lock waiter goes into the unfair mode when there are waiters
in the wait queue but the pending bit isn't set. Otherwise, it will
go into the queued mode waiting in the queue for its turn.
On a 2-socket 36-core E5-2699 v3 system (HT off), a kernel build
(make -j<n>) was done in a VM with unpinned vCPUs 3 times with the
best time selected and <n> is the number of vCPUs available. The build
times of the original pvqspinlock, hybrid pvqspinlock and unfair lock
with various number of vCPUs are as follows:
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
30 342.1s 329.1s 329.1s
36 314.1s 305.3s 307.3s
45 345.0s 302.1s 306.6s
54 365.4s 308.6s 307.8s
72 358.9s 293.6s 303.9s
108 343.0s 285.9s 304.2s
The hybrid pvqspinlock performs better or comparable to the unfair
lock.
By turning on QUEUED_LOCK_STAT, the table below showed the number
of lock acquisitions in unfair mode and queue mode after a kernel
build with various number of vCPUs.
vCPUs queued mode unfair mode
----- ----------- -----------
30 9,130,518 294,954
36 10,856,614 386,809
45 8,467,264 11,475,373
54 6,409,987 19,670,855
72 4,782,063 25,712,180
It can be seen that as the VM became more and more over-committed,
the ratio of locks acquired in unfair mode increases. This is all
done automatically to get the best overall performance as possible.
Using a kernel locking microbenchmark with number of locking
threads equals to the number of vCPUs available on the same machine,
the minimum, average and maximum (min/avg/max) numbers of locking
operations done per thread in a 5-second testing interval are shown
below:
vCPUs hybrid pvqlock unfair lock
----- -------------- -----------
36 822,135/881,063/950,363 75,570/313,496/ 690,465
54 542,435/581,664/625,937 35,460/204,280/ 457,172
72 397,500/428,177/499,299 17,933/150,679/ 708,001
108 257,898/288,150/340,871 3,085/181,176/1,257,109
It can be seen that the hybrid pvqspinlocks are more fair and
performant than the unfair locks in this test.
The table below shows the kernel build times on a smaller 2-socket
16-core 32-thread E5-2620 v4 system.
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
16 436.8s 433.4s 435.6s
36 366.2s 364.8s 364.5s
48 423.6s 376.3s 370.2s
64 433.1s 376.6s 376.8s
Again, the performance of the hybrid pvqspinlock was comparable to
that of the unfair lock.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510089486-3466-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 16:18:06 -05:00
|
|
|
#define queued_spin_trylock(l) pv_hybrid_queued_unfair_trylock(l)
|
|
|
|
static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock)
|
2015-11-10 16:18:56 -05:00
|
|
|
{
|
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
Currently, all the lock waiters entering the slowpath will do one
lock stealing attempt to acquire the lock. That helps performance,
especially in VMs with over-committed vCPUs. However, the current
pvqspinlocks still don't perform as good as unfair locks in many cases.
On the other hands, unfair locks do have the problem of lock starvation
that pvqspinlocks don't have.
This patch combines the best attributes of an unfair lock and a
pvqspinlock into a hybrid lock with 2 modes - queued mode & unfair
mode. A lock waiter goes into the unfair mode when there are waiters
in the wait queue but the pending bit isn't set. Otherwise, it will
go into the queued mode waiting in the queue for its turn.
On a 2-socket 36-core E5-2699 v3 system (HT off), a kernel build
(make -j<n>) was done in a VM with unpinned vCPUs 3 times with the
best time selected and <n> is the number of vCPUs available. The build
times of the original pvqspinlock, hybrid pvqspinlock and unfair lock
with various number of vCPUs are as follows:
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
30 342.1s 329.1s 329.1s
36 314.1s 305.3s 307.3s
45 345.0s 302.1s 306.6s
54 365.4s 308.6s 307.8s
72 358.9s 293.6s 303.9s
108 343.0s 285.9s 304.2s
The hybrid pvqspinlock performs better or comparable to the unfair
lock.
By turning on QUEUED_LOCK_STAT, the table below showed the number
of lock acquisitions in unfair mode and queue mode after a kernel
build with various number of vCPUs.
vCPUs queued mode unfair mode
----- ----------- -----------
30 9,130,518 294,954
36 10,856,614 386,809
45 8,467,264 11,475,373
54 6,409,987 19,670,855
72 4,782,063 25,712,180
It can be seen that as the VM became more and more over-committed,
the ratio of locks acquired in unfair mode increases. This is all
done automatically to get the best overall performance as possible.
Using a kernel locking microbenchmark with number of locking
threads equals to the number of vCPUs available on the same machine,
the minimum, average and maximum (min/avg/max) numbers of locking
operations done per thread in a 5-second testing interval are shown
below:
vCPUs hybrid pvqlock unfair lock
----- -------------- -----------
36 822,135/881,063/950,363 75,570/313,496/ 690,465
54 542,435/581,664/625,937 35,460/204,280/ 457,172
72 397,500/428,177/499,299 17,933/150,679/ 708,001
108 257,898/288,150/340,871 3,085/181,176/1,257,109
It can be seen that the hybrid pvqspinlocks are more fair and
performant than the unfair locks in this test.
The table below shows the kernel build times on a smaller 2-socket
16-core 32-thread E5-2620 v4 system.
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
16 436.8s 433.4s 435.6s
36 366.2s 364.8s 364.5s
48 423.6s 376.3s 370.2s
64 433.1s 376.6s 376.8s
Again, the performance of the hybrid pvqspinlock was comparable to
that of the unfair lock.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510089486-3466-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 16:18:06 -05:00
|
|
|
/*
|
|
|
|
* Stay in unfair lock mode as long as queued mode waiters are
|
|
|
|
* present in the MCS wait queue but the pending bit isn't set.
|
|
|
|
*/
|
|
|
|
for (;;) {
|
|
|
|
int val = atomic_read(&lock->val);
|
|
|
|
|
|
|
|
if (!(val & _Q_LOCKED_PENDING_MASK) &&
|
2018-04-26 11:34:16 +01:00
|
|
|
(cmpxchg_acquire(&lock->locked, 0, _Q_LOCKED_VAL) == 0)) {
|
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
Currently, all the lock waiters entering the slowpath will do one
lock stealing attempt to acquire the lock. That helps performance,
especially in VMs with over-committed vCPUs. However, the current
pvqspinlocks still don't perform as good as unfair locks in many cases.
On the other hands, unfair locks do have the problem of lock starvation
that pvqspinlocks don't have.
This patch combines the best attributes of an unfair lock and a
pvqspinlock into a hybrid lock with 2 modes - queued mode & unfair
mode. A lock waiter goes into the unfair mode when there are waiters
in the wait queue but the pending bit isn't set. Otherwise, it will
go into the queued mode waiting in the queue for its turn.
On a 2-socket 36-core E5-2699 v3 system (HT off), a kernel build
(make -j<n>) was done in a VM with unpinned vCPUs 3 times with the
best time selected and <n> is the number of vCPUs available. The build
times of the original pvqspinlock, hybrid pvqspinlock and unfair lock
with various number of vCPUs are as follows:
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
30 342.1s 329.1s 329.1s
36 314.1s 305.3s 307.3s
45 345.0s 302.1s 306.6s
54 365.4s 308.6s 307.8s
72 358.9s 293.6s 303.9s
108 343.0s 285.9s 304.2s
The hybrid pvqspinlock performs better or comparable to the unfair
lock.
By turning on QUEUED_LOCK_STAT, the table below showed the number
of lock acquisitions in unfair mode and queue mode after a kernel
build with various number of vCPUs.
vCPUs queued mode unfair mode
----- ----------- -----------
30 9,130,518 294,954
36 10,856,614 386,809
45 8,467,264 11,475,373
54 6,409,987 19,670,855
72 4,782,063 25,712,180
It can be seen that as the VM became more and more over-committed,
the ratio of locks acquired in unfair mode increases. This is all
done automatically to get the best overall performance as possible.
Using a kernel locking microbenchmark with number of locking
threads equals to the number of vCPUs available on the same machine,
the minimum, average and maximum (min/avg/max) numbers of locking
operations done per thread in a 5-second testing interval are shown
below:
vCPUs hybrid pvqlock unfair lock
----- -------------- -----------
36 822,135/881,063/950,363 75,570/313,496/ 690,465
54 542,435/581,664/625,937 35,460/204,280/ 457,172
72 397,500/428,177/499,299 17,933/150,679/ 708,001
108 257,898/288,150/340,871 3,085/181,176/1,257,109
It can be seen that the hybrid pvqspinlocks are more fair and
performant than the unfair locks in this test.
The table below shows the kernel build times on a smaller 2-socket
16-core 32-thread E5-2620 v4 system.
vCPUs pvqlock hybrid pvqlock unfair lock
----- ------- -------------- -----------
16 436.8s 433.4s 435.6s
36 366.2s 364.8s 364.5s
48 423.6s 376.3s 370.2s
64 433.1s 376.6s 376.8s
Again, the performance of the hybrid pvqspinlock was comparable to
that of the unfair lock.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510089486-3466-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 16:18:06 -05:00
|
|
|
qstat_inc(qstat_pv_lock_stealing, true);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
if (!(val & _Q_TAIL_MASK) || (val & _Q_PENDING_MASK))
|
|
|
|
break;
|
|
|
|
|
|
|
|
cpu_relax();
|
2016-07-14 14:26:11 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
2015-11-10 16:18:56 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The pending bit is used by the queue head vCPU to indicate that it
|
|
|
|
* is actively spinning on the lock and no lock stealing is allowed.
|
|
|
|
*/
|
|
|
|
#if _Q_PENDING_BITS == 8
|
|
|
|
static __always_inline void set_pending(struct qspinlock *lock)
|
|
|
|
{
|
2018-04-26 11:34:16 +01:00
|
|
|
WRITE_ONCE(lock->pending, 1);
|
2015-11-10 16:18:56 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The pending bit check in pv_queued_spin_steal_lock() isn't a memory
|
2017-08-14 16:07:02 -04:00
|
|
|
* barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the
|
|
|
|
* lock just to be sure that it will get it.
|
2015-11-10 16:18:56 -05:00
|
|
|
*/
|
|
|
|
static __always_inline int trylock_clear_pending(struct qspinlock *lock)
|
|
|
|
{
|
2018-04-26 11:34:16 +01:00
|
|
|
return !READ_ONCE(lock->locked) &&
|
|
|
|
(cmpxchg_acquire(&lock->locked_pending, _Q_PENDING_VAL,
|
2017-08-14 16:07:02 -04:00
|
|
|
_Q_LOCKED_VAL) == _Q_PENDING_VAL);
|
2015-11-10 16:18:56 -05:00
|
|
|
}
|
|
|
|
#else /* _Q_PENDING_BITS == 8 */
|
|
|
|
static __always_inline void set_pending(struct qspinlock *lock)
|
|
|
|
{
|
2016-04-18 01:01:27 +02:00
|
|
|
atomic_or(_Q_PENDING_VAL, &lock->val);
|
2015-11-10 16:18:56 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
static __always_inline int trylock_clear_pending(struct qspinlock *lock)
|
|
|
|
{
|
|
|
|
int val = atomic_read(&lock->val);
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
int old, new;
|
|
|
|
|
|
|
|
if (val & _Q_LOCKED_MASK)
|
|
|
|
break;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to clear pending bit & set locked bit
|
|
|
|
*/
|
|
|
|
old = val;
|
|
|
|
new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
|
2017-08-14 16:07:02 -04:00
|
|
|
val = atomic_cmpxchg_acquire(&lock->val, old, new);
|
2015-11-10 16:18:56 -05:00
|
|
|
|
|
|
|
if (val == old)
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif /* _Q_PENDING_BITS == 8 */
|
|
|
|
|
2015-04-24 14:56:37 -04:00
|
|
|
/*
|
|
|
|
* Lock and MCS node addresses hash table for fast lookup
|
|
|
|
*
|
|
|
|
* Hashing is done on a per-cacheline basis to minimize the need to access
|
|
|
|
* more than one cacheline.
|
|
|
|
*
|
|
|
|
* Dynamically allocate a hash table big enough to hold at least 4X the
|
|
|
|
* number of possible cpus in the system. Allocation is done on page
|
|
|
|
* granularity. So the minimum number of hash buckets should be at least
|
|
|
|
* 256 (64-bit) or 512 (32-bit) to fully utilize a 4k page.
|
|
|
|
*
|
|
|
|
* Since we should not be holding locks from NMI context (very rare indeed) the
|
|
|
|
* max load factor is 0.75, which is around the point where open addressing
|
|
|
|
* breaks down.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
struct pv_hash_entry {
|
|
|
|
struct qspinlock *lock;
|
|
|
|
struct pv_node *node;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define PV_HE_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_entry))
|
|
|
|
#define PV_HE_MIN (PAGE_SIZE / sizeof(struct pv_hash_entry))
|
|
|
|
|
|
|
|
static struct pv_hash_entry *pv_lock_hash;
|
|
|
|
static unsigned int pv_lock_hash_bits __read_mostly;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate memory for the PV qspinlock hash buckets
|
|
|
|
*
|
|
|
|
* This function should be called from the paravirt spinlock initialization
|
|
|
|
* routine.
|
|
|
|
*/
|
|
|
|
void __init __pv_init_lock_hash(void)
|
|
|
|
{
|
|
|
|
int pv_hash_size = ALIGN(4 * num_possible_cpus(), PV_HE_PER_LINE);
|
|
|
|
|
|
|
|
if (pv_hash_size < PV_HE_MIN)
|
|
|
|
pv_hash_size = PV_HE_MIN;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate space from bootmem which should be page-size aligned
|
|
|
|
* and hence cacheline aligned.
|
|
|
|
*/
|
|
|
|
pv_lock_hash = alloc_large_system_hash("PV qspinlock",
|
|
|
|
sizeof(struct pv_hash_entry),
|
2017-07-06 15:39:11 -07:00
|
|
|
pv_hash_size, 0,
|
|
|
|
HASH_EARLY | HASH_ZERO,
|
2015-04-24 14:56:37 -04:00
|
|
|
&pv_lock_hash_bits, NULL,
|
|
|
|
pv_hash_size, pv_hash_size);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define for_each_hash_entry(he, offset, hash) \
|
|
|
|
for (hash &= ~(PV_HE_PER_LINE - 1), he = &pv_lock_hash[hash], offset = 0; \
|
|
|
|
offset < (1 << pv_lock_hash_bits); \
|
|
|
|
offset++, he = &pv_lock_hash[(hash + offset) & ((1 << pv_lock_hash_bits) - 1)])
|
|
|
|
|
|
|
|
static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
|
|
|
|
{
|
|
|
|
unsigned long offset, hash = hash_ptr(lock, pv_lock_hash_bits);
|
|
|
|
struct pv_hash_entry *he;
|
2015-11-09 19:09:25 -05:00
|
|
|
int hopcnt = 0;
|
2015-04-24 14:56:37 -04:00
|
|
|
|
|
|
|
for_each_hash_entry(he, offset, hash) {
|
2015-11-09 19:09:25 -05:00
|
|
|
hopcnt++;
|
2015-04-24 14:56:37 -04:00
|
|
|
if (!cmpxchg(&he->lock, NULL, lock)) {
|
|
|
|
WRITE_ONCE(he->node, node);
|
2015-11-09 19:09:25 -05:00
|
|
|
qstat_hop(hopcnt);
|
2015-04-24 14:56:37 -04:00
|
|
|
return &he->lock;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* Hard assume there is a free entry for us.
|
|
|
|
*
|
|
|
|
* This is guaranteed by ensuring every blocked lock only ever consumes
|
|
|
|
* a single entry, and since we only have 4 nesting levels per CPU
|
|
|
|
* and allocated 4*nr_possible_cpus(), this must be so.
|
|
|
|
*
|
|
|
|
* The single entry is guaranteed by having the lock owner unhash
|
|
|
|
* before it releases.
|
|
|
|
*/
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct pv_node *pv_unhash(struct qspinlock *lock)
|
|
|
|
{
|
|
|
|
unsigned long offset, hash = hash_ptr(lock, pv_lock_hash_bits);
|
|
|
|
struct pv_hash_entry *he;
|
|
|
|
struct pv_node *node;
|
|
|
|
|
|
|
|
for_each_hash_entry(he, offset, hash) {
|
|
|
|
if (READ_ONCE(he->lock) == lock) {
|
|
|
|
node = READ_ONCE(he->node);
|
|
|
|
WRITE_ONCE(he->lock, NULL);
|
|
|
|
return node;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* Hard assume we'll find an entry.
|
|
|
|
*
|
|
|
|
* This guarantees a limited lookup time and is itself guaranteed by
|
|
|
|
* having the lock owner do the unhash -- IFF the unlock sees the
|
|
|
|
* SLOW flag, there MUST be a hash entry.
|
|
|
|
*/
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
2015-11-09 19:09:27 -05:00
|
|
|
/*
|
|
|
|
* Return true if when it is time to check the previous node which is not
|
|
|
|
* in a running state.
|
|
|
|
*/
|
|
|
|
static inline bool
|
|
|
|
pv_wait_early(struct pv_node *prev, int loop)
|
|
|
|
{
|
|
|
|
if ((loop & PV_PREV_CHECK_MASK) != 0)
|
|
|
|
return false;
|
|
|
|
|
2017-01-10 02:56:46 -05:00
|
|
|
return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
|
2015-11-09 19:09:27 -05:00
|
|
|
}
|
|
|
|
|
2015-04-24 14:56:37 -04:00
|
|
|
/*
|
|
|
|
* Initialize the PV part of the mcs_spinlock node.
|
|
|
|
*/
|
|
|
|
static void pv_init_node(struct mcs_spinlock *node)
|
|
|
|
{
|
|
|
|
struct pv_node *pn = (struct pv_node *)node;
|
|
|
|
|
2018-10-16 09:45:07 -04:00
|
|
|
BUILD_BUG_ON(sizeof(struct pv_node) > sizeof(struct qnode));
|
2015-04-24 14:56:37 -04:00
|
|
|
|
|
|
|
pn->cpu = smp_processor_id();
|
|
|
|
pn->state = vcpu_running;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Wait for node->locked to become true, halt the vcpu after a short spin.
|
2015-07-11 16:36:52 -04:00
|
|
|
* pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
|
|
|
|
* behalf.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2015-11-09 19:09:27 -05:00
|
|
|
static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
|
2015-04-24 14:56:37 -04:00
|
|
|
{
|
|
|
|
struct pv_node *pn = (struct pv_node *)node;
|
2015-11-09 19:09:27 -05:00
|
|
|
struct pv_node *pp = (struct pv_node *)prev;
|
2015-04-24 14:56:37 -04:00
|
|
|
int loop;
|
2015-11-09 19:09:27 -05:00
|
|
|
bool wait_early;
|
2015-04-24 14:56:37 -04:00
|
|
|
|
2016-05-31 12:53:47 -04:00
|
|
|
for (;;) {
|
2015-11-09 19:09:27 -05:00
|
|
|
for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
|
2015-04-24 14:56:37 -04:00
|
|
|
if (READ_ONCE(node->locked))
|
|
|
|
return;
|
2015-11-09 19:09:27 -05:00
|
|
|
if (pv_wait_early(pp, loop)) {
|
|
|
|
wait_early = true;
|
|
|
|
break;
|
|
|
|
}
|
2015-04-24 14:56:37 -04:00
|
|
|
cpu_relax();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Order pn->state vs pn->locked thusly:
|
|
|
|
*
|
|
|
|
* [S] pn->state = vcpu_halted [S] next->locked = 1
|
|
|
|
* MB MB
|
2015-07-11 16:36:52 -04:00
|
|
|
* [L] pn->locked [RmW] pn->state = vcpu_hashed
|
2015-04-24 14:56:37 -04:00
|
|
|
*
|
2015-07-11 16:36:52 -04:00
|
|
|
* Matches the cmpxchg() from pv_kick_node().
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2015-05-12 10:51:55 +02:00
|
|
|
smp_store_mb(pn->state, vcpu_halted);
|
2015-04-24 14:56:37 -04:00
|
|
|
|
2015-11-09 19:09:25 -05:00
|
|
|
if (!READ_ONCE(node->locked)) {
|
|
|
|
qstat_inc(qstat_pv_wait_node, true);
|
2015-11-09 19:09:27 -05:00
|
|
|
qstat_inc(qstat_pv_wait_early, wait_early);
|
2015-04-24 14:56:37 -04:00
|
|
|
pv_wait(&pn->state, vcpu_halted);
|
2015-11-09 19:09:25 -05:00
|
|
|
}
|
2015-04-24 14:56:37 -04:00
|
|
|
|
|
|
|
/*
|
2015-11-09 19:09:25 -05:00
|
|
|
* If pv_kick_node() changed us to vcpu_hashed, retain that
|
2015-11-10 16:18:56 -05:00
|
|
|
* value so that pv_wait_head_or_lock() knows to not also try
|
|
|
|
* to hash this lock.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2015-07-11 16:36:52 -04:00
|
|
|
cmpxchg(&pn->state, vcpu_halted, vcpu_running);
|
2015-04-24 14:56:37 -04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the locked flag is still not set after wakeup, it is a
|
|
|
|
* spurious wakeup and the vCPU should wait again. However,
|
|
|
|
* there is a pretty high overhead for CPU halting and kicking.
|
|
|
|
* So it is better to spin for a while in the hope that the
|
|
|
|
* MCS lock will be released soon.
|
|
|
|
*/
|
2015-11-09 19:09:25 -05:00
|
|
|
qstat_inc(qstat_pv_spurious_wakeup, !READ_ONCE(node->locked));
|
2015-04-24 14:56:37 -04:00
|
|
|
}
|
2015-07-11 16:36:52 -04:00
|
|
|
|
2015-04-24 14:56:37 -04:00
|
|
|
/*
|
|
|
|
* By now our node->locked should be 1 and our caller will not actually
|
|
|
|
* spin-wait for it. We do however rely on our caller to do a
|
|
|
|
* load-acquire for us.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-07-11 16:36:52 -04:00
|
|
|
* Called after setting next->locked = 1 when we're the lock owner.
|
|
|
|
*
|
2015-11-10 16:18:56 -05:00
|
|
|
* Instead of waking the waiters stuck in pv_wait_node() advance their state
|
|
|
|
* such that they're waiting in pv_wait_head_or_lock(), this avoids a
|
|
|
|
* wake/sleep cycle.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2015-07-11 16:36:52 -04:00
|
|
|
static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
|
2015-04-24 14:56:37 -04:00
|
|
|
{
|
|
|
|
struct pv_node *pn = (struct pv_node *)node;
|
|
|
|
|
|
|
|
/*
|
2015-07-11 16:36:52 -04:00
|
|
|
* If the vCPU is indeed halted, advance its state to match that of
|
|
|
|
* pv_wait_node(). If OTOH this fails, the vCPU was running and will
|
|
|
|
* observe its next->locked value and advance itself.
|
2015-04-24 14:56:37 -04:00
|
|
|
*
|
2015-07-11 16:36:52 -04:00
|
|
|
* Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
|
2017-08-14 16:07:02 -04:00
|
|
|
*
|
|
|
|
* The write to next->locked in arch_mcs_spin_unlock_contended()
|
|
|
|
* must be ordered before the read of pn->state in the cmpxchg()
|
|
|
|
* below for the code to work correctly. To guarantee full ordering
|
|
|
|
* irrespective of the success or failure of the cmpxchg(),
|
|
|
|
* a relaxed version with explicit barrier is used. The control
|
|
|
|
* dependency will order the reading of pn->state before any
|
|
|
|
* subsequent writes.
|
2015-07-11 16:36:52 -04:00
|
|
|
*/
|
2017-08-14 16:07:02 -04:00
|
|
|
smp_mb__before_atomic();
|
|
|
|
if (cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_hashed)
|
|
|
|
!= vcpu_halted)
|
2015-07-11 16:36:52 -04:00
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Put the lock into the hash table and set the _Q_SLOW_VAL.
|
2015-04-24 14:56:37 -04:00
|
|
|
*
|
2015-07-11 16:36:52 -04:00
|
|
|
* As this is the same vCPU that will check the _Q_SLOW_VAL value and
|
|
|
|
* the hash table later on at unlock time, no atomic instruction is
|
|
|
|
* needed.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2018-04-26 11:34:16 +01:00
|
|
|
WRITE_ONCE(lock->locked, _Q_SLOW_VAL);
|
2015-07-11 16:36:52 -04:00
|
|
|
(void)pv_hash(lock, pn);
|
2015-04-24 14:56:37 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-11-10 16:18:56 -05:00
|
|
|
* Wait for l->locked to become clear and acquire the lock;
|
|
|
|
* halt the vcpu after a short spin.
|
2015-04-24 14:56:37 -04:00
|
|
|
* __pv_queued_spin_unlock() will wake us.
|
2015-11-10 16:18:56 -05:00
|
|
|
*
|
|
|
|
* The current value of the lock will be returned for additional processing.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2015-11-10 16:18:56 -05:00
|
|
|
static u32
|
|
|
|
pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
|
2015-04-24 14:56:37 -04:00
|
|
|
{
|
|
|
|
struct pv_node *pn = (struct pv_node *)node;
|
|
|
|
struct qspinlock **lp = NULL;
|
2015-11-09 19:09:25 -05:00
|
|
|
int waitcnt = 0;
|
2015-04-24 14:56:37 -04:00
|
|
|
int loop;
|
|
|
|
|
2015-07-11 16:36:52 -04:00
|
|
|
/*
|
|
|
|
* If pv_kick_node() already advanced our state, we don't need to
|
|
|
|
* insert ourselves into the hash table anymore.
|
|
|
|
*/
|
|
|
|
if (READ_ONCE(pn->state) == vcpu_hashed)
|
|
|
|
lp = (struct qspinlock **)1;
|
|
|
|
|
2015-12-10 15:17:45 -05:00
|
|
|
/*
|
|
|
|
* Tracking # of slowpath locking operations
|
|
|
|
*/
|
2018-04-26 11:34:27 +01:00
|
|
|
qstat_inc(qstat_lock_slowpath, true);
|
2015-12-10 15:17:45 -05:00
|
|
|
|
2015-11-09 19:09:25 -05:00
|
|
|
for (;; waitcnt++) {
|
2015-11-09 19:09:27 -05:00
|
|
|
/*
|
|
|
|
* Set correct vCPU state to be used by queue node wait-early
|
|
|
|
* mechanism.
|
|
|
|
*/
|
|
|
|
WRITE_ONCE(pn->state, vcpu_running);
|
|
|
|
|
2015-11-10 16:18:56 -05:00
|
|
|
/*
|
|
|
|
* Set the pending bit in the active lock spinning loop to
|
|
|
|
* disable lock stealing before attempting to acquire the lock.
|
|
|
|
*/
|
|
|
|
set_pending(lock);
|
2015-04-24 14:56:37 -04:00
|
|
|
for (loop = SPIN_THRESHOLD; loop; loop--) {
|
2015-11-10 16:18:56 -05:00
|
|
|
if (trylock_clear_pending(lock))
|
|
|
|
goto gotlock;
|
2015-04-24 14:56:37 -04:00
|
|
|
cpu_relax();
|
|
|
|
}
|
2015-11-10 16:18:56 -05:00
|
|
|
clear_pending(lock);
|
|
|
|
|
2015-04-24 14:56:37 -04:00
|
|
|
|
|
|
|
if (!lp) { /* ONCE */
|
|
|
|
lp = pv_hash(lock, pn);
|
2015-07-11 16:36:52 -04:00
|
|
|
|
2015-04-24 14:56:37 -04:00
|
|
|
/*
|
2015-07-13 16:58:30 +01:00
|
|
|
* We must hash before setting _Q_SLOW_VAL, such that
|
|
|
|
* when we observe _Q_SLOW_VAL in __pv_queued_spin_unlock()
|
|
|
|
* we'll be sure to be able to observe our hash entry.
|
2015-04-24 14:56:37 -04:00
|
|
|
*
|
2015-07-13 16:58:30 +01:00
|
|
|
* [S] <hash> [Rmw] l->locked == _Q_SLOW_VAL
|
|
|
|
* MB RMB
|
|
|
|
* [RmW] l->locked = _Q_SLOW_VAL [L] <unhash>
|
2015-04-24 14:56:37 -04:00
|
|
|
*
|
2015-07-13 16:58:30 +01:00
|
|
|
* Matches the smp_rmb() in __pv_queued_spin_unlock().
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2018-04-26 11:34:16 +01:00
|
|
|
if (xchg(&lock->locked, _Q_SLOW_VAL) == 0) {
|
2015-04-24 14:56:37 -04:00
|
|
|
/*
|
2015-11-10 16:18:56 -05:00
|
|
|
* The lock was free and now we own the lock.
|
|
|
|
* Change the lock value back to _Q_LOCKED_VAL
|
|
|
|
* and unhash the table.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2018-04-26 11:34:16 +01:00
|
|
|
WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
|
2015-04-24 14:56:37 -04:00
|
|
|
WRITE_ONCE(*lp, NULL);
|
2015-11-10 16:18:56 -05:00
|
|
|
goto gotlock;
|
2015-04-24 14:56:37 -04:00
|
|
|
}
|
|
|
|
}
|
2016-07-14 16:15:56 +08:00
|
|
|
WRITE_ONCE(pn->state, vcpu_hashed);
|
2015-11-09 19:09:25 -05:00
|
|
|
qstat_inc(qstat_pv_wait_head, true);
|
|
|
|
qstat_inc(qstat_pv_wait_again, waitcnt);
|
2018-04-26 11:34:16 +01:00
|
|
|
pv_wait(&lock->locked, _Q_SLOW_VAL);
|
2015-04-24 14:56:37 -04:00
|
|
|
|
|
|
|
/*
|
2016-05-31 12:53:47 -04:00
|
|
|
* Because of lock stealing, the queue head vCPU may not be
|
|
|
|
* able to acquire the lock before it has to wait again.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-11-10 16:18:56 -05:00
|
|
|
* The cmpxchg() or xchg() call before coming here provides the
|
|
|
|
* acquire semantics for locking. The dummy ORing of _Q_LOCKED_VAL
|
|
|
|
* here is to indicate to the compiler that the value will always
|
|
|
|
* be nozero to enable better code optimization.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2015-11-10 16:18:56 -05:00
|
|
|
gotlock:
|
|
|
|
return (u32)(atomic_read(&lock->val) | _Q_LOCKED_VAL);
|
2015-04-24 14:56:37 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-11-09 19:09:24 -05:00
|
|
|
* PV versions of the unlock fastpath and slowpath functions to be used
|
|
|
|
* instead of queued_spin_unlock().
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2015-11-09 19:09:24 -05:00
|
|
|
__visible void
|
|
|
|
__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
|
2015-04-24 14:56:37 -04:00
|
|
|
{
|
|
|
|
struct pv_node *node;
|
|
|
|
|
2015-07-21 12:13:43 +02:00
|
|
|
if (unlikely(locked != _Q_SLOW_VAL)) {
|
|
|
|
WARN(!debug_locks_silent,
|
|
|
|
"pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
|
|
|
|
(unsigned long)lock, atomic_read(&lock->val));
|
2015-07-11 21:19:19 -04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2015-07-13 16:58:30 +01:00
|
|
|
/*
|
|
|
|
* A failed cmpxchg doesn't provide any memory-ordering guarantees,
|
|
|
|
* so we need a barrier to order the read of the node data in
|
|
|
|
* pv_unhash *after* we've read the lock being _Q_SLOW_VAL.
|
|
|
|
*
|
2015-11-10 16:18:56 -05:00
|
|
|
* Matches the cmpxchg() in pv_wait_head_or_lock() setting _Q_SLOW_VAL.
|
2015-07-13 16:58:30 +01:00
|
|
|
*/
|
|
|
|
smp_rmb();
|
|
|
|
|
2015-04-24 14:56:37 -04:00
|
|
|
/*
|
|
|
|
* Since the above failed to release, this must be the SLOW path.
|
|
|
|
* Therefore start by looking up the blocked node and unhashing it.
|
|
|
|
*/
|
|
|
|
node = pv_unhash(lock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now that we have a reference to the (likely) blocked pv_node,
|
|
|
|
* release the lock.
|
|
|
|
*/
|
2018-04-26 11:34:16 +01:00
|
|
|
smp_store_release(&lock->locked, 0);
|
2015-04-24 14:56:37 -04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* At this point the memory pointed at by lock can be freed/reused,
|
|
|
|
* however we can still use the pv_node to kick the CPU.
|
2015-07-11 16:36:52 -04:00
|
|
|
* The other vCPU may not really be halted, but kicking an active
|
|
|
|
* vCPU is harmless other than the additional latency in completing
|
|
|
|
* the unlock.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
2015-11-09 19:09:25 -05:00
|
|
|
qstat_inc(qstat_pv_kick_unlock, true);
|
2015-09-11 14:37:34 -04:00
|
|
|
pv_kick(node->cpu);
|
2015-04-24 14:56:37 -04:00
|
|
|
}
|
2015-11-09 19:09:24 -05:00
|
|
|
|
2015-04-24 14:56:37 -04:00
|
|
|
/*
|
|
|
|
* Include the architecture specific callee-save thunk of the
|
|
|
|
* __pv_queued_spin_unlock(). This thunk is put together with
|
2015-11-09 19:09:24 -05:00
|
|
|
* __pv_queued_spin_unlock() to make the callee-save thunk and the real unlock
|
|
|
|
* function close to each other sharing consecutive instruction cachelines.
|
|
|
|
* Alternatively, architecture specific version of __pv_queued_spin_unlock()
|
|
|
|
* can be defined.
|
2015-04-24 14:56:37 -04:00
|
|
|
*/
|
|
|
|
#include <asm/qspinlock_paravirt.h>
|
|
|
|
|
2015-11-09 19:09:24 -05:00
|
|
|
#ifndef __pv_queued_spin_unlock
|
|
|
|
__visible void __pv_queued_spin_unlock(struct qspinlock *lock)
|
|
|
|
{
|
|
|
|
u8 locked;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We must not unlock if SLOW, because in that case we must first
|
|
|
|
* unhash. Otherwise it would be possible to have multiple @lock
|
|
|
|
* entries, which would be BAD.
|
|
|
|
*/
|
2018-04-26 11:34:16 +01:00
|
|
|
locked = cmpxchg_release(&lock->locked, _Q_LOCKED_VAL, 0);
|
2015-11-09 19:09:24 -05:00
|
|
|
if (likely(locked == _Q_LOCKED_VAL))
|
|
|
|
return;
|
|
|
|
|
|
|
|
__pv_queued_spin_unlock_slowpath(lock, locked);
|
|
|
|
}
|
|
|
|
#endif /* __pv_queued_spin_unlock */
|