2019-04-04 13:43:16 -04:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
|
|
|
/*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License as published by
|
|
|
|
* the Free Software Foundation; either version 2 of the License, or
|
|
|
|
* (at your option) any later version.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* Authors: Waiman Long <longman@redhat.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef LOCK_EVENT
|
|
|
|
#define LOCK_EVENT(name) LOCKEVENT_ ## name,
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef CONFIG_QUEUED_SPINLOCKS
|
|
|
|
#ifdef CONFIG_PARAVIRT_SPINLOCKS
|
|
|
|
/*
|
|
|
|
* Locking events for PV qspinlock.
|
|
|
|
*/
|
|
|
|
LOCK_EVENT(pv_hash_hops) /* Average # of hops per hashing operation */
|
|
|
|
LOCK_EVENT(pv_kick_unlock) /* # of vCPU kicks issued at unlock time */
|
|
|
|
LOCK_EVENT(pv_kick_wake) /* # of vCPU kicks for pv_latency_wake */
|
|
|
|
LOCK_EVENT(pv_latency_kick) /* Average latency (ns) of vCPU kick */
|
|
|
|
LOCK_EVENT(pv_latency_wake) /* Average latency (ns) of kick-to-wakeup */
|
|
|
|
LOCK_EVENT(pv_lock_stealing) /* # of lock stealing operations */
|
|
|
|
LOCK_EVENT(pv_spurious_wakeup) /* # of spurious wakeups in non-head vCPUs */
|
|
|
|
LOCK_EVENT(pv_wait_again) /* # of wait's after queue head vCPU kick */
|
|
|
|
LOCK_EVENT(pv_wait_early) /* # of early vCPU wait's */
|
|
|
|
LOCK_EVENT(pv_wait_head) /* # of vCPU wait's at the queue head */
|
|
|
|
LOCK_EVENT(pv_wait_node) /* # of vCPU wait's at non-head queue node */
|
|
|
|
#endif /* CONFIG_PARAVIRT_SPINLOCKS */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Locking events for qspinlock
|
|
|
|
*
|
|
|
|
* Subtracting lock_use_node[234] from lock_slowpath will give you
|
|
|
|
* lock_use_node1.
|
|
|
|
*/
|
|
|
|
LOCK_EVENT(lock_pending) /* # of locking ops via pending code */
|
|
|
|
LOCK_EVENT(lock_slowpath) /* # of locking ops via MCS lock queue */
|
|
|
|
LOCK_EVENT(lock_use_node2) /* # of locking ops that use 2nd percpu node */
|
|
|
|
LOCK_EVENT(lock_use_node3) /* # of locking ops that use 3rd percpu node */
|
|
|
|
LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */
|
|
|
|
LOCK_EVENT(lock_no_node) /* # of locking ops w/o using percpu node */
|
|
|
|
#endif /* CONFIG_QUEUED_SPINLOCKS */
|
2019-04-04 13:43:19 -04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Locking events for rwsem
|
|
|
|
*/
|
|
|
|
LOCK_EVENT(rwsem_sleep_reader) /* # of reader sleeps */
|
|
|
|
LOCK_EVENT(rwsem_sleep_writer) /* # of writer sleeps */
|
|
|
|
LOCK_EVENT(rwsem_wake_reader) /* # of reader wakeups */
|
|
|
|
LOCK_EVENT(rwsem_wake_writer) /* # of writer wakeups */
|
locking/rwsem: Adaptive disabling of reader optimistic spinning
Reader optimistic spinning is helpful when the reader critical section
is short and there aren't that many readers around. It makes readers
relatively more preferred than writers. When a writer times out spinning
on a reader-owned lock and set the nospinnable bits, there are two main
reasons for that.
1) The reader critical section is long, perhaps the task sleeps after
acquiring the read lock.
2) There are just too many readers contending the lock causing it to
take a while to service all of them.
In the former case, long reader critical section will impede the progress
of writers which is usually more important for system performance.
In the later case, reader optimistic spinning tends to make the reader
groups that contain readers that acquire the lock together smaller
leading to more of them. That may hurt performance in some cases. In
other words, the setting of nonspinnable bits indicates that reader
optimistic spinning may not be helpful for those workloads that cause it.
Therefore, any writers that have observed the setting of the writer
nonspinnable bit for a given rwsem after they fail to acquire the lock
via optimistic spinning will set the reader nonspinnable bit once they
acquire the write lock. Similarly, readers that observe the setting
of reader nonspinnable bit at slowpath entry will also set the reader
nonspinnable bit when they acquire the read lock via the wakeup path.
Once the reader nonspinnable bit is on, it will only be reset when
a writer is able to acquire the rwsem in the fast path or somehow a
reader or writer in the slowpath doesn't observe the nonspinable bit.
This is to discourage reader optmistic spinning on that particular
rwsem and make writers more preferred. This adaptive disabling of reader
optimistic spinning will alleviate some of the negative side effect of
this feature.
In addition, this patch tries to make readers in the spinning queue
follow the phase-fair principle after quitting optimistic spinning
by checking if another reader has somehow acquired a read lock after
this reader enters the optimistic spinning queue. If so and the rwsem
is still reader-owned, this reader is in the right read-phase and can
attempt to acquire the lock.
On a 2-socket 40-core 80-thread Skylake system, the page_fault1 test of
the will-it-scale benchmark was run with various number of threads. The
number of operations done before reader optimistic spinning patches,
this patch and after this patch were:
Threads Before rspin Before patch After patch %change
------- ------------ ------------ ----------- -------
20 5541068 5345484 5455667 -3.5%/ +2.1%
40 10185150 7292313 9219276 -28.5%/+26.4%
60 8196733 6460517 7181209 -21.2%/+11.2%
80 9508864 6739559 8107025 -29.1%/+20.3%
This patch doesn't recover all the lost performance, but it is more
than half. Given the fact that reader optimistic spinning does benefit
some workloads, this is a good compromise.
Using the rwsem locking microbenchmark with very short critical section,
this patch doesn't have too much impact on locking performance as shown
by the locking rates (kops/s) below with equal numbers of readers and
writers before and after this patch:
# of Threads Pre-patch Post-patch
------------ --------- ----------
2 4,730 4,969
4 4,814 4,786
8 4,866 4,815
16 4,715 4,511
32 3,338 3,500
64 3,212 3,389
80 3,110 3,044
When running the locking microbenchmark with 40 dedicated reader and writer
threads, however, the reader performance is curtailed to favor the writer.
Before patch:
40 readers, Iterations Min/Mean/Max = 204,026/234,309/254,816
40 writers, Iterations Min/Mean/Max = 88,515/95,884/115,644
After patch:
40 readers, Iterations Min/Mean/Max = 33,813/35,260/36,791
40 writers, Iterations Min/Mean/Max = 95,368/96,565/97,798
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-16-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-20 16:59:14 -04:00
|
|
|
LOCK_EVENT(rwsem_opt_rlock) /* # of opt-acquired read locks */
|
|
|
|
LOCK_EVENT(rwsem_opt_wlock) /* # of opt-acquired write locks */
|
|
|
|
LOCK_EVENT(rwsem_opt_fail) /* # of failed optspins */
|
|
|
|
LOCK_EVENT(rwsem_opt_nospin) /* # of disabled optspins */
|
|
|
|
LOCK_EVENT(rwsem_opt_norspin) /* # of disabled reader-only optspins */
|
|
|
|
LOCK_EVENT(rwsem_opt_rlock2) /* # of opt-acquired 2ndary read locks */
|
2019-04-04 13:43:19 -04:00
|
|
|
LOCK_EVENT(rwsem_rlock) /* # of read locks acquired */
|
2020-11-20 23:14:14 -05:00
|
|
|
LOCK_EVENT(rwsem_rlock_steal) /* # of read locks by lock stealing */
|
2019-04-04 13:43:19 -04:00
|
|
|
LOCK_EVENT(rwsem_rlock_fast) /* # of fast read locks acquired */
|
|
|
|
LOCK_EVENT(rwsem_rlock_fail) /* # of failed read lock acquisitions */
|
locking/rwsem: Implement lock handoff to prevent lock starvation
Because of writer lock stealing, it is possible that a constant
stream of incoming writers will cause a waiting writer or reader to
wait indefinitely leading to lock starvation.
This patch implements a lock handoff mechanism to disable lock stealing
and force lock handoff to the first waiter or waiters (for readers)
in the queue after at least a 4ms waiting period unless it is a RT
writer task which doesn't need to wait. The waiting period is used to
avoid discouraging lock stealing too much to affect performance.
The setting and clearing of the handoff bit is serialized by the
wait_lock. So racing is not possible.
A rwsem microbenchmark was run for 5 seconds on a 2-socket 40-core
80-thread Skylake system with a v5.1 based kernel and 240 write_lock
threads with 5us sleep critical section.
Before the patch, the min/mean/max numbers of locking operations for
the locking threads were 1/7,792/173,696. After the patch, the figures
became 5,842/6,542/7,458. It can be seen that the rwsem became much
more fair, though there was a drop of about 16% in the mean locking
operations done which was a tradeoff of having better fairness.
Making the waiter set the handoff bit right after the first wakeup can
impact performance especially with a mixed reader/writer workload. With
the same microbenchmark with short critical section and equal number of
reader and writer threads (40/40), the reader/writer locking operation
counts with the current patch were:
40 readers, Iterations Min/Mean/Max = 1,793/1,794/1,796
40 writers, Iterations Min/Mean/Max = 1,793/34,956/86,081
By making waiter set handoff bit immediately after wakeup:
40 readers, Iterations Min/Mean/Max = 43/44/46
40 writers, Iterations Min/Mean/Max = 43/1,263/3,191
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-8-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-20 16:59:06 -04:00
|
|
|
LOCK_EVENT(rwsem_rlock_handoff) /* # of read lock handoffs */
|
2019-04-04 13:43:19 -04:00
|
|
|
LOCK_EVENT(rwsem_wlock) /* # of write locks acquired */
|
|
|
|
LOCK_EVENT(rwsem_wlock_fail) /* # of failed write lock acquisitions */
|
locking/rwsem: Implement lock handoff to prevent lock starvation
Because of writer lock stealing, it is possible that a constant
stream of incoming writers will cause a waiting writer or reader to
wait indefinitely leading to lock starvation.
This patch implements a lock handoff mechanism to disable lock stealing
and force lock handoff to the first waiter or waiters (for readers)
in the queue after at least a 4ms waiting period unless it is a RT
writer task which doesn't need to wait. The waiting period is used to
avoid discouraging lock stealing too much to affect performance.
The setting and clearing of the handoff bit is serialized by the
wait_lock. So racing is not possible.
A rwsem microbenchmark was run for 5 seconds on a 2-socket 40-core
80-thread Skylake system with a v5.1 based kernel and 240 write_lock
threads with 5us sleep critical section.
Before the patch, the min/mean/max numbers of locking operations for
the locking threads were 1/7,792/173,696. After the patch, the figures
became 5,842/6,542/7,458. It can be seen that the rwsem became much
more fair, though there was a drop of about 16% in the mean locking
operations done which was a tradeoff of having better fairness.
Making the waiter set the handoff bit right after the first wakeup can
impact performance especially with a mixed reader/writer workload. With
the same microbenchmark with short critical section and equal number of
reader and writer threads (40/40), the reader/writer locking operation
counts with the current patch were:
40 readers, Iterations Min/Mean/Max = 1,793/1,794/1,796
40 writers, Iterations Min/Mean/Max = 1,793/34,956/86,081
By making waiter set handoff bit immediately after wakeup:
40 readers, Iterations Min/Mean/Max = 43/44/46
40 writers, Iterations Min/Mean/Max = 43/1,263/3,191
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Link: https://lkml.kernel.org/r/20190520205918.22251-8-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-20 16:59:06 -04:00
|
|
|
LOCK_EVENT(rwsem_wlock_handoff) /* # of write lock handoffs */
|