docs/mm: document latest changes to vm_lock

Change the documentation to reflect that vm_lock is integrated into vma
and replaced with vm_refcnt.

Document newly introduced vma_start_read_locked{_nested} functions.

Link: https://lkml.kernel.org/r/20250111042604.3230628-18-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mattew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Suren Baghdasaryan 2025-01-10 20:26:04 -08:00 committed by Andrew Morton
parent be96d5653b
commit 76532c193e

View File

@ -716,9 +716,14 @@ calls :c:func:`!rcu_read_lock` to ensure that the VMA is looked up in an RCU
critical section, then attempts to VMA lock it via :c:func:`!vma_start_read`,
before releasing the RCU lock via :c:func:`!rcu_read_unlock`.
VMA read locks hold the read lock on the :c:member:`!vma->vm_lock` semaphore for
their duration and the caller of :c:func:`!lock_vma_under_rcu` must release it
via :c:func:`!vma_end_read`.
In cases when the user already holds mmap read lock, :c:func:`!vma_start_read_locked`
and :c:func:`!vma_start_read_locked_nested` can be used. These functions do not
fail due to lock contention but the caller should still check their return values
in case they fail for other reasons.
VMA read locks increment :c:member:`!vma.vm_refcnt` reference counter for their
duration and the caller of :c:func:`!lock_vma_under_rcu` must drop it via
:c:func:`!vma_end_read`.
VMA **write** locks are acquired via :c:func:`!vma_start_write` in instances where a
VMA is about to be modified, unlike :c:func:`!vma_start_read` the lock is always
@ -726,9 +731,9 @@ acquired. An mmap write lock **must** be held for the duration of the VMA write
lock, releasing or downgrading the mmap write lock also releases the VMA write
lock so there is no :c:func:`!vma_end_write` function.
Note that a semaphore write lock is not held across a VMA lock. Rather, a
sequence number is used for serialisation, and the write semaphore is only
acquired at the point of write lock to update this.
Note that when write-locking a VMA lock, the :c:member:`!vma.vm_refcnt` is temporarily
modified so that readers can detect the presense of a writer. The reference counter is
restored once the vma sequence number used for serialisation is updated.
This ensures the semantics we require - VMA write locks provide exclusive write
access to the VMA.
@ -738,7 +743,7 @@ Implementation details
The VMA lock mechanism is designed to be a lightweight means of avoiding the use
of the heavily contended mmap lock. It is implemented using a combination of a
read/write semaphore and sequence numbers belonging to the containing
reference counter and sequence numbers belonging to the containing
:c:struct:`!struct mm_struct` and the VMA.
Read locks are acquired via :c:func:`!vma_start_read`, which is an optimistic
@ -779,28 +784,31 @@ release of any VMA locks on its release makes sense, as you would never want to
keep VMAs locked across entirely separate write operations. It also maintains
correct lock ordering.
Each time a VMA read lock is acquired, we acquire a read lock on the
:c:member:`!vma->vm_lock` read/write semaphore and hold it, while checking that
the sequence count of the VMA does not match that of the mm.
Each time a VMA read lock is acquired, we increment :c:member:`!vma.vm_refcnt`
reference counter and check that the sequence count of the VMA does not match
that of the mm.
If it does, the read lock fails. If it does not, we hold the lock, excluding
writers, but permitting other readers, who will also obtain this lock under RCU.
If it does, the read lock fails and :c:member:`!vma.vm_refcnt` is dropped.
If it does not, we keep the reference counter raised, excluding writers, but
permitting other readers, who can also obtain this lock under RCU.
Importantly, maple tree operations performed in :c:func:`!lock_vma_under_rcu`
are also RCU safe, so the whole read lock operation is guaranteed to function
correctly.
On the write side, we acquire a write lock on the :c:member:`!vma->vm_lock`
read/write semaphore, before setting the VMA's sequence number under this lock,
also simultaneously holding the mmap write lock.
On the write side, we set a bit in :c:member:`!vma.vm_refcnt` which can't be
modified by readers and wait for all readers to drop their reference count.
Once there are no readers, VMA's sequence number is set to match that of the
mm. During this entire operation mmap write lock is held.
This way, if any read locks are in effect, :c:func:`!vma_start_write` will sleep
until these are finished and mutual exclusion is achieved.
After setting the VMA's sequence number, the lock is released, avoiding
complexity with a long-term held write lock.
After setting the VMA's sequence number, the bit in :c:member:`!vma.vm_refcnt`
indicating a writer is cleared. From this point on, VMA's sequence number will
indicate VMA's write-locked state until mmap write lock is dropped or downgraded.
This clever combination of a read/write semaphore and sequence count allows for
This clever combination of a reference counter and sequence count allows for
fast RCU-based per-VMA lock acquisition (especially on page fault, though
utilised elsewhere) with minimal complexity around lock ordering.