License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2005-04-16 15:20:36 -07:00
|
|
|
/*
|
|
|
|
* High memory handling common code and variables.
|
|
|
|
*
|
|
|
|
* (C) 1999 Andrea Arcangeli, SuSE GmbH, andrea@suse.de
|
|
|
|
* Gerhard Wichert, Siemens AG, Gerhard.Wichert@pdb.siemens.de
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Redesigned the x86 32-bit VM architecture to deal with
|
|
|
|
* 64-bit physical space. With current x86 CPUs this
|
|
|
|
* means up to 64 Gigabytes physical RAM.
|
|
|
|
*
|
|
|
|
* Rewrote high memory support to move the page cache into
|
|
|
|
* high memory. Implemented permanent (schedulable) kmaps
|
|
|
|
* based on Linus' idea.
|
|
|
|
*
|
|
|
|
* Copyright (C) 1999 Ingo Molnar <mingo@redhat.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/mm.h>
|
2011-10-16 02:01:52 -04:00
|
|
|
#include <linux/export.h>
|
2005-04-16 15:20:36 -07:00
|
|
|
#include <linux/swap.h>
|
|
|
|
#include <linux/bio.h>
|
|
|
|
#include <linux/pagemap.h>
|
|
|
|
#include <linux/mempool.h>
|
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/hash.h>
|
|
|
|
#include <linux/highmem.h>
|
2010-08-05 09:22:24 -05:00
|
|
|
#include <linux/kgdb.h>
|
2005-04-16 15:20:36 -07:00
|
|
|
#include <asm/tlbflush.h>
|
2019-11-29 08:17:25 +01:00
|
|
|
#include <linux/vmalloc.h>
|
2010-10-27 15:32:57 -07:00
|
|
|
|
2022-10-05 21:05:55 -07:00
|
|
|
#ifdef CONFIG_KMAP_LOCAL
|
|
|
|
static inline int kmap_local_calc_idx(int idx)
|
|
|
|
{
|
|
|
|
return idx + KM_MAX_IDX * smp_processor_id();
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifndef arch_kmap_local_map_idx
|
|
|
|
#define arch_kmap_local_map_idx(idx, pfn) kmap_local_calc_idx(idx)
|
|
|
|
#endif
|
|
|
|
#endif /* CONFIG_KMAP_LOCAL */
|
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
/*
|
|
|
|
* Virtual_count is not a pure "count".
|
|
|
|
* 0 means that it is not mapped, and has not been mapped
|
|
|
|
* since a TLB flush - it is usable.
|
|
|
|
* 1 means that there are no users, but it has been mapped
|
|
|
|
* since the last TLB flush - so we can't use it.
|
|
|
|
* n means that there are (n-1) current users of it.
|
|
|
|
*/
|
|
|
|
#ifdef CONFIG_HIGHMEM
|
2005-10-21 03:22:44 -04:00
|
|
|
|
2014-08-06 16:08:23 -07:00
|
|
|
/*
|
|
|
|
* Architecture with aliasing data cache may define the following family of
|
|
|
|
* helper functions in its asm/highmem.h to control cache color of virtual
|
|
|
|
* addresses where physical memory pages are mapped by kmap.
|
|
|
|
*/
|
|
|
|
#ifndef get_pkmap_color
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine color of virtual address where the page should be mapped.
|
|
|
|
*/
|
|
|
|
static inline unsigned int get_pkmap_color(struct page *page)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#define get_pkmap_color get_pkmap_color
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get next index for mapping inside PKMAP region for page with given color.
|
|
|
|
*/
|
|
|
|
static inline unsigned int get_next_pkmap_nr(unsigned int color)
|
|
|
|
{
|
|
|
|
static unsigned int last_pkmap_nr;
|
|
|
|
|
|
|
|
last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
|
|
|
|
return last_pkmap_nr;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine if page index inside PKMAP region (pkmap_nr) of given color
|
|
|
|
* has wrapped around PKMAP region end. When this happens an attempt to
|
|
|
|
* flush all unused PKMAP slots is made.
|
|
|
|
*/
|
|
|
|
static inline int no_more_pkmaps(unsigned int pkmap_nr, unsigned int color)
|
|
|
|
{
|
|
|
|
return pkmap_nr == 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get the number of PKMAP entries of the given color. If no free slot is
|
|
|
|
* found after checking that many entries, kmap will sleep waiting for
|
|
|
|
* someone to call kunmap and free PKMAP slot.
|
|
|
|
*/
|
|
|
|
static inline int get_pkmap_entries_count(unsigned int color)
|
|
|
|
{
|
|
|
|
return LAST_PKMAP;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get head of a wait queue for PKMAP entries of the given color.
|
|
|
|
* Wait queues for different mapping colors should be independent to avoid
|
|
|
|
* unnecessary wakeups caused by freeing of slots of other colors.
|
|
|
|
*/
|
|
|
|
static inline wait_queue_head_t *get_pkmap_wait_queue_head(unsigned int color)
|
|
|
|
{
|
|
|
|
static DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait);
|
|
|
|
|
|
|
|
return &pkmap_map_wait;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2024-06-07 10:37:11 +02:00
|
|
|
unsigned long __nr_free_highpages(void)
|
2006-09-25 23:31:11 -07:00
|
|
|
{
|
2024-06-07 10:37:11 +02:00
|
|
|
unsigned long pages = 0;
|
2016-05-19 17:12:26 -07:00
|
|
|
struct zone *zone;
|
2006-09-25 23:31:11 -07:00
|
|
|
|
2016-05-19 17:12:26 -07:00
|
|
|
for_each_populated_zone(zone) {
|
|
|
|
if (is_highmem(zone))
|
|
|
|
pages += zone_page_state(zone, NR_FREE_PAGES);
|
2007-07-17 04:03:12 -07:00
|
|
|
}
|
2006-09-25 23:31:11 -07:00
|
|
|
|
|
|
|
return pages;
|
|
|
|
}
|
|
|
|
|
mm/highmem: reimplement totalhigh_pages() by walking zones
Patch series "mm/highmem: don't track highmem pages manually".
Let's remove highmem special-casing from adjust_managed_page_count(), to
result in less confusion why memblock manually adjusts totalram_pages, and
__free_pages_core() only adjusts the zone's managed pages -- what about
the highmem pages that adjust_managed_page_count() updates?
Now, we only maintain totalram_pages and a zone's managed pages
independent of highmem support. We can derive the number of highmem pages
simply by looking at the relevant zone's managed pages. I don't think
there is any particular fast path that needs a maximum-efficient
totalhigh_pages() implementation.
Note that highmem memory is currently initialized using
free_highmem_page()->free_reserved_page(), not __free_pages_core(). In
the future we might want to also use __free_pages_core() to initialize
highmem memory, to make that less special, and consider moving
totalram_pages updates into __free_pages_core() [1], so we can just use
adjust_managed_page_count() in there as well.
Booting a simple kernel in QEMU reveals no highmem accounting change:
Before:
Memory: 3095448K/3145208K available (14802K kernel code, 2073K rwdata,
5000K rodata, 740K init, 556K bss, 49760K reserved, 0K cma-reserved,
2244488K highmem)
After:
Memory: 3095276K/3145208K available (14802K kernel code, 2073K rwdata,
5000K rodata, 740K init, 556K bss, 49932K reserved, 0K cma-reserved,
2244488K highmem)
[1] https://lkml.kernel.org/r/20240601133402.2675-1-richard.weiyang@gmail.com
This patch (of 2):
Can we get rid of the highmem ifdef in adjust_managed_page_count()?
Likely yes: we don't have that many totalhigh_pages() users, and they all
don't seem to be very performance critical.
So let's implement totalhigh_pages() like nr_free_highpages(), collecting
information from all zones. This is now similar to what we do in
si_meminfo_node() to collect the per-node highmem page count.
In the common case (single node, 3-4 zones), we really shouldn't care. We
could optimize a bit further (only walk ZONE_HIGHMEM and ZONE_MOVABLE if
required), but there doesn't seem a real need for that.
[david@redhat.com: fix build bot complaint]
Link: https://lkml.kernel.org/r/b57e5bc4-eb72-40e3-add4-57dfa6e03df6@redhat.com
Link: https://lkml.kernel.org/r/20240607083711.62833-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240607083711.62833-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-07 10:37:10 +02:00
|
|
|
unsigned long __totalhigh_pages(void)
|
|
|
|
{
|
|
|
|
unsigned long pages = 0;
|
|
|
|
struct zone *zone;
|
|
|
|
|
|
|
|
for_each_populated_zone(zone) {
|
|
|
|
if (is_highmem(zone))
|
|
|
|
pages += zone_managed_pages(zone);
|
|
|
|
}
|
|
|
|
|
|
|
|
return pages;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(__totalhigh_pages);
|
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
static int pkmap_count[LAST_PKMAP];
|
|
|
|
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(kmap_lock);
|
|
|
|
|
2021-05-04 18:40:09 -07:00
|
|
|
pte_t *pkmap_page_table;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
/*
|
|
|
|
* Most architectures have no use for kmap_high_get(), so let's abstract
|
|
|
|
* the disabling of IRQ out of the locking in that case to save on a
|
|
|
|
* potential useless overhead.
|
|
|
|
*/
|
|
|
|
#ifdef ARCH_NEEDS_KMAP_HIGH_GET
|
|
|
|
#define lock_kmap() spin_lock_irq(&kmap_lock)
|
|
|
|
#define unlock_kmap() spin_unlock_irq(&kmap_lock)
|
|
|
|
#define lock_kmap_any(flags) spin_lock_irqsave(&kmap_lock, flags)
|
|
|
|
#define unlock_kmap_any(flags) spin_unlock_irqrestore(&kmap_lock, flags)
|
|
|
|
#else
|
|
|
|
#define lock_kmap() spin_lock(&kmap_lock)
|
|
|
|
#define unlock_kmap() spin_unlock(&kmap_lock)
|
|
|
|
#define lock_kmap_any(flags) \
|
|
|
|
do { spin_lock(&kmap_lock); (void)(flags); } while (0)
|
|
|
|
#define unlock_kmap_any(flags) \
|
|
|
|
do { spin_unlock(&kmap_lock); (void)(flags); } while (0)
|
|
|
|
#endif
|
|
|
|
|
2020-11-03 10:27:34 +01:00
|
|
|
struct page *__kmap_to_page(void *vaddr)
|
2012-07-31 16:45:02 -07:00
|
|
|
{
|
2022-10-05 21:05:55 -07:00
|
|
|
unsigned long base = (unsigned long) vaddr & PAGE_MASK;
|
|
|
|
struct kmap_ctrl *kctrl = ¤t->kmap_ctrl;
|
2012-07-31 16:45:02 -07:00
|
|
|
unsigned long addr = (unsigned long)vaddr;
|
2022-10-05 21:05:55 -07:00
|
|
|
int i;
|
|
|
|
|
|
|
|
/* kmap() mappings */
|
|
|
|
if (WARN_ON_ONCE(addr >= PKMAP_ADDR(0) &&
|
|
|
|
addr < PKMAP_ADDR(LAST_PKMAP)))
|
mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper. This means that by default, the accesses change from a
C dereference to a READ_ONCE(). This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.
But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte. Arch code
is deliberately not converted, as the arch code knows best. It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
Conversion was done using Coccinelle:
----
// $ make coccicheck \
// COCCI=ptepget.cocci \
// SPFLAGS="--include-headers" \
// MODE=patch
virtual patch
@ depends on patch @
pte_t *v;
@@
- *v
+ ptep_get(v)
----
Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so. This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.
Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot. The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get(). HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined. Fix by continuing to do a direct dereference
when MMU=n. This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.
Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 16:15:45 +01:00
|
|
|
return pte_page(ptep_get(&pkmap_page_table[PKMAP_NR(addr)]));
|
2012-07-31 16:45:02 -07:00
|
|
|
|
2022-10-05 21:05:55 -07:00
|
|
|
/* kmap_local_page() mappings */
|
|
|
|
if (WARN_ON_ONCE(base >= __fix_to_virt(FIX_KMAP_END) &&
|
|
|
|
base < __fix_to_virt(FIX_KMAP_BEGIN))) {
|
|
|
|
for (i = 0; i < kctrl->idx; i++) {
|
|
|
|
unsigned long base_addr;
|
|
|
|
int idx;
|
2021-05-04 18:40:09 -07:00
|
|
|
|
2022-10-05 21:05:55 -07:00
|
|
|
idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
|
|
|
|
base_addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
|
|
|
|
|
|
|
|
if (base_addr == base)
|
|
|
|
return pte_page(kctrl->pteval[i]);
|
|
|
|
}
|
2012-07-31 16:45:02 -07:00
|
|
|
}
|
|
|
|
|
2022-06-30 10:41:21 +02:00
|
|
|
return virt_to_page(vaddr);
|
2012-07-31 16:45:02 -07:00
|
|
|
}
|
2020-11-03 10:27:34 +01:00
|
|
|
EXPORT_SYMBOL(__kmap_to_page);
|
2012-07-31 16:45:02 -07:00
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
static void flush_all_zero_pkmaps(void)
|
|
|
|
{
|
|
|
|
int i;
|
2008-08-01 03:15:21 +02:00
|
|
|
int need_flush = 0;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
flush_cache_kmaps();
|
|
|
|
|
|
|
|
for (i = 0; i < LAST_PKMAP; i++) {
|
|
|
|
struct page *page;
|
mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper. This means that by default, the accesses change from a
C dereference to a READ_ONCE(). This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.
But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte. Arch code
is deliberately not converted, as the arch code knows best. It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
Conversion was done using Coccinelle:
----
// $ make coccicheck \
// COCCI=ptepget.cocci \
// SPFLAGS="--include-headers" \
// MODE=patch
virtual patch
@ depends on patch @
pte_t *v;
@@
- *v
+ ptep_get(v)
----
Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so. This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.
Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot. The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get(). HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined. Fix by continuing to do a direct dereference
when MMU=n. This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.
Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 16:15:45 +01:00
|
|
|
pte_t ptent;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* zero means we don't have anything to do,
|
|
|
|
* >1 means that it is still in use. Only
|
|
|
|
* a count of 1 means that it is free but
|
|
|
|
* needs to be unmapped
|
|
|
|
*/
|
|
|
|
if (pkmap_count[i] != 1)
|
|
|
|
continue;
|
|
|
|
pkmap_count[i] = 0;
|
|
|
|
|
|
|
|
/* sanity check */
|
mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper. This means that by default, the accesses change from a
C dereference to a READ_ONCE(). This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.
But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte. Arch code
is deliberately not converted, as the arch code knows best. It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
Conversion was done using Coccinelle:
----
// $ make coccicheck \
// COCCI=ptepget.cocci \
// SPFLAGS="--include-headers" \
// MODE=patch
virtual patch
@ depends on patch @
pte_t *v;
@@
- *v
+ ptep_get(v)
----
Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so. This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.
Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot. The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get(). HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined. Fix by continuing to do a direct dereference
when MMU=n. This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.
Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 16:15:45 +01:00
|
|
|
ptent = ptep_get(&pkmap_page_table[i]);
|
|
|
|
BUG_ON(pte_none(ptent));
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Don't need an atomic fetch-and-clear op here;
|
|
|
|
* no-one has the page mapped, and cannot get at
|
|
|
|
* its virtual address (and hence PTE) without first
|
|
|
|
* getting the kmap_lock (which is held here).
|
|
|
|
* So no dangers, even with speculative execution.
|
|
|
|
*/
|
mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper. This means that by default, the accesses change from a
C dereference to a READ_ONCE(). This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.
But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte. Arch code
is deliberately not converted, as the arch code knows best. It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
Conversion was done using Coccinelle:
----
// $ make coccicheck \
// COCCI=ptepget.cocci \
// SPFLAGS="--include-headers" \
// MODE=patch
virtual patch
@ depends on patch @
pte_t *v;
@@
- *v
+ ptep_get(v)
----
Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so. This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.
Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot. The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get(). HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined. Fix by continuing to do a direct dereference
when MMU=n. This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.
Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 16:15:45 +01:00
|
|
|
page = pte_page(ptent);
|
2012-12-11 16:01:24 -08:00
|
|
|
pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
set_page_address(page, NULL);
|
2008-08-01 03:15:21 +02:00
|
|
|
need_flush = 1;
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
2008-08-01 03:15:21 +02:00
|
|
|
if (need_flush)
|
|
|
|
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
2020-11-03 10:27:34 +01:00
|
|
|
void __kmap_flush_unused(void)
|
2007-05-02 19:27:15 +02:00
|
|
|
{
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
lock_kmap();
|
2007-05-02 19:27:15 +02:00
|
|
|
flush_all_zero_pkmaps();
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
unlock_kmap();
|
2007-05-02 19:27:15 +02:00
|
|
|
}
|
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
static inline unsigned long map_new_virtual(struct page *page)
|
|
|
|
{
|
|
|
|
unsigned long vaddr;
|
|
|
|
int count;
|
2014-08-06 16:08:23 -07:00
|
|
|
unsigned int last_pkmap_nr;
|
|
|
|
unsigned int color = get_pkmap_color(page);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
start:
|
2014-08-06 16:08:23 -07:00
|
|
|
count = get_pkmap_entries_count(color);
|
2005-04-16 15:20:36 -07:00
|
|
|
/* Find an empty entry */
|
|
|
|
for (;;) {
|
2014-08-06 16:08:23 -07:00
|
|
|
last_pkmap_nr = get_next_pkmap_nr(color);
|
|
|
|
if (no_more_pkmaps(last_pkmap_nr, color)) {
|
2005-04-16 15:20:36 -07:00
|
|
|
flush_all_zero_pkmaps();
|
2014-08-06 16:08:23 -07:00
|
|
|
count = get_pkmap_entries_count(color);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
if (!pkmap_count[last_pkmap_nr])
|
|
|
|
break; /* Found a usable entry */
|
|
|
|
if (--count)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Sleep for somebody else to unmap their entries
|
|
|
|
*/
|
|
|
|
{
|
|
|
|
DECLARE_WAITQUEUE(wait, current);
|
2014-08-06 16:08:23 -07:00
|
|
|
wait_queue_head_t *pkmap_map_wait =
|
|
|
|
get_pkmap_wait_queue_head(color);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
__set_current_state(TASK_UNINTERRUPTIBLE);
|
2014-08-06 16:08:23 -07:00
|
|
|
add_wait_queue(pkmap_map_wait, &wait);
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
unlock_kmap();
|
2005-04-16 15:20:36 -07:00
|
|
|
schedule();
|
2014-08-06 16:08:23 -07:00
|
|
|
remove_wait_queue(pkmap_map_wait, &wait);
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
lock_kmap();
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/* Somebody else might have mapped it while we slept */
|
|
|
|
if (page_address(page))
|
|
|
|
return (unsigned long)page_address(page);
|
|
|
|
|
|
|
|
/* Re-start */
|
|
|
|
goto start;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
vaddr = PKMAP_ADDR(last_pkmap_nr);
|
|
|
|
set_pte_at(&init_mm, vaddr,
|
|
|
|
&(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
|
|
|
|
|
|
|
|
pkmap_count[last_pkmap_nr] = 1;
|
|
|
|
set_page_address(page, (void *)vaddr);
|
|
|
|
|
|
|
|
return vaddr;
|
|
|
|
}
|
|
|
|
|
2008-03-19 17:00:42 -07:00
|
|
|
/**
|
|
|
|
* kmap_high - map a highmem page into memory
|
|
|
|
* @page: &struct page to map
|
|
|
|
*
|
|
|
|
* Returns the page's virtual memory address.
|
|
|
|
*
|
|
|
|
* We cannot call this from interrupts, as it may block.
|
|
|
|
*/
|
2008-02-04 22:29:26 -08:00
|
|
|
void *kmap_high(struct page *page)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
|
|
|
unsigned long vaddr;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For highmem pages, we can't trust "virtual" until
|
|
|
|
* after we have the lock.
|
|
|
|
*/
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
lock_kmap();
|
2005-04-16 15:20:36 -07:00
|
|
|
vaddr = (unsigned long)page_address(page);
|
|
|
|
if (!vaddr)
|
|
|
|
vaddr = map_new_virtual(page);
|
|
|
|
pkmap_count[PKMAP_NR(vaddr)]++;
|
2006-04-02 13:47:35 +02:00
|
|
|
BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 2);
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
unlock_kmap();
|
2021-05-04 18:40:09 -07:00
|
|
|
return (void *) vaddr;
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(kmap_high);
|
|
|
|
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
#ifdef ARCH_NEEDS_KMAP_HIGH_GET
|
|
|
|
/**
|
|
|
|
* kmap_high_get - pin a highmem page into memory
|
|
|
|
* @page: &struct page to pin
|
|
|
|
*
|
|
|
|
* Returns the page's current virtual memory address, or NULL if no mapping
|
2010-01-25 21:38:09 +01:00
|
|
|
* exists. If and only if a non null address is returned then a
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
* matching call to kunmap_high() is necessary.
|
|
|
|
*
|
|
|
|
* This can be called from any context.
|
|
|
|
*/
|
|
|
|
void *kmap_high_get(struct page *page)
|
|
|
|
{
|
|
|
|
unsigned long vaddr, flags;
|
|
|
|
|
|
|
|
lock_kmap_any(flags);
|
|
|
|
vaddr = (unsigned long)page_address(page);
|
|
|
|
if (vaddr) {
|
|
|
|
BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
|
|
|
|
pkmap_count[PKMAP_NR(vaddr)]++;
|
|
|
|
}
|
|
|
|
unlock_kmap_any(flags);
|
2021-05-04 18:40:09 -07:00
|
|
|
return (void *) vaddr;
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2008-03-19 17:00:42 -07:00
|
|
|
/**
|
2011-10-31 17:09:09 -07:00
|
|
|
* kunmap_high - unmap a highmem page into memory
|
2008-03-19 17:00:42 -07:00
|
|
|
* @page: &struct page to unmap
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
*
|
|
|
|
* If ARCH_NEEDS_KMAP_HIGH_GET is not defined then this may be called
|
|
|
|
* only from user context.
|
2008-03-19 17:00:42 -07:00
|
|
|
*/
|
2008-02-04 22:29:26 -08:00
|
|
|
void kunmap_high(struct page *page)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
|
|
|
unsigned long vaddr;
|
|
|
|
unsigned long nr;
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
unsigned long flags;
|
2005-04-16 15:20:36 -07:00
|
|
|
int need_wakeup;
|
2014-08-06 16:08:23 -07:00
|
|
|
unsigned int color = get_pkmap_color(page);
|
|
|
|
wait_queue_head_t *pkmap_map_wait;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
lock_kmap_any(flags);
|
2005-04-16 15:20:36 -07:00
|
|
|
vaddr = (unsigned long)page_address(page);
|
2006-04-02 13:47:35 +02:00
|
|
|
BUG_ON(!vaddr);
|
2005-04-16 15:20:36 -07:00
|
|
|
nr = PKMAP_NR(vaddr);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A count must never go down to zero
|
|
|
|
* without a TLB flush!
|
|
|
|
*/
|
|
|
|
need_wakeup = 0;
|
|
|
|
switch (--pkmap_count[nr]) {
|
|
|
|
case 0:
|
|
|
|
BUG();
|
|
|
|
case 1:
|
|
|
|
/*
|
|
|
|
* Avoid an unnecessary wake_up() function call.
|
|
|
|
* The common case is pkmap_count[] == 1, but
|
|
|
|
* no waiters.
|
|
|
|
* The tasks queued in the wait-queue are guarded
|
|
|
|
* by both the lock in the wait-queue-head and by
|
|
|
|
* the kmap_lock. As the kmap_lock is held here,
|
|
|
|
* no need for the wait-queue-head's lock. Simply
|
|
|
|
* test if the queue is empty.
|
|
|
|
*/
|
2014-08-06 16:08:23 -07:00
|
|
|
pkmap_map_wait = get_pkmap_wait_queue_head(color);
|
|
|
|
need_wakeup = waitqueue_active(pkmap_map_wait);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
highmem: atomic highmem kmap page pinning
Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.
When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.
It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.
For example, let's consider this sequence of events:
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.
--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);
*** preemption occurs in the middle of the loop above ***
- kmap_high() is called for a different page.
--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.
- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.
- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.
*** the first thread is rescheduled ***
- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !
And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.
This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Reviewed-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-04 22:49:41 -05:00
|
|
|
unlock_kmap_any(flags);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/* do wake-up, if needed, race-free outside of the spin lock */
|
|
|
|
if (need_wakeup)
|
2014-08-06 16:08:23 -07:00
|
|
|
wake_up(pkmap_map_wait);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(kunmap_high);
|
2020-12-14 19:12:59 -08:00
|
|
|
|
|
|
|
void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
|
|
|
|
unsigned start2, unsigned end2)
|
|
|
|
{
|
|
|
|
unsigned int i;
|
|
|
|
|
|
|
|
BUG_ON(end1 > page_size(page) || end2 > page_size(page));
|
|
|
|
|
2021-03-12 21:07:37 -08:00
|
|
|
if (start1 >= end1)
|
|
|
|
start1 = end1 = 0;
|
|
|
|
if (start2 >= end2)
|
|
|
|
start2 = end2 = 0;
|
|
|
|
|
2020-12-14 19:12:59 -08:00
|
|
|
for (i = 0; i < compound_nr(page); i++) {
|
|
|
|
void *kaddr = NULL;
|
|
|
|
|
|
|
|
if (start1 >= PAGE_SIZE) {
|
|
|
|
start1 -= PAGE_SIZE;
|
|
|
|
end1 -= PAGE_SIZE;
|
|
|
|
} else {
|
|
|
|
unsigned this_end = min_t(unsigned, end1, PAGE_SIZE);
|
|
|
|
|
2021-03-12 21:07:37 -08:00
|
|
|
if (end1 > start1) {
|
2021-11-05 13:45:06 -07:00
|
|
|
kaddr = kmap_local_page(page + i);
|
2020-12-14 19:12:59 -08:00
|
|
|
memset(kaddr + start1, 0, this_end - start1);
|
2021-03-12 21:07:37 -08:00
|
|
|
}
|
2020-12-14 19:12:59 -08:00
|
|
|
end1 -= this_end;
|
|
|
|
start1 = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (start2 >= PAGE_SIZE) {
|
|
|
|
start2 -= PAGE_SIZE;
|
|
|
|
end2 -= PAGE_SIZE;
|
|
|
|
} else {
|
|
|
|
unsigned this_end = min_t(unsigned, end2, PAGE_SIZE);
|
|
|
|
|
2021-03-12 21:07:37 -08:00
|
|
|
if (end2 > start2) {
|
|
|
|
if (!kaddr)
|
2021-11-05 13:45:06 -07:00
|
|
|
kaddr = kmap_local_page(page + i);
|
2020-12-14 19:12:59 -08:00
|
|
|
memset(kaddr + start2, 0, this_end - start2);
|
2021-03-12 21:07:37 -08:00
|
|
|
}
|
2020-12-14 19:12:59 -08:00
|
|
|
end2 -= this_end;
|
|
|
|
start2 = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (kaddr) {
|
2021-11-05 13:45:06 -07:00
|
|
|
kunmap_local(kaddr);
|
2020-12-14 19:12:59 -08:00
|
|
|
flush_dcache_page(page + i);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!end1 && !end2)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
BUG_ON((start1 | start2 | end1 | end2) != 0);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(zero_user_segments);
|
2020-11-03 10:27:18 +01:00
|
|
|
#endif /* CONFIG_HIGHMEM */
|
|
|
|
|
|
|
|
#ifdef CONFIG_KMAP_LOCAL
|
|
|
|
|
|
|
|
#include <asm/kmap_size.h>
|
|
|
|
|
2020-11-03 10:27:19 +01:00
|
|
|
/*
|
2020-11-18 20:48:39 +01:00
|
|
|
* With DEBUG_KMAP_LOCAL the stack depth is doubled and every second
|
2020-11-03 10:27:19 +01:00
|
|
|
* slot is unused which acts as a guard page
|
|
|
|
*/
|
2020-11-18 20:48:39 +01:00
|
|
|
#ifdef CONFIG_DEBUG_KMAP_LOCAL
|
2020-11-03 10:27:19 +01:00
|
|
|
# define KM_INCR 2
|
|
|
|
#else
|
|
|
|
# define KM_INCR 1
|
|
|
|
#endif
|
|
|
|
|
2020-11-03 10:27:18 +01:00
|
|
|
static inline int kmap_local_idx_push(void)
|
|
|
|
{
|
2021-09-07 19:56:12 -07:00
|
|
|
WARN_ON_ONCE(in_hardirq() && !irqs_disabled());
|
2020-11-18 20:48:43 +01:00
|
|
|
current->kmap_ctrl.idx += KM_INCR;
|
|
|
|
BUG_ON(current->kmap_ctrl.idx >= KM_MAX_IDX);
|
|
|
|
return current->kmap_ctrl.idx - 1;
|
2020-11-03 10:27:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline int kmap_local_idx(void)
|
|
|
|
{
|
2020-11-18 20:48:43 +01:00
|
|
|
return current->kmap_ctrl.idx - 1;
|
2020-11-03 10:27:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void kmap_local_idx_pop(void)
|
|
|
|
{
|
2020-11-18 20:48:43 +01:00
|
|
|
current->kmap_ctrl.idx -= KM_INCR;
|
|
|
|
BUG_ON(current->kmap_ctrl.idx < 0);
|
2020-11-03 10:27:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
#ifndef arch_kmap_local_post_map
|
|
|
|
# define arch_kmap_local_post_map(vaddr, pteval) do { } while (0)
|
|
|
|
#endif
|
2020-11-03 10:27:31 +01:00
|
|
|
|
2020-11-03 10:27:18 +01:00
|
|
|
#ifndef arch_kmap_local_pre_unmap
|
|
|
|
# define arch_kmap_local_pre_unmap(vaddr) do { } while (0)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifndef arch_kmap_local_post_unmap
|
|
|
|
# define arch_kmap_local_post_unmap(vaddr) do { } while (0)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifndef arch_kmap_local_unmap_idx
|
|
|
|
#define arch_kmap_local_unmap_idx(idx, vaddr) kmap_local_calc_idx(idx)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifndef arch_kmap_local_high_get
|
|
|
|
static inline void *arch_kmap_local_high_get(struct page *page)
|
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2021-01-23 21:02:02 -08:00
|
|
|
#ifndef arch_kmap_local_set_pte
|
|
|
|
#define arch_kmap_local_set_pte(mm, vaddr, ptep, ptev) \
|
|
|
|
set_pte_at(mm, vaddr, ptep, ptev)
|
|
|
|
#endif
|
|
|
|
|
2020-11-03 10:27:18 +01:00
|
|
|
/* Unmap a local mapping which was obtained by kmap_high_get() */
|
2020-11-12 11:59:32 +01:00
|
|
|
static inline bool kmap_high_unmap_local(unsigned long vaddr)
|
2020-11-03 10:27:18 +01:00
|
|
|
{
|
|
|
|
#ifdef ARCH_NEEDS_KMAP_HIGH_GET
|
2020-11-12 11:59:32 +01:00
|
|
|
if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
|
mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper. This means that by default, the accesses change from a
C dereference to a READ_ONCE(). This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.
But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte. Arch code
is deliberately not converted, as the arch code knows best. It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
Conversion was done using Coccinelle:
----
// $ make coccicheck \
// COCCI=ptepget.cocci \
// SPFLAGS="--include-headers" \
// MODE=patch
virtual patch
@ depends on patch @
pte_t *v;
@@
- *v
+ ptep_get(v)
----
Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so. This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.
Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot. The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get(). HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined. Fix by continuing to do a direct dereference
when MMU=n. This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.
Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 16:15:45 +01:00
|
|
|
kunmap_high(pte_page(ptep_get(&pkmap_page_table[PKMAP_NR(vaddr)])));
|
2020-11-12 11:59:32 +01:00
|
|
|
return true;
|
|
|
|
}
|
2020-11-03 10:27:18 +01:00
|
|
|
#endif
|
2020-11-12 11:59:32 +01:00
|
|
|
return false;
|
2020-11-03 10:27:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static pte_t *__kmap_pte;
|
|
|
|
|
2021-11-19 16:43:55 -08:00
|
|
|
static pte_t *kmap_get_pte(unsigned long vaddr, int idx)
|
2020-11-03 10:27:18 +01:00
|
|
|
{
|
2021-11-19 16:43:55 -08:00
|
|
|
if (IS_ENABLED(CONFIG_KMAP_LOCAL_NON_LINEAR_PTE_ARRAY))
|
|
|
|
/*
|
|
|
|
* Set by the arch if __kmap_pte[-idx] does not produce
|
|
|
|
* the correct entry.
|
|
|
|
*/
|
|
|
|
return virt_to_kpte(vaddr);
|
2020-11-03 10:27:18 +01:00
|
|
|
if (!__kmap_pte)
|
|
|
|
__kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
|
2021-11-19 16:43:55 -08:00
|
|
|
return &__kmap_pte[-idx];
|
2020-11-03 10:27:18 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot)
|
|
|
|
{
|
2021-11-19 16:43:55 -08:00
|
|
|
pte_t pteval, *kmap_pte;
|
2020-11-03 10:27:18 +01:00
|
|
|
unsigned long vaddr;
|
|
|
|
int idx;
|
|
|
|
|
2020-11-18 20:48:44 +01:00
|
|
|
/*
|
|
|
|
* Disable migration so resulting virtual address is stable
|
2021-05-06 18:06:47 -07:00
|
|
|
* across preemption.
|
2020-11-18 20:48:44 +01:00
|
|
|
*/
|
|
|
|
migrate_disable();
|
2020-11-03 10:27:18 +01:00
|
|
|
preempt_disable();
|
|
|
|
idx = arch_kmap_local_map_idx(kmap_local_idx_push(), pfn);
|
|
|
|
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
|
2021-11-19 16:43:55 -08:00
|
|
|
kmap_pte = kmap_get_pte(vaddr, idx);
|
mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper. This means that by default, the accesses change from a
C dereference to a READ_ONCE(). This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.
But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte. Arch code
is deliberately not converted, as the arch code knows best. It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
Conversion was done using Coccinelle:
----
// $ make coccicheck \
// COCCI=ptepget.cocci \
// SPFLAGS="--include-headers" \
// MODE=patch
virtual patch
@ depends on patch @
pte_t *v;
@@
- *v
+ ptep_get(v)
----
Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so. This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.
Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot. The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get(). HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined. Fix by continuing to do a direct dereference
when MMU=n. This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.
Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 16:15:45 +01:00
|
|
|
BUG_ON(!pte_none(ptep_get(kmap_pte)));
|
2020-11-03 10:27:18 +01:00
|
|
|
pteval = pfn_pte(pfn, prot);
|
2021-11-19 16:43:55 -08:00
|
|
|
arch_kmap_local_set_pte(&init_mm, vaddr, kmap_pte, pteval);
|
2020-11-03 10:27:18 +01:00
|
|
|
arch_kmap_local_post_map(vaddr, pteval);
|
2020-11-18 20:48:43 +01:00
|
|
|
current->kmap_ctrl.pteval[kmap_local_idx()] = pteval;
|
2020-11-03 10:27:18 +01:00
|
|
|
preempt_enable();
|
|
|
|
|
|
|
|
return (void *)vaddr;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(__kmap_local_pfn_prot);
|
|
|
|
|
|
|
|
void *__kmap_local_page_prot(struct page *page, pgprot_t prot)
|
|
|
|
{
|
|
|
|
void *kmap;
|
|
|
|
|
2020-11-18 20:48:40 +01:00
|
|
|
/*
|
|
|
|
* To broaden the usage of the actual kmap_local() machinery always map
|
|
|
|
* pages when debugging is enabled and the architecture has no problems
|
|
|
|
* with alias mappings.
|
|
|
|
*/
|
|
|
|
if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
|
2020-11-03 10:27:18 +01:00
|
|
|
return page_address(page);
|
|
|
|
|
|
|
|
/* Try kmap_high_get() if architecture has it enabled */
|
|
|
|
kmap = arch_kmap_local_high_get(page);
|
|
|
|
if (kmap)
|
|
|
|
return kmap;
|
|
|
|
|
|
|
|
return __kmap_local_pfn_prot(page_to_pfn(page), prot);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(__kmap_local_page_prot);
|
|
|
|
|
2022-07-06 13:15:19 +02:00
|
|
|
void kunmap_local_indexed(const void *vaddr)
|
2020-11-03 10:27:18 +01:00
|
|
|
{
|
|
|
|
unsigned long addr = (unsigned long) vaddr & PAGE_MASK;
|
2021-11-19 16:43:55 -08:00
|
|
|
pte_t *kmap_pte;
|
2020-11-03 10:27:18 +01:00
|
|
|
int idx;
|
|
|
|
|
|
|
|
if (addr < __fix_to_virt(FIX_KMAP_END) ||
|
|
|
|
addr > __fix_to_virt(FIX_KMAP_BEGIN)) {
|
2020-11-18 20:48:40 +01:00
|
|
|
if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP)) {
|
|
|
|
/* This _should_ never happen! See above. */
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
return;
|
|
|
|
}
|
2020-11-12 11:59:32 +01:00
|
|
|
/*
|
|
|
|
* Handle mappings which were obtained by kmap_high_get()
|
|
|
|
* first as the virtual address of such mappings is below
|
|
|
|
* PAGE_OFFSET. Warn for all other addresses which are in
|
|
|
|
* the user space part of the virtual address space.
|
|
|
|
*/
|
|
|
|
if (!kmap_high_unmap_local(addr))
|
|
|
|
WARN_ON_ONCE(addr < PAGE_OFFSET);
|
2020-11-03 10:27:18 +01:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
preempt_disable();
|
|
|
|
idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
|
|
|
|
WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
|
|
|
|
|
2021-11-19 16:43:55 -08:00
|
|
|
kmap_pte = kmap_get_pte(addr, idx);
|
2020-11-03 10:27:18 +01:00
|
|
|
arch_kmap_local_pre_unmap(addr);
|
2021-11-19 16:43:55 -08:00
|
|
|
pte_clear(&init_mm, addr, kmap_pte);
|
2020-11-03 10:27:18 +01:00
|
|
|
arch_kmap_local_post_unmap(addr);
|
2020-11-18 20:48:43 +01:00
|
|
|
current->kmap_ctrl.pteval[kmap_local_idx()] = __pte(0);
|
2020-11-03 10:27:18 +01:00
|
|
|
kmap_local_idx_pop();
|
|
|
|
preempt_enable();
|
2020-11-18 20:48:44 +01:00
|
|
|
migrate_enable();
|
2020-11-03 10:27:18 +01:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(kunmap_local_indexed);
|
2020-11-18 20:48:43 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Invoked before switch_to(). This is safe even when during or after
|
|
|
|
* clearing the maps an interrupt which needs a kmap_local happens because
|
|
|
|
* the task::kmap_ctrl.idx is not modified by the unmapping code so a
|
|
|
|
* nested kmap_local will use the next unused index and restore the index
|
|
|
|
* on unmap. The already cleared kmaps of the outgoing task are irrelevant
|
|
|
|
* because the interrupt context does not know about them. The same applies
|
|
|
|
* when scheduling back in for an interrupt which happens before the
|
|
|
|
* restore is complete.
|
|
|
|
*/
|
|
|
|
void __kmap_local_sched_out(void)
|
|
|
|
{
|
|
|
|
struct task_struct *tsk = current;
|
2021-11-19 16:43:55 -08:00
|
|
|
pte_t *kmap_pte;
|
2020-11-18 20:48:43 +01:00
|
|
|
int i;
|
|
|
|
|
|
|
|
/* Clear kmaps */
|
|
|
|
for (i = 0; i < tsk->kmap_ctrl.idx; i++) {
|
|
|
|
pte_t pteval = tsk->kmap_ctrl.pteval[i];
|
|
|
|
unsigned long addr;
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
/* With debug all even slots are unmapped and act as guard */
|
2021-03-24 21:37:53 -07:00
|
|
|
if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) {
|
2022-04-08 13:08:55 -07:00
|
|
|
WARN_ON_ONCE(pte_val(pteval) != 0);
|
2020-11-18 20:48:43 +01:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (WARN_ON_ONCE(pte_none(pteval)))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is a horrible hack for XTENSA to calculate the
|
|
|
|
* coloured PTE index. Uses the PFN encoded into the pteval
|
|
|
|
* and the map index calculation because the actual mapped
|
|
|
|
* virtual address is not stored in task::kmap_ctrl.
|
|
|
|
* For any sane architecture this is optimized out.
|
|
|
|
*/
|
|
|
|
idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
|
|
|
|
|
|
|
|
addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
|
2021-11-19 16:43:55 -08:00
|
|
|
kmap_pte = kmap_get_pte(addr, idx);
|
2020-11-18 20:48:43 +01:00
|
|
|
arch_kmap_local_pre_unmap(addr);
|
2021-11-19 16:43:55 -08:00
|
|
|
pte_clear(&init_mm, addr, kmap_pte);
|
2020-11-18 20:48:43 +01:00
|
|
|
arch_kmap_local_post_unmap(addr);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void __kmap_local_sched_in(void)
|
|
|
|
{
|
|
|
|
struct task_struct *tsk = current;
|
2021-11-19 16:43:55 -08:00
|
|
|
pte_t *kmap_pte;
|
2020-11-18 20:48:43 +01:00
|
|
|
int i;
|
|
|
|
|
|
|
|
/* Restore kmaps */
|
|
|
|
for (i = 0; i < tsk->kmap_ctrl.idx; i++) {
|
|
|
|
pte_t pteval = tsk->kmap_ctrl.pteval[i];
|
|
|
|
unsigned long addr;
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
/* With debug all even slots are unmapped and act as guard */
|
2021-03-24 21:37:53 -07:00
|
|
|
if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) {
|
2022-04-08 13:08:55 -07:00
|
|
|
WARN_ON_ONCE(pte_val(pteval) != 0);
|
2020-11-18 20:48:43 +01:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (WARN_ON_ONCE(pte_none(pteval)))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* See comment in __kmap_local_sched_out() */
|
|
|
|
idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
|
|
|
|
addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
|
2021-11-19 16:43:55 -08:00
|
|
|
kmap_pte = kmap_get_pte(addr, idx);
|
|
|
|
set_pte_at(&init_mm, addr, kmap_pte, pteval);
|
2020-11-18 20:48:43 +01:00
|
|
|
arch_kmap_local_post_map(addr, pteval);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void kmap_local_fork(struct task_struct *tsk)
|
|
|
|
{
|
|
|
|
if (WARN_ON_ONCE(tsk->kmap_ctrl.idx))
|
|
|
|
memset(&tsk->kmap_ctrl, 0, sizeof(tsk->kmap_ctrl));
|
|
|
|
}
|
|
|
|
|
2020-11-03 10:27:18 +01:00
|
|
|
#endif
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
#if defined(HASHED_PAGE_VIRTUAL)
|
|
|
|
|
|
|
|
#define PA_HASH_ORDER 7
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Describes one page->virtual association
|
|
|
|
*/
|
|
|
|
struct page_address_map {
|
|
|
|
struct page *page;
|
|
|
|
void *virtual;
|
|
|
|
struct list_head list;
|
|
|
|
};
|
|
|
|
|
2012-12-11 16:01:23 -08:00
|
|
|
static struct page_address_map page_address_maps[LAST_PKMAP];
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Hash table bucket
|
|
|
|
*/
|
|
|
|
static struct page_address_slot {
|
|
|
|
struct list_head lh; /* List of page_address_maps */
|
|
|
|
spinlock_t lock; /* Protect this bucket's list */
|
|
|
|
} ____cacheline_aligned_in_smp page_address_htable[1<<PA_HASH_ORDER];
|
|
|
|
|
2011-08-17 13:45:09 +01:00
|
|
|
static struct page_address_slot *page_slot(const struct page *page)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
|
|
|
return &page_address_htable[hash_ptr(page, PA_HASH_ORDER)];
|
|
|
|
}
|
|
|
|
|
2008-03-19 17:00:42 -07:00
|
|
|
/**
|
|
|
|
* page_address - get the mapped virtual address of a page
|
|
|
|
* @page: &struct page to get the virtual address of
|
|
|
|
*
|
|
|
|
* Returns the page's virtual address.
|
|
|
|
*/
|
2011-08-17 13:45:09 +01:00
|
|
|
void *page_address(const struct page *page)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
|
|
|
unsigned long flags;
|
|
|
|
void *ret;
|
|
|
|
struct page_address_slot *pas;
|
|
|
|
|
|
|
|
if (!PageHighMem(page))
|
|
|
|
return lowmem_page_address(page);
|
|
|
|
|
|
|
|
pas = page_slot(page);
|
|
|
|
ret = NULL;
|
|
|
|
spin_lock_irqsave(&pas->lock, flags);
|
|
|
|
if (!list_empty(&pas->lh)) {
|
|
|
|
struct page_address_map *pam;
|
|
|
|
|
|
|
|
list_for_each_entry(pam, &pas->lh, list) {
|
|
|
|
if (pam->page == page) {
|
|
|
|
ret = pam->virtual;
|
2022-03-22 14:48:01 -07:00
|
|
|
break;
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2022-03-22 14:48:01 -07:00
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
spin_unlock_irqrestore(&pas->lock, flags);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(page_address);
|
|
|
|
|
2008-03-19 17:00:42 -07:00
|
|
|
/**
|
|
|
|
* set_page_address - set a page's virtual address
|
|
|
|
* @page: &struct page to set
|
|
|
|
* @virtual: virtual address to use
|
|
|
|
*/
|
2005-04-16 15:20:36 -07:00
|
|
|
void set_page_address(struct page *page, void *virtual)
|
|
|
|
{
|
|
|
|
unsigned long flags;
|
|
|
|
struct page_address_slot *pas;
|
|
|
|
struct page_address_map *pam;
|
|
|
|
|
|
|
|
BUG_ON(!PageHighMem(page));
|
|
|
|
|
|
|
|
pas = page_slot(page);
|
|
|
|
if (virtual) { /* Add */
|
2012-12-11 16:01:23 -08:00
|
|
|
pam = &page_address_maps[PKMAP_NR((unsigned long)virtual)];
|
2005-04-16 15:20:36 -07:00
|
|
|
pam->page = page;
|
|
|
|
pam->virtual = virtual;
|
|
|
|
|
|
|
|
spin_lock_irqsave(&pas->lock, flags);
|
|
|
|
list_add_tail(&pam->list, &pas->lh);
|
|
|
|
spin_unlock_irqrestore(&pas->lock, flags);
|
|
|
|
} else { /* Remove */
|
|
|
|
spin_lock_irqsave(&pas->lock, flags);
|
|
|
|
list_for_each_entry(pam, &pas->lh, list) {
|
|
|
|
if (pam->page == page) {
|
|
|
|
list_del(&pam->list);
|
2022-03-22 14:48:01 -07:00
|
|
|
break;
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
spin_unlock_irqrestore(&pas->lock, flags);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void __init page_address_init(void)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < ARRAY_SIZE(page_address_htable); i++) {
|
|
|
|
INIT_LIST_HEAD(&page_address_htable[i].lh);
|
|
|
|
spin_lock_init(&page_address_htable[i].lock);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-10-15 20:09:52 -07:00
|
|
|
#endif /* defined(HASHED_PAGE_VIRTUAL) */
|