The biggest change here is eliminating the awful idea that KVM had, of

essentially guessing which pfns are refcounted pages.  The reason to
 do so was that KVM needs to map both non-refcounted pages (for example
 BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain
 refcounted pages.  However, the result was security issues in the past,
 and more recently the inability to map VM_IO and VM_PFNMAP memory
 that _is_ backed by struct page but is not refcounted.  In particular
 this broke virtio-gpu blob resources (which directly map host graphics
 buffers into the guest as "vram" for the virtio-gpu device) with the
 amdgpu driver, because amdgpu allocates non-compound higher order pages
 and the tail pages could not be mapped into KVM.
 
 This requires adjusting all uses of struct page in the per-architecture
 code, to always work on the pfn whenever possible.  The large series that
 did this, from David Stevens and Sean Christopherson, also cleaned up
 substantially the set of functions that provided arch code with the
 pfn for a host virtual addresses.  The previous maze of twisty little
 passages, all different, is replaced by five functions (__gfn_to_page,
 __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages)
 saving almost 200 lines of code.
 
 ARM:
 
 * Support for stage-1 permission indirection (FEAT_S1PIE) and
   permission overlays (FEAT_S1POE), including nested virt + the
   emulated page table walker
 
 * Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
   was introduced in PSCIv1.3 as a mechanism to request hibernation,
   similar to the S4 state in ACPI
 
 * Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
   part of it, introduce trivial initialization of the host's MPAM
   context so KVM can use the corresponding traps
 
 * PMU support under nested virtualization, honoring the guest
   hypervisor's trap configuration and event filtering when running a
   nested guest
 
 * Fixes to vgic ITS serialization where stale device/interrupt table
   entries are not zeroed when the mapping is invalidated by the VM
 
 * Avoid emulated MMIO completion if userspace has requested synchronous
   external abort injection
 
 * Various fixes and cleanups affecting pKVM, vCPU initialization, and
   selftests
 
 LoongArch:
 
 * Add iocsr and mmio bus simulation in kernel.
 
 * Add in-kernel interrupt controller emulation.
 
 * Add support for virtualization extensions to the eiointc irqchip.
 
 PPC:
 
 * Drop lingering and utterly obsolete references to PPC970 KVM, which was
   removed 10 years ago.
 
 * Fix incorrect documentation references to non-existing ioctls
 
 RISC-V:
 
 * Accelerate KVM RISC-V when running as a guest
 
 * Perf support to collect KVM guest statistics from host side
 
 s390:
 
 * New selftests: more ucontrol selftests and CPU model sanity checks
 
 * Support for the gen17 CPU model
 
 * List registers supported by KVM_GET/SET_ONE_REG in the documentation
 
 x86:
 
 * Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve
   documentation, harden against unexpected changes.  Even if the hardware
   A/D tracking is disabled, it is possible to use the hardware-defined A/D
   bits to track if a PFN is Accessed and/or Dirty, and that removes a lot
   of special cases.
 
 * Elide TLB flushes when aging secondary PTEs, as has been done in x86's
   primary MMU for over 10 years.
 
 * Recover huge pages in-place in the TDP MMU when dirty page logging is
   toggled off, instead of zapping them and waiting until the page is
   re-accessed to create a huge mapping.  This reduces vCPU jitter.
 
 * Batch TLB flushes when dirty page logging is toggled off.  This reduces
   the time it takes to disable dirty logging by ~3x.
 
 * Remove the shrinker that was (poorly) attempting to reclaim shadow page
   tables in low-memory situations.
 
 * Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.
 
 * Advertise CPUIDs for new instructions in Clearwater Forest
 
 * Quirk KVM's misguided behavior of initialized certain feature MSRs to
   their maximum supported feature set, which can result in KVM creating
   invalid vCPU state.  E.g. initializing PERF_CAPABILITIES to a non-zero
   value results in the vCPU having invalid state if userspace hides PDCM
   from the guest, which in turn can lead to save/restore failures.
 
 * Fix KVM's handling of non-canonical checks for vCPUs that support LA57
   to better follow the "architecture", in quotes because the actual
   behavior is poorly documented.  E.g. most MSR writes and descriptor
   table loads ignore CR4.LA57 and operate purely on whether the CPU
   supports LA57.
 
 * Bypass the register cache when querying CPL from kvm_sched_out(), as
   filling the cache from IRQ context is generally unsafe; harden the
   cache accessors to try to prevent similar issues from occuring in the
   future.  The issue that triggered this change was already fixed in 6.12,
   but was still kinda latent.
 
 * Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
   over-advertises SPEC_CTRL when trying to support cross-vendor VMs.
 
 * Minor cleanups
 
 * Switch hugepage recovery thread to use vhost_task.  These kthreads can
   consume significant amounts of CPU time on behalf of a VM or in response
   to how the VM behaves (for example how it accesses its memory); therefore
   KVM tried to place the thread in the VM's cgroups and charge the CPU
   time consumed by that work to the VM's container.  However the kthreads
   did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM
   instances inside could not complete freezing.  Fix this by replacing the
   kthread with a PF_USER_WORKER thread, via the vhost_task abstraction.
   Another 100+ lines removed, with generally better behavior too like
   having these threads properly parented in the process tree.
 
 * Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't
   really work; there was really nothing to work around anyway: the broken
   patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL
   MSR is virtualized and therefore unaffected by the erratum.
 
 * Fix 6.12 regression where CONFIG_KVM will be built as a module even
   if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'.
 
 x86 selftests:
 
 * x86 selftests can now use AVX.
 
 Documentation:
 
 * Use rST internal links
 
 * Reorganize the introduction to the API document
 
 Generic:
 
 * Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
   of RCU, so that running a vCPU on a different task doesn't encounter long
   due to having to wait for all CPUs become quiescent.  In general both reads
   and writes are rare, but userspace that supports confidential computing is
   introducing the use of "helper" vCPUs that may jump from one host processor
   to another.  Those will be very happy to trigger a synchronize_rcu(), and
   the effect on performance is quite the disaster.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmc9MRYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP00QgArxqxBIGLCW5t7bw7vtNq63QYRyh4
 dTiDguLiYQJ+AXmnRu11R6aPC7HgMAvlFCCmH+GEce4WEgt26hxCmncJr/aJOSwS
 letCS7TrME16PeZvh25A1nhPBUw6mTF1qqzgcdHMrqXG8LuHoGcKYGSRVbkf3kfI
 1ZoMq1r8ChXbVVmCx9DQ3gw1TVr5Dpjs2voLh8rDSE9Xpw0tVVabHu3/NhQEz/F+
 t8/nRaqH777icCHIf9PCk5HnarHxLAOvhM2M0Yj09PuBcE5fFQxpxltw/qiKQqqW
 ep4oquojGl87kZnhlDaac2UNtK90Ws+WxxvCwUmbvGN0ZJVaQwf4FvTwig==
 =lWpE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The biggest change here is eliminating the awful idea that KVM had of
  essentially guessing which pfns are refcounted pages.

  The reason to do so was that KVM needs to map both non-refcounted
  pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP
  VMAs that contain refcounted pages.

  However, the result was security issues in the past, and more recently
  the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by
  struct page but is not refcounted. In particular this broke virtio-gpu
  blob resources (which directly map host graphics buffers into the
  guest as "vram" for the virtio-gpu device) with the amdgpu driver,
  because amdgpu allocates non-compound higher order pages and the tail
  pages could not be mapped into KVM.

  This requires adjusting all uses of struct page in the
  per-architecture code, to always work on the pfn whenever possible.
  The large series that did this, from David Stevens and Sean
  Christopherson, also cleaned up substantially the set of functions
  that provided arch code with the pfn for a host virtual addresses.

  The previous maze of twisty little passages, all different, is
  replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the
  non-__ versions of these two, and kvm_prefetch_pages) saving almost
  200 lines of code.

  ARM:

   - Support for stage-1 permission indirection (FEAT_S1PIE) and
     permission overlays (FEAT_S1POE), including nested virt + the
     emulated page table walker

   - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This
     call was introduced in PSCIv1.3 as a mechanism to request
     hibernation, similar to the S4 state in ACPI

   - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
     part of it, introduce trivial initialization of the host's MPAM
     context so KVM can use the corresponding traps

   - PMU support under nested virtualization, honoring the guest
     hypervisor's trap configuration and event filtering when running a
     nested guest

   - Fixes to vgic ITS serialization where stale device/interrupt table
     entries are not zeroed when the mapping is invalidated by the VM

   - Avoid emulated MMIO completion if userspace has requested
     synchronous external abort injection

   - Various fixes and cleanups affecting pKVM, vCPU initialization, and
     selftests

  LoongArch:

   - Add iocsr and mmio bus simulation in kernel.

   - Add in-kernel interrupt controller emulation.

   - Add support for virtualization extensions to the eiointc irqchip.

  PPC:

   - Drop lingering and utterly obsolete references to PPC970 KVM, which
     was removed 10 years ago.

   - Fix incorrect documentation references to non-existing ioctls

  RISC-V:

   - Accelerate KVM RISC-V when running as a guest

   - Perf support to collect KVM guest statistics from host side

  s390:

   - New selftests: more ucontrol selftests and CPU model sanity checks

   - Support for the gen17 CPU model

   - List registers supported by KVM_GET/SET_ONE_REG in the
     documentation

  x86:

   - Cleanup KVM's handling of Accessed and Dirty bits to dedup code,
     improve documentation, harden against unexpected changes.

     Even if the hardware A/D tracking is disabled, it is possible to
     use the hardware-defined A/D bits to track if a PFN is Accessed
     and/or Dirty, and that removes a lot of special cases.

   - Elide TLB flushes when aging secondary PTEs, as has been done in
     x86's primary MMU for over 10 years.

   - Recover huge pages in-place in the TDP MMU when dirty page logging
     is toggled off, instead of zapping them and waiting until the page
     is re-accessed to create a huge mapping. This reduces vCPU jitter.

   - Batch TLB flushes when dirty page logging is toggled off. This
     reduces the time it takes to disable dirty logging by ~3x.

   - Remove the shrinker that was (poorly) attempting to reclaim shadow
     page tables in low-memory situations.

   - Clean up and optimize KVM's handling of writes to
     MSR_IA32_APICBASE.

   - Advertise CPUIDs for new instructions in Clearwater Forest

   - Quirk KVM's misguided behavior of initialized certain feature MSRs
     to their maximum supported feature set, which can result in KVM
     creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to
     a non-zero value results in the vCPU having invalid state if
     userspace hides PDCM from the guest, which in turn can lead to
     save/restore failures.

   - Fix KVM's handling of non-canonical checks for vCPUs that support
     LA57 to better follow the "architecture", in quotes because the
     actual behavior is poorly documented. E.g. most MSR writes and
     descriptor table loads ignore CR4.LA57 and operate purely on
     whether the CPU supports LA57.

   - Bypass the register cache when querying CPL from kvm_sched_out(),
     as filling the cache from IRQ context is generally unsafe; harden
     the cache accessors to try to prevent similar issues from occuring
     in the future. The issue that triggered this change was already
     fixed in 6.12, but was still kinda latent.

   - Advertise AMD_IBPB_RET to userspace, and fix a related bug where
     KVM over-advertises SPEC_CTRL when trying to support cross-vendor
     VMs.

   - Minor cleanups

   - Switch hugepage recovery thread to use vhost_task.

     These kthreads can consume significant amounts of CPU time on
     behalf of a VM or in response to how the VM behaves (for example
     how it accesses its memory); therefore KVM tried to place the
     thread in the VM's cgroups and charge the CPU time consumed by that
     work to the VM's container.

     However the kthreads did not process SIGSTOP/SIGCONT, and therefore
     cgroups which had KVM instances inside could not complete freezing.

     Fix this by replacing the kthread with a PF_USER_WORKER thread, via
     the vhost_task abstraction. Another 100+ lines removed, with
     generally better behavior too like having these threads properly
     parented in the process tree.

   - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that
     didn't really work; there was really nothing to work around anyway:
     the broken patch was meant to fix nested virtualization, but the
     PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the
     erratum.

   - Fix 6.12 regression where CONFIG_KVM will be built as a module even
     if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is
     'y'.

  x86 selftests:

   - x86 selftests can now use AVX.

  Documentation:

   - Use rST internal links

   - Reorganize the introduction to the API document

  Generic:

   - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock
     instead of RCU, so that running a vCPU on a different task doesn't
     encounter long due to having to wait for all CPUs become quiescent.

     In general both reads and writes are rare, but userspace that
     supports confidential computing is introducing the use of "helper"
     vCPUs that may jump from one host processor to another. Those will
     be very happy to trigger a synchronize_rcu(), and the effect on
     performance is quite the disaster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits)
  KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
  KVM: x86: add back X86_LOCAL_APIC dependency
  Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
  KVM: x86: switch hugepage recovery thread to vhost_task
  KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR
  x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest
  Documentation: KVM: fix malformed table
  irqchip/loongson-eiointc: Add virt extension support
  LoongArch: KVM: Add irqfd support
  LoongArch: KVM: Add PCHPIC user mode read and write functions
  LoongArch: KVM: Add PCHPIC read and write functions
  LoongArch: KVM: Add PCHPIC device support
  LoongArch: KVM: Add EIOINTC user mode read and write functions
  LoongArch: KVM: Add EIOINTC read and write functions
  LoongArch: KVM: Add EIOINTC device support
  LoongArch: KVM: Add IPI user mode read and write function
  LoongArch: KVM: Add IPI read and write function
  LoongArch: KVM: Add IPI device support
  LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
  KVM: arm64: Pass on SVE mapping failures
  ...
This commit is contained in:
Linus Torvalds 2024-11-23 16:00:50 -08:00
commit 9f16d5e6f2
183 changed files with 9183 additions and 2654 deletions

View File

@ -152,6 +152,8 @@ infrastructure:
+------------------------------+---------+---------+
| DIT | [51-48] | y |
+------------------------------+---------+---------+
| MPAM | [43-40] | n |
+------------------------------+---------+---------+
| SVE | [35-32] | y |
+------------------------------+---------+---------+
| GIC | [27-24] | n |

View File

@ -85,6 +85,70 @@ to CPUINTC directly::
| Devices |
+---------+
Virtual Extended IRQ model
==========================
In this model, IPI (Inter-Processor Interrupt) and CPU Local Timer interrupt
go to CPUINTC directly, CPU UARTS interrupts go to PCH-PIC, while all other
devices interrupts go to PCH-PIC/PCH-MSI and gathered by V-EIOINTC (Virtual
Extended I/O Interrupt Controller), and then go to CPUINTC directly::
+-----+ +-------------------+ +-------+
| IPI |--> | CPUINTC(0-255vcpu)| <-- | Timer |
+-----+ +-------------------+ +-------+
^
|
+-----------+
| V-EIOINTC |
+-----------+
^ ^
| |
+---------+ +---------+
| PCH-PIC | | PCH-MSI |
+---------+ +---------+
^ ^ ^
| | |
+--------+ +---------+ +---------+
| UARTs | | Devices | | Devices |
+--------+ +---------+ +---------+
Description
-----------
V-EIOINTC (Virtual Extended I/O Interrupt Controller) is an extension of
EIOINTC, it only works in VM mode which runs in KVM hypervisor. Interrupts can
be routed to up to four vCPUs via standard EIOINTC, however with V-EIOINTC
interrupts can be routed to up to 256 virtual cpus.
With standard EIOINTC, interrupt routing setting includes two parts: eight
bits for CPU selection and four bits for CPU IP (Interrupt Pin) selection.
For CPU selection there is four bits for EIOINTC node selection, four bits
for EIOINTC CPU selection. Bitmap method is used for CPU selection and
CPU IP selection, so interrupt can only route to CPU0 - CPU3 and IP0-IP3 in
one EIOINTC node.
With V-EIOINTC it supports to route more CPUs and CPU IP (Interrupt Pin),
there are two newly added registers with V-EIOINTC.
EXTIOI_VIRT_FEATURES
--------------------
This register is read-only register, which indicates supported features with
V-EIOINTC. Feature EXTIOI_HAS_INT_ENCODE and EXTIOI_HAS_CPU_ENCODE is added.
Feature EXTIOI_HAS_INT_ENCODE is part of standard EIOINTC. If it is 1, it
indicates that CPU Interrupt Pin selection can be normal method rather than
bitmap method, so interrupt can be routed to IP0 - IP15.
Feature EXTIOI_HAS_CPU_ENCODE is entension of V-EIOINTC. If it is 1, it
indicates that CPU selection can be normal method rather than bitmap method,
so interrupt can be routed to CPU0 - CPU255.
EXTIOI_VIRT_CONFIG
------------------
This register is read-write register, for compatibility intterupt routed uses
the default method which is the same with standard EIOINTC. If the bit is set
with 1, it indicated HW to use normal method rather than bitmap method.
Advanced Extended IRQ model
===========================

View File

@ -87,6 +87,61 @@ PCH-LPC/PCH-MSI然后被EIOINTC统一收集再直接到达CPUINTC::
| Devices |
+---------+
虚拟扩展IRQ模型
===============
在这种模型里面, IPI(Inter-Processor Interrupt) 和CPU本地时钟中断直接发送到CPUINTC,
CPU串口 (UARTs) 中断发送到PCH-PIC, 而其他所有设备的中断则分别发送到所连接的PCH_PIC/
PCH-MSI, 然后V-EIOINTC统一收集再直接到达CPUINTC::
+-----+ +-------------------+ +-------+
| IPI |--> | CPUINTC(0-255vcpu)| <-- | Timer |
+-----+ +-------------------+ +-------+
^
|
+-----------+
| V-EIOINTC |
+-----------+
^ ^
| |
+---------+ +---------+
| PCH-PIC | | PCH-MSI |
+---------+ +---------+
^ ^ ^
| | |
+--------+ +---------+ +---------+
| UARTs | | Devices | | Devices |
+--------+ +---------+ +---------+
V-EIOINTC 是EIOINTC的扩展, 仅工作在虚拟机模式下, 中断经EIOINTC最多可个路由到
个虚拟CPU. 但中断经V-EIOINTC最多可个路由到256个虚拟CPU.
传统的EIOINTC中断控制器中断路由分为两个部分8比特用于控制路由到哪个CPU
4比特用于控制路由到特定CPU的哪个中断管脚。控制CPU路由的8比特前4比特用于控制
路由到哪个EIOINTC节点后4比特用于控制此节点哪个CPU。中断路由在选择CPU路由
和CPU中断管脚路由时使用bitmap编码方式而不是正常编码方式所以对于一个
EIOINTC中断控制器节点中断只能路由到CPU0 - CPU3中断管脚IP0-IP3。
V-EIOINTC新增了两个寄存器支持中断路由到更多CPU个和中断管脚。
V-EIOINTC功能寄存器
-------------------
功能寄存器是只读寄存器用于显示V-EIOINTC支持的特性目前两个支持两个特性
EXTIOI_HAS_INT_ENCODE 和 EXTIOI_HAS_CPU_ENCODE。
特性EXTIOI_HAS_INT_ENCODE是传统EIOINTC中断控制器的一个特性如果此比特为1
显示CPU中断管脚路由方式支持正常编码而不是bitmap编码所以中断可以路由到
管脚IP0 - IP15。
特性EXTIOI_HAS_CPU_ENCODE是V-EIOINTC新增特性如果此比特为1表示CPU路由
方式支持正常编码而不是bitmap编码所以中断可以路由到CPU0 - CPU255。
V-EIOINTC配置寄存器
-------------------
配置寄存器是可读写寄存器,为了兼容性考虑,如果不写此寄存器,中断路由采用
和传统EIOINTC相同的路由设置。如果对应比特设置为1表示采用正常路由方式而
不是bitmap编码的路由方式。
高级扩展IRQ模型
===============

View File

@ -7,8 +7,19 @@ The Definitive KVM (Kernel-based Virtual Machine) API Documentation
1. General description
======================
The kvm API is a set of ioctls that are issued to control various aspects
of a virtual machine. The ioctls belong to the following classes:
The kvm API is centered around different kinds of file descriptors
and ioctls that can be issued to these file descriptors. An initial
open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
handle will create a VM file descriptor which can be used to issue VM
ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will
create a virtual cpu or device and return a file descriptor pointing to
the new resource.
In other words, the kvm API is a set of ioctls that are issued to
different kinds of file descriptor in order to control various aspects of
a virtual machine. Depending on the file descriptor that accepts them,
ioctls belong to the following classes:
- System ioctls: These query and set global attributes which affect the
whole kvm subsystem. In addition a system ioctl is used to create
@ -35,18 +46,19 @@ of a virtual machine. The ioctls belong to the following classes:
device ioctls must be issued from the same process (address space) that
was used to create the VM.
2. File descriptors
===================
While most ioctls are specific to one kind of file descriptor, in some
cases the same ioctl can belong to more than one class.
The kvm API is centered around file descriptors. An initial
open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
handle will create a VM file descriptor which can be used to issue VM
ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will
create a virtual cpu or device and return a file descriptor pointing to
the new resource. Finally, ioctls on a vcpu or device fd can be used
to control the vcpu or device. For vcpus, this includes the important
task of actually running guest code.
The KVM API grew over time. For this reason, KVM defines many constants
of the form ``KVM_CAP_*``, each corresponding to a set of functionality
provided by one or more ioctls. Availability of these "capabilities" can
be checked with :ref:`KVM_CHECK_EXTENSION <KVM_CHECK_EXTENSION>`. Some
capabilities also need to be enabled for VMs or VCPUs where their
functionality is desired (see :ref:`cap_enable` and :ref:`cap_enable_vm`).
2. Restrictions
===============
In general file descriptors can be migrated among processes by means
of fork() and the SCM_RIGHTS facility of unix domain socket. These
@ -96,12 +108,9 @@ description:
Capability:
which KVM extension provides this ioctl. Can be 'basic',
which means that is will be provided by any kernel that supports
API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
means availability needs to be checked with KVM_CHECK_EXTENSION
(see section 4.4), or 'none' which means that while not all kernels
support this ioctl, there's no capability bit to check its
availability: for kernels that don't support the ioctl,
the ioctl returns -ENOTTY.
API version 12 (see :ref:`KVM_GET_API_VERSION <KVM_GET_API_VERSION>`),
or a KVM_CAP_xyz constant that can be checked with
:ref:`KVM_CHECK_EXTENSION <KVM_CHECK_EXTENSION>`.
Architectures:
which instruction set architectures provide this ioctl.
@ -118,6 +127,8 @@ description:
are not detailed, but errors with specific meanings are.
.. _KVM_GET_API_VERSION:
4.1 KVM_GET_API_VERSION
-----------------------
@ -246,6 +257,8 @@ This list also varies by kvm version and host processor, but does not change
otherwise.
.. _KVM_CHECK_EXTENSION:
4.4 KVM_CHECK_EXTENSION
-----------------------
@ -288,7 +301,7 @@ the VCPU file descriptor can be mmap-ed, including:
- if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at
KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on
KVM_CAP_DIRTY_LOG_RING, see section 8.3.
KVM_CAP_DIRTY_LOG_RING, see :ref:`KVM_CAP_DIRTY_LOG_RING`.
4.7 KVM_CREATE_VCPU
@ -338,8 +351,8 @@ KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
cpu's hardware control block.
4.8 KVM_GET_DIRTY_LOG (vm ioctl)
--------------------------------
4.8 KVM_GET_DIRTY_LOG
---------------------
:Capability: basic
:Architectures: all
@ -1298,7 +1311,7 @@ See KVM_GET_VCPU_EVENTS for the data structure.
:Capability: KVM_CAP_DEBUGREGS
:Architectures: x86
:Type: vm ioctl
:Type: vcpu ioctl
:Parameters: struct kvm_debugregs (out)
:Returns: 0 on success, -1 on error
@ -1320,7 +1333,7 @@ Reads debug registers from the vcpu.
:Capability: KVM_CAP_DEBUGREGS
:Architectures: x86
:Type: vm ioctl
:Type: vcpu ioctl
:Parameters: struct kvm_debugregs (in)
:Returns: 0 on success, -1 on error
@ -1429,6 +1442,8 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
.. _KVM_ENABLE_CAP:
4.37 KVM_ENABLE_CAP
-------------------
@ -2116,8 +2131,8 @@ TLB, prior to calling KVM_RUN on the associated vcpu.
The "bitmap" field is the userspace address of an array. This array
consists of a number of bits, equal to the total number of TLB entries as
determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
nearest multiple of 64.
determined by the last successful call to ``KVM_ENABLE_CAP(KVM_CAP_SW_TLB)``,
rounded up to the nearest multiple of 64.
Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
array.
@ -2170,42 +2185,6 @@ userspace update the TCE table directly which is useful in some
circumstances.
4.63 KVM_ALLOCATE_RMA
---------------------
:Capability: KVM_CAP_PPC_RMA
:Architectures: powerpc
:Type: vm ioctl
:Parameters: struct kvm_allocate_rma (out)
:Returns: file descriptor for mapping the allocated RMA
This allocates a Real Mode Area (RMA) from the pool allocated at boot
time by the kernel. An RMA is a physically-contiguous, aligned region
of memory used on older POWER processors to provide the memory which
will be accessed by real-mode (MMU off) accesses in a KVM guest.
POWER processors support a set of sizes for the RMA that usually
includes 64MB, 128MB, 256MB and some larger powers of two.
::
/* for KVM_ALLOCATE_RMA */
struct kvm_allocate_rma {
__u64 rma_size;
};
The return value is a file descriptor which can be passed to mmap(2)
to map the allocated RMA into userspace. The mapped area can then be
passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
RMA for a virtual machine. The size of the RMA in bytes (which is
fixed at host kernel boot time) is returned in the rma_size field of
the argument structure.
The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
is supported; 2 if the processor requires all virtual machines to have
an RMA, or 1 if the processor can use an RMA but doesn't require it,
because it supports the Virtual RMA (VRMA) facility.
4.64 KVM_NMI
------------
@ -2602,7 +2581,7 @@ Specifically:
======================= ========= ===== =======================================
.. [1] These encodings are not accepted for SVE-enabled vcpus. See
KVM_ARM_VCPU_INIT.
:ref:`KVM_ARM_VCPU_INIT`.
The equivalent register content can be accessed via bits [127:0] of
the corresponding SVE Zn registers instead for vcpus that have SVE
@ -3593,6 +3572,27 @@ Errors:
This ioctl returns the guest registers that are supported for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
Note that s390 does not support KVM_GET_REG_LIST for historical reasons
(read: nobody cared). The set of registers in kernels 4.x and newer is:
- KVM_REG_S390_TODPR
- KVM_REG_S390_EPOCHDIFF
- KVM_REG_S390_CPU_TIMER
- KVM_REG_S390_CLOCK_COMP
- KVM_REG_S390_PFTOKEN
- KVM_REG_S390_PFCOMPARE
- KVM_REG_S390_PFSELECT
- KVM_REG_S390_PP
- KVM_REG_S390_GBEA
4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
-----------------------------------------
@ -4956,8 +4956,8 @@ Coalesced pio is based on coalesced mmio. There is little difference
between coalesced mmio and pio except that coalesced pio records accesses
to I/O ports.
4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
------------------------------------
4.117 KVM_CLEAR_DIRTY_LOG
-------------------------
:Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
:Architectures: x86, arm64, mips
@ -5093,8 +5093,8 @@ Recognised values for feature:
Finalizes the configuration of the specified vcpu feature.
The vcpu must already have been initialised, enabling the affected feature, by
means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in
features[].
means of a successful :ref:`KVM_ARM_VCPU_INIT <KVM_ARM_VCPU_INIT>` call with the
appropriate flag set in features[].
For affected vcpu features, this is a mandatory step that must be performed
before the vcpu is fully usable.
@ -5266,7 +5266,7 @@ the cpu reset definition in the POP (Principles Of Operation).
4.123 KVM_S390_INITIAL_RESET
----------------------------
:Capability: none
:Capability: basic
:Architectures: s390
:Type: vcpu ioctl
:Parameters: none
@ -6205,7 +6205,7 @@ applied.
.. _KVM_ARM_GET_REG_WRITABLE_MASKS:
4.139 KVM_ARM_GET_REG_WRITABLE_MASKS
-------------------------------------------
------------------------------------
:Capability: KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES
:Architectures: arm64
@ -6443,6 +6443,8 @@ the capability to be present.
`flags` must currently be zero.
.. _kvm_run:
5. The kvm_run structure
========================
@ -6855,6 +6857,10 @@ the first `ndata` items (possibly zero) of the data array are valid.
the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
specification.
- for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
specification.
- for RISC-V, data[0] is set to the value of the second argument of the
``sbi_system_reset`` call.
@ -6888,6 +6894,12 @@ either:
- Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
"Caller responsibilities" for possible return values.
Hibernation using the PSCI SYSTEM_OFF2 call is enabled when PSCI v1.3
is enabled. If a guest invokes the PSCI SYSTEM_OFF2 function, KVM will
exit to userspace with the KVM_SYSTEM_EVENT_SHUTDOWN event type and with
data[0] set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only
supported hibernate type for the SYSTEM_OFF2 function is HIBERNATE_OFF.
::
/* KVM_EXIT_IOAPIC_EOI */
@ -7162,11 +7174,15 @@ primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
.. _cap_enable:
6. Capabilities that can be enabled on vCPUs
============================================
There are certain capabilities that change the behavior of the virtual CPU or
the virtual machine when enabled. To enable them, please see section 4.37.
the virtual machine when enabled. To enable them, please see
:ref:`KVM_ENABLE_CAP`.
Below you can find a list of capabilities and what their effect on the vCPU or
the virtual machine is when enabling them.
@ -7375,7 +7391,7 @@ KVM API and also from the guest.
sets are supported
(bitfields defined in arch/x86/include/uapi/asm/kvm.h).
As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
As described above in the kvm_sync_regs struct info in section :ref:`kvm_run`,
KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
without having to call SET/GET_*REGS". This reduces overhead by eliminating
repeated ioctl calls for setting and/or getting register values. This is
@ -7421,13 +7437,15 @@ Unused bitfields in the bitarrays must be set to zero.
This capability connects the vcpu to an in-kernel XIVE device.
.. _cap_enable_vm:
7. Capabilities that can be enabled on VMs
==========================================
There are certain capabilities that change the behavior of the virtual
machine when enabled. To enable them, please see section 4.37. Below
you can find a list of capabilities and what their effect on the VM
is when enabling them.
machine when enabled. To enable them, please see section
:ref:`KVM_ENABLE_CAP`. Below you can find a list of capabilities and
what their effect on the VM is when enabling them.
The following information is provided along with the description:
@ -8107,6 +8125,28 @@ KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM
or moved memslot isn't reachable, i.e KVM
_may_ invalidate only SPTEs related to the
memslot.
KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
vCPU's MSR_IA32_PERF_CAPABILITIES (0x345),
MSR_IA32_ARCH_CAPABILITIES (0x10a),
MSR_PLATFORM_INFO (0xce), and all VMX MSRs
(0x480..0x492) to the maximal capabilities
supported by KVM. KVM also sets
MSR_IA32_UCODE_REV (0x8b) to an arbitrary
value (which is different for Intel vs.
AMD). Lastly, when guest CPUID is set (by
userspace), KVM modifies select VMX MSR
fields to force consistency between guest
CPUID and L2's effective ISA. When this
quirk is disabled, KVM zeroes the vCPU's MSR
values (with two exceptions, see below),
i.e. treats the feature MSRs like CPUID
leaves and gives userspace full control of
the vCPU model definition. This quirk does
not affect VMX MSRs CR0/CR4_FIXED1 (0x487
and 0x489), as KVM does now allow them to
be set by userspace (KVM sets them based on
guest CPUID, for safety purposes).
=================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID
@ -8588,6 +8628,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
(0x40000001). Otherwise, a guest may use the paravirtual features
regardless of what has actually been exposed through the CPUID leaf.
.. _KVM_CAP_DIRTY_LOG_RING:
8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
----------------------------------------------------------

View File

@ -135,8 +135,8 @@ We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
For direct sp, we can easily avoid it since the spte of direct sp is fixed
to gfn. For indirect sp, we disabled fast page fault for simplicity.
A solution for indirect sp could be to pin the gfn, for example via
gfn_to_pfn_memslot_atomic, before the cmpxchg. After the pinning:
A solution for indirect sp could be to pin the gfn before the cmpxchg. After
the pinning:
- We have held the refcount of pfn; that means the pfn can not be freed and
be reused for another gfn.
@ -147,49 +147,51 @@ Then, we can ensure the dirty bitmaps is correctly set for a gfn.
2) Dirty bit tracking
In the origin code, the spte can be fast updated (non-atomically) if the
In the original code, the spte can be fast updated (non-atomically) if the
spte is read-only and the Accessed bit has already been set since the
Accessed bit and Dirty bit can not be lost.
But it is not true after fast page fault since the spte can be marked
writable between reading spte and updating spte. Like below case:
+------------------------------------------------------------------------+
| At the beginning:: |
| |
| spte.W = 0 |
| spte.Accessed = 1 |
+------------------------------------+-----------------------------------+
| CPU 0: | CPU 1: |
+------------------------------------+-----------------------------------+
| In mmu_spte_clear_track_bits():: | |
| | |
| old_spte = *spte; | |
| | |
| | |
| /* 'if' condition is satisfied. */| |
| if (old_spte.Accessed == 1 && | |
| old_spte.W == 0) | |
| spte = 0ull; | |
+------------------------------------+-----------------------------------+
| | on fast page fault path:: |
| | |
| | spte.W = 1 |
| | |
| | memory write on the spte:: |
| | |
| | spte.Dirty = 1 |
+------------------------------------+-----------------------------------+
| :: | |
| | |
| else | |
| old_spte = xchg(spte, 0ull) | |
| if (old_spte.Accessed == 1) | |
| kvm_set_pfn_accessed(spte.pfn);| |
| if (old_spte.Dirty == 1) | |
| kvm_set_pfn_dirty(spte.pfn); | |
| OOPS!!! | |
+------------------------------------+-----------------------------------+
+-------------------------------------------------------------------------+
| At the beginning:: |
| |
| spte.W = 0 |
| spte.Accessed = 1 |
+-------------------------------------+-----------------------------------+
| CPU 0: | CPU 1: |
+-------------------------------------+-----------------------------------+
| In mmu_spte_update():: | |
| | |
| old_spte = *spte; | |
| | |
| | |
| /* 'if' condition is satisfied. */ | |
| if (old_spte.Accessed == 1 && | |
| old_spte.W == 0) | |
| spte = new_spte; | |
+-------------------------------------+-----------------------------------+
| | on fast page fault path:: |
| | |
| | spte.W = 1 |
| | |
| | memory write on the spte:: |
| | |
| | spte.Dirty = 1 |
+-------------------------------------+-----------------------------------+
| :: | |
| | |
| else | |
| old_spte = xchg(spte, new_spte);| |
| if (old_spte.Accessed && | |
| !new_spte.Accessed) | |
| flush = true; | |
| if (old_spte.Dirty && | |
| !new_spte.Dirty) | |
| flush = true; | |
| OOPS!!! | |
+-------------------------------------+-----------------------------------+
The Dirty bit is lost in this case.

View File

@ -33,6 +33,18 @@ Note however that any software (e.g ``WIN87EM.DLL``) expecting these features
to be present likely predates these CPUID feature bits, and therefore
doesn't know to check for them anyway.
``KVM_SET_VCPU_EVENTS`` issue
-----------------------------
Invalid KVM_SET_VCPU_EVENTS input with respect to error codes *may* result in
failed VM-Entry on Intel CPUs. Pre-CET Intel CPUs require that exception
injection through the VMCS correctly set the "error code valid" flag, e.g.
require the flag be set when injecting a #GP, clear when injecting a #UD,
clear when injecting a soft exception, etc. Intel CPUs that enumerate
IA32_VMX_BASIC[56] as '1' relax VMX's consistency checks, and AMD CPUs have no
restrictions whatsoever. KVM_SET_VCPU_EVENTS doesn't sanity check the vector
versus "has_error_code", i.e. KVM's ABI follows AMD behavior.
Nested virtualization features
------------------------------

View File

@ -46,6 +46,7 @@ struct cpuinfo_arm64 {
u64 reg_revidr;
u64 reg_gmid;
u64 reg_smidr;
u64 reg_mpamidr;
u64 reg_id_aa64dfr0;
u64 reg_id_aa64dfr1;

View File

@ -62,6 +62,11 @@ cpucap_is_possible(const unsigned int cap)
return IS_ENABLED(CONFIG_ARM64_WORKAROUND_REPEAT_TLBI);
case ARM64_WORKAROUND_SPECULATIVE_SSBS:
return IS_ENABLED(CONFIG_ARM64_ERRATUM_3194386);
case ARM64_MPAM:
/*
* KVM MPAM support doesn't rely on the host kernel supporting MPAM.
*/
return true;
}
return true;

View File

@ -613,6 +613,13 @@ static inline bool id_aa64pfr1_sme(u64 pfr1)
return val > 0;
}
static inline bool id_aa64pfr0_mpam(u64 pfr0)
{
u32 val = cpuid_feature_extract_unsigned_field(pfr0, ID_AA64PFR0_EL1_MPAM_SHIFT);
return val > 0;
}
static inline bool id_aa64pfr1_mte(u64 pfr1)
{
u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_EL1_MTE_SHIFT);
@ -850,6 +857,16 @@ static inline bool system_supports_haft(void)
cpus_have_final_cap(ARM64_HAFT);
}
static __always_inline bool system_supports_mpam(void)
{
return alternative_has_cap_unlikely(ARM64_MPAM);
}
static __always_inline bool system_supports_mpam_hcr(void)
{
return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
}
int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
bool try_emulate_mrs(struct pt_regs *regs, u32 isn);

View File

@ -249,6 +249,19 @@
msr spsr_el2, x0
.endm
.macro __init_el2_mpam
/* Memory Partitioning And Monitoring: disable EL2 traps */
mrs x1, id_aa64pfr0_el1
ubfx x0, x1, #ID_AA64PFR0_EL1_MPAM_SHIFT, #4
cbz x0, .Lskip_mpam_\@ // skip if no MPAM
msr_s SYS_MPAM2_EL2, xzr // use the default partition
// and disable lower traps
mrs_s x0, SYS_MPAMIDR_EL1
tbz x0, #MPAMIDR_EL1_HAS_HCR_SHIFT, .Lskip_mpam_\@ // skip if no MPAMHCR reg
msr_s SYS_MPAMHCR_EL2, xzr // clear TRAP_MPAMIDR_EL1 -> EL2
.Lskip_mpam_\@:
.endm
/**
* Initialize EL2 registers to sane values. This should be called early on all
* cores that were booted in EL2. Note that everything gets initialised as
@ -266,6 +279,7 @@
__init_el2_stage2
__init_el2_gicv3
__init_el2_hstr
__init_el2_mpam
__init_el2_nvhe_idregs
__init_el2_cptr
__init_el2_fgt

View File

@ -103,6 +103,7 @@
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
#define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En | HCRX_EL2_EnFPM)
#define MPAMHCR_HOST_FLAGS 0
/* TCR_EL2 Registers bits */
#define TCR_EL2_DS (1UL << 32)
@ -311,35 +312,6 @@
GENMASK(19, 18) | \
GENMASK(15, 0))
/* Hyp Debug Configuration Register bits */
#define MDCR_EL2_E2TB_MASK (UL(0x3))
#define MDCR_EL2_E2TB_SHIFT (UL(24))
#define MDCR_EL2_HPMFZS (UL(1) << 36)
#define MDCR_EL2_HPMFZO (UL(1) << 29)
#define MDCR_EL2_MTPME (UL(1) << 28)
#define MDCR_EL2_TDCC (UL(1) << 27)
#define MDCR_EL2_HLP (UL(1) << 26)
#define MDCR_EL2_HCCD (UL(1) << 23)
#define MDCR_EL2_TTRF (UL(1) << 19)
#define MDCR_EL2_HPMD (UL(1) << 17)
#define MDCR_EL2_TPMS (UL(1) << 14)
#define MDCR_EL2_E2PB_MASK (UL(0x3))
#define MDCR_EL2_E2PB_SHIFT (UL(12))
#define MDCR_EL2_TDRA (UL(1) << 11)
#define MDCR_EL2_TDOSA (UL(1) << 10)
#define MDCR_EL2_TDA (UL(1) << 9)
#define MDCR_EL2_TDE (UL(1) << 8)
#define MDCR_EL2_HPME (UL(1) << 7)
#define MDCR_EL2_TPM (UL(1) << 6)
#define MDCR_EL2_TPMCR (UL(1) << 5)
#define MDCR_EL2_HPMN_MASK (UL(0x1F))
#define MDCR_EL2_RES0 (GENMASK(63, 37) | \
GENMASK(35, 30) | \
GENMASK(25, 24) | \
GENMASK(22, 20) | \
BIT(18) | \
GENMASK(16, 15))
/*
* FGT register definitions
*

View File

@ -76,7 +76,6 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,

View File

@ -225,6 +225,11 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
return vcpu_has_nv(vcpu) && __is_hyp_ctxt(&vcpu->arch.ctxt);
}
static inline bool vcpu_is_host_el0(const struct kvm_vcpu *vcpu)
{
return is_hyp_ctxt(vcpu) && !vcpu_is_el2(vcpu);
}
/*
* The layout of SPSR for an AArch32 state is different when observed from an
* AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
@ -693,4 +698,8 @@ static inline bool guest_hyp_sve_traps_enabled(const struct kvm_vcpu *vcpu)
return __guest_hyp_cptr_xen_trap_enabled(vcpu, ZEN);
}
static inline void kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
{
vcpu_set_flag(vcpu, GUEST_HAS_PTRAUTH);
}
#endif /* __ARM64_KVM_EMULATE_H__ */

View File

@ -74,8 +74,6 @@ enum kvm_mode kvm_get_mode(void);
static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; };
#endif
DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
extern unsigned int __ro_after_init kvm_sve_max_vl;
extern unsigned int __ro_after_init kvm_host_sve_max_vl;
int __init kvm_arm_init_sve(void);
@ -374,7 +372,7 @@ struct kvm_arch {
u64 ctr_el0;
/* Masks for VNCR-baked sysregs */
/* Masks for VNCR-backed and general EL2 sysregs */
struct kvm_sysreg_masks *sysreg_masks;
/*
@ -408,6 +406,9 @@ struct kvm_vcpu_fault_info {
r = __VNCR_START__ + ((VNCR_ ## r) / 8), \
__after_##r = __MAX__(__before_##r - 1, r)
#define MARKER(m) \
m, __after_##m = m - 1
enum vcpu_sysreg {
__INVALID_SYSREG__, /* 0 is reserved as an invalid value */
MPIDR_EL1, /* MultiProcessor Affinity Register */
@ -468,13 +469,15 @@ enum vcpu_sysreg {
/* EL2 registers */
SCTLR_EL2, /* System Control Register (EL2) */
ACTLR_EL2, /* Auxiliary Control Register (EL2) */
MDCR_EL2, /* Monitor Debug Configuration Register (EL2) */
CPTR_EL2, /* Architectural Feature Trap Register (EL2) */
HACR_EL2, /* Hypervisor Auxiliary Control Register */
ZCR_EL2, /* SVE Control Register (EL2) */
TTBR0_EL2, /* Translation Table Base Register 0 (EL2) */
TTBR1_EL2, /* Translation Table Base Register 1 (EL2) */
TCR_EL2, /* Translation Control Register (EL2) */
PIRE0_EL2, /* Permission Indirection Register 0 (EL2) */
PIR_EL2, /* Permission Indirection Register 1 (EL2) */
POR_EL2, /* Permission Overlay Register 2 (EL2) */
SPSR_EL2, /* EL2 saved program status register */
ELR_EL2, /* EL2 exception link register */
AFSR0_EL2, /* Auxiliary Fault Status Register 0 (EL2) */
@ -494,7 +497,13 @@ enum vcpu_sysreg {
CNTHV_CTL_EL2,
CNTHV_CVAL_EL2,
__VNCR_START__, /* Any VNCR-capable reg goes after this point */
/* Anything from this can be RES0/RES1 sanitised */
MARKER(__SANITISED_REG_START__),
TCR2_EL2, /* Extended Translation Control Register (EL2) */
MDCR_EL2, /* Monitor Debug Configuration Register (EL2) */
/* Any VNCR-capable reg goes after this point */
MARKER(__VNCR_START__),
VNCR(SCTLR_EL1),/* System Control Register */
VNCR(ACTLR_EL1),/* Auxiliary Control Register */
@ -554,7 +563,7 @@ struct kvm_sysreg_masks {
struct {
u64 res0;
u64 res1;
} mask[NR_SYS_REGS - __VNCR_START__];
} mask[NR_SYS_REGS - __SANITISED_REG_START__];
};
struct kvm_cpu_context {
@ -1002,13 +1011,13 @@ static inline u64 *___ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
#define ctxt_sys_reg(c,r) (*__ctxt_sys_reg(c,r))
u64 kvm_vcpu_sanitise_vncr_reg(const struct kvm_vcpu *, enum vcpu_sysreg);
u64 kvm_vcpu_apply_reg_masks(const struct kvm_vcpu *, enum vcpu_sysreg, u64);
#define __vcpu_sys_reg(v,r) \
(*({ \
const struct kvm_cpu_context *ctxt = &(v)->arch.ctxt; \
u64 *__r = __ctxt_sys_reg(ctxt, (r)); \
if (vcpu_has_nv((v)) && (r) >= __VNCR_START__) \
*__r = kvm_vcpu_sanitise_vncr_reg((v), (r)); \
if (vcpu_has_nv((v)) && (r) >= __SANITISED_REG_START__) \
*__r = kvm_vcpu_apply_reg_masks((v), (r), *__r);\
__r; \
}))
@ -1037,6 +1046,10 @@ static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
case TTBR0_EL1: *val = read_sysreg_s(SYS_TTBR0_EL12); break;
case TTBR1_EL1: *val = read_sysreg_s(SYS_TTBR1_EL12); break;
case TCR_EL1: *val = read_sysreg_s(SYS_TCR_EL12); break;
case TCR2_EL1: *val = read_sysreg_s(SYS_TCR2_EL12); break;
case PIR_EL1: *val = read_sysreg_s(SYS_PIR_EL12); break;
case PIRE0_EL1: *val = read_sysreg_s(SYS_PIRE0_EL12); break;
case POR_EL1: *val = read_sysreg_s(SYS_POR_EL12); break;
case ESR_EL1: *val = read_sysreg_s(SYS_ESR_EL12); break;
case AFSR0_EL1: *val = read_sysreg_s(SYS_AFSR0_EL12); break;
case AFSR1_EL1: *val = read_sysreg_s(SYS_AFSR1_EL12); break;
@ -1083,6 +1096,10 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
case TTBR0_EL1: write_sysreg_s(val, SYS_TTBR0_EL12); break;
case TTBR1_EL1: write_sysreg_s(val, SYS_TTBR1_EL12); break;
case TCR_EL1: write_sysreg_s(val, SYS_TCR_EL12); break;
case TCR2_EL1: write_sysreg_s(val, SYS_TCR2_EL12); break;
case PIR_EL1: write_sysreg_s(val, SYS_PIR_EL12); break;
case PIRE0_EL1: write_sysreg_s(val, SYS_PIRE0_EL12); break;
case POR_EL1: write_sysreg_s(val, SYS_POR_EL12); break;
case ESR_EL1: write_sysreg_s(val, SYS_ESR_EL12); break;
case AFSR0_EL1: write_sysreg_s(val, SYS_AFSR0_EL12); break;
case AFSR1_EL1: write_sysreg_s(val, SYS_AFSR1_EL12); break;
@ -1140,7 +1157,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
void kvm_arm_halt_guest(struct kvm *kvm);
void kvm_arm_resume_guest(struct kvm *kvm);
#define vcpu_has_run_once(vcpu) !!rcu_access_pointer((vcpu)->pid)
#define vcpu_has_run_once(vcpu) (!!READ_ONCE((vcpu)->pid))
#ifndef __KVM_NVHE_HYPERVISOR__
#define kvm_call_hyp_nvhe(f, ...) \
@ -1503,4 +1520,13 @@ void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val);
(system_supports_fpmr() && \
kvm_has_feat((k), ID_AA64PFR2_EL1, FPMR, IMP))
#define kvm_has_tcr2(k) \
(kvm_has_feat((k), ID_AA64MMFR3_EL1, TCRX, IMP))
#define kvm_has_s1pie(k) \
(kvm_has_feat((k), ID_AA64MMFR3_EL1, S1PIE, IMP))
#define kvm_has_s1poe(k) \
(kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP))
#endif /* __ARM64_KVM_HOST_H__ */

View File

@ -674,10 +674,8 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
*
* If there is a valid, leaf page-table entry used to translate @addr, then
* set the access flag in that entry.
*
* Return: The old page-table entry prior to setting the flag, 0 on failure.
*/
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
* kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access

View File

@ -542,18 +542,6 @@
#define SYS_MAIR_EL2 sys_reg(3, 4, 10, 2, 0)
#define SYS_AMAIR_EL2 sys_reg(3, 4, 10, 3, 0)
#define SYS_MPAMHCR_EL2 sys_reg(3, 4, 10, 4, 0)
#define SYS_MPAMVPMV_EL2 sys_reg(3, 4, 10, 4, 1)
#define SYS_MPAM2_EL2 sys_reg(3, 4, 10, 5, 0)
#define __SYS__MPAMVPMx_EL2(x) sys_reg(3, 4, 10, 6, x)
#define SYS_MPAMVPM0_EL2 __SYS__MPAMVPMx_EL2(0)
#define SYS_MPAMVPM1_EL2 __SYS__MPAMVPMx_EL2(1)
#define SYS_MPAMVPM2_EL2 __SYS__MPAMVPMx_EL2(2)
#define SYS_MPAMVPM3_EL2 __SYS__MPAMVPMx_EL2(3)
#define SYS_MPAMVPM4_EL2 __SYS__MPAMVPMx_EL2(4)
#define SYS_MPAMVPM5_EL2 __SYS__MPAMVPMx_EL2(5)
#define SYS_MPAMVPM6_EL2 __SYS__MPAMVPMx_EL2(6)
#define SYS_MPAMVPM7_EL2 __SYS__MPAMVPMx_EL2(7)
#define SYS_VBAR_EL2 sys_reg(3, 4, 12, 0, 0)
#define SYS_RVBAR_EL2 sys_reg(3, 4, 12, 0, 1)

View File

@ -50,7 +50,6 @@
#define VNCR_VBAR_EL1 0x250
#define VNCR_TCR2_EL1 0x270
#define VNCR_PIRE0_EL1 0x290
#define VNCR_PIRE0_EL2 0x298
#define VNCR_PIR_EL1 0x2A0
#define VNCR_POR_EL1 0x2A8
#define VNCR_ICH_LR0_EL2 0x400

View File

@ -484,6 +484,12 @@ enum {
*/
#define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (1ULL << 0)
/*
* Shutdown caused by a PSCI v1.3 SYSTEM_OFF2 call.
* Valid only when the system event has a type of KVM_SYSTEM_EVENT_SHUTDOWN.
*/
#define KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 (1ULL << 0)
/* run->fail_entry.hardware_entry_failure_reason codes. */
#define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED (1ULL << 0)

View File

@ -688,6 +688,14 @@ static const struct arm64_ftr_bits ftr_id_dfr1[] = {
ARM64_FTR_END,
};
static const struct arm64_ftr_bits ftr_mpamidr[] = {
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, MPAMIDR_EL1_PMG_MAX_SHIFT, MPAMIDR_EL1_PMG_MAX_WIDTH, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, MPAMIDR_EL1_VPMR_MAX_SHIFT, MPAMIDR_EL1_VPMR_MAX_WIDTH, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, MPAMIDR_EL1_HAS_HCR_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, MPAMIDR_EL1_PARTID_MAX_SHIFT, MPAMIDR_EL1_PARTID_MAX_WIDTH, 0),
ARM64_FTR_END,
};
/*
* Common ftr bits for a 32bit register with all hidden, strict
* attributes, with 4bit feature fields and a default safe value of
@ -808,6 +816,9 @@ static const struct __ftr_reg_entry {
ARM64_FTR_REG(SYS_ID_AA64MMFR3_EL1, ftr_id_aa64mmfr3),
ARM64_FTR_REG(SYS_ID_AA64MMFR4_EL1, ftr_id_aa64mmfr4),
/* Op1 = 0, CRn = 10, CRm = 4 */
ARM64_FTR_REG(SYS_MPAMIDR_EL1, ftr_mpamidr),
/* Op1 = 1, CRn = 0, CRm = 0 */
ARM64_FTR_REG(SYS_GMID_EL1, ftr_gmid),
@ -1167,6 +1178,9 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
cpacr_restore(cpacr);
}
if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0))
init_cpu_ftr_reg(SYS_MPAMIDR_EL1, info->reg_mpamidr);
if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
init_cpu_ftr_reg(SYS_GMID_EL1, info->reg_gmid);
}
@ -1423,6 +1437,11 @@ void update_cpu_features(int cpu,
cpacr_restore(cpacr);
}
if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) {
taint |= check_update_ftr_reg(SYS_MPAMIDR_EL1, cpu,
info->reg_mpamidr, boot->reg_mpamidr);
}
/*
* The kernel uses the LDGM/STGM instructions and the number of tags
* they read/write depends on the GMID_EL1.BS field. Check that the
@ -2389,6 +2408,36 @@ cpucap_panic_on_conflict(const struct arm64_cpu_capabilities *cap)
return !!(cap->type & ARM64_CPUCAP_PANIC_ON_CONFLICT);
}
static bool
test_has_mpam(const struct arm64_cpu_capabilities *entry, int scope)
{
if (!has_cpuid_feature(entry, scope))
return false;
/* Check firmware actually enabled MPAM on this cpu. */
return (read_sysreg_s(SYS_MPAM1_EL1) & MPAM1_EL1_MPAMEN);
}
static void
cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
{
/*
* Access by the kernel (at EL1) should use the reserved PARTID
* which is configured unrestricted. This avoids priority-inversion
* where latency sensitive tasks have to wait for a task that has
* been throttled to release the lock.
*/
write_sysreg_s(0, SYS_MPAM1_EL1);
}
static bool
test_has_mpam_hcr(const struct arm64_cpu_capabilities *entry, int scope)
{
u64 idr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
return idr & MPAMIDR_EL1_HAS_HCR;
}
static const struct arm64_cpu_capabilities arm64_features[] = {
{
.capability = ARM64_ALWAYS_BOOT,
@ -2900,6 +2949,20 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
#endif
},
#endif
{
.desc = "Memory Partitioning And Monitoring",
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.capability = ARM64_MPAM,
.matches = test_has_mpam,
.cpu_enable = cpu_enable_mpam,
ARM64_CPUID_FIELDS(ID_AA64PFR0_EL1, MPAM, 1)
},
{
.desc = "Memory Partitioning And Monitoring Virtualisation",
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.capability = ARM64_MPAM_HCR,
.matches = test_has_mpam_hcr,
},
{
.desc = "NV1",
.capability = ARM64_HAS_HCR_NV1,
@ -3436,6 +3499,36 @@ static void verify_hyp_capabilities(void)
}
}
static void verify_mpam_capabilities(void)
{
u64 cpu_idr = read_cpuid(ID_AA64PFR0_EL1);
u64 sys_idr = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
u16 cpu_partid_max, cpu_pmg_max, sys_partid_max, sys_pmg_max;
if (FIELD_GET(ID_AA64PFR0_EL1_MPAM_MASK, cpu_idr) !=
FIELD_GET(ID_AA64PFR0_EL1_MPAM_MASK, sys_idr)) {
pr_crit("CPU%d: MPAM version mismatch\n", smp_processor_id());
cpu_die_early();
}
cpu_idr = read_cpuid(MPAMIDR_EL1);
sys_idr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
if (FIELD_GET(MPAMIDR_EL1_HAS_HCR, cpu_idr) !=
FIELD_GET(MPAMIDR_EL1_HAS_HCR, sys_idr)) {
pr_crit("CPU%d: Missing MPAM HCR\n", smp_processor_id());
cpu_die_early();
}
cpu_partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, cpu_idr);
cpu_pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, cpu_idr);
sys_partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, sys_idr);
sys_pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, sys_idr);
if (cpu_partid_max < sys_partid_max || cpu_pmg_max < sys_pmg_max) {
pr_crit("CPU%d: MPAM PARTID/PMG max values are mismatched\n", smp_processor_id());
cpu_die_early();
}
}
/*
* Run through the enabled system capabilities and enable() it on this CPU.
* The capabilities were decided based on the available CPUs at the boot time.
@ -3462,6 +3555,9 @@ static void verify_local_cpu_capabilities(void)
if (is_hyp_mode_available())
verify_hyp_capabilities();
if (system_supports_mpam())
verify_mpam_capabilities();
}
void check_local_cpu_capabilities(void)

View File

@ -479,6 +479,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
__cpuinfo_store_cpu_32bit(&info->aarch32);
if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0))
info->reg_mpamidr = read_cpuid(MPAMIDR_EL1);
cpuinfo_detect_icache_policy(info);
}

View File

@ -206,8 +206,7 @@ void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
static inline bool userspace_irqchip(struct kvm *kvm)
{
return static_branch_unlikely(&userspace_irqchip_in_use) &&
unlikely(!irqchip_in_kernel(kvm));
return unlikely(!irqchip_in_kernel(kvm));
}
static void soft_timer_start(struct hrtimer *hrt, u64 ns)

View File

@ -69,7 +69,6 @@ DECLARE_KVM_NVHE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
static bool vgic_present, kvm_arm_initialised;
static DEFINE_PER_CPU(unsigned char, kvm_hyp_initialized);
DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
bool is_kvm_arm_initialised(void)
{
@ -503,9 +502,6 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
static_branch_dec(&userspace_irqchip_in_use);
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
kvm_timer_vcpu_terminate(vcpu);
kvm_pmu_vcpu_destroy(vcpu);
@ -848,22 +844,6 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
return ret;
}
if (!irqchip_in_kernel(kvm)) {
/*
* Tell the rest of the code that there are userspace irqchip
* VMs in the wild.
*/
static_branch_inc(&userspace_irqchip_in_use);
}
/*
* Initialize traps for protected VMs.
* NOTE: Move to run in EL2 directly, rather than via a hypercall, once
* the code is in place for first run initialization at EL2.
*/
if (kvm_vm_is_protected(kvm))
kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
mutex_lock(&kvm->arch.config_lock);
set_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags);
mutex_unlock(&kvm->arch.config_lock);
@ -1077,7 +1057,7 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret)
* state gets updated in kvm_timer_update_run and
* kvm_pmu_update_run below).
*/
if (static_branch_unlikely(&userspace_irqchip_in_use)) {
if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
if (kvm_timer_should_notify_user(vcpu) ||
kvm_pmu_should_notify_user(vcpu)) {
*ret = -EINTR;
@ -1199,7 +1179,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
vcpu->mode = OUTSIDE_GUEST_MODE;
isb(); /* Ensure work in x_flush_hwstate is committed */
kvm_pmu_sync_hwstate(vcpu);
if (static_branch_unlikely(&userspace_irqchip_in_use))
if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
kvm_timer_sync_user(vcpu);
kvm_vgic_sync_hwstate(vcpu);
local_irq_enable();
@ -1245,7 +1225,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
* we don't want vtimer interrupts to race with syncing the
* timer virtual interrupt state.
*/
if (static_branch_unlikely(&userspace_irqchip_in_use))
if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
kvm_timer_sync_user(vcpu);
kvm_arch_vcpu_ctxsync_fp(vcpu);

View File

@ -24,6 +24,9 @@ struct s1_walk_info {
unsigned int txsz;
int sl;
bool hpd;
bool e0poe;
bool poe;
bool pan;
bool be;
bool s2;
};
@ -37,6 +40,16 @@ struct s1_walk_result {
u8 APTable;
bool UXNTable;
bool PXNTable;
bool uwxn;
bool uov;
bool ur;
bool uw;
bool ux;
bool pwxn;
bool pov;
bool pr;
bool pw;
bool px;
};
struct {
u8 fst;
@ -87,6 +100,51 @@ static enum trans_regime compute_translation_regime(struct kvm_vcpu *vcpu, u32 o
}
}
static bool s1pie_enabled(struct kvm_vcpu *vcpu, enum trans_regime regime)
{
if (!kvm_has_s1pie(vcpu->kvm))
return false;
switch (regime) {
case TR_EL2:
case TR_EL20:
return vcpu_read_sys_reg(vcpu, TCR2_EL2) & TCR2_EL2_PIE;
case TR_EL10:
return (__vcpu_sys_reg(vcpu, HCRX_EL2) & HCRX_EL2_TCR2En) &&
(__vcpu_sys_reg(vcpu, TCR2_EL1) & TCR2_EL1x_PIE);
default:
BUG();
}
}
static void compute_s1poe(struct kvm_vcpu *vcpu, struct s1_walk_info *wi)
{
u64 val;
if (!kvm_has_s1poe(vcpu->kvm)) {
wi->poe = wi->e0poe = false;
return;
}
switch (wi->regime) {
case TR_EL2:
case TR_EL20:
val = vcpu_read_sys_reg(vcpu, TCR2_EL2);
wi->poe = val & TCR2_EL2_POE;
wi->e0poe = (wi->regime == TR_EL20) && (val & TCR2_EL2_E0POE);
break;
case TR_EL10:
if (__vcpu_sys_reg(vcpu, HCRX_EL2) & HCRX_EL2_TCR2En) {
wi->poe = wi->e0poe = false;
return;
}
val = __vcpu_sys_reg(vcpu, TCR2_EL1);
wi->poe = val & TCR2_EL1x_POE;
wi->e0poe = val & TCR2_EL1x_E0POE;
}
}
static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
struct s1_walk_result *wr, u64 va)
{
@ -98,6 +156,8 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
wi->regime = compute_translation_regime(vcpu, op);
as_el0 = (op == OP_AT_S1E0R || op == OP_AT_S1E0W);
wi->pan = (op == OP_AT_S1E1RP || op == OP_AT_S1E1WP) &&
(*vcpu_cpsr(vcpu) & PSR_PAN_BIT);
va55 = va & BIT(55);
@ -180,6 +240,14 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
(va55 ?
FIELD_GET(TCR_HPD1, tcr) :
FIELD_GET(TCR_HPD0, tcr)));
/* R_JHSVW */
wi->hpd |= s1pie_enabled(vcpu, wi->regime);
/* Do we have POE? */
compute_s1poe(vcpu, wi);
/* R_BVXDG */
wi->hpd |= (wi->poe || wi->e0poe);
/* Someone was silly enough to encode TG0/TG1 differently */
if (va55) {
@ -412,6 +480,11 @@ struct mmu_config {
u64 ttbr1;
u64 tcr;
u64 mair;
u64 tcr2;
u64 pir;
u64 pire0;
u64 por_el0;
u64 por_el1;
u64 sctlr;
u64 vttbr;
u64 vtcr;
@ -424,6 +497,17 @@ static void __mmu_config_save(struct mmu_config *config)
config->ttbr1 = read_sysreg_el1(SYS_TTBR1);
config->tcr = read_sysreg_el1(SYS_TCR);
config->mair = read_sysreg_el1(SYS_MAIR);
if (cpus_have_final_cap(ARM64_HAS_TCR2)) {
config->tcr2 = read_sysreg_el1(SYS_TCR2);
if (cpus_have_final_cap(ARM64_HAS_S1PIE)) {
config->pir = read_sysreg_el1(SYS_PIR);
config->pire0 = read_sysreg_el1(SYS_PIRE0);
}
if (system_supports_poe()) {
config->por_el1 = read_sysreg_el1(SYS_POR);
config->por_el0 = read_sysreg_s(SYS_POR_EL0);
}
}
config->sctlr = read_sysreg_el1(SYS_SCTLR);
config->vttbr = read_sysreg(vttbr_el2);
config->vtcr = read_sysreg(vtcr_el2);
@ -444,6 +528,17 @@ static void __mmu_config_restore(struct mmu_config *config)
write_sysreg_el1(config->ttbr1, SYS_TTBR1);
write_sysreg_el1(config->tcr, SYS_TCR);
write_sysreg_el1(config->mair, SYS_MAIR);
if (cpus_have_final_cap(ARM64_HAS_TCR2)) {
write_sysreg_el1(config->tcr2, SYS_TCR2);
if (cpus_have_final_cap(ARM64_HAS_S1PIE)) {
write_sysreg_el1(config->pir, SYS_PIR);
write_sysreg_el1(config->pire0, SYS_PIRE0);
}
if (system_supports_poe()) {
write_sysreg_el1(config->por_el1, SYS_POR);
write_sysreg_s(config->por_el0, SYS_POR_EL0);
}
}
write_sysreg_el1(config->sctlr, SYS_SCTLR);
write_sysreg(config->vttbr, vttbr_el2);
write_sysreg(config->vtcr, vtcr_el2);
@ -739,6 +834,9 @@ static bool pan3_enabled(struct kvm_vcpu *vcpu, enum trans_regime regime)
if (!kvm_has_feat(vcpu->kvm, ID_AA64MMFR1_EL1, PAN, PAN3))
return false;
if (s1pie_enabled(vcpu, regime))
return true;
if (regime == TR_EL10)
sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
else
@ -747,11 +845,307 @@ static bool pan3_enabled(struct kvm_vcpu *vcpu, enum trans_regime regime)
return sctlr & SCTLR_EL1_EPAN;
}
static void compute_s1_direct_permissions(struct kvm_vcpu *vcpu,
struct s1_walk_info *wi,
struct s1_walk_result *wr)
{
bool wxn;
/* Non-hierarchical part of AArch64.S1DirectBasePermissions() */
if (wi->regime != TR_EL2) {
switch (FIELD_GET(PTE_USER | PTE_RDONLY, wr->desc)) {
case 0b00:
wr->pr = wr->pw = true;
wr->ur = wr->uw = false;
break;
case 0b01:
wr->pr = wr->pw = wr->ur = wr->uw = true;
break;
case 0b10:
wr->pr = true;
wr->pw = wr->ur = wr->uw = false;
break;
case 0b11:
wr->pr = wr->ur = true;
wr->pw = wr->uw = false;
break;
}
/* We don't use px for anything yet, but hey... */
wr->px = !((wr->desc & PTE_PXN) || wr->uw);
wr->ux = !(wr->desc & PTE_UXN);
} else {
wr->ur = wr->uw = wr->ux = false;
if (!(wr->desc & PTE_RDONLY)) {
wr->pr = wr->pw = true;
} else {
wr->pr = true;
wr->pw = false;
}
/* XN maps to UXN */
wr->px = !(wr->desc & PTE_UXN);
}
switch (wi->regime) {
case TR_EL2:
case TR_EL20:
wxn = (vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_ELx_WXN);
break;
case TR_EL10:
wxn = (__vcpu_sys_reg(vcpu, SCTLR_EL1) & SCTLR_ELx_WXN);
break;
}
wr->pwxn = wr->uwxn = wxn;
wr->pov = wi->poe;
wr->uov = wi->e0poe;
}
static void compute_s1_hierarchical_permissions(struct kvm_vcpu *vcpu,
struct s1_walk_info *wi,
struct s1_walk_result *wr)
{
/* Hierarchical part of AArch64.S1DirectBasePermissions() */
if (wi->regime != TR_EL2) {
switch (wr->APTable) {
case 0b00:
break;
case 0b01:
wr->ur = wr->uw = false;
break;
case 0b10:
wr->pw = wr->uw = false;
break;
case 0b11:
wr->pw = wr->ur = wr->uw = false;
break;
}
wr->px &= !wr->PXNTable;
wr->ux &= !wr->UXNTable;
} else {
if (wr->APTable & BIT(1))
wr->pw = false;
/* XN maps to UXN */
wr->px &= !wr->UXNTable;
}
}
#define perm_idx(v, r, i) ((vcpu_read_sys_reg((v), (r)) >> ((i) * 4)) & 0xf)
#define set_priv_perms(wr, r, w, x) \
do { \
(wr)->pr = (r); \
(wr)->pw = (w); \
(wr)->px = (x); \
} while (0)
#define set_unpriv_perms(wr, r, w, x) \
do { \
(wr)->ur = (r); \
(wr)->uw = (w); \
(wr)->ux = (x); \
} while (0)
#define set_priv_wxn(wr, v) \
do { \
(wr)->pwxn = (v); \
} while (0)
#define set_unpriv_wxn(wr, v) \
do { \
(wr)->uwxn = (v); \
} while (0)
/* Similar to AArch64.S1IndirectBasePermissions(), without GCS */
#define set_perms(w, wr, ip) \
do { \
/* R_LLZDZ */ \
switch ((ip)) { \
case 0b0000: \
set_ ## w ## _perms((wr), false, false, false); \
break; \
case 0b0001: \
set_ ## w ## _perms((wr), true , false, false); \
break; \
case 0b0010: \
set_ ## w ## _perms((wr), false, false, true ); \
break; \
case 0b0011: \
set_ ## w ## _perms((wr), true , false, true ); \
break; \
case 0b0100: \
set_ ## w ## _perms((wr), false, false, false); \
break; \
case 0b0101: \
set_ ## w ## _perms((wr), true , true , false); \
break; \
case 0b0110: \
set_ ## w ## _perms((wr), true , true , true ); \
break; \
case 0b0111: \
set_ ## w ## _perms((wr), true , true , true ); \
break; \
case 0b1000: \
set_ ## w ## _perms((wr), true , false, false); \
break; \
case 0b1001: \
set_ ## w ## _perms((wr), true , false, false); \
break; \
case 0b1010: \
set_ ## w ## _perms((wr), true , false, true ); \
break; \
case 0b1011: \
set_ ## w ## _perms((wr), false, false, false); \
break; \
case 0b1100: \
set_ ## w ## _perms((wr), true , true , false); \
break; \
case 0b1101: \
set_ ## w ## _perms((wr), false, false, false); \
break; \
case 0b1110: \
set_ ## w ## _perms((wr), true , true , true ); \
break; \
case 0b1111: \
set_ ## w ## _perms((wr), false, false, false); \
break; \
} \
\
/* R_HJYGR */ \
set_ ## w ## _wxn((wr), ((ip) == 0b0110)); \
\
} while (0)
static void compute_s1_indirect_permissions(struct kvm_vcpu *vcpu,
struct s1_walk_info *wi,
struct s1_walk_result *wr)
{
u8 up, pp, idx;
idx = pte_pi_index(wr->desc);
switch (wi->regime) {
case TR_EL10:
pp = perm_idx(vcpu, PIR_EL1, idx);
up = perm_idx(vcpu, PIRE0_EL1, idx);
break;
case TR_EL20:
pp = perm_idx(vcpu, PIR_EL2, idx);
up = perm_idx(vcpu, PIRE0_EL2, idx);
break;
case TR_EL2:
pp = perm_idx(vcpu, PIR_EL2, idx);
up = 0;
break;
}
set_perms(priv, wr, pp);
if (wi->regime != TR_EL2)
set_perms(unpriv, wr, up);
else
set_unpriv_perms(wr, false, false, false);
wr->pov = wi->poe && !(pp & BIT(3));
wr->uov = wi->e0poe && !(up & BIT(3));
/* R_VFPJF */
if (wr->px && wr->uw) {
set_priv_perms(wr, false, false, false);
set_unpriv_perms(wr, false, false, false);
}
}
static void compute_s1_overlay_permissions(struct kvm_vcpu *vcpu,
struct s1_walk_info *wi,
struct s1_walk_result *wr)
{
u8 idx, pov_perms, uov_perms;
idx = FIELD_GET(PTE_PO_IDX_MASK, wr->desc);
switch (wi->regime) {
case TR_EL10:
pov_perms = perm_idx(vcpu, POR_EL1, idx);
uov_perms = perm_idx(vcpu, POR_EL0, idx);
break;
case TR_EL20:
pov_perms = perm_idx(vcpu, POR_EL2, idx);
uov_perms = perm_idx(vcpu, POR_EL0, idx);
break;
case TR_EL2:
pov_perms = perm_idx(vcpu, POR_EL2, idx);
uov_perms = 0;
break;
}
if (pov_perms & ~POE_RXW)
pov_perms = POE_NONE;
if (wi->poe && wr->pov) {
wr->pr &= pov_perms & POE_R;
wr->px &= pov_perms & POE_X;
wr->pw &= pov_perms & POE_W;
}
if (uov_perms & ~POE_RXW)
uov_perms = POE_NONE;
if (wi->e0poe && wr->uov) {
wr->ur &= uov_perms & POE_R;
wr->ux &= uov_perms & POE_X;
wr->uw &= uov_perms & POE_W;
}
}
static void compute_s1_permissions(struct kvm_vcpu *vcpu,
struct s1_walk_info *wi,
struct s1_walk_result *wr)
{
bool pan;
if (!s1pie_enabled(vcpu, wi->regime))
compute_s1_direct_permissions(vcpu, wi, wr);
else
compute_s1_indirect_permissions(vcpu, wi, wr);
if (!wi->hpd)
compute_s1_hierarchical_permissions(vcpu, wi, wr);
if (wi->poe || wi->e0poe)
compute_s1_overlay_permissions(vcpu, wi, wr);
/* R_QXXPC */
if (wr->pwxn) {
if (!wr->pov && wr->pw)
wr->px = false;
if (wr->pov && wr->px)
wr->pw = false;
}
/* R_NPBXC */
if (wr->uwxn) {
if (!wr->uov && wr->uw)
wr->ux = false;
if (wr->uov && wr->ux)
wr->uw = false;
}
pan = wi->pan && (wr->ur || wr->uw ||
(pan3_enabled(vcpu, wi->regime) && wr->ux));
wr->pw &= !pan;
wr->pr &= !pan;
}
static u64 handle_at_slow(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
{
bool perm_fail, ur, uw, ux, pr, pw, px;
struct s1_walk_result wr = {};
struct s1_walk_info wi = {};
bool perm_fail = false;
int ret, idx;
ret = setup_s1_walk(vcpu, op, &wi, &wr, vaddr);
@ -770,88 +1164,24 @@ static u64 handle_at_slow(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
if (ret)
goto compute_par;
/* FIXME: revisit when adding indirect permission support */
/* AArch64.S1DirectBasePermissions() */
if (wi.regime != TR_EL2) {
switch (FIELD_GET(PTE_USER | PTE_RDONLY, wr.desc)) {
case 0b00:
pr = pw = true;
ur = uw = false;
break;
case 0b01:
pr = pw = ur = uw = true;
break;
case 0b10:
pr = true;
pw = ur = uw = false;
break;
case 0b11:
pr = ur = true;
pw = uw = false;
break;
}
switch (wr.APTable) {
case 0b00:
break;
case 0b01:
ur = uw = false;
break;
case 0b10:
pw = uw = false;
break;
case 0b11:
pw = ur = uw = false;
break;
}
/* We don't use px for anything yet, but hey... */
px = !((wr.desc & PTE_PXN) || wr.PXNTable || uw);
ux = !((wr.desc & PTE_UXN) || wr.UXNTable);
if (op == OP_AT_S1E1RP || op == OP_AT_S1E1WP) {
bool pan;
pan = *vcpu_cpsr(vcpu) & PSR_PAN_BIT;
pan &= ur || uw || (pan3_enabled(vcpu, wi.regime) && ux);
pw &= !pan;
pr &= !pan;
}
} else {
ur = uw = ux = false;
if (!(wr.desc & PTE_RDONLY)) {
pr = pw = true;
} else {
pr = true;
pw = false;
}
if (wr.APTable & BIT(1))
pw = false;
/* XN maps to UXN */
px = !((wr.desc & PTE_UXN) || wr.UXNTable);
}
perm_fail = false;
compute_s1_permissions(vcpu, &wi, &wr);
switch (op) {
case OP_AT_S1E1RP:
case OP_AT_S1E1R:
case OP_AT_S1E2R:
perm_fail = !pr;
perm_fail = !wr.pr;
break;
case OP_AT_S1E1WP:
case OP_AT_S1E1W:
case OP_AT_S1E2W:
perm_fail = !pw;
perm_fail = !wr.pw;
break;
case OP_AT_S1E0R:
perm_fail = !ur;
perm_fail = !wr.ur;
break;
case OP_AT_S1E0W:
perm_fail = !uw;
perm_fail = !wr.uw;
break;
case OP_AT_S1E1A:
case OP_AT_S1E2A:
@ -914,6 +1244,17 @@ static u64 __kvm_at_s1e01_fast(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
write_sysreg_el1(vcpu_read_sys_reg(vcpu, TTBR1_EL1), SYS_TTBR1);
write_sysreg_el1(vcpu_read_sys_reg(vcpu, TCR_EL1), SYS_TCR);
write_sysreg_el1(vcpu_read_sys_reg(vcpu, MAIR_EL1), SYS_MAIR);
if (kvm_has_tcr2(vcpu->kvm)) {
write_sysreg_el1(vcpu_read_sys_reg(vcpu, TCR2_EL1), SYS_TCR2);
if (kvm_has_s1pie(vcpu->kvm)) {
write_sysreg_el1(vcpu_read_sys_reg(vcpu, PIR_EL1), SYS_PIR);
write_sysreg_el1(vcpu_read_sys_reg(vcpu, PIRE0_EL1), SYS_PIRE0);
}
if (kvm_has_s1poe(vcpu->kvm)) {
write_sysreg_el1(vcpu_read_sys_reg(vcpu, POR_EL1), SYS_POR);
write_sysreg_s(vcpu_read_sys_reg(vcpu, POR_EL0), SYS_POR_EL0);
}
}
write_sysreg_el1(vcpu_read_sys_reg(vcpu, SCTLR_EL1), SYS_SCTLR);
__load_stage2(mmu, mmu->arch);
@ -992,12 +1333,9 @@ void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
* switching context behind everybody's back, disable interrupts...
*/
scoped_guard(write_lock_irqsave, &vcpu->kvm->mmu_lock) {
struct kvm_s2_mmu *mmu;
u64 val, hcr;
bool fail;
mmu = &vcpu->kvm->arch.mmu;
val = hcr = read_sysreg(hcr_el2);
val &= ~HCR_TGE;
val |= HCR_VM;

View File

@ -16,9 +16,13 @@
enum trap_behaviour {
BEHAVE_HANDLE_LOCALLY = 0,
BEHAVE_FORWARD_READ = BIT(0),
BEHAVE_FORWARD_WRITE = BIT(1),
BEHAVE_FORWARD_ANY = BEHAVE_FORWARD_READ | BEHAVE_FORWARD_WRITE,
BEHAVE_FORWARD_RW = BEHAVE_FORWARD_READ | BEHAVE_FORWARD_WRITE,
/* Traps that take effect in Host EL0, this is rare! */
BEHAVE_FORWARD_IN_HOST_EL0 = BIT(2),
};
struct trap_bits {
@ -79,7 +83,6 @@ enum cgt_group_id {
CGT_MDCR_E2TB,
CGT_MDCR_TDCC,
CGT_CPACR_E0POE,
CGT_CPTR_TAM,
CGT_CPTR_TCPAC,
@ -106,6 +109,7 @@ enum cgt_group_id {
CGT_HCR_TPU_TOCU,
CGT_HCR_NV1_nNV2_ENSCXT,
CGT_MDCR_TPM_TPMCR,
CGT_MDCR_TPM_HPMN,
CGT_MDCR_TDE_TDA,
CGT_MDCR_TDE_TDOSA,
CGT_MDCR_TDE_TDRA,
@ -122,6 +126,7 @@ enum cgt_group_id {
CGT_CNTHCTL_EL1PTEN,
CGT_CPTR_TTA,
CGT_MDCR_HPMN,
/* Must be last */
__NR_CGT_GROUP_IDS__
@ -138,7 +143,7 @@ static const struct trap_bits coarse_trap_bits[] = {
.index = HCR_EL2,
.value = HCR_TID2,
.mask = HCR_TID2,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TID3] = {
.index = HCR_EL2,
@ -162,37 +167,37 @@ static const struct trap_bits coarse_trap_bits[] = {
.index = HCR_EL2,
.value = HCR_TIDCP,
.mask = HCR_TIDCP,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TACR] = {
.index = HCR_EL2,
.value = HCR_TACR,
.mask = HCR_TACR,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TSW] = {
.index = HCR_EL2,
.value = HCR_TSW,
.mask = HCR_TSW,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TPC] = { /* Also called TCPC when FEAT_DPB is implemented */
.index = HCR_EL2,
.value = HCR_TPC,
.mask = HCR_TPC,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TPU] = {
.index = HCR_EL2,
.value = HCR_TPU,
.mask = HCR_TPU,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TTLB] = {
.index = HCR_EL2,
.value = HCR_TTLB,
.mask = HCR_TTLB,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TVM] = {
.index = HCR_EL2,
@ -204,7 +209,7 @@ static const struct trap_bits coarse_trap_bits[] = {
.index = HCR_EL2,
.value = HCR_TDZ,
.mask = HCR_TDZ,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TRVM] = {
.index = HCR_EL2,
@ -216,205 +221,201 @@ static const struct trap_bits coarse_trap_bits[] = {
.index = HCR_EL2,
.value = HCR_TLOR,
.mask = HCR_TLOR,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TERR] = {
.index = HCR_EL2,
.value = HCR_TERR,
.mask = HCR_TERR,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_APK] = {
.index = HCR_EL2,
.value = 0,
.mask = HCR_APK,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_NV] = {
.index = HCR_EL2,
.value = HCR_NV,
.mask = HCR_NV,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_NV_nNV2] = {
.index = HCR_EL2,
.value = HCR_NV,
.mask = HCR_NV | HCR_NV2,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_NV1_nNV2] = {
.index = HCR_EL2,
.value = HCR_NV | HCR_NV1,
.mask = HCR_NV | HCR_NV1 | HCR_NV2,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_AT] = {
.index = HCR_EL2,
.value = HCR_AT,
.mask = HCR_AT,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_nFIEN] = {
.index = HCR_EL2,
.value = 0,
.mask = HCR_FIEN,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TID4] = {
.index = HCR_EL2,
.value = HCR_TID4,
.mask = HCR_TID4,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TICAB] = {
.index = HCR_EL2,
.value = HCR_TICAB,
.mask = HCR_TICAB,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TOCU] = {
.index = HCR_EL2,
.value = HCR_TOCU,
.mask = HCR_TOCU,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_ENSCXT] = {
.index = HCR_EL2,
.value = 0,
.mask = HCR_ENSCXT,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TTLBIS] = {
.index = HCR_EL2,
.value = HCR_TTLBIS,
.mask = HCR_TTLBIS,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TTLBOS] = {
.index = HCR_EL2,
.value = HCR_TTLBOS,
.mask = HCR_TTLBOS,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_TPMCR] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TPMCR,
.mask = MDCR_EL2_TPMCR,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW |
BEHAVE_FORWARD_IN_HOST_EL0,
},
[CGT_MDCR_TPM] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TPM,
.mask = MDCR_EL2_TPM,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW |
BEHAVE_FORWARD_IN_HOST_EL0,
},
[CGT_MDCR_TDE] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TDE,
.mask = MDCR_EL2_TDE,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_TDA] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TDA,
.mask = MDCR_EL2_TDA,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_TDOSA] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TDOSA,
.mask = MDCR_EL2_TDOSA,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_TDRA] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TDRA,
.mask = MDCR_EL2_TDRA,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_E2PB] = {
.index = MDCR_EL2,
.value = 0,
.mask = BIT(MDCR_EL2_E2PB_SHIFT),
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_TPMS] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TPMS,
.mask = MDCR_EL2_TPMS,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_TTRF] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TTRF,
.mask = MDCR_EL2_TTRF,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_E2TB] = {
.index = MDCR_EL2,
.value = 0,
.mask = BIT(MDCR_EL2_E2TB_SHIFT),
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_TDCC] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TDCC,
.mask = MDCR_EL2_TDCC,
.behaviour = BEHAVE_FORWARD_ANY,
},
[CGT_CPACR_E0POE] = {
.index = CPTR_EL2,
.value = CPACR_ELx_E0POE,
.mask = CPACR_ELx_E0POE,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_CPTR_TAM] = {
.index = CPTR_EL2,
.value = CPTR_EL2_TAM,
.mask = CPTR_EL2_TAM,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_CPTR_TCPAC] = {
.index = CPTR_EL2,
.value = CPTR_EL2_TCPAC,
.mask = CPTR_EL2_TCPAC,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCRX_EnFPM] = {
.index = HCRX_EL2,
.value = 0,
.mask = HCRX_EL2_EnFPM,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCRX_TCR2En] = {
.index = HCRX_EL2,
.value = 0,
.mask = HCRX_EL2_TCR2En,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_ICH_HCR_TC] = {
.index = ICH_HCR_EL2,
.value = ICH_HCR_TC,
.mask = ICH_HCR_TC,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_ICH_HCR_TALL0] = {
.index = ICH_HCR_EL2,
.value = ICH_HCR_TALL0,
.mask = ICH_HCR_TALL0,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_ICH_HCR_TALL1] = {
.index = ICH_HCR_EL2,
.value = ICH_HCR_TALL1,
.mask = ICH_HCR_TALL1,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_ICH_HCR_TDIR] = {
.index = ICH_HCR_EL2,
.value = ICH_HCR_TDIR,
.mask = ICH_HCR_TDIR,
.behaviour = BEHAVE_FORWARD_ANY,
.behaviour = BEHAVE_FORWARD_RW,
},
};
@ -435,6 +436,7 @@ static const enum cgt_group_id *coarse_control_combo[] = {
MCB(CGT_HCR_TPU_TOCU, CGT_HCR_TPU, CGT_HCR_TOCU),
MCB(CGT_HCR_NV1_nNV2_ENSCXT, CGT_HCR_NV1_nNV2, CGT_HCR_ENSCXT),
MCB(CGT_MDCR_TPM_TPMCR, CGT_MDCR_TPM, CGT_MDCR_TPMCR),
MCB(CGT_MDCR_TPM_HPMN, CGT_MDCR_TPM, CGT_MDCR_HPMN),
MCB(CGT_MDCR_TDE_TDA, CGT_MDCR_TDE, CGT_MDCR_TDA),
MCB(CGT_MDCR_TDE_TDOSA, CGT_MDCR_TDE, CGT_MDCR_TDOSA),
MCB(CGT_MDCR_TDE_TDRA, CGT_MDCR_TDE, CGT_MDCR_TDRA),
@ -474,7 +476,7 @@ static enum trap_behaviour check_cnthctl_el1pcten(struct kvm_vcpu *vcpu)
if (get_sanitized_cnthctl(vcpu) & (CNTHCTL_EL1PCTEN << 10))
return BEHAVE_HANDLE_LOCALLY;
return BEHAVE_FORWARD_ANY;
return BEHAVE_FORWARD_RW;
}
static enum trap_behaviour check_cnthctl_el1pten(struct kvm_vcpu *vcpu)
@ -482,7 +484,7 @@ static enum trap_behaviour check_cnthctl_el1pten(struct kvm_vcpu *vcpu)
if (get_sanitized_cnthctl(vcpu) & (CNTHCTL_EL1PCEN << 10))
return BEHAVE_HANDLE_LOCALLY;
return BEHAVE_FORWARD_ANY;
return BEHAVE_FORWARD_RW;
}
static enum trap_behaviour check_cptr_tta(struct kvm_vcpu *vcpu)
@ -493,7 +495,35 @@ static enum trap_behaviour check_cptr_tta(struct kvm_vcpu *vcpu)
val = translate_cptr_el2_to_cpacr_el1(val);
if (val & CPACR_ELx_TTA)
return BEHAVE_FORWARD_ANY;
return BEHAVE_FORWARD_RW;
return BEHAVE_HANDLE_LOCALLY;
}
static enum trap_behaviour check_mdcr_hpmn(struct kvm_vcpu *vcpu)
{
u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
unsigned int idx;
switch (sysreg) {
case SYS_PMEVTYPERn_EL0(0) ... SYS_PMEVTYPERn_EL0(30):
case SYS_PMEVCNTRn_EL0(0) ... SYS_PMEVCNTRn_EL0(30):
idx = (sys_reg_CRm(sysreg) & 0x3) << 3 | sys_reg_Op2(sysreg);
break;
case SYS_PMXEVTYPER_EL0:
case SYS_PMXEVCNTR_EL0:
idx = SYS_FIELD_GET(PMSELR_EL0, SEL,
__vcpu_sys_reg(vcpu, PMSELR_EL0));
break;
default:
/* Someone used this trap helper for something else... */
KVM_BUG_ON(1, vcpu->kvm);
return BEHAVE_HANDLE_LOCALLY;
}
if (kvm_pmu_counter_is_hyp(vcpu, idx))
return BEHAVE_FORWARD_RW | BEHAVE_FORWARD_IN_HOST_EL0;
return BEHAVE_HANDLE_LOCALLY;
}
@ -505,6 +535,7 @@ static const complex_condition_check ccc[] = {
CCC(CGT_CNTHCTL_EL1PCTEN, check_cnthctl_el1pcten),
CCC(CGT_CNTHCTL_EL1PTEN, check_cnthctl_el1pten),
CCC(CGT_CPTR_TTA, check_cptr_tta),
CCC(CGT_MDCR_HPMN, check_mdcr_hpmn),
};
/*
@ -711,6 +742,10 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
SR_TRAP(SYS_MAIR_EL1, CGT_HCR_TVM_TRVM),
SR_TRAP(SYS_AMAIR_EL1, CGT_HCR_TVM_TRVM),
SR_TRAP(SYS_CONTEXTIDR_EL1, CGT_HCR_TVM_TRVM),
SR_TRAP(SYS_PIR_EL1, CGT_HCR_TVM_TRVM),
SR_TRAP(SYS_PIRE0_EL1, CGT_HCR_TVM_TRVM),
SR_TRAP(SYS_POR_EL0, CGT_HCR_TVM_TRVM),
SR_TRAP(SYS_POR_EL1, CGT_HCR_TVM_TRVM),
SR_TRAP(SYS_TCR2_EL1, CGT_HCR_TVM_TRVM_HCRX_TCR2En),
SR_TRAP(SYS_DC_ZVA, CGT_HCR_TDZ),
SR_TRAP(SYS_DC_GVA, CGT_HCR_TDZ),
@ -919,77 +954,77 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
SR_TRAP(SYS_PMOVSCLR_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMCEID0_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMCEID1_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMXEVTYPER_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMXEVTYPER_EL0, CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMSWINC_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMSELR_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMXEVCNTR_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMXEVCNTR_EL0, CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMCCNTR_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMUSERENR_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_PMINTENSET_EL1, CGT_MDCR_TPM),
SR_TRAP(SYS_PMINTENCLR_EL1, CGT_MDCR_TPM),
SR_TRAP(SYS_PMMIR_EL1, CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(0), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(1), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(2), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(3), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(4), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(5), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(6), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(7), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(8), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(9), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(10), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(11), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(12), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(13), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(14), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(15), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(16), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(17), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(18), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(19), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(20), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(21), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(22), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(23), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(24), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(25), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(26), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(27), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(28), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(29), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(30), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(0), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(1), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(2), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(3), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(4), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(5), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(6), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(7), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(8), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(9), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(10), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(11), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(12), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(13), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(14), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(15), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(16), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(17), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(18), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(19), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(20), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(21), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(22), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(23), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(24), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(25), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(26), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(27), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(28), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(29), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVTYPERn_EL0(30), CGT_MDCR_TPM),
SR_TRAP(SYS_PMEVCNTRn_EL0(0), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(1), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(2), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(3), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(4), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(5), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(6), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(7), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(8), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(9), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(10), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(11), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(12), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(13), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(14), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(15), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(16), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(17), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(18), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(19), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(20), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(21), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(22), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(23), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(24), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(25), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(26), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(27), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(28), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(29), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVCNTRn_EL0(30), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(0), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(1), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(2), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(3), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(4), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(5), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(6), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(7), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(8), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(9), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(10), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(11), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(12), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(13), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(14), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(15), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(16), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(17), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(18), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(19), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(20), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(21), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(22), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(23), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(24), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(25), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(26), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(27), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(28), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(29), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMEVTYPERn_EL0(30), CGT_MDCR_TPM_HPMN),
SR_TRAP(SYS_PMCCFILTR_EL0, CGT_MDCR_TPM),
SR_TRAP(SYS_MDCCSR_EL0, CGT_MDCR_TDCC_TDE_TDA),
SR_TRAP(SYS_MDCCINT_EL1, CGT_MDCR_TDCC_TDE_TDA),
@ -1141,7 +1176,6 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
SR_TRAP(SYS_AMEVTYPER1_EL0(13), CGT_CPTR_TAM),
SR_TRAP(SYS_AMEVTYPER1_EL0(14), CGT_CPTR_TAM),
SR_TRAP(SYS_AMEVTYPER1_EL0(15), CGT_CPTR_TAM),
SR_TRAP(SYS_POR_EL0, CGT_CPACR_E0POE),
/* op0=2, op1=1, and CRn<0b1000 */
SR_RANGE_TRAP(sys_reg(2, 1, 0, 0, 0),
sys_reg(2, 1, 7, 15, 7), CGT_CPTR_TTA),
@ -2021,7 +2055,8 @@ int __init populate_nv_trap_config(void)
cgids = coarse_control_combo[id - __MULTIPLE_CONTROL_BITS__];
for (int i = 0; cgids[i] != __RESERVED__; i++) {
if (cgids[i] >= __MULTIPLE_CONTROL_BITS__) {
if (cgids[i] >= __MULTIPLE_CONTROL_BITS__ &&
cgids[i] < __COMPLEX_CONDITIONS__) {
kvm_err("Recursive MCB %d/%d\n", id, cgids[i]);
ret = -EINVAL;
}
@ -2126,11 +2161,19 @@ static u64 kvm_get_sysreg_res0(struct kvm *kvm, enum vcpu_sysreg sr)
return masks->mask[sr - __VNCR_START__].res0;
}
static bool check_fgt_bit(struct kvm *kvm, bool is_read,
static bool check_fgt_bit(struct kvm_vcpu *vcpu, bool is_read,
u64 val, const union trap_config tc)
{
struct kvm *kvm = vcpu->kvm;
enum vcpu_sysreg sr;
/*
* KVM doesn't know about any FGTs that apply to the host, and hopefully
* that'll remain the case.
*/
if (is_hyp_ctxt(vcpu))
return false;
if (tc.pol)
return (val & BIT(tc.bit));
@ -2207,7 +2250,15 @@ bool triage_sysreg_trap(struct kvm_vcpu *vcpu, int *sr_index)
* If we're not nesting, immediately return to the caller, with the
* sysreg index, should we have it.
*/
if (!vcpu_has_nv(vcpu) || is_hyp_ctxt(vcpu))
if (!vcpu_has_nv(vcpu))
goto local;
/*
* There are a few traps that take effect InHost, but are constrained
* to EL0. Don't bother with computing the trap behaviour if the vCPU
* isn't in EL0.
*/
if (is_hyp_ctxt(vcpu) && !vcpu_is_host_el0(vcpu))
goto local;
switch ((enum fgt_group_id)tc.fgt) {
@ -2253,12 +2304,14 @@ bool triage_sysreg_trap(struct kvm_vcpu *vcpu, int *sr_index)
goto local;
}
if (tc.fgt != __NO_FGT_GROUP__ && check_fgt_bit(vcpu->kvm, is_read,
val, tc))
if (tc.fgt != __NO_FGT_GROUP__ && check_fgt_bit(vcpu, is_read, val, tc))
goto inject;
b = compute_trap_behaviour(vcpu, tc);
if (!(b & BEHAVE_FORWARD_IN_HOST_EL0) && vcpu_is_host_el0(vcpu))
goto local;
if (((b & BEHAVE_FORWARD_READ) && is_read) ||
((b & BEHAVE_FORWARD_WRITE) && !is_read))
goto inject;
@ -2393,6 +2446,8 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
kvm_arch_vcpu_load(vcpu, smp_processor_id());
preempt_enable();
kvm_pmu_nested_transition(vcpu);
}
static void kvm_inject_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
@ -2475,6 +2530,8 @@ static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
kvm_arch_vcpu_load(vcpu, smp_processor_id());
preempt_enable();
kvm_pmu_nested_transition(vcpu);
return 1;
}

View File

@ -1051,21 +1051,19 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
}
while (length > 0) {
kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
struct page *page = __gfn_to_page(kvm, gfn, write);
void *maddr;
unsigned long num_tags;
struct page *page;
struct folio *folio;
if (is_error_noslot_pfn(pfn)) {
if (!page) {
ret = -EFAULT;
goto out;
}
page = pfn_to_online_page(pfn);
if (!page) {
if (!pfn_to_online_page(page_to_pfn(page))) {
/* Reject ZONE_DEVICE memory */
kvm_release_pfn_clean(pfn);
kvm_release_page_unused(page);
ret = -EFAULT;
goto out;
}
@ -1082,7 +1080,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
/* No tags in memory, so write zeros */
num_tags = MTE_GRANULES_PER_PAGE -
clear_user(tags, MTE_GRANULES_PER_PAGE);
kvm_release_pfn_clean(pfn);
kvm_release_page_clean(page);
} else {
/*
* Only locking to serialise with a concurrent
@ -1104,7 +1102,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
else
set_page_mte_tagged(page);
kvm_release_pfn_dirty(pfn);
kvm_release_page_dirty(page);
}
if (num_tags != MTE_GRANULES_PER_PAGE) {

View File

@ -204,6 +204,35 @@ static inline void __deactivate_traps_hfgxtr(struct kvm_vcpu *vcpu)
__deactivate_fgt(hctxt, vcpu, kvm, HAFGRTR_EL2);
}
static inline void __activate_traps_mpam(struct kvm_vcpu *vcpu)
{
u64 r = MPAM2_EL2_TRAPMPAM0EL1 | MPAM2_EL2_TRAPMPAM1EL1;
if (!system_supports_mpam())
return;
/* trap guest access to MPAMIDR_EL1 */
if (system_supports_mpam_hcr()) {
write_sysreg_s(MPAMHCR_EL2_TRAP_MPAMIDR_EL1, SYS_MPAMHCR_EL2);
} else {
/* From v1.1 TIDR can trap MPAMIDR, set it unconditionally */
r |= MPAM2_EL2_TIDR;
}
write_sysreg_s(r, SYS_MPAM2_EL2);
}
static inline void __deactivate_traps_mpam(void)
{
if (!system_supports_mpam())
return;
write_sysreg_s(0, SYS_MPAM2_EL2);
if (system_supports_mpam_hcr())
write_sysreg_s(MPAMHCR_HOST_FLAGS, SYS_MPAMHCR_EL2);
}
static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
{
/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
@ -244,6 +273,7 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
}
__activate_traps_hfgxtr(vcpu);
__activate_traps_mpam(vcpu);
}
static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
@ -263,6 +293,7 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
write_sysreg_s(HCRX_HOST_FLAGS, SYS_HCRX_EL2);
__deactivate_traps_hfgxtr(vcpu);
__deactivate_traps_mpam();
}
static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)

View File

@ -58,7 +58,7 @@ static inline bool ctxt_has_s1pie(struct kvm_cpu_context *ctxt)
return false;
vcpu = ctxt_to_vcpu(ctxt);
return kvm_has_feat(kern_hyp_va(vcpu->kvm), ID_AA64MMFR3_EL1, S1PIE, IMP);
return kvm_has_s1pie(kern_hyp_va(vcpu->kvm));
}
static inline bool ctxt_has_tcrx(struct kvm_cpu_context *ctxt)
@ -69,7 +69,7 @@ static inline bool ctxt_has_tcrx(struct kvm_cpu_context *ctxt)
return false;
vcpu = ctxt_to_vcpu(ctxt);
return kvm_has_feat(kern_hyp_va(vcpu->kvm), ID_AA64MMFR3_EL1, TCRX, IMP);
return kvm_has_tcr2(kern_hyp_va(vcpu->kvm));
}
static inline bool ctxt_has_s1poe(struct kvm_cpu_context *ctxt)
@ -80,7 +80,7 @@ static inline bool ctxt_has_s1poe(struct kvm_cpu_context *ctxt)
return false;
vcpu = ctxt_to_vcpu(ctxt);
return kvm_has_feat(kern_hyp_va(vcpu->kvm), ID_AA64MMFR3_EL1, S1POE, IMP);
return kvm_has_s1poe(kern_hyp_va(vcpu->kvm));
}
static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
@ -152,9 +152,10 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0), tpidrro_el0);
}
static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
u64 mpidr)
{
write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1), vmpidr_el2);
write_sysreg(mpidr, vmpidr_el2);
if (has_vhe() ||
!cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {

View File

@ -15,6 +15,4 @@
#define DECLARE_REG(type, name, ctxt, reg) \
type name = (type)cpu_reg(ctxt, (reg))
void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu);
#endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */

View File

@ -105,8 +105,10 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
hyp_vcpu->vcpu.arch.hw_mmu = host_vcpu->arch.hw_mmu;
hyp_vcpu->vcpu.arch.hcr_el2 = host_vcpu->arch.hcr_el2;
hyp_vcpu->vcpu.arch.mdcr_el2 = host_vcpu->arch.mdcr_el2;
hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE);
hyp_vcpu->vcpu.arch.hcr_el2 |= READ_ONCE(host_vcpu->arch.hcr_el2) &
(HCR_TWI | HCR_TWE);
hyp_vcpu->vcpu.arch.iflags = host_vcpu->arch.iflags;
@ -349,13 +351,6 @@ static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
}
static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
}
static void handle___pkvm_init_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
@ -411,7 +406,6 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
HANDLE_FUNC(__pkvm_vcpu_init_traps),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
HANDLE_FUNC(__pkvm_teardown_vm),

View File

@ -6,6 +6,9 @@
#include <linux/kvm_host.h>
#include <linux/mm.h>
#include <asm/kvm_emulate.h>
#include <nvhe/fixed_config.h>
#include <nvhe/mem_protect.h>
#include <nvhe/memory.h>
@ -201,11 +204,46 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
}
}
static void pkvm_vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
if (has_hvhe())
vcpu->arch.hcr_el2 |= HCR_E2H;
if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) {
/* route synchronous external abort exceptions to EL2 */
vcpu->arch.hcr_el2 |= HCR_TEA;
/* trap error record accesses */
vcpu->arch.hcr_el2 |= HCR_TERR;
}
if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
vcpu->arch.hcr_el2 |= HCR_FWB;
if (cpus_have_final_cap(ARM64_HAS_EVT) &&
!cpus_have_final_cap(ARM64_MISMATCHED_CACHE_TYPE))
vcpu->arch.hcr_el2 |= HCR_TID4;
else
vcpu->arch.hcr_el2 |= HCR_TID2;
if (vcpu_has_ptrauth(vcpu))
vcpu->arch.hcr_el2 |= (HCR_API | HCR_APK);
}
/*
* Initialize trap register values in protected mode.
*/
void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
static void pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
{
vcpu->arch.cptr_el2 = kvm_get_reset_cptr_el2(vcpu);
vcpu->arch.mdcr_el2 = 0;
pkvm_vcpu_reset_hcr(vcpu);
if ((!vcpu_is_protected(vcpu)))
return;
pvm_init_trap_regs(vcpu);
pvm_init_traps_aa64pfr0(vcpu);
pvm_init_traps_aa64pfr1(vcpu);
@ -289,6 +327,65 @@ void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
hyp_spin_unlock(&vm_table_lock);
}
static void pkvm_init_features_from_host(struct pkvm_hyp_vm *hyp_vm, const struct kvm *host_kvm)
{
struct kvm *kvm = &hyp_vm->kvm;
DECLARE_BITMAP(allowed_features, KVM_VCPU_MAX_FEATURES);
/* No restrictions for non-protected VMs. */
if (!kvm_vm_is_protected(kvm)) {
bitmap_copy(kvm->arch.vcpu_features,
host_kvm->arch.vcpu_features,
KVM_VCPU_MAX_FEATURES);
return;
}
bitmap_zero(allowed_features, KVM_VCPU_MAX_FEATURES);
/*
* For protected VMs, always allow:
* - CPU starting in poweroff state
* - PSCI v0.2
*/
set_bit(KVM_ARM_VCPU_POWER_OFF, allowed_features);
set_bit(KVM_ARM_VCPU_PSCI_0_2, allowed_features);
/*
* Check if remaining features are allowed:
* - Performance Monitoring
* - Scalable Vectors
* - Pointer Authentication
*/
if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer), PVM_ID_AA64DFR0_ALLOW))
set_bit(KVM_ARM_VCPU_PMU_V3, allowed_features);
if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_SVE), PVM_ID_AA64PFR0_ALLOW))
set_bit(KVM_ARM_VCPU_SVE, allowed_features);
if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_API), PVM_ID_AA64ISAR1_RESTRICT_UNSIGNED) &&
FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_APA), PVM_ID_AA64ISAR1_RESTRICT_UNSIGNED))
set_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, allowed_features);
if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPI), PVM_ID_AA64ISAR1_ALLOW) &&
FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPA), PVM_ID_AA64ISAR1_ALLOW))
set_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, allowed_features);
bitmap_and(kvm->arch.vcpu_features, host_kvm->arch.vcpu_features,
allowed_features, KVM_VCPU_MAX_FEATURES);
}
static void pkvm_vcpu_init_ptrauth(struct pkvm_hyp_vcpu *hyp_vcpu)
{
struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PTRAUTH_ADDRESS) ||
vcpu_has_feature(vcpu, KVM_ARM_VCPU_PTRAUTH_GENERIC)) {
kvm_vcpu_enable_ptrauth(vcpu);
} else {
vcpu_clear_flag(&hyp_vcpu->vcpu, GUEST_HAS_PTRAUTH);
}
}
static void unpin_host_vcpu(struct kvm_vcpu *host_vcpu)
{
if (host_vcpu)
@ -310,6 +407,18 @@ static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
hyp_vm->host_kvm = host_kvm;
hyp_vm->kvm.created_vcpus = nr_vcpus;
hyp_vm->kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
hyp_vm->kvm.arch.pkvm.enabled = READ_ONCE(host_kvm->arch.pkvm.enabled);
pkvm_init_features_from_host(hyp_vm, host_kvm);
}
static void pkvm_vcpu_init_sve(struct pkvm_hyp_vcpu *hyp_vcpu, struct kvm_vcpu *host_vcpu)
{
struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) {
vcpu_clear_flag(vcpu, GUEST_HAS_SVE);
vcpu_clear_flag(vcpu, VCPU_SVE_FINALIZED);
}
}
static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu,
@ -335,6 +444,11 @@ static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu,
hyp_vcpu->vcpu.arch.hw_mmu = &hyp_vm->kvm.arch.mmu;
hyp_vcpu->vcpu.arch.cflags = READ_ONCE(host_vcpu->arch.cflags);
hyp_vcpu->vcpu.arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
pkvm_vcpu_init_sve(hyp_vcpu, host_vcpu);
pkvm_vcpu_init_ptrauth(hyp_vcpu);
pkvm_vcpu_init_traps(&hyp_vcpu->vcpu);
done:
if (ret)
unpin_host_vcpu(host_vcpu);

View File

@ -265,6 +265,8 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_
case PSCI_1_0_FN_PSCI_FEATURES:
case PSCI_1_0_FN_SET_SUSPEND_MODE:
case PSCI_1_1_FN64_SYSTEM_RESET2:
case PSCI_1_3_FN_SYSTEM_OFF2:
case PSCI_1_3_FN64_SYSTEM_OFF2:
return psci_forward(host_ctxt);
case PSCI_1_0_FN64_SYSTEM_SUSPEND:
return psci_system_suspend(func_id, host_ctxt);

View File

@ -95,7 +95,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
{
void *start, *end, *virt = hyp_phys_to_virt(phys);
unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
enum kvm_pgtable_prot prot;
int ret, i;
/* Recreate the hyp page-table using the early page allocator */
@ -147,24 +146,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
return ret;
}
pkvm_create_host_sve_mappings();
/*
* Map the host sections RO in the hypervisor, but transfer the
* ownership from the host to the hypervisor itself to make sure they
* can't be donated or shared with another entity.
*
* The ownership transition requires matching changes in the host
* stage-2. This will be done later (see finalize_host_mappings()) once
* the hyp_vmemmap is addressable.
*/
prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
ret = pkvm_create_mappings(&kvm_vgic_global_state,
&kvm_vgic_global_state + 1, prot);
if (ret)
return ret;
return 0;
return pkvm_create_host_sve_mappings();
}
static void update_nvhe_init_params(void)

View File

@ -28,7 +28,7 @@ void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
{
__sysreg_restore_el1_state(ctxt);
__sysreg_restore_el1_state(ctxt, ctxt_sys_reg(ctxt, MPIDR_EL1));
__sysreg_restore_common_state(ctxt);
__sysreg_restore_user_state(ctxt);
__sysreg_restore_el2_return_state(ctxt);

View File

@ -1245,19 +1245,16 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
NULL, NULL, 0);
}
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
{
kvm_pte_t pte = 0;
int ret;
ret = stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
&pte, NULL,
NULL, NULL,
KVM_PGTABLE_WALK_HANDLE_FAULT |
KVM_PGTABLE_WALK_SHARED);
if (!ret)
dsb(ishst);
return pte;
}
struct stage2_age_data {

View File

@ -1012,9 +1012,6 @@ static void __vgic_v3_read_ctlr(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
val = ((vtr >> 29) & 7) << ICC_CTLR_EL1_PRI_BITS_SHIFT;
/* IDbits */
val |= ((vtr >> 23) & 7) << ICC_CTLR_EL1_ID_BITS_SHIFT;
/* SEIS */
if (kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK)
val |= BIT(ICC_CTLR_EL1_SEIS_SHIFT);
/* A3V */
val |= ((vtr >> 21) & 1) << ICC_CTLR_EL1_A3V_SHIFT;
/* EOImode */

View File

@ -15,6 +15,131 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_nested.h>
static void __sysreg_save_vel2_state(struct kvm_vcpu *vcpu)
{
/* These registers are common with EL1 */
__vcpu_sys_reg(vcpu, PAR_EL1) = read_sysreg(par_el1);
__vcpu_sys_reg(vcpu, TPIDR_EL1) = read_sysreg(tpidr_el1);
__vcpu_sys_reg(vcpu, ESR_EL2) = read_sysreg_el1(SYS_ESR);
__vcpu_sys_reg(vcpu, AFSR0_EL2) = read_sysreg_el1(SYS_AFSR0);
__vcpu_sys_reg(vcpu, AFSR1_EL2) = read_sysreg_el1(SYS_AFSR1);
__vcpu_sys_reg(vcpu, FAR_EL2) = read_sysreg_el1(SYS_FAR);
__vcpu_sys_reg(vcpu, MAIR_EL2) = read_sysreg_el1(SYS_MAIR);
__vcpu_sys_reg(vcpu, VBAR_EL2) = read_sysreg_el1(SYS_VBAR);
__vcpu_sys_reg(vcpu, CONTEXTIDR_EL2) = read_sysreg_el1(SYS_CONTEXTIDR);
__vcpu_sys_reg(vcpu, AMAIR_EL2) = read_sysreg_el1(SYS_AMAIR);
/*
* In VHE mode those registers are compatible between EL1 and EL2,
* and the guest uses the _EL1 versions on the CPU naturally.
* So we save them into their _EL2 versions here.
* For nVHE mode we trap accesses to those registers, so our
* _EL2 copy in sys_regs[] is always up-to-date and we don't need
* to save anything here.
*/
if (vcpu_el2_e2h_is_set(vcpu)) {
u64 val;
/*
* We don't save CPTR_EL2, as accesses to CPACR_EL1
* are always trapped, ensuring that the in-memory
* copy is always up-to-date. A small blessing...
*/
__vcpu_sys_reg(vcpu, SCTLR_EL2) = read_sysreg_el1(SYS_SCTLR);
__vcpu_sys_reg(vcpu, TTBR0_EL2) = read_sysreg_el1(SYS_TTBR0);
__vcpu_sys_reg(vcpu, TTBR1_EL2) = read_sysreg_el1(SYS_TTBR1);
__vcpu_sys_reg(vcpu, TCR_EL2) = read_sysreg_el1(SYS_TCR);
if (ctxt_has_tcrx(&vcpu->arch.ctxt)) {
__vcpu_sys_reg(vcpu, TCR2_EL2) = read_sysreg_el1(SYS_TCR2);
if (ctxt_has_s1pie(&vcpu->arch.ctxt)) {
__vcpu_sys_reg(vcpu, PIRE0_EL2) = read_sysreg_el1(SYS_PIRE0);
__vcpu_sys_reg(vcpu, PIR_EL2) = read_sysreg_el1(SYS_PIR);
}
if (ctxt_has_s1poe(&vcpu->arch.ctxt))
__vcpu_sys_reg(vcpu, POR_EL2) = read_sysreg_el1(SYS_POR);
}
/*
* The EL1 view of CNTKCTL_EL1 has a bunch of RES0 bits where
* the interesting CNTHCTL_EL2 bits live. So preserve these
* bits when reading back the guest-visible value.
*/
val = read_sysreg_el1(SYS_CNTKCTL);
val &= CNTKCTL_VALID_BITS;
__vcpu_sys_reg(vcpu, CNTHCTL_EL2) &= ~CNTKCTL_VALID_BITS;
__vcpu_sys_reg(vcpu, CNTHCTL_EL2) |= val;
}
__vcpu_sys_reg(vcpu, SP_EL2) = read_sysreg(sp_el1);
__vcpu_sys_reg(vcpu, ELR_EL2) = read_sysreg_el1(SYS_ELR);
__vcpu_sys_reg(vcpu, SPSR_EL2) = read_sysreg_el1(SYS_SPSR);
}
static void __sysreg_restore_vel2_state(struct kvm_vcpu *vcpu)
{
u64 val;
/* These registers are common with EL1 */
write_sysreg(__vcpu_sys_reg(vcpu, PAR_EL1), par_el1);
write_sysreg(__vcpu_sys_reg(vcpu, TPIDR_EL1), tpidr_el1);
write_sysreg(__vcpu_sys_reg(vcpu, MPIDR_EL1), vmpidr_el2);
write_sysreg_el1(__vcpu_sys_reg(vcpu, MAIR_EL2), SYS_MAIR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, VBAR_EL2), SYS_VBAR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, CONTEXTIDR_EL2), SYS_CONTEXTIDR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, AMAIR_EL2), SYS_AMAIR);
if (vcpu_el2_e2h_is_set(vcpu)) {
/*
* In VHE mode those registers are compatible between
* EL1 and EL2.
*/
write_sysreg_el1(__vcpu_sys_reg(vcpu, SCTLR_EL2), SYS_SCTLR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, CPTR_EL2), SYS_CPACR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, TTBR0_EL2), SYS_TTBR0);
write_sysreg_el1(__vcpu_sys_reg(vcpu, TTBR1_EL2), SYS_TTBR1);
write_sysreg_el1(__vcpu_sys_reg(vcpu, TCR_EL2), SYS_TCR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, CNTHCTL_EL2), SYS_CNTKCTL);
} else {
/*
* CNTHCTL_EL2 only affects EL1 when running nVHE, so
* no need to restore it.
*/
val = translate_sctlr_el2_to_sctlr_el1(__vcpu_sys_reg(vcpu, SCTLR_EL2));
write_sysreg_el1(val, SYS_SCTLR);
val = translate_cptr_el2_to_cpacr_el1(__vcpu_sys_reg(vcpu, CPTR_EL2));
write_sysreg_el1(val, SYS_CPACR);
val = translate_ttbr0_el2_to_ttbr0_el1(__vcpu_sys_reg(vcpu, TTBR0_EL2));
write_sysreg_el1(val, SYS_TTBR0);
val = translate_tcr_el2_to_tcr_el1(__vcpu_sys_reg(vcpu, TCR_EL2));
write_sysreg_el1(val, SYS_TCR);
}
if (ctxt_has_tcrx(&vcpu->arch.ctxt)) {
write_sysreg_el1(__vcpu_sys_reg(vcpu, TCR2_EL2), SYS_TCR2);
if (ctxt_has_s1pie(&vcpu->arch.ctxt)) {
write_sysreg_el1(__vcpu_sys_reg(vcpu, PIR_EL2), SYS_PIR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, PIRE0_EL2), SYS_PIRE0);
}
if (ctxt_has_s1poe(&vcpu->arch.ctxt))
write_sysreg_el1(__vcpu_sys_reg(vcpu, POR_EL2), SYS_POR);
}
write_sysreg_el1(__vcpu_sys_reg(vcpu, ESR_EL2), SYS_ESR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, AFSR0_EL2), SYS_AFSR0);
write_sysreg_el1(__vcpu_sys_reg(vcpu, AFSR1_EL2), SYS_AFSR1);
write_sysreg_el1(__vcpu_sys_reg(vcpu, FAR_EL2), SYS_FAR);
write_sysreg(__vcpu_sys_reg(vcpu, SP_EL2), sp_el1);
write_sysreg_el1(__vcpu_sys_reg(vcpu, ELR_EL2), SYS_ELR);
write_sysreg_el1(__vcpu_sys_reg(vcpu, SPSR_EL2), SYS_SPSR);
}
/*
* VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
* pstate, which are handled as part of the el2 return state) on every
@ -66,6 +191,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
u64 mpidr;
host_ctxt = host_data_ptr(host_ctxt);
__sysreg_save_user_state(host_ctxt);
@ -89,7 +215,29 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
*/
__sysreg32_restore_state(vcpu);
__sysreg_restore_user_state(guest_ctxt);
__sysreg_restore_el1_state(guest_ctxt);
if (unlikely(__is_hyp_ctxt(guest_ctxt))) {
__sysreg_restore_vel2_state(vcpu);
} else {
if (vcpu_has_nv(vcpu)) {
/*
* Use the guest hypervisor's VPIDR_EL2 when in a
* nested state. The hardware value of MIDR_EL1 gets
* restored on put.
*/
write_sysreg(ctxt_sys_reg(guest_ctxt, VPIDR_EL2), vpidr_el2);
/*
* As we're restoring a nested guest, set the value
* provided by the guest hypervisor.
*/
mpidr = ctxt_sys_reg(guest_ctxt, VMPIDR_EL2);
} else {
mpidr = ctxt_sys_reg(guest_ctxt, MPIDR_EL1);
}
__sysreg_restore_el1_state(guest_ctxt, mpidr);
}
vcpu_set_flag(vcpu, SYSREGS_ON_CPU);
}
@ -112,12 +260,20 @@ void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu)
host_ctxt = host_data_ptr(host_ctxt);
__sysreg_save_el1_state(guest_ctxt);
if (unlikely(__is_hyp_ctxt(guest_ctxt)))
__sysreg_save_vel2_state(vcpu);
else
__sysreg_save_el1_state(guest_ctxt);
__sysreg_save_user_state(guest_ctxt);
__sysreg32_save_state(vcpu);
/* Restore host user state */
__sysreg_restore_user_state(host_ctxt);
/* If leaving a nesting guest, restore MIDR_EL1 default view */
if (vcpu_has_nv(vcpu))
write_sysreg(read_cpuid_id(), vpidr_el2);
vcpu_clear_flag(vcpu, SYSREGS_ON_CPU);
}

View File

@ -575,6 +575,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case KVM_ARM_PSCI_0_2:
case KVM_ARM_PSCI_1_0:
case KVM_ARM_PSCI_1_1:
case KVM_ARM_PSCI_1_2:
case KVM_ARM_PSCI_1_3:
if (!wants_02)
return -EINVAL;
vcpu->kvm->arch.psci_version = val;

View File

@ -72,6 +72,31 @@ unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len)
return data;
}
static bool kvm_pending_sync_exception(struct kvm_vcpu *vcpu)
{
if (!vcpu_get_flag(vcpu, PENDING_EXCEPTION))
return false;
if (vcpu_el1_is_32bit(vcpu)) {
switch (vcpu_get_flag(vcpu, EXCEPT_MASK)) {
case unpack_vcpu_flag(EXCEPT_AA32_UND):
case unpack_vcpu_flag(EXCEPT_AA32_IABT):
case unpack_vcpu_flag(EXCEPT_AA32_DABT):
return true;
default:
return false;
}
} else {
switch (vcpu_get_flag(vcpu, EXCEPT_MASK)) {
case unpack_vcpu_flag(EXCEPT_AA64_EL1_SYNC):
case unpack_vcpu_flag(EXCEPT_AA64_EL2_SYNC):
return true;
default:
return false;
}
}
}
/**
* kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
* or in-kernel IO emulation
@ -84,8 +109,11 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
unsigned int len;
int mask;
/* Detect an already handled MMIO return */
if (unlikely(!vcpu->mmio_needed))
/*
* Detect if the MMIO return was already handled or if userspace aborted
* the MMIO access.
*/
if (unlikely(!vcpu->mmio_needed || kvm_pending_sync_exception(vcpu)))
return 1;
vcpu->mmio_needed = 0;

View File

@ -1451,6 +1451,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
long vma_pagesize, fault_granule;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt;
struct page *page;
if (fault_is_perm)
fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@ -1572,7 +1573,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
/*
* Read mmu_invalidate_seq so that KVM can detect if the results of
* vma_lookup() or __gfn_to_pfn_memslot() become stale prior to
* vma_lookup() or __kvm_faultin_pfn() become stale prior to
* acquiring kvm->mmu_lock.
*
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
@ -1581,8 +1582,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
mmap_read_unlock(current->mm);
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
write_fault, &writable, NULL);
pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
&writable, &page);
if (pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(hva, vma_shift);
return 0;
@ -1595,7 +1596,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* If the page was identified as device early by looking at
* the VMA flags, vma_pagesize is already representing the
* largest quantity we can map. If instead it was mapped
* via gfn_to_pfn_prot(), vma_pagesize is set to PAGE_SIZE
* via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
* and must not be upgraded.
*
* In both cases, we don't let transparent_hugepage_adjust()
@ -1704,33 +1705,27 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
}
out_unlock:
kvm_release_faultin_page(kvm, page, !!ret, writable);
read_unlock(&kvm->mmu_lock);
/* Mark the page dirty only if the fault is handled successfully */
if (writable && !ret) {
kvm_set_pfn_dirty(pfn);
if (writable && !ret)
mark_page_dirty_in_slot(kvm, memslot, gfn);
}
kvm_release_pfn_clean(pfn);
return ret != -EAGAIN ? ret : 0;
}
/* Resolve the access fault by making the page young again. */
static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
{
kvm_pte_t pte;
struct kvm_s2_mmu *mmu;
trace_kvm_access_fault(fault_ipa);
read_lock(&vcpu->kvm->mmu_lock);
mmu = vcpu->arch.hw_mmu;
pte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
read_unlock(&vcpu->kvm->mmu_lock);
if (kvm_pte_valid(pte))
kvm_set_pfn_accessed(kvm_pte_to_pfn(pte));
}
/**

View File

@ -917,12 +917,13 @@ static void limit_nv_id_regs(struct kvm *kvm)
ID_AA64MMFR4_EL1_E2H0_NI_NV1);
kvm_set_vm_id_reg(kvm, SYS_ID_AA64MMFR4_EL1, val);
/* Only limited support for PMU, Debug, BPs and WPs */
/* Only limited support for PMU, Debug, BPs, WPs, and HPMN0 */
val = kvm_read_vm_id_reg(kvm, SYS_ID_AA64DFR0_EL1);
val &= (NV_FTR(DFR0, PMUVer) |
NV_FTR(DFR0, WRPs) |
NV_FTR(DFR0, BRPs) |
NV_FTR(DFR0, DebugVer));
NV_FTR(DFR0, DebugVer) |
NV_FTR(DFR0, HPMN0));
/* Cap Debug to ARMv8.1 */
tmp = FIELD_GET(NV_FTR(DFR0, DebugVer), val);
@ -933,15 +934,15 @@ static void limit_nv_id_regs(struct kvm *kvm)
kvm_set_vm_id_reg(kvm, SYS_ID_AA64DFR0_EL1, val);
}
u64 kvm_vcpu_sanitise_vncr_reg(const struct kvm_vcpu *vcpu, enum vcpu_sysreg sr)
u64 kvm_vcpu_apply_reg_masks(const struct kvm_vcpu *vcpu,
enum vcpu_sysreg sr, u64 v)
{
u64 v = ctxt_sys_reg(&vcpu->arch.ctxt, sr);
struct kvm_sysreg_masks *masks;
masks = vcpu->kvm->arch.sysreg_masks;
if (masks) {
sr -= __VNCR_START__;
sr -= __SANITISED_REG_START__;
v &= ~masks->mask[sr].res0;
v |= masks->mask[sr].res1;
@ -952,7 +953,11 @@ u64 kvm_vcpu_sanitise_vncr_reg(const struct kvm_vcpu *vcpu, enum vcpu_sysreg sr)
static void set_sysreg_masks(struct kvm *kvm, int sr, u64 res0, u64 res1)
{
int i = sr - __VNCR_START__;
int i = sr - __SANITISED_REG_START__;
BUILD_BUG_ON(!__builtin_constant_p(sr));
BUILD_BUG_ON(sr < __SANITISED_REG_START__);
BUILD_BUG_ON(sr >= NR_SYS_REGS);
kvm->arch.sysreg_masks->mask[i].res0 = res0;
kvm->arch.sysreg_masks->mask[i].res1 = res1;
@ -1050,7 +1055,7 @@ int kvm_init_nv_sysregs(struct kvm *kvm)
res0 |= HCRX_EL2_PTTWI;
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, SCTLRX, IMP))
res0 |= HCRX_EL2_SCTLR2En;
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, TCRX, IMP))
if (!kvm_has_tcr2(kvm))
res0 |= HCRX_EL2_TCR2En;
if (!kvm_has_feat(kvm, ID_AA64ISAR2_EL1, MOPS, IMP))
res0 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
@ -1101,9 +1106,9 @@ int kvm_init_nv_sysregs(struct kvm *kvm)
res0 |= (HFGxTR_EL2_nSMPRI_EL1 | HFGxTR_EL2_nTPIDR2_EL0);
if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, THE, IMP))
res0 |= HFGxTR_EL2_nRCWMASK_EL1;
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, S1PIE, IMP))
if (!kvm_has_s1pie(kvm))
res0 |= (HFGxTR_EL2_nPIRE0_EL1 | HFGxTR_EL2_nPIR_EL1);
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, S1POE, IMP))
if (!kvm_has_s1poe(kvm))
res0 |= (HFGxTR_EL2_nPOR_EL0 | HFGxTR_EL2_nPOR_EL1);
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, S2POE, IMP))
res0 |= HFGxTR_EL2_nS2POR_EL1;
@ -1200,6 +1205,28 @@ int kvm_init_nv_sysregs(struct kvm *kvm)
res0 |= ~(res0 | res1);
set_sysreg_masks(kvm, HAFGRTR_EL2, res0, res1);
/* TCR2_EL2 */
res0 = TCR2_EL2_RES0;
res1 = TCR2_EL2_RES1;
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, D128, IMP))
res0 |= (TCR2_EL2_DisCH0 | TCR2_EL2_DisCH1 | TCR2_EL2_D128);
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, MEC, IMP))
res0 |= TCR2_EL2_AMEC1 | TCR2_EL2_AMEC0;
if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, HAFDBS, HAFT))
res0 |= TCR2_EL2_HAFT;
if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, THE, IMP))
res0 |= TCR2_EL2_PTTWI | TCR2_EL2_PnCH;
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, AIE, IMP))
res0 |= TCR2_EL2_AIE;
if (!kvm_has_s1poe(kvm))
res0 |= TCR2_EL2_POE | TCR2_EL2_E0POE;
if (!kvm_has_s1pie(kvm))
res0 |= TCR2_EL2_PIE;
if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, VH, IMP))
res0 |= (TCR2_EL2_E0POE | TCR2_EL2_D128 |
TCR2_EL2_AMEC1 | TCR2_EL2_DisCH0 | TCR2_EL2_DisCH1);
set_sysreg_masks(kvm, TCR2_EL2, res0, res1);
/* SCTLR_EL1 */
res0 = SCTLR_EL1_RES0;
res1 = SCTLR_EL1_RES1;
@ -1207,6 +1234,43 @@ int kvm_init_nv_sysregs(struct kvm *kvm)
res0 |= SCTLR_EL1_EPAN;
set_sysreg_masks(kvm, SCTLR_EL1, res0, res1);
/* MDCR_EL2 */
res0 = MDCR_EL2_RES0;
res1 = MDCR_EL2_RES1;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMUVer, IMP))
res0 |= (MDCR_EL2_HPMN | MDCR_EL2_TPMCR |
MDCR_EL2_TPM | MDCR_EL2_HPME);
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMSVer, IMP))
res0 |= MDCR_EL2_E2PB | MDCR_EL2_TPMS;
if (!kvm_has_feat(kvm, ID_AA64DFR1_EL1, SPMU, IMP))
res0 |= MDCR_EL2_EnSPM;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMUVer, V3P1))
res0 |= MDCR_EL2_HPMD;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceFilt, IMP))
res0 |= MDCR_EL2_TTRF;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMUVer, V3P5))
res0 |= MDCR_EL2_HCCD | MDCR_EL2_HLP;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceBuffer, IMP))
res0 |= MDCR_EL2_E2TB;
if (!kvm_has_feat(kvm, ID_AA64MMFR0_EL1, FGT, IMP))
res0 |= MDCR_EL2_TDCC;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, MTPMU, IMP) ||
kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL3, IMP))
res0 |= MDCR_EL2_MTPME;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMUVer, V3P7))
res0 |= MDCR_EL2_HPMFZO;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMSS, IMP))
res0 |= MDCR_EL2_PMSSE;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMSVer, V1P2))
res0 |= MDCR_EL2_HPMFZS;
if (!kvm_has_feat(kvm, ID_AA64DFR1_EL1, EBEP, IMP))
res0 |= MDCR_EL2_PMEE;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, DebugVer, V8P9))
res0 |= MDCR_EL2_EBWE;
if (!kvm_has_feat(kvm, ID_AA64DFR2_EL1, STEP, IMP))
res0 |= MDCR_EL2_EnSTEPOP;
set_sysreg_masks(kvm, MDCR_EL2, res0, res1);
return 0;
}

View File

@ -89,7 +89,11 @@ static bool kvm_pmc_is_64bit(struct kvm_pmc *pmc)
static bool kvm_pmc_has_64bit_overflow(struct kvm_pmc *pmc)
{
u64 val = kvm_vcpu_read_pmcr(kvm_pmc_to_vcpu(pmc));
struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
u64 val = kvm_vcpu_read_pmcr(vcpu);
if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
return __vcpu_sys_reg(vcpu, MDCR_EL2) & MDCR_EL2_HLP;
return (pmc->idx < ARMV8_PMU_CYCLE_IDX && (val & ARMV8_PMU_PMCR_LP)) ||
(pmc->idx == ARMV8_PMU_CYCLE_IDX && (val & ARMV8_PMU_PMCR_LC));
@ -111,6 +115,11 @@ static u32 counter_index_to_evtreg(u64 idx)
return (idx == ARMV8_PMU_CYCLE_IDX) ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + idx;
}
static u64 kvm_pmc_read_evtreg(const struct kvm_pmc *pmc)
{
return __vcpu_sys_reg(kvm_pmc_to_vcpu(pmc), counter_index_to_evtreg(pmc->idx));
}
static u64 kvm_pmu_get_pmc_value(struct kvm_pmc *pmc)
{
struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
@ -244,7 +253,7 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu)
*/
void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
{
unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
unsigned long mask = kvm_pmu_implemented_counter_mask(vcpu);
int i;
for_each_set_bit(i, &mask, 32)
@ -265,7 +274,37 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
irq_work_sync(&vcpu->arch.pmu.overflow_work);
}
u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx)
{
unsigned int hpmn;
if (!vcpu_has_nv(vcpu) || idx == ARMV8_PMU_CYCLE_IDX)
return false;
/*
* Programming HPMN=0 is CONSTRAINED UNPREDICTABLE if FEAT_HPMN0 isn't
* implemented. Since KVM's ability to emulate HPMN=0 does not directly
* depend on hardware (all PMU registers are trapped), make the
* implementation choice that all counters are included in the second
* range reserved for EL2/EL3.
*/
hpmn = SYS_FIELD_GET(MDCR_EL2, HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2));
return idx >= hpmn;
}
u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu)
{
u64 mask = kvm_pmu_implemented_counter_mask(vcpu);
u64 hpmn;
if (!vcpu_has_nv(vcpu) || vcpu_is_el2(vcpu))
return mask;
hpmn = SYS_FIELD_GET(MDCR_EL2, HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2));
return mask & ~GENMASK(vcpu->kvm->arch.pmcr_n - 1, hpmn);
}
u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu)
{
u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu));
@ -574,7 +613,7 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0);
if (val & ARMV8_PMU_PMCR_P) {
unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
unsigned long mask = kvm_pmu_accessible_counter_mask(vcpu);
mask &= ~BIT(ARMV8_PMU_CYCLE_IDX);
for_each_set_bit(i, &mask, 32)
kvm_pmu_set_pmc_value(kvm_vcpu_idx_to_pmc(vcpu, i), 0, true);
@ -585,8 +624,44 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
{
struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
return (kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E) &&
(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx));
unsigned int mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
if (!(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx)))
return false;
if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
return mdcr & MDCR_EL2_HPME;
return kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E;
}
static bool kvm_pmc_counts_at_el0(struct kvm_pmc *pmc)
{
u64 evtreg = kvm_pmc_read_evtreg(pmc);
bool nsu = evtreg & ARMV8_PMU_EXCLUDE_NS_EL0;
bool u = evtreg & ARMV8_PMU_EXCLUDE_EL0;
return u == nsu;
}
static bool kvm_pmc_counts_at_el1(struct kvm_pmc *pmc)
{
u64 evtreg = kvm_pmc_read_evtreg(pmc);
bool nsk = evtreg & ARMV8_PMU_EXCLUDE_NS_EL1;
bool p = evtreg & ARMV8_PMU_EXCLUDE_EL1;
return p == nsk;
}
static bool kvm_pmc_counts_at_el2(struct kvm_pmc *pmc)
{
struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
u64 mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
if (!kvm_pmu_counter_is_hyp(vcpu, pmc->idx) && (mdcr & MDCR_EL2_HPMD))
return false;
return kvm_pmc_read_evtreg(pmc) & ARMV8_PMU_INCLUDE_EL2;
}
/**
@ -599,17 +674,15 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
struct arm_pmu *arm_pmu = vcpu->kvm->arch.arm_pmu;
struct perf_event *event;
struct perf_event_attr attr;
u64 eventsel, reg, data;
bool p, u, nsk, nsu;
u64 eventsel, evtreg;
reg = counter_index_to_evtreg(pmc->idx);
data = __vcpu_sys_reg(vcpu, reg);
evtreg = kvm_pmc_read_evtreg(pmc);
kvm_pmu_stop_counter(pmc);
if (pmc->idx == ARMV8_PMU_CYCLE_IDX)
eventsel = ARMV8_PMUV3_PERFCTR_CPU_CYCLES;
else
eventsel = data & kvm_pmu_event_mask(vcpu->kvm);
eventsel = evtreg & kvm_pmu_event_mask(vcpu->kvm);
/*
* Neither SW increment nor chained events need to be backed
@ -627,22 +700,25 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
!test_bit(eventsel, vcpu->kvm->arch.pmu_filter))
return;
p = data & ARMV8_PMU_EXCLUDE_EL1;
u = data & ARMV8_PMU_EXCLUDE_EL0;
nsk = data & ARMV8_PMU_EXCLUDE_NS_EL1;
nsu = data & ARMV8_PMU_EXCLUDE_NS_EL0;
memset(&attr, 0, sizeof(struct perf_event_attr));
attr.type = arm_pmu->pmu.type;
attr.size = sizeof(attr);
attr.pinned = 1;
attr.disabled = !kvm_pmu_counter_is_enabled(pmc);
attr.exclude_user = (u != nsu);
attr.exclude_kernel = (p != nsk);
attr.exclude_user = !kvm_pmc_counts_at_el0(pmc);
attr.exclude_hv = 1; /* Don't count EL2 events */
attr.exclude_host = 1; /* Don't count host events */
attr.config = eventsel;
/*
* Filter events at EL1 (i.e. vEL2) when in a hyp context based on the
* guest's EL2 filter.
*/
if (unlikely(is_hyp_ctxt(vcpu)))
attr.exclude_kernel = !kvm_pmc_counts_at_el2(pmc);
else
attr.exclude_kernel = !kvm_pmc_counts_at_el1(pmc);
/*
* If counting with a 64bit counter, advertise it to the perf
* code, carefully dealing with the initial sample period
@ -804,7 +880,7 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu)
{
u64 mask = kvm_pmu_valid_counter_mask(vcpu);
u64 mask = kvm_pmu_implemented_counter_mask(vcpu);
kvm_pmu_handle_pmcr(vcpu, kvm_vcpu_read_pmcr(vcpu));
@ -1139,3 +1215,32 @@ u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu)
return u64_replace_bits(pmcr, vcpu->kvm->arch.pmcr_n, ARMV8_PMU_PMCR_N);
}
void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu)
{
bool reprogrammed = false;
unsigned long mask;
int i;
if (!kvm_vcpu_has_pmu(vcpu))
return;
mask = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
for_each_set_bit(i, &mask, 32) {
struct kvm_pmc *pmc = kvm_vcpu_idx_to_pmc(vcpu, i);
/*
* We only need to reconfigure events where the filter is
* different at EL1 vs. EL2, as we're multiplexing the true EL1
* event filter bit for nested.
*/
if (kvm_pmc_counts_at_el1(pmc) == kvm_pmc_counts_at_el2(pmc))
continue;
kvm_pmu_create_perf_event(pmc);
reprogrammed = true;
}
if (reprogrammed)
kvm_vcpu_pmu_restore_guest(vcpu);
}

View File

@ -194,6 +194,12 @@ static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0);
}
static void kvm_psci_system_off2(struct kvm_vcpu *vcpu)
{
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN,
KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2);
}
static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
{
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0);
@ -322,7 +328,7 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
switch(psci_fn) {
case PSCI_0_2_FN_PSCI_VERSION:
val = minor == 0 ? KVM_ARM_PSCI_1_0 : KVM_ARM_PSCI_1_1;
val = PSCI_VERSION(1, minor);
break;
case PSCI_1_0_FN_PSCI_FEATURES:
arg = smccc_get_arg1(vcpu);
@ -358,6 +364,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
if (minor >= 1)
val = 0;
break;
case PSCI_1_3_FN_SYSTEM_OFF2:
case PSCI_1_3_FN64_SYSTEM_OFF2:
if (minor >= 3)
val = PSCI_1_3_OFF_TYPE_HIBERNATE_OFF;
break;
}
break;
case PSCI_1_0_FN_SYSTEM_SUSPEND:
@ -392,6 +403,33 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
break;
}
break;
case PSCI_1_3_FN_SYSTEM_OFF2:
kvm_psci_narrow_to_32bit(vcpu);
fallthrough;
case PSCI_1_3_FN64_SYSTEM_OFF2:
if (minor < 3)
break;
arg = smccc_get_arg1(vcpu);
/*
* SYSTEM_OFF2 defaults to HIBERNATE_OFF if arg1 is zero. arg2
* must be zero.
*/
if ((arg && arg != PSCI_1_3_OFF_TYPE_HIBERNATE_OFF) ||
smccc_get_arg2(vcpu) != 0) {
val = PSCI_RET_INVALID_PARAMS;
break;
}
kvm_psci_system_off2(vcpu);
/*
* We shouldn't be going back to the guest after receiving a
* SYSTEM_OFF2 request. Preload a return value of
* INTERNAL_FAILURE should userspace ignore the exit and resume
* the vCPU.
*/
val = PSCI_RET_INTERNAL_FAILURE;
ret = 0;
break;
default:
return kvm_psci_0_2_call(vcpu);
}
@ -449,6 +487,10 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
}
switch (version) {
case KVM_ARM_PSCI_1_3:
return kvm_psci_1_x_call(vcpu, 3);
case KVM_ARM_PSCI_1_2:
return kvm_psci_1_x_call(vcpu, 2);
case KVM_ARM_PSCI_1_1:
return kvm_psci_1_x_call(vcpu, 1);
case KVM_ARM_PSCI_1_0:

View File

@ -167,11 +167,6 @@ static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
memset(vcpu->arch.sve_state, 0, vcpu_sve_state_size(vcpu));
}
static void kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
{
vcpu_set_flag(vcpu, GUEST_HAS_PTRAUTH);
}
/**
* kvm_reset_vcpu - sets core registers and sys_regs to reset value
* @vcpu: The VCPU pointer

View File

@ -110,6 +110,14 @@ static bool get_el2_to_el1_mapping(unsigned int reg,
PURE_EL2_SYSREG( RVBAR_EL2 );
PURE_EL2_SYSREG( TPIDR_EL2 );
PURE_EL2_SYSREG( HPFAR_EL2 );
PURE_EL2_SYSREG( HCRX_EL2 );
PURE_EL2_SYSREG( HFGRTR_EL2 );
PURE_EL2_SYSREG( HFGWTR_EL2 );
PURE_EL2_SYSREG( HFGITR_EL2 );
PURE_EL2_SYSREG( HDFGRTR_EL2 );
PURE_EL2_SYSREG( HDFGWTR_EL2 );
PURE_EL2_SYSREG( HAFGRTR_EL2 );
PURE_EL2_SYSREG( CNTVOFF_EL2 );
PURE_EL2_SYSREG( CNTHCTL_EL2 );
MAPPED_EL2_SYSREG(SCTLR_EL2, SCTLR_EL1,
translate_sctlr_el2_to_sctlr_el1 );
@ -126,10 +134,15 @@ static bool get_el2_to_el1_mapping(unsigned int reg,
MAPPED_EL2_SYSREG(ESR_EL2, ESR_EL1, NULL );
MAPPED_EL2_SYSREG(FAR_EL2, FAR_EL1, NULL );
MAPPED_EL2_SYSREG(MAIR_EL2, MAIR_EL1, NULL );
MAPPED_EL2_SYSREG(TCR2_EL2, TCR2_EL1, NULL );
MAPPED_EL2_SYSREG(PIR_EL2, PIR_EL1, NULL );
MAPPED_EL2_SYSREG(PIRE0_EL2, PIRE0_EL1, NULL );
MAPPED_EL2_SYSREG(POR_EL2, POR_EL1, NULL );
MAPPED_EL2_SYSREG(AMAIR_EL2, AMAIR_EL1, NULL );
MAPPED_EL2_SYSREG(ELR_EL2, ELR_EL1, NULL );
MAPPED_EL2_SYSREG(SPSR_EL2, SPSR_EL1, NULL );
MAPPED_EL2_SYSREG(ZCR_EL2, ZCR_EL1, NULL );
MAPPED_EL2_SYSREG(CONTEXTIDR_EL2, CONTEXTIDR_EL1, NULL );
default:
return false;
}
@ -148,6 +161,21 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
if (!is_hyp_ctxt(vcpu))
goto memory_read;
/*
* CNTHCTL_EL2 requires some special treatment to
* account for the bits that can be set via CNTKCTL_EL1.
*/
switch (reg) {
case CNTHCTL_EL2:
if (vcpu_el2_e2h_is_set(vcpu)) {
val = read_sysreg_el1(SYS_CNTKCTL);
val &= CNTKCTL_VALID_BITS;
val |= __vcpu_sys_reg(vcpu, reg) & ~CNTKCTL_VALID_BITS;
return val;
}
break;
}
/*
* If this register does not have an EL1 counterpart,
* then read the stored EL2 version.
@ -165,6 +193,9 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
/* Get the current version of the EL1 counterpart. */
WARN_ON(!__vcpu_read_sys_reg_from_cpu(el1r, &val));
if (reg >= __SANITISED_REG_START__)
val = kvm_vcpu_apply_reg_masks(vcpu, reg, val);
return val;
}
@ -198,6 +229,19 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
*/
__vcpu_sys_reg(vcpu, reg) = val;
switch (reg) {
case CNTHCTL_EL2:
/*
* If E2H=0, CNHTCTL_EL2 is a pure shadow register.
* Otherwise, some of the bits are backed by
* CNTKCTL_EL1, while the rest is kept in memory.
* Yes, this is fun stuff.
*/
if (vcpu_el2_e2h_is_set(vcpu))
write_sysreg_el1(val, SYS_CNTKCTL);
return;
}
/* No EL1 counterpart? We're done here.? */
if (reg == el1r)
return;
@ -390,10 +434,6 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
bool was_enabled = vcpu_has_cache_enabled(vcpu);
u64 val, mask, shift;
if (reg_to_encoding(r) == SYS_TCR2_EL1 &&
!kvm_has_feat(vcpu->kvm, ID_AA64MMFR3_EL1, TCRX, IMP))
return undef_access(vcpu, p, r);
BUG_ON(!p->is_write);
get_access_mask(r, &mask, &shift);
@ -1128,7 +1168,7 @@ static int set_pmreg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, u64 va
{
bool set;
val &= kvm_pmu_valid_counter_mask(vcpu);
val &= kvm_pmu_accessible_counter_mask(vcpu);
switch (r->reg) {
case PMOVSSET_EL0:
@ -1151,7 +1191,7 @@ static int set_pmreg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, u64 va
static int get_pmreg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r, u64 *val)
{
u64 mask = kvm_pmu_valid_counter_mask(vcpu);
u64 mask = kvm_pmu_accessible_counter_mask(vcpu);
*val = __vcpu_sys_reg(vcpu, r->reg) & mask;
return 0;
@ -1165,7 +1205,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (pmu_access_el0_disabled(vcpu))
return false;
mask = kvm_pmu_valid_counter_mask(vcpu);
mask = kvm_pmu_accessible_counter_mask(vcpu);
if (p->is_write) {
val = p->regval & mask;
if (r->Op2 & 0x1) {
@ -1188,7 +1228,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
u64 mask = kvm_pmu_valid_counter_mask(vcpu);
u64 mask = kvm_pmu_accessible_counter_mask(vcpu);
if (check_pmu_access_disabled(vcpu, 0))
return false;
@ -1212,7 +1252,7 @@ static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
u64 mask = kvm_pmu_valid_counter_mask(vcpu);
u64 mask = kvm_pmu_accessible_counter_mask(vcpu);
if (pmu_access_el0_disabled(vcpu))
return false;
@ -1242,7 +1282,7 @@ static bool access_pmswinc(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (pmu_write_swinc_el0_disabled(vcpu))
return false;
mask = kvm_pmu_valid_counter_mask(vcpu);
mask = kvm_pmu_accessible_counter_mask(vcpu);
kvm_pmu_software_increment(vcpu, p->regval & mask);
return true;
}
@ -1509,6 +1549,9 @@ static u8 pmuver_to_perfmon(u8 pmuver)
}
}
static u64 sanitise_id_aa64pfr0_el1(const struct kvm_vcpu *vcpu, u64 val);
static u64 sanitise_id_aa64dfr0_el1(const struct kvm_vcpu *vcpu, u64 val);
/* Read a sanitised cpufeature ID register by sys_reg_desc */
static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
@ -1522,6 +1565,12 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
val = read_sanitised_ftr_reg(id);
switch (id) {
case SYS_ID_AA64DFR0_EL1:
val = sanitise_id_aa64dfr0_el1(vcpu, val);
break;
case SYS_ID_AA64PFR0_EL1:
val = sanitise_id_aa64pfr0_el1(vcpu, val);
break;
case SYS_ID_AA64PFR1_EL1:
if (!kvm_has_mte(vcpu->kvm))
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE);
@ -1535,6 +1584,7 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTEX);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_DF2);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_PFAR);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MPAM_frac);
break;
case SYS_ID_AA64PFR2_EL1:
/* We only expose FPMR */
@ -1692,11 +1742,8 @@ static unsigned int fp8_visibility(const struct kvm_vcpu *vcpu,
return REG_HIDDEN;
}
static u64 read_sanitised_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
static u64 sanitise_id_aa64pfr0_el1(const struct kvm_vcpu *vcpu, u64 val)
{
u64 val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
if (!vcpu_has_sve(vcpu))
val &= ~ID_AA64PFR0_EL1_SVE_MASK;
@ -1724,6 +1771,13 @@ static u64 read_sanitised_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
val &= ~ID_AA64PFR0_EL1_AMU_MASK;
/*
* MPAM is disabled by default as KVM also needs a set of PARTID to
* program the MPAMVPMx_EL2 PARTID remapping registers with. But some
* older kernels let the guest see the ID bit.
*/
val &= ~ID_AA64PFR0_EL1_MPAM_MASK;
return val;
}
@ -1737,11 +1791,8 @@ static u64 read_sanitised_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
(val); \
})
static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
static u64 sanitise_id_aa64dfr0_el1(const struct kvm_vcpu *vcpu, u64 val)
{
u64 val = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
val = ID_REG_LIMIT_FIELD_ENUM(val, ID_AA64DFR0_EL1, DebugVer, V8P8);
/*
@ -1834,6 +1885,70 @@ static int set_id_dfr0_el1(struct kvm_vcpu *vcpu,
return set_id_reg(vcpu, rd, val);
}
static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd, u64 user_val)
{
u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
u64 mpam_mask = ID_AA64PFR0_EL1_MPAM_MASK;
/*
* Commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits
* in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to
* guests, but didn't add trap handling. KVM doesn't support MPAM and
* always returns an UNDEF for these registers. The guest must see 0
* for this field.
*
* But KVM must also accept values from user-space that were provided
* by KVM. On CPUs that support MPAM, permit user-space to write
* the sanitizied value to ID_AA64PFR0_EL1.MPAM, but ignore this field.
*/
if ((hw_val & mpam_mask) == (user_val & mpam_mask))
user_val &= ~ID_AA64PFR0_EL1_MPAM_MASK;
return set_id_reg(vcpu, rd, user_val);
}
static int set_id_aa64pfr1_el1(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd, u64 user_val)
{
u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
u64 mpam_mask = ID_AA64PFR1_EL1_MPAM_frac_MASK;
/* See set_id_aa64pfr0_el1 for comment about MPAM */
if ((hw_val & mpam_mask) == (user_val & mpam_mask))
user_val &= ~ID_AA64PFR1_EL1_MPAM_frac_MASK;
return set_id_reg(vcpu, rd, user_val);
}
static int set_ctr_el0(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd, u64 user_val)
{
u8 user_L1Ip = SYS_FIELD_GET(CTR_EL0, L1Ip, user_val);
/*
* Both AIVIVT (0b01) and VPIPT (0b00) are documented as reserved.
* Hence only allow to set VIPT(0b10) or PIPT(0b11) for L1Ip based
* on what hardware reports.
*
* Using a VIPT software model on PIPT will lead to over invalidation,
* but still correct. Hence, we can allow downgrading PIPT to VIPT,
* but not the other way around. This is handled via arm64_ftr_safe_value()
* as CTR_EL0 ftr_bits has L1Ip field with type FTR_EXACT and safe value
* set as VIPT.
*/
switch (user_L1Ip) {
case CTR_EL0_L1Ip_RESERVED_VPIPT:
case CTR_EL0_L1Ip_RESERVED_AIVIVT:
return -EINVAL;
case CTR_EL0_L1Ip_VIPT:
case CTR_EL0_L1Ip_PIPT:
return set_id_reg(vcpu, rd, user_val);
default:
return -ENOENT;
}
}
/*
* cpufeature ID register user accessors
*
@ -2104,6 +2219,15 @@ static bool bad_redir_trap(struct kvm_vcpu *vcpu,
.val = v, \
}
#define EL2_REG_FILTERED(name, acc, rst, v, filter) { \
SYS_DESC(SYS_##name), \
.access = acc, \
.reset = rst, \
.reg = name, \
.visibility = filter, \
.val = v, \
}
#define EL2_REG_VNCR(name, rst, v) EL2_REG(name, bad_vncr_trap, rst, v)
#define EL2_REG_REDIR(name, rst, v) EL2_REG(name, bad_redir_trap, rst, v)
@ -2150,6 +2274,15 @@ static bool bad_redir_trap(struct kvm_vcpu *vcpu,
.val = mask, \
}
/* sys_reg_desc initialiser for cpufeature ID registers that need filtering */
#define ID_FILTERED(sysreg, name, mask) { \
ID_DESC(sysreg), \
.set_user = set_##name, \
.visibility = id_visibility, \
.reset = kvm_read_sanitised_id_reg, \
.val = (mask), \
}
/*
* sys_reg_desc initialiser for architecturally unallocated cpufeature ID
* register with encoding Op0=3, Op1=0, CRn=0, CRm=crm, Op2=op2
@ -2236,16 +2369,18 @@ static u64 reset_hcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
return __vcpu_sys_reg(vcpu, r->reg) = val;
}
static unsigned int __el2_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd,
unsigned int (*fn)(const struct kvm_vcpu *,
const struct sys_reg_desc *))
{
return el2_visibility(vcpu, rd) ?: fn(vcpu, rd);
}
static unsigned int sve_el2_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
unsigned int r;
r = el2_visibility(vcpu, rd);
if (r)
return r;
return sve_visibility(vcpu, rd);
return __el2_visibility(vcpu, rd, sve_visibility);
}
static bool access_zcr_el2(struct kvm_vcpu *vcpu,
@ -2273,12 +2408,48 @@ static bool access_zcr_el2(struct kvm_vcpu *vcpu,
static unsigned int s1poe_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
if (kvm_has_feat(vcpu->kvm, ID_AA64MMFR3_EL1, S1POE, IMP))
if (kvm_has_s1poe(vcpu->kvm))
return 0;
return REG_HIDDEN;
}
static unsigned int s1poe_el2_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
return __el2_visibility(vcpu, rd, s1poe_visibility);
}
static unsigned int tcr2_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
if (kvm_has_tcr2(vcpu->kvm))
return 0;
return REG_HIDDEN;
}
static unsigned int tcr2_el2_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
return __el2_visibility(vcpu, rd, tcr2_visibility);
}
static unsigned int s1pie_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
if (kvm_has_s1pie(vcpu->kvm))
return 0;
return REG_HIDDEN;
}
static unsigned int s1pie_el2_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
return __el2_visibility(vcpu, rd, s1pie_visibility);
}
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@ -2374,18 +2545,15 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* AArch64 ID registers */
/* CRm=4 */
{ SYS_DESC(SYS_ID_AA64PFR0_EL1),
.access = access_id_reg,
.get_user = get_id_reg,
.set_user = set_id_reg,
.reset = read_sanitised_id_aa64pfr0_el1,
.val = ~(ID_AA64PFR0_EL1_AMU |
ID_AA64PFR0_EL1_MPAM |
ID_AA64PFR0_EL1_SVE |
ID_AA64PFR0_EL1_RAS |
ID_AA64PFR0_EL1_AdvSIMD |
ID_AA64PFR0_EL1_FP), },
ID_WRITABLE(ID_AA64PFR1_EL1, ~(ID_AA64PFR1_EL1_PFAR |
ID_FILTERED(ID_AA64PFR0_EL1, id_aa64pfr0_el1,
~(ID_AA64PFR0_EL1_AMU |
ID_AA64PFR0_EL1_MPAM |
ID_AA64PFR0_EL1_SVE |
ID_AA64PFR0_EL1_RAS |
ID_AA64PFR0_EL1_AdvSIMD |
ID_AA64PFR0_EL1_FP)),
ID_FILTERED(ID_AA64PFR1_EL1, id_aa64pfr1_el1,
~(ID_AA64PFR1_EL1_PFAR |
ID_AA64PFR1_EL1_DF2 |
ID_AA64PFR1_EL1_MTEX |
ID_AA64PFR1_EL1_THE |
@ -2406,11 +2574,6 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_WRITABLE(ID_AA64FPFR0_EL1, ~ID_AA64FPFR0_EL1_RES0),
/* CRm=5 */
{ SYS_DESC(SYS_ID_AA64DFR0_EL1),
.access = access_id_reg,
.get_user = get_id_reg,
.set_user = set_id_aa64dfr0_el1,
.reset = read_sanitised_id_aa64dfr0_el1,
/*
* Prior to FEAT_Debugv8.9, the architecture defines context-aware
* breakpoints (CTX_CMPs) as the highest numbered breakpoints (BRPs).
@ -2423,10 +2586,11 @@ static const struct sys_reg_desc sys_reg_descs[] = {
* See DDI0487K.a, section D2.8.3 Breakpoint types and linking
* of breakpoints for more details.
*/
.val = ID_AA64DFR0_EL1_DoubleLock_MASK |
ID_AA64DFR0_EL1_WRPs_MASK |
ID_AA64DFR0_EL1_PMUVer_MASK |
ID_AA64DFR0_EL1_DebugVer_MASK, },
ID_FILTERED(ID_AA64DFR0_EL1, id_aa64dfr0_el1,
ID_AA64DFR0_EL1_DoubleLock_MASK |
ID_AA64DFR0_EL1_WRPs_MASK |
ID_AA64DFR0_EL1_PMUVer_MASK |
ID_AA64DFR0_EL1_DebugVer_MASK),
ID_SANITISED(ID_AA64DFR1_EL1),
ID_UNALLOCATED(5,2),
ID_UNALLOCATED(5,3),
@ -2489,7 +2653,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
{ SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 },
{ SYS_DESC(SYS_TCR2_EL1), access_vm_reg, reset_val, TCR2_EL1, 0 },
{ SYS_DESC(SYS_TCR2_EL1), access_vm_reg, reset_val, TCR2_EL1, 0,
.visibility = tcr2_visibility },
PTRAUTH_KEY(APIA),
PTRAUTH_KEY(APIB),
@ -2543,8 +2708,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_PMMIR_EL1), trap_raz_wi },
{ SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 },
{ SYS_DESC(SYS_PIRE0_EL1), NULL, reset_unknown, PIRE0_EL1 },
{ SYS_DESC(SYS_PIR_EL1), NULL, reset_unknown, PIR_EL1 },
{ SYS_DESC(SYS_PIRE0_EL1), NULL, reset_unknown, PIRE0_EL1,
.visibility = s1pie_visibility },
{ SYS_DESC(SYS_PIR_EL1), NULL, reset_unknown, PIR_EL1,
.visibility = s1pie_visibility },
{ SYS_DESC(SYS_POR_EL1), NULL, reset_unknown, POR_EL1,
.visibility = s1poe_visibility },
{ SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
@ -2553,8 +2720,11 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_LOREA_EL1), trap_loregion },
{ SYS_DESC(SYS_LORN_EL1), trap_loregion },
{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
{ SYS_DESC(SYS_MPAMIDR_EL1), undef_access },
{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
{ SYS_DESC(SYS_MPAM1_EL1), undef_access },
{ SYS_DESC(SYS_MPAM0_EL1), undef_access },
{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
@ -2599,10 +2769,12 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CCSIDR2_EL1), undef_access },
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
ID_WRITABLE(CTR_EL0, CTR_EL0_DIC_MASK |
CTR_EL0_IDC_MASK |
CTR_EL0_DminLine_MASK |
CTR_EL0_IminLine_MASK),
ID_FILTERED(CTR_EL0, ctr_el0,
CTR_EL0_DIC_MASK |
CTR_EL0_IDC_MASK |
CTR_EL0_DminLine_MASK |
CTR_EL0_L1Ip_MASK |
CTR_EL0_IminLine_MASK),
{ SYS_DESC(SYS_SVCR), undef_access, reset_val, SVCR, 0, .visibility = sme_visibility },
{ SYS_DESC(SYS_FPMR), undef_access, reset_val, FPMR, 0, .visibility = fp8_visibility },
@ -2818,14 +2990,16 @@ static const struct sys_reg_desc sys_reg_descs[] = {
EL2_REG_VNCR(HFGITR_EL2, reset_val, 0),
EL2_REG_VNCR(HACR_EL2, reset_val, 0),
{ SYS_DESC(SYS_ZCR_EL2), .access = access_zcr_el2, .reset = reset_val,
.visibility = sve_el2_visibility, .reg = ZCR_EL2 },
EL2_REG_FILTERED(ZCR_EL2, access_zcr_el2, reset_val, 0,
sve_el2_visibility),
EL2_REG_VNCR(HCRX_EL2, reset_val, 0),
EL2_REG(TTBR0_EL2, access_rw, reset_val, 0),
EL2_REG(TTBR1_EL2, access_rw, reset_val, 0),
EL2_REG(TCR_EL2, access_rw, reset_val, TCR_EL2_RES1),
EL2_REG_FILTERED(TCR2_EL2, access_rw, reset_val, TCR2_EL2_RES1,
tcr2_el2_visibility),
EL2_REG_VNCR(VTTBR_EL2, reset_val, 0),
EL2_REG_VNCR(VTCR_EL2, reset_val, 0),
@ -2853,7 +3027,24 @@ static const struct sys_reg_desc sys_reg_descs[] = {
EL2_REG(HPFAR_EL2, access_rw, reset_val, 0),
EL2_REG(MAIR_EL2, access_rw, reset_val, 0),
EL2_REG_FILTERED(PIRE0_EL2, access_rw, reset_val, 0,
s1pie_el2_visibility),
EL2_REG_FILTERED(PIR_EL2, access_rw, reset_val, 0,
s1pie_el2_visibility),
EL2_REG_FILTERED(POR_EL2, access_rw, reset_val, 0,
s1poe_el2_visibility),
EL2_REG(AMAIR_EL2, access_rw, reset_val, 0),
{ SYS_DESC(SYS_MPAMHCR_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPMV_EL2), undef_access },
{ SYS_DESC(SYS_MPAM2_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPM0_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPM1_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPM2_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPM3_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPM4_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPM5_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPM6_EL2), undef_access },
{ SYS_DESC(SYS_MPAMVPM7_EL2), undef_access },
EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
@ -4719,7 +4910,7 @@ void kvm_calculate_traps(struct kvm_vcpu *vcpu)
if (kvm_has_feat(kvm, ID_AA64ISAR2_EL1, MOPS, IMP))
vcpu->arch.hcrx_el2 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
if (kvm_has_feat(kvm, ID_AA64MMFR3_EL1, TCRX, IMP))
if (kvm_has_tcr2(kvm))
vcpu->arch.hcrx_el2 |= HCRX_EL2_TCR2En;
if (kvm_has_fpmr(kvm))
@ -4769,11 +4960,11 @@ void kvm_calculate_traps(struct kvm_vcpu *vcpu)
kvm->arch.fgu[HFGITR_GROUP] |= (HFGITR_EL2_ATS1E1RP |
HFGITR_EL2_ATS1E1WP);
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, S1PIE, IMP))
if (!kvm_has_s1pie(kvm))
kvm->arch.fgu[HFGxTR_GROUP] |= (HFGxTR_EL2_nPIRE0_EL1 |
HFGxTR_EL2_nPIR_EL1);
if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, S1POE, IMP))
if (!kvm_has_s1poe(kvm))
kvm->arch.fgu[HFGxTR_GROUP] |= (HFGxTR_EL2_nPOR_EL1 |
HFGxTR_EL2_nPOR_EL0);

View File

@ -782,6 +782,9 @@ static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its,
ite = find_ite(its, device_id, event_id);
if (ite && its_is_collection_mapped(ite->collection)) {
struct its_device *device = find_its_device(its, device_id);
int ite_esz = vgic_its_get_abi(its)->ite_esz;
gpa_t gpa = device->itt_addr + ite->event_id * ite_esz;
/*
* Though the spec talks about removing the pending state, we
* don't bother here since we clear the ITTE anyway and the
@ -790,7 +793,8 @@ static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its,
vgic_its_invalidate_cache(its);
its_free_ite(kvm, ite);
return 0;
return vgic_its_write_entry_lock(its, gpa, 0, ite_esz);
}
return E_ITS_DISCARD_UNMAPPED_INTERRUPT;
@ -1139,9 +1143,11 @@ static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its,
bool valid = its_cmd_get_validbit(its_cmd);
u8 num_eventid_bits = its_cmd_get_size(its_cmd);
gpa_t itt_addr = its_cmd_get_ittaddr(its_cmd);
int dte_esz = vgic_its_get_abi(its)->dte_esz;
struct its_device *device;
gpa_t gpa;
if (!vgic_its_check_id(its, its->baser_device_table, device_id, NULL))
if (!vgic_its_check_id(its, its->baser_device_table, device_id, &gpa))
return E_ITS_MAPD_DEVICE_OOR;
if (valid && num_eventid_bits > VITS_TYPER_IDBITS)
@ -1162,7 +1168,7 @@ static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its,
* is an error, so we are done in any case.
*/
if (!valid)
return 0;
return vgic_its_write_entry_lock(its, gpa, 0, dte_esz);
device = vgic_its_alloc_device(its, device_id, itt_addr,
num_eventid_bits);
@ -2086,7 +2092,6 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
static int vgic_its_save_ite(struct vgic_its *its, struct its_device *dev,
struct its_ite *ite, gpa_t gpa, int ite_esz)
{
struct kvm *kvm = its->dev->kvm;
u32 next_offset;
u64 val;
@ -2095,7 +2100,8 @@ static int vgic_its_save_ite(struct vgic_its *its, struct its_device *dev,
((u64)ite->irq->intid << KVM_ITS_ITE_PINTID_SHIFT) |
ite->collection->collection_id;
val = cpu_to_le64(val);
return vgic_write_guest_lock(kvm, gpa, &val, ite_esz);
return vgic_its_write_entry_lock(its, gpa, val, ite_esz);
}
/**
@ -2239,7 +2245,6 @@ static int vgic_its_restore_itt(struct vgic_its *its, struct its_device *dev)
static int vgic_its_save_dte(struct vgic_its *its, struct its_device *dev,
gpa_t ptr, int dte_esz)
{
struct kvm *kvm = its->dev->kvm;
u64 val, itt_addr_field;
u32 next_offset;
@ -2250,7 +2255,8 @@ static int vgic_its_save_dte(struct vgic_its *its, struct its_device *dev,
(itt_addr_field << KVM_ITS_DTE_ITTADDR_SHIFT) |
(dev->num_eventid_bits - 1));
val = cpu_to_le64(val);
return vgic_write_guest_lock(kvm, ptr, &val, dte_esz);
return vgic_its_write_entry_lock(its, ptr, val, dte_esz);
}
/**
@ -2437,7 +2443,8 @@ static int vgic_its_save_cte(struct vgic_its *its,
((u64)collection->target_addr << KVM_ITS_CTE_RDBASE_SHIFT) |
collection->collection_id);
val = cpu_to_le64(val);
return vgic_write_guest_lock(its->dev->kvm, gpa, &val, esz);
return vgic_its_write_entry_lock(its, gpa, val, esz);
}
/*
@ -2453,8 +2460,7 @@ static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz)
u64 val;
int ret;
BUG_ON(esz > sizeof(val));
ret = kvm_read_guest_lock(kvm, gpa, &val, esz);
ret = vgic_its_read_entry_lock(its, gpa, &val, esz);
if (ret)
return ret;
val = le64_to_cpu(val);
@ -2492,7 +2498,6 @@ static int vgic_its_save_collection_table(struct vgic_its *its)
u64 baser = its->baser_coll_table;
gpa_t gpa = GITS_BASER_ADDR_48_to_52(baser);
struct its_collection *collection;
u64 val;
size_t max_size, filled = 0;
int ret, cte_esz = abi->cte_esz;
@ -2516,10 +2521,7 @@ static int vgic_its_save_collection_table(struct vgic_its *its)
* table is not fully filled, add a last dummy element
* with valid bit unset
*/
val = 0;
BUG_ON(cte_esz > sizeof(val));
ret = vgic_write_guest_lock(its->dev->kvm, gpa, &val, cte_esz);
return ret;
return vgic_its_write_entry_lock(its, gpa, 0, cte_esz);
}
/*

View File

@ -146,6 +146,29 @@ static inline int vgic_write_guest_lock(struct kvm *kvm, gpa_t gpa,
return ret;
}
static inline int vgic_its_read_entry_lock(struct vgic_its *its, gpa_t eaddr,
u64 *eval, unsigned long esize)
{
struct kvm *kvm = its->dev->kvm;
if (KVM_BUG_ON(esize != sizeof(*eval), kvm))
return -EINVAL;
return kvm_read_guest_lock(kvm, eaddr, eval, esize);
}
static inline int vgic_its_write_entry_lock(struct vgic_its *its, gpa_t eaddr,
u64 eval, unsigned long esize)
{
struct kvm *kvm = its->dev->kvm;
if (KVM_BUG_ON(esize != sizeof(eval), kvm))
return -EINVAL;
return vgic_write_guest_lock(kvm, eaddr, &eval, esize);
}
/*
* This struct provides an intermediate representation of the fields contained
* in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC

View File

@ -62,6 +62,8 @@ HW_DBM
KVM_HVHE
KVM_PROTECTED_MODE
MISMATCHED_CACHE_TYPE
MPAM
MPAM_HCR
MTE
MTE_ASYMM
SME

View File

@ -1200,7 +1200,7 @@ UnsignedEnum 55:52 BRBE
0b0001 IMP
0b0010 BRBE_V1P1
EndEnum
Enum 51:48 MTPMU
SignedEnum 51:48 MTPMU
0b0000 NI_IMPDEF
0b0001 IMP
0b1111 NI
@ -1208,6 +1208,7 @@ EndEnum
UnsignedEnum 47:44 TraceBuffer
0b0000 NI
0b0001 IMP
0b0010 TRBE_V1P1
EndEnum
UnsignedEnum 43:40 TraceFilt
0b0000 NI
@ -1224,11 +1225,18 @@ UnsignedEnum 35:32 PMSVer
0b0011 V1P2
0b0100 V1P3
0b0101 V1P4
0b0110 V1P5
EndEnum
Field 31:28 CTX_CMPs
Res0 27:24
UnsignedEnum 27:24 SEBEP
0b0000 NI
0b0001 IMP
EndEnum
Field 23:20 WRPs
Res0 19:16
UnsignedEnum 19:16 PMSS
0b0000 NI
0b0001 IMP
EndEnum
Field 15:12 BRPs
UnsignedEnum 11:8 PMUVer
0b0000 NI
@ -1288,6 +1296,32 @@ Field 15:8 BRPs
Field 7:0 SYSPMUID
EndSysreg
Sysreg ID_AA64DFR2_EL1 3 0 0 5 2
Res0 63:28
UnsignedEnum 27:24 TRBE_EXC
0b0000 NI
0b0001 IMP
EndEnum
UnsignedEnum 23:20 SPE_nVM
0b0000 NI
0b0001 IMP
EndEnum
UnsignedEnum 19:16 SPE_EXC
0b0000 NI
0b0001 IMP
EndEnum
Res0 15:8
UnsignedEnum 7:4 BWE
0b0000 NI
0b0001 FEAT_BWE
0b0002 FEAT_BWE2
EndEnum
UnsignedEnum 3:0 STEP
0b0000 NI
0b0001 IMP
EndEnum
EndSysreg
Sysreg ID_AA64AFR0_EL1 3 0 0 5 4
Res0 63:32
Field 31:28 IMPDEF7
@ -2400,6 +2434,41 @@ Field 1 AFSR1_EL1
Field 0 AFSR0_EL1
EndSysregFields
Sysreg MDCR_EL2 3 4 1 1 1
Res0 63:51
Field 50 EnSTEPOP
Res0 49:44
Field 43 EBWE
Res0 42
Field 41:40 PMEE
Res0 39:37
Field 36 HPMFZS
Res0 35:32
Field 31:30 PMSSE
Field 29 HPMFZO
Field 28 MTPME
Field 27 TDCC
Field 26 HLP
Field 25:24 E2TB
Field 23 HCCD
Res0 22:20
Field 19 TTRF
Res0 18
Field 17 HPMD
Res0 16
Field 15 EnSPM
Field 14 TPMS
Field 13:12 E2PB
Field 11 TDRA
Field 10 TDOSA
Field 9 TDA
Field 8 TDE
Field 7 HPME
Field 6 TPM
Field 5 TPMCR
Field 4:0 HPMN
EndSysreg
Sysreg HFGRTR_EL2 3 4 1 1 4
Fields HFGxTR_EL2
EndSysreg
@ -2749,6 +2818,126 @@ Field 1 E2SPE
Field 0 E0HSPE
EndSysreg
Sysreg MPAMHCR_EL2 3 4 10 4 0
Res0 63:32
Field 31 TRAP_MPAMIDR_EL1
Res0 30:9
Field 8 GSTAPP_PLK
Res0 7:2
Field 1 EL1_VPMEN
Field 0 EL0_VPMEN
EndSysreg
Sysreg MPAMVPMV_EL2 3 4 10 4 1
Res0 63:32
Field 31 VPM_V31
Field 30 VPM_V30
Field 29 VPM_V29
Field 28 VPM_V28
Field 27 VPM_V27
Field 26 VPM_V26
Field 25 VPM_V25
Field 24 VPM_V24
Field 23 VPM_V23
Field 22 VPM_V22
Field 21 VPM_V21
Field 20 VPM_V20
Field 19 VPM_V19
Field 18 VPM_V18
Field 17 VPM_V17
Field 16 VPM_V16
Field 15 VPM_V15
Field 14 VPM_V14
Field 13 VPM_V13
Field 12 VPM_V12
Field 11 VPM_V11
Field 10 VPM_V10
Field 9 VPM_V9
Field 8 VPM_V8
Field 7 VPM_V7
Field 6 VPM_V6
Field 5 VPM_V5
Field 4 VPM_V4
Field 3 VPM_V3
Field 2 VPM_V2
Field 1 VPM_V1
Field 0 VPM_V0
EndSysreg
Sysreg MPAM2_EL2 3 4 10 5 0
Field 63 MPAMEN
Res0 62:59
Field 58 TIDR
Res0 57
Field 56 ALTSP_HFC
Field 55 ALTSP_EL2
Field 54 ALTSP_FRCD
Res0 53:51
Field 50 EnMPAMSM
Field 49 TRAPMPAM0EL1
Field 48 TRAPMPAM1EL1
Field 47:40 PMG_D
Field 39:32 PMG_I
Field 31:16 PARTID_D
Field 15:0 PARTID_I
EndSysreg
Sysreg MPAMVPM0_EL2 3 4 10 6 0
Field 63:48 PhyPARTID3
Field 47:32 PhyPARTID2
Field 31:16 PhyPARTID1
Field 15:0 PhyPARTID0
EndSysreg
Sysreg MPAMVPM1_EL2 3 4 10 6 1
Field 63:48 PhyPARTID7
Field 47:32 PhyPARTID6
Field 31:16 PhyPARTID5
Field 15:0 PhyPARTID4
EndSysreg
Sysreg MPAMVPM2_EL2 3 4 10 6 2
Field 63:48 PhyPARTID11
Field 47:32 PhyPARTID10
Field 31:16 PhyPARTID9
Field 15:0 PhyPARTID8
EndSysreg
Sysreg MPAMVPM3_EL2 3 4 10 6 3
Field 63:48 PhyPARTID15
Field 47:32 PhyPARTID14
Field 31:16 PhyPARTID13
Field 15:0 PhyPARTID12
EndSysreg
Sysreg MPAMVPM4_EL2 3 4 10 6 4
Field 63:48 PhyPARTID19
Field 47:32 PhyPARTID18
Field 31:16 PhyPARTID17
Field 15:0 PhyPARTID16
EndSysreg
Sysreg MPAMVPM5_EL2 3 4 10 6 5
Field 63:48 PhyPARTID23
Field 47:32 PhyPARTID22
Field 31:16 PhyPARTID21
Field 15:0 PhyPARTID20
EndSysreg
Sysreg MPAMVPM6_EL2 3 4 10 6 6
Field 63:48 PhyPARTID27
Field 47:32 PhyPARTID26
Field 31:16 PhyPARTID25
Field 15:0 PhyPARTID24
EndSysreg
Sysreg MPAMVPM7_EL2 3 4 10 6 7
Field 63:48 PhyPARTID31
Field 47:32 PhyPARTID30
Field 31:16 PhyPARTID29
Field 15:0 PhyPARTID28
EndSysreg
Sysreg CONTEXTIDR_EL2 3 4 13 0 1
Fields CONTEXTIDR_ELx
EndSysreg
@ -2781,6 +2970,10 @@ Sysreg FAR_EL12 3 5 6 0 0
Field 63:0 ADDR
EndSysreg
Sysreg MPAM1_EL12 3 5 10 5 0
Fields MPAM1_ELx
EndSysreg
Sysreg CONTEXTIDR_EL12 3 5 13 0 1
Fields CONTEXTIDR_ELx
EndSysreg
@ -2831,8 +3024,7 @@ Field 13 AMEC1
Field 12 AMEC0
Field 11 HAFT
Field 10 PTTWI
Field 9:8 SKL1
Field 7:6 SKL0
Res0 9:6
Field 5 D128
Field 4 AIE
Field 3 POE
@ -2895,6 +3087,10 @@ Sysreg PIRE0_EL12 3 5 10 2 2
Fields PIRx_ELx
EndSysreg
Sysreg PIRE0_EL2 3 4 10 2 2
Fields PIRx_ELx
EndSysreg
Sysreg PIR_EL1 3 0 10 2 3
Fields PIRx_ELx
EndSysreg
@ -2915,6 +3111,10 @@ Sysreg POR_EL1 3 0 10 2 4
Fields PIRx_ELx
EndSysreg
Sysreg POR_EL2 3 4 10 2 4
Fields PIRx_ELx
EndSysreg
Sysreg POR_EL12 3 5 10 2 4
Fields PIRx_ELx
EndSysreg
@ -2953,6 +3153,22 @@ Res0 1
Field 0 EN
EndSysreg
Sysreg MPAMIDR_EL1 3 0 10 4 4
Res0 63:62
Field 61 HAS_SDEFLT
Field 60 HAS_FORCE_NS
Field 59 SP4
Field 58 HAS_TIDR
Field 57 HAS_ALTSP
Res0 56:40
Field 39:32 PMG_MAX
Res0 31:21
Field 20:18 VPMR_MAX
Field 17 HAS_HCR
Res0 16
Field 15:0 PARTID_MAX
EndSysreg
Sysreg LORID_EL1 3 0 10 4 7
Res0 63:24
Field 23:16 LD
@ -2960,6 +3176,27 @@ Res0 15:8
Field 7:0 LR
EndSysreg
Sysreg MPAM1_EL1 3 0 10 5 0
Field 63 MPAMEN
Res0 62:61
Field 60 FORCED_NS
Res0 59:55
Field 54 ALTSP_FRCD
Res0 53:48
Field 47:40 PMG_D
Field 39:32 PMG_I
Field 31:16 PARTID_D
Field 15:0 PARTID_I
EndSysreg
Sysreg MPAM0_EL1 3 0 10 5 1
Res0 63:48
Field 47:40 PMG_D
Field 39:32 PMG_I
Field 31:16 PARTID_D
Field 15:0 PARTID_I
EndSysreg
Sysreg ISR_EL1 3 0 12 1 0
Res0 63:11
Field 10 IS

View File

@ -65,6 +65,7 @@ extern struct acpi_vector_group pch_group[MAX_IO_PICS];
extern struct acpi_vector_group msi_group[MAX_IO_PICS];
#define CORES_PER_EIO_NODE 4
#define CORES_PER_VEIO_NODE 256
#define LOONGSON_CPU_UART0_VEC 10 /* CPU UART0 */
#define LOONGSON_CPU_THSENS_VEC 14 /* CPU Thsens */

View File

@ -0,0 +1,123 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) 2024 Loongson Technology Corporation Limited
*/
#ifndef __ASM_KVM_EIOINTC_H
#define __ASM_KVM_EIOINTC_H
#include <kvm/iodev.h>
#define EIOINTC_IRQS 256
#define EIOINTC_ROUTE_MAX_VCPUS 256
#define EIOINTC_IRQS_U8_NUMS (EIOINTC_IRQS / 8)
#define EIOINTC_IRQS_U16_NUMS (EIOINTC_IRQS_U8_NUMS / 2)
#define EIOINTC_IRQS_U32_NUMS (EIOINTC_IRQS_U8_NUMS / 4)
#define EIOINTC_IRQS_U64_NUMS (EIOINTC_IRQS_U8_NUMS / 8)
/* map to ipnum per 32 irqs */
#define EIOINTC_IRQS_NODETYPE_COUNT 16
#define EIOINTC_BASE 0x1400
#define EIOINTC_SIZE 0x900
#define EIOINTC_NODETYPE_START 0xa0
#define EIOINTC_NODETYPE_END 0xbf
#define EIOINTC_IPMAP_START 0xc0
#define EIOINTC_IPMAP_END 0xc7
#define EIOINTC_ENABLE_START 0x200
#define EIOINTC_ENABLE_END 0x21f
#define EIOINTC_BOUNCE_START 0x280
#define EIOINTC_BOUNCE_END 0x29f
#define EIOINTC_ISR_START 0x300
#define EIOINTC_ISR_END 0x31f
#define EIOINTC_COREISR_START 0x400
#define EIOINTC_COREISR_END 0x41f
#define EIOINTC_COREMAP_START 0x800
#define EIOINTC_COREMAP_END 0x8ff
#define EIOINTC_VIRT_BASE (0x40000000)
#define EIOINTC_VIRT_SIZE (0x1000)
#define EIOINTC_VIRT_FEATURES (0x0)
#define EIOINTC_HAS_VIRT_EXTENSION (0)
#define EIOINTC_HAS_ENABLE_OPTION (1)
#define EIOINTC_HAS_INT_ENCODE (2)
#define EIOINTC_HAS_CPU_ENCODE (3)
#define EIOINTC_VIRT_HAS_FEATURES ((1U << EIOINTC_HAS_VIRT_EXTENSION) \
| (1U << EIOINTC_HAS_ENABLE_OPTION) \
| (1U << EIOINTC_HAS_INT_ENCODE) \
| (1U << EIOINTC_HAS_CPU_ENCODE))
#define EIOINTC_VIRT_CONFIG (0x4)
#define EIOINTC_ENABLE (1)
#define EIOINTC_ENABLE_INT_ENCODE (2)
#define EIOINTC_ENABLE_CPU_ENCODE (3)
#define LOONGSON_IP_NUM 8
struct loongarch_eiointc {
spinlock_t lock;
struct kvm *kvm;
struct kvm_io_device device;
struct kvm_io_device device_vext;
uint32_t num_cpu;
uint32_t features;
uint32_t status;
/* hardware state */
union nodetype {
u64 reg_u64[EIOINTC_IRQS_NODETYPE_COUNT / 4];
u32 reg_u32[EIOINTC_IRQS_NODETYPE_COUNT / 2];
u16 reg_u16[EIOINTC_IRQS_NODETYPE_COUNT];
u8 reg_u8[EIOINTC_IRQS_NODETYPE_COUNT * 2];
} nodetype;
/* one bit shows the state of one irq */
union bounce {
u64 reg_u64[EIOINTC_IRQS_U64_NUMS];
u32 reg_u32[EIOINTC_IRQS_U32_NUMS];
u16 reg_u16[EIOINTC_IRQS_U16_NUMS];
u8 reg_u8[EIOINTC_IRQS_U8_NUMS];
} bounce;
union isr {
u64 reg_u64[EIOINTC_IRQS_U64_NUMS];
u32 reg_u32[EIOINTC_IRQS_U32_NUMS];
u16 reg_u16[EIOINTC_IRQS_U16_NUMS];
u8 reg_u8[EIOINTC_IRQS_U8_NUMS];
} isr;
union coreisr {
u64 reg_u64[EIOINTC_ROUTE_MAX_VCPUS][EIOINTC_IRQS_U64_NUMS];
u32 reg_u32[EIOINTC_ROUTE_MAX_VCPUS][EIOINTC_IRQS_U32_NUMS];
u16 reg_u16[EIOINTC_ROUTE_MAX_VCPUS][EIOINTC_IRQS_U16_NUMS];
u8 reg_u8[EIOINTC_ROUTE_MAX_VCPUS][EIOINTC_IRQS_U8_NUMS];
} coreisr;
union enable {
u64 reg_u64[EIOINTC_IRQS_U64_NUMS];
u32 reg_u32[EIOINTC_IRQS_U32_NUMS];
u16 reg_u16[EIOINTC_IRQS_U16_NUMS];
u8 reg_u8[EIOINTC_IRQS_U8_NUMS];
} enable;
/* use one byte to config ipmap for 32 irqs at once */
union ipmap {
u64 reg_u64;
u32 reg_u32[EIOINTC_IRQS_U32_NUMS / 4];
u16 reg_u16[EIOINTC_IRQS_U16_NUMS / 4];
u8 reg_u8[EIOINTC_IRQS_U8_NUMS / 4];
} ipmap;
/* use one byte to config coremap for one irq */
union coremap {
u64 reg_u64[EIOINTC_IRQS / 8];
u32 reg_u32[EIOINTC_IRQS / 4];
u16 reg_u16[EIOINTC_IRQS / 2];
u8 reg_u8[EIOINTC_IRQS];
} coremap;
DECLARE_BITMAP(sw_coreisr[EIOINTC_ROUTE_MAX_VCPUS][LOONGSON_IP_NUM], EIOINTC_IRQS);
uint8_t sw_coremap[EIOINTC_IRQS];
};
int kvm_loongarch_register_eiointc_device(void);
void eiointc_set_irq(struct loongarch_eiointc *s, int irq, int level);
#endif /* __ASM_KVM_EIOINTC_H */

View File

@ -18,8 +18,13 @@
#include <asm/inst.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_ipi.h>
#include <asm/kvm_eiointc.h>
#include <asm/kvm_pch_pic.h>
#include <asm/loongarch.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
/* Loongarch KVM register ids */
#define KVM_GET_IOC_CSR_IDX(id) ((id & KVM_CSR_IDX_MASK) >> LOONGARCH_REG_SHIFT)
#define KVM_GET_IOC_CPUCFG_IDX(id) ((id & KVM_CPUCFG_IDX_MASK) >> LOONGARCH_REG_SHIFT)
@ -44,6 +49,12 @@ struct kvm_vm_stat {
struct kvm_vm_stat_generic generic;
u64 pages;
u64 hugepages;
u64 ipi_read_exits;
u64 ipi_write_exits;
u64 eiointc_read_exits;
u64 eiointc_write_exits;
u64 pch_pic_read_exits;
u64 pch_pic_write_exits;
};
struct kvm_vcpu_stat {
@ -84,7 +95,7 @@ struct kvm_world_switch {
*
* For LOONGARCH_CSR_CPUID register, max CPUID size if 512
* For IPI hardware, max destination CPUID size 1024
* For extioi interrupt controller, max destination CPUID size is 256
* For eiointc interrupt controller, max destination CPUID size is 256
* For msgint interrupt controller, max supported CPUID size is 65536
*
* Currently max CPUID is defined as 256 for KVM hypervisor, in future
@ -117,6 +128,9 @@ struct kvm_arch {
s64 time_offset;
struct kvm_context __percpu *vmcs;
struct loongarch_ipi *ipi;
struct loongarch_eiointc *eiointc;
struct loongarch_pch_pic *pch_pic;
};
#define CSR_MAX_NUMS 0x800
@ -221,6 +235,8 @@ struct kvm_vcpu_arch {
int last_sched_cpu;
/* mp state */
struct kvm_mp_state mp_state;
/* ipi state */
struct ipi_state ipi_state;
/* cpucfg */
u32 cpucfg[KVM_MAX_CPUCFG_REGS];

View File

@ -0,0 +1,45 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) 2024 Loongson Technology Corporation Limited
*/
#ifndef __ASM_KVM_IPI_H
#define __ASM_KVM_IPI_H
#include <kvm/iodev.h>
#define LARCH_INT_IPI 12
struct loongarch_ipi {
spinlock_t lock;
struct kvm *kvm;
struct kvm_io_device device;
};
struct ipi_state {
spinlock_t lock;
uint32_t status;
uint32_t en;
uint32_t set;
uint32_t clear;
uint64_t buf[4];
};
#define IOCSR_IPI_BASE 0x1000
#define IOCSR_IPI_SIZE 0x160
#define IOCSR_IPI_STATUS 0x000
#define IOCSR_IPI_EN 0x004
#define IOCSR_IPI_SET 0x008
#define IOCSR_IPI_CLEAR 0x00c
#define IOCSR_IPI_BUF_20 0x020
#define IOCSR_IPI_BUF_28 0x028
#define IOCSR_IPI_BUF_30 0x030
#define IOCSR_IPI_BUF_38 0x038
#define IOCSR_IPI_SEND 0x040
#define IOCSR_MAIL_SEND 0x048
#define IOCSR_ANY_SEND 0x158
int kvm_loongarch_register_ipi_device(void);
#endif

View File

@ -0,0 +1,62 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) 2024 Loongson Technology Corporation Limited
*/
#ifndef __ASM_KVM_PCH_PIC_H
#define __ASM_KVM_PCH_PIC_H
#include <kvm/iodev.h>
#define PCH_PIC_SIZE 0x3e8
#define PCH_PIC_INT_ID_START 0x0
#define PCH_PIC_INT_ID_END 0x7
#define PCH_PIC_MASK_START 0x20
#define PCH_PIC_MASK_END 0x27
#define PCH_PIC_HTMSI_EN_START 0x40
#define PCH_PIC_HTMSI_EN_END 0x47
#define PCH_PIC_EDGE_START 0x60
#define PCH_PIC_EDGE_END 0x67
#define PCH_PIC_CLEAR_START 0x80
#define PCH_PIC_CLEAR_END 0x87
#define PCH_PIC_AUTO_CTRL0_START 0xc0
#define PCH_PIC_AUTO_CTRL0_END 0xc7
#define PCH_PIC_AUTO_CTRL1_START 0xe0
#define PCH_PIC_AUTO_CTRL1_END 0xe7
#define PCH_PIC_ROUTE_ENTRY_START 0x100
#define PCH_PIC_ROUTE_ENTRY_END 0x13f
#define PCH_PIC_HTMSI_VEC_START 0x200
#define PCH_PIC_HTMSI_VEC_END 0x23f
#define PCH_PIC_INT_IRR_START 0x380
#define PCH_PIC_INT_IRR_END 0x38f
#define PCH_PIC_INT_ISR_START 0x3a0
#define PCH_PIC_INT_ISR_END 0x3af
#define PCH_PIC_POLARITY_START 0x3e0
#define PCH_PIC_POLARITY_END 0x3e7
#define PCH_PIC_INT_ID_VAL 0x7000000UL
#define PCH_PIC_INT_ID_VER 0x1UL
struct loongarch_pch_pic {
spinlock_t lock;
struct kvm *kvm;
struct kvm_io_device device;
uint64_t mask; /* 1:disable irq, 0:enable irq */
uint64_t htmsi_en; /* 1:msi */
uint64_t edge; /* 1:edge triggered, 0:level triggered */
uint64_t auto_ctrl0; /* only use default value 00b */
uint64_t auto_ctrl1; /* only use default value 00b */
uint64_t last_intirr; /* edge detection */
uint64_t irr; /* interrupt request register */
uint64_t isr; /* interrupt service register */
uint64_t polarity; /* 0: high level trigger, 1: low level trigger */
uint8_t route_entry[64]; /* default value 0, route to int0: eiointc */
uint8_t htmsi_vector[64]; /* irq route table for routing to eiointc */
uint64_t pch_pic_base;
};
int kvm_loongarch_register_pch_pic_device(void);
void pch_pic_set_irq(struct loongarch_pch_pic *s, int irq, int level);
void pch_msi_set_irq(struct kvm *kvm, int irq, int level);
#endif /* __ASM_KVM_PCH_PIC_H */

View File

@ -8,6 +8,8 @@
#include <linux/types.h>
#define __KVM_HAVE_IRQ_LINE
/*
* KVM LoongArch specific structures and definitions.
*
@ -132,4 +134,22 @@ struct kvm_iocsr_entry {
#define KVM_IRQCHIP_NUM_PINS 64
#define KVM_MAX_CORES 256
#define KVM_DEV_LOONGARCH_IPI_GRP_REGS 0x40000001
#define KVM_DEV_LOONGARCH_EXTIOI_GRP_REGS 0x40000002
#define KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS 0x40000003
#define KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_NUM_CPU 0x0
#define KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_FEATURE 0x1
#define KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_STATE 0x2
#define KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL 0x40000004
#define KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_NUM_CPU 0x0
#define KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_FEATURE 0x1
#define KVM_DEV_LOONGARCH_EXTIOI_CTRL_LOAD_FINISHED 0x3
#define KVM_DEV_LOONGARCH_PCH_PIC_GRP_REGS 0x40000005
#define KVM_DEV_LOONGARCH_PCH_PIC_GRP_CTRL 0x40000006
#define KVM_DEV_LOONGARCH_PCH_PIC_CTRL_INIT 0
#endif /* __UAPI_ASM_LOONGARCH_KVM_H */

View File

@ -21,13 +21,16 @@ config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on AS_HAS_LVZ_EXTENSION
select HAVE_KVM_DIRTY_RING_ACQ_REL
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_IRQCHIP
select HAVE_KVM_MSI
select HAVE_KVM_READONLY_MEM
select HAVE_KVM_VCPU_ASYNC_IOCTL
select KVM_COMMON
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select KVM_GENERIC_HARDWARE_ENABLING
select KVM_GENERIC_MMU_NOTIFIER
select KVM_MMIO
select HAVE_KVM_READONLY_MEM
select KVM_XFER_TO_GUEST_WORK
select SCHED_INFO
help

View File

@ -18,5 +18,9 @@ kvm-y += timer.o
kvm-y += tlb.o
kvm-y += vcpu.o
kvm-y += vm.o
kvm-y += intc/ipi.o
kvm-y += intc/eiointc.o
kvm-y += intc/pch_pic.o
kvm-y += irqfd.o
CFLAGS_exit.o += $(call cc-option,-Wno-override-init,)

View File

@ -157,7 +157,7 @@ static int kvm_handle_csr(struct kvm_vcpu *vcpu, larch_inst inst)
int kvm_emu_iocsr(larch_inst inst, struct kvm_run *run, struct kvm_vcpu *vcpu)
{
int ret;
unsigned long val;
unsigned long *val;
u32 addr, rd, rj, opcode;
/*
@ -170,6 +170,7 @@ int kvm_emu_iocsr(larch_inst inst, struct kvm_run *run, struct kvm_vcpu *vcpu)
ret = EMULATE_DO_IOCSR;
run->iocsr_io.phys_addr = addr;
run->iocsr_io.is_write = 0;
val = &vcpu->arch.gprs[rd];
/* LoongArch is Little endian */
switch (opcode) {
@ -202,16 +203,25 @@ int kvm_emu_iocsr(larch_inst inst, struct kvm_run *run, struct kvm_vcpu *vcpu)
run->iocsr_io.is_write = 1;
break;
default:
ret = EMULATE_FAIL;
break;
return EMULATE_FAIL;
}
if (ret == EMULATE_DO_IOCSR) {
if (run->iocsr_io.is_write) {
val = vcpu->arch.gprs[rd];
memcpy(run->iocsr_io.data, &val, run->iocsr_io.len);
}
vcpu->arch.io_gpr = rd;
if (run->iocsr_io.is_write) {
if (!kvm_io_bus_write(vcpu, KVM_IOCSR_BUS, addr, run->iocsr_io.len, val))
ret = EMULATE_DONE;
else
/* Save data and let user space to write it */
memcpy(run->iocsr_io.data, val, run->iocsr_io.len);
trace_kvm_iocsr(KVM_TRACE_IOCSR_WRITE, run->iocsr_io.len, addr, val);
} else {
if (!kvm_io_bus_read(vcpu, KVM_IOCSR_BUS, addr, run->iocsr_io.len, val))
ret = EMULATE_DONE;
else
/* Save register id for iocsr read completion */
vcpu->arch.io_gpr = rd;
trace_kvm_iocsr(KVM_TRACE_IOCSR_READ, run->iocsr_io.len, addr, NULL);
}
return ret;
@ -447,19 +457,33 @@ int kvm_emu_mmio_read(struct kvm_vcpu *vcpu, larch_inst inst)
}
if (ret == EMULATE_DO_MMIO) {
trace_kvm_mmio(KVM_TRACE_MMIO_READ, run->mmio.len, run->mmio.phys_addr, NULL);
/*
* If mmio device such as PCH-PIC is emulated in KVM,
* it need not return to user space to handle the mmio
* exception.
*/
ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, vcpu->arch.badv,
run->mmio.len, &vcpu->arch.gprs[rd]);
if (!ret) {
update_pc(&vcpu->arch);
vcpu->mmio_needed = 0;
return EMULATE_DONE;
}
/* Set for kvm_complete_mmio_read() use */
vcpu->arch.io_gpr = rd;
run->mmio.is_write = 0;
vcpu->mmio_is_write = 0;
trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, run->mmio.len,
run->mmio.phys_addr, NULL);
} else {
kvm_err("Read not supported Inst=0x%08x @%lx BadVaddr:%#lx\n",
inst.word, vcpu->arch.pc, vcpu->arch.badv);
kvm_arch_vcpu_dump_regs(vcpu);
vcpu->mmio_needed = 0;
return EMULATE_DO_MMIO;
}
kvm_err("Read not supported Inst=0x%08x @%lx BadVaddr:%#lx\n",
inst.word, vcpu->arch.pc, vcpu->arch.badv);
kvm_arch_vcpu_dump_regs(vcpu);
vcpu->mmio_needed = 0;
return ret;
}
@ -600,19 +624,29 @@ int kvm_emu_mmio_write(struct kvm_vcpu *vcpu, larch_inst inst)
}
if (ret == EMULATE_DO_MMIO) {
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, run->mmio.len, run->mmio.phys_addr, data);
/*
* If mmio device such as PCH-PIC is emulated in KVM,
* it need not return to user space to handle the mmio
* exception.
*/
ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, vcpu->arch.badv, run->mmio.len, data);
if (!ret)
return EMULATE_DONE;
run->mmio.is_write = 1;
vcpu->mmio_needed = 1;
vcpu->mmio_is_write = 1;
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, run->mmio.len,
run->mmio.phys_addr, data);
} else {
vcpu->arch.pc = curr_pc;
kvm_err("Write not supported Inst=0x%08x @%lx BadVaddr:%#lx\n",
inst.word, vcpu->arch.pc, vcpu->arch.badv);
kvm_arch_vcpu_dump_regs(vcpu);
/* Rollback PC if emulation was unsuccessful */
return EMULATE_DO_MMIO;
}
vcpu->arch.pc = curr_pc;
kvm_err("Write not supported Inst=0x%08x @%lx BadVaddr:%#lx\n",
inst.word, vcpu->arch.pc, vcpu->arch.badv);
kvm_arch_vcpu_dump_regs(vcpu);
/* Rollback PC if emulation was unsuccessful */
return ret;
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,475 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2024 Loongson Technology Corporation Limited
*/
#include <linux/kvm_host.h>
#include <asm/kvm_ipi.h>
#include <asm/kvm_vcpu.h>
static void ipi_send(struct kvm *kvm, uint64_t data)
{
int cpu, action;
uint32_t status;
struct kvm_vcpu *vcpu;
struct kvm_interrupt irq;
cpu = ((data & 0xffffffff) >> 16) & 0x3ff;
vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu);
if (unlikely(vcpu == NULL)) {
kvm_err("%s: invalid target cpu: %d\n", __func__, cpu);
return;
}
action = BIT(data & 0x1f);
spin_lock(&vcpu->arch.ipi_state.lock);
status = vcpu->arch.ipi_state.status;
vcpu->arch.ipi_state.status |= action;
spin_unlock(&vcpu->arch.ipi_state.lock);
if (status == 0) {
irq.irq = LARCH_INT_IPI;
kvm_vcpu_ioctl_interrupt(vcpu, &irq);
}
}
static void ipi_clear(struct kvm_vcpu *vcpu, uint64_t data)
{
uint32_t status;
struct kvm_interrupt irq;
spin_lock(&vcpu->arch.ipi_state.lock);
vcpu->arch.ipi_state.status &= ~data;
status = vcpu->arch.ipi_state.status;
spin_unlock(&vcpu->arch.ipi_state.lock);
if (status == 0) {
irq.irq = -LARCH_INT_IPI;
kvm_vcpu_ioctl_interrupt(vcpu, &irq);
}
}
static uint64_t read_mailbox(struct kvm_vcpu *vcpu, int offset, int len)
{
uint64_t data = 0;
spin_lock(&vcpu->arch.ipi_state.lock);
data = *(ulong *)((void *)vcpu->arch.ipi_state.buf + (offset - 0x20));
spin_unlock(&vcpu->arch.ipi_state.lock);
switch (len) {
case 1:
return data & 0xff;
case 2:
return data & 0xffff;
case 4:
return data & 0xffffffff;
case 8:
return data;
default:
kvm_err("%s: unknown data len: %d\n", __func__, len);
return 0;
}
}
static void write_mailbox(struct kvm_vcpu *vcpu, int offset, uint64_t data, int len)
{
void *pbuf;
spin_lock(&vcpu->arch.ipi_state.lock);
pbuf = (void *)vcpu->arch.ipi_state.buf + (offset - 0x20);
switch (len) {
case 1:
*(unsigned char *)pbuf = (unsigned char)data;
break;
case 2:
*(unsigned short *)pbuf = (unsigned short)data;
break;
case 4:
*(unsigned int *)pbuf = (unsigned int)data;
break;
case 8:
*(unsigned long *)pbuf = (unsigned long)data;
break;
default:
kvm_err("%s: unknown data len: %d\n", __func__, len);
}
spin_unlock(&vcpu->arch.ipi_state.lock);
}
static int send_ipi_data(struct kvm_vcpu *vcpu, gpa_t addr, uint64_t data)
{
int i, ret;
uint32_t val = 0, mask = 0;
/*
* Bit 27-30 is mask for byte writing.
* If the mask is 0, we need not to do anything.
*/
if ((data >> 27) & 0xf) {
/* Read the old val */
ret = kvm_io_bus_read(vcpu, KVM_IOCSR_BUS, addr, sizeof(val), &val);
if (unlikely(ret)) {
kvm_err("%s: : read date from addr %llx failed\n", __func__, addr);
return ret;
}
/* Construct the mask by scanning the bit 27-30 */
for (i = 0; i < 4; i++) {
if (data & (BIT(27 + i)))
mask |= (0xff << (i * 8));
}
/* Save the old part of val */
val &= mask;
}
val |= ((uint32_t)(data >> 32) & ~mask);
ret = kvm_io_bus_write(vcpu, KVM_IOCSR_BUS, addr, sizeof(val), &val);
if (unlikely(ret))
kvm_err("%s: : write date to addr %llx failed\n", __func__, addr);
return ret;
}
static int mail_send(struct kvm *kvm, uint64_t data)
{
int cpu, mailbox, offset;
struct kvm_vcpu *vcpu;
cpu = ((data & 0xffffffff) >> 16) & 0x3ff;
vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu);
if (unlikely(vcpu == NULL)) {
kvm_err("%s: invalid target cpu: %d\n", __func__, cpu);
return -EINVAL;
}
mailbox = ((data & 0xffffffff) >> 2) & 0x7;
offset = IOCSR_IPI_BASE + IOCSR_IPI_BUF_20 + mailbox * 4;
return send_ipi_data(vcpu, offset, data);
}
static int any_send(struct kvm *kvm, uint64_t data)
{
int cpu, offset;
struct kvm_vcpu *vcpu;
cpu = ((data & 0xffffffff) >> 16) & 0x3ff;
vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu);
if (unlikely(vcpu == NULL)) {
kvm_err("%s: invalid target cpu: %d\n", __func__, cpu);
return -EINVAL;
}
offset = data & 0xffff;
return send_ipi_data(vcpu, offset, data);
}
static int loongarch_ipi_readl(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *val)
{
int ret = 0;
uint32_t offset;
uint64_t res = 0;
offset = (uint32_t)(addr & 0x1ff);
WARN_ON_ONCE(offset & (len - 1));
switch (offset) {
case IOCSR_IPI_STATUS:
spin_lock(&vcpu->arch.ipi_state.lock);
res = vcpu->arch.ipi_state.status;
spin_unlock(&vcpu->arch.ipi_state.lock);
break;
case IOCSR_IPI_EN:
spin_lock(&vcpu->arch.ipi_state.lock);
res = vcpu->arch.ipi_state.en;
spin_unlock(&vcpu->arch.ipi_state.lock);
break;
case IOCSR_IPI_SET:
res = 0;
break;
case IOCSR_IPI_CLEAR:
res = 0;
break;
case IOCSR_IPI_BUF_20 ... IOCSR_IPI_BUF_38 + 7:
if (offset + len > IOCSR_IPI_BUF_38 + 8) {
kvm_err("%s: invalid offset or len: offset = %d, len = %d\n",
__func__, offset, len);
ret = -EINVAL;
break;
}
res = read_mailbox(vcpu, offset, len);
break;
default:
kvm_err("%s: unknown addr: %llx\n", __func__, addr);
ret = -EINVAL;
break;
}
*(uint64_t *)val = res;
return ret;
}
static int loongarch_ipi_writel(struct kvm_vcpu *vcpu, gpa_t addr, int len, const void *val)
{
int ret = 0;
uint64_t data;
uint32_t offset;
data = *(uint64_t *)val;
offset = (uint32_t)(addr & 0x1ff);
WARN_ON_ONCE(offset & (len - 1));
switch (offset) {
case IOCSR_IPI_STATUS:
ret = -EINVAL;
break;
case IOCSR_IPI_EN:
spin_lock(&vcpu->arch.ipi_state.lock);
vcpu->arch.ipi_state.en = data;
spin_unlock(&vcpu->arch.ipi_state.lock);
break;
case IOCSR_IPI_SET:
ret = -EINVAL;
break;
case IOCSR_IPI_CLEAR:
/* Just clear the status of the current vcpu */
ipi_clear(vcpu, data);
break;
case IOCSR_IPI_BUF_20 ... IOCSR_IPI_BUF_38 + 7:
if (offset + len > IOCSR_IPI_BUF_38 + 8) {
kvm_err("%s: invalid offset or len: offset = %d, len = %d\n",
__func__, offset, len);
ret = -EINVAL;
break;
}
write_mailbox(vcpu, offset, data, len);
break;
case IOCSR_IPI_SEND:
ipi_send(vcpu->kvm, data);
break;
case IOCSR_MAIL_SEND:
ret = mail_send(vcpu->kvm, *(uint64_t *)val);
break;
case IOCSR_ANY_SEND:
ret = any_send(vcpu->kvm, *(uint64_t *)val);
break;
default:
kvm_err("%s: unknown addr: %llx\n", __func__, addr);
ret = -EINVAL;
break;
}
return ret;
}
static int kvm_ipi_read(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, void *val)
{
int ret;
struct loongarch_ipi *ipi;
ipi = vcpu->kvm->arch.ipi;
if (!ipi) {
kvm_err("%s: ipi irqchip not valid!\n", __func__);
return -EINVAL;
}
ipi->kvm->stat.ipi_read_exits++;
ret = loongarch_ipi_readl(vcpu, addr, len, val);
return ret;
}
static int kvm_ipi_write(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, const void *val)
{
int ret;
struct loongarch_ipi *ipi;
ipi = vcpu->kvm->arch.ipi;
if (!ipi) {
kvm_err("%s: ipi irqchip not valid!\n", __func__);
return -EINVAL;
}
ipi->kvm->stat.ipi_write_exits++;
ret = loongarch_ipi_writel(vcpu, addr, len, val);
return ret;
}
static const struct kvm_io_device_ops kvm_ipi_ops = {
.read = kvm_ipi_read,
.write = kvm_ipi_write,
};
static int kvm_ipi_regs_access(struct kvm_device *dev,
struct kvm_device_attr *attr,
bool is_write)
{
int len = 4;
int cpu, addr;
uint64_t val;
void *p = NULL;
struct kvm_vcpu *vcpu;
cpu = (attr->attr >> 16) & 0x3ff;
addr = attr->attr & 0xff;
vcpu = kvm_get_vcpu(dev->kvm, cpu);
if (unlikely(vcpu == NULL)) {
kvm_err("%s: invalid target cpu: %d\n", __func__, cpu);
return -EINVAL;
}
switch (addr) {
case IOCSR_IPI_STATUS:
p = &vcpu->arch.ipi_state.status;
break;
case IOCSR_IPI_EN:
p = &vcpu->arch.ipi_state.en;
break;
case IOCSR_IPI_SET:
p = &vcpu->arch.ipi_state.set;
break;
case IOCSR_IPI_CLEAR:
p = &vcpu->arch.ipi_state.clear;
break;
case IOCSR_IPI_BUF_20:
p = &vcpu->arch.ipi_state.buf[0];
len = 8;
break;
case IOCSR_IPI_BUF_28:
p = &vcpu->arch.ipi_state.buf[1];
len = 8;
break;
case IOCSR_IPI_BUF_30:
p = &vcpu->arch.ipi_state.buf[2];
len = 8;
break;
case IOCSR_IPI_BUF_38:
p = &vcpu->arch.ipi_state.buf[3];
len = 8;
break;
default:
kvm_err("%s: unknown ipi register, addr = %d\n", __func__, addr);
return -EINVAL;
}
if (is_write) {
if (len == 4) {
if (get_user(val, (uint32_t __user *)attr->addr))
return -EFAULT;
*(uint32_t *)p = (uint32_t)val;
} else if (len == 8) {
if (get_user(val, (uint64_t __user *)attr->addr))
return -EFAULT;
*(uint64_t *)p = val;
}
} else {
if (len == 4) {
val = *(uint32_t *)p;
return put_user(val, (uint32_t __user *)attr->addr);
} else if (len == 8) {
val = *(uint64_t *)p;
return put_user(val, (uint64_t __user *)attr->addr);
}
}
return 0;
}
static int kvm_ipi_get_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
case KVM_DEV_LOONGARCH_IPI_GRP_REGS:
return kvm_ipi_regs_access(dev, attr, false);
default:
kvm_err("%s: unknown group (%d)\n", __func__, attr->group);
return -EINVAL;
}
}
static int kvm_ipi_set_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
case KVM_DEV_LOONGARCH_IPI_GRP_REGS:
return kvm_ipi_regs_access(dev, attr, true);
default:
kvm_err("%s: unknown group (%d)\n", __func__, attr->group);
return -EINVAL;
}
}
static int kvm_ipi_create(struct kvm_device *dev, u32 type)
{
int ret;
struct kvm *kvm;
struct kvm_io_device *device;
struct loongarch_ipi *s;
if (!dev) {
kvm_err("%s: kvm_device ptr is invalid!\n", __func__);
return -EINVAL;
}
kvm = dev->kvm;
if (kvm->arch.ipi) {
kvm_err("%s: LoongArch IPI has already been created!\n", __func__);
return -EINVAL;
}
s = kzalloc(sizeof(struct loongarch_ipi), GFP_KERNEL);
if (!s)
return -ENOMEM;
spin_lock_init(&s->lock);
s->kvm = kvm;
/*
* Initialize IOCSR device
*/
device = &s->device;
kvm_iodevice_init(device, &kvm_ipi_ops);
mutex_lock(&kvm->slots_lock);
ret = kvm_io_bus_register_dev(kvm, KVM_IOCSR_BUS, IOCSR_IPI_BASE, IOCSR_IPI_SIZE, device);
mutex_unlock(&kvm->slots_lock);
if (ret < 0) {
kvm_err("%s: Initialize IOCSR dev failed, ret = %d\n", __func__, ret);
goto err;
}
kvm->arch.ipi = s;
return 0;
err:
kfree(s);
return -EFAULT;
}
static void kvm_ipi_destroy(struct kvm_device *dev)
{
struct kvm *kvm;
struct loongarch_ipi *ipi;
if (!dev || !dev->kvm || !dev->kvm->arch.ipi)
return;
kvm = dev->kvm;
ipi = kvm->arch.ipi;
kvm_io_bus_unregister_dev(kvm, KVM_IOCSR_BUS, &ipi->device);
kfree(ipi);
}
static struct kvm_device_ops kvm_ipi_dev_ops = {
.name = "kvm-loongarch-ipi",
.create = kvm_ipi_create,
.destroy = kvm_ipi_destroy,
.set_attr = kvm_ipi_set_attr,
.get_attr = kvm_ipi_get_attr,
};
int kvm_loongarch_register_ipi_device(void)
{
return kvm_register_device_ops(&kvm_ipi_dev_ops, KVM_DEV_TYPE_LOONGARCH_IPI);
}

View File

@ -0,0 +1,519 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2024 Loongson Technology Corporation Limited
*/
#include <asm/kvm_eiointc.h>
#include <asm/kvm_pch_pic.h>
#include <asm/kvm_vcpu.h>
#include <linux/count_zeros.h>
/* update the isr according to irq level and route irq to eiointc */
static void pch_pic_update_irq(struct loongarch_pch_pic *s, int irq, int level)
{
u64 mask = BIT(irq);
/*
* set isr and route irq to eiointc and
* the route table is in htmsi_vector[]
*/
if (level) {
if (mask & s->irr & ~s->mask) {
s->isr |= mask;
irq = s->htmsi_vector[irq];
eiointc_set_irq(s->kvm->arch.eiointc, irq, level);
}
} else {
if (mask & s->isr & ~s->irr) {
s->isr &= ~mask;
irq = s->htmsi_vector[irq];
eiointc_set_irq(s->kvm->arch.eiointc, irq, level);
}
}
}
/* update batch irqs, the irq_mask is a bitmap of irqs */
static void pch_pic_update_batch_irqs(struct loongarch_pch_pic *s, u64 irq_mask, int level)
{
int irq, bits;
/* find each irq by irqs bitmap and update each irq */
bits = sizeof(irq_mask) * 8;
irq = find_first_bit((void *)&irq_mask, bits);
while (irq < bits) {
pch_pic_update_irq(s, irq, level);
bitmap_clear((void *)&irq_mask, irq, 1);
irq = find_first_bit((void *)&irq_mask, bits);
}
}
/* called when a irq is triggered in pch pic */
void pch_pic_set_irq(struct loongarch_pch_pic *s, int irq, int level)
{
u64 mask = BIT(irq);
spin_lock(&s->lock);
if (level)
s->irr |= mask; /* set irr */
else {
/*
* In edge triggered mode, 0 does not mean to clear irq
* The irr register variable is cleared when cpu writes to the
* PCH_PIC_CLEAR_START address area
*/
if (s->edge & mask) {
spin_unlock(&s->lock);
return;
}
s->irr &= ~mask;
}
pch_pic_update_irq(s, irq, level);
spin_unlock(&s->lock);
}
/* msi irq handler */
void pch_msi_set_irq(struct kvm *kvm, int irq, int level)
{
eiointc_set_irq(kvm->arch.eiointc, irq, level);
}
/*
* pch pic register is 64-bit, but it is accessed by 32-bit,
* so we use high to get whether low or high 32 bits we want
* to read.
*/
static u32 pch_pic_read_reg(u64 *s, int high)
{
u64 val = *s;
/* read the high 32 bits when high is 1 */
return high ? (u32)(val >> 32) : (u32)val;
}
/*
* pch pic register is 64-bit, but it is accessed by 32-bit,
* so we use high to get whether low or high 32 bits we want
* to write.
*/
static u32 pch_pic_write_reg(u64 *s, int high, u32 v)
{
u64 val = *s, data = v;
if (high) {
/*
* Clear val high 32 bits
* Write the high 32 bits when the high is 1
*/
*s = (val << 32 >> 32) | (data << 32);
val >>= 32;
} else
/*
* Clear val low 32 bits
* Write the low 32 bits when the high is 0
*/
*s = (val >> 32 << 32) | v;
return (u32)val;
}
static int loongarch_pch_pic_read(struct loongarch_pch_pic *s, gpa_t addr, int len, void *val)
{
int offset, index, ret = 0;
u32 data = 0;
u64 int_id = 0;
offset = addr - s->pch_pic_base;
spin_lock(&s->lock);
switch (offset) {
case PCH_PIC_INT_ID_START ... PCH_PIC_INT_ID_END:
/* int id version */
int_id |= (u64)PCH_PIC_INT_ID_VER << 32;
/* irq number */
int_id |= (u64)31 << (32 + 16);
/* int id value */
int_id |= PCH_PIC_INT_ID_VAL;
*(u64 *)val = int_id;
break;
case PCH_PIC_MASK_START ... PCH_PIC_MASK_END:
offset -= PCH_PIC_MASK_START;
index = offset >> 2;
/* read mask reg */
data = pch_pic_read_reg(&s->mask, index);
*(u32 *)val = data;
break;
case PCH_PIC_HTMSI_EN_START ... PCH_PIC_HTMSI_EN_END:
offset -= PCH_PIC_HTMSI_EN_START;
index = offset >> 2;
/* read htmsi enable reg */
data = pch_pic_read_reg(&s->htmsi_en, index);
*(u32 *)val = data;
break;
case PCH_PIC_EDGE_START ... PCH_PIC_EDGE_END:
offset -= PCH_PIC_EDGE_START;
index = offset >> 2;
/* read edge enable reg */
data = pch_pic_read_reg(&s->edge, index);
*(u32 *)val = data;
break;
case PCH_PIC_AUTO_CTRL0_START ... PCH_PIC_AUTO_CTRL0_END:
case PCH_PIC_AUTO_CTRL1_START ... PCH_PIC_AUTO_CTRL1_END:
/* we only use default mode: fixed interrupt distribution mode */
*(u32 *)val = 0;
break;
case PCH_PIC_ROUTE_ENTRY_START ... PCH_PIC_ROUTE_ENTRY_END:
/* only route to int0: eiointc */
*(u8 *)val = 1;
break;
case PCH_PIC_HTMSI_VEC_START ... PCH_PIC_HTMSI_VEC_END:
offset -= PCH_PIC_HTMSI_VEC_START;
/* read htmsi vector */
data = s->htmsi_vector[offset];
*(u8 *)val = data;
break;
case PCH_PIC_POLARITY_START ... PCH_PIC_POLARITY_END:
/* we only use defalut value 0: high level triggered */
*(u32 *)val = 0;
break;
default:
ret = -EINVAL;
}
spin_unlock(&s->lock);
return ret;
}
static int kvm_pch_pic_read(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, void *val)
{
int ret;
struct loongarch_pch_pic *s = vcpu->kvm->arch.pch_pic;
if (!s) {
kvm_err("%s: pch pic irqchip not valid!\n", __func__);
return -EINVAL;
}
/* statistics of pch pic reading */
vcpu->kvm->stat.pch_pic_read_exits++;
ret = loongarch_pch_pic_read(s, addr, len, val);
return ret;
}
static int loongarch_pch_pic_write(struct loongarch_pch_pic *s, gpa_t addr,
int len, const void *val)
{
int ret;
u32 old, data, offset, index;
u64 irq;
ret = 0;
data = *(u32 *)val;
offset = addr - s->pch_pic_base;
spin_lock(&s->lock);
switch (offset) {
case PCH_PIC_MASK_START ... PCH_PIC_MASK_END:
offset -= PCH_PIC_MASK_START;
/* get whether high or low 32 bits we want to write */
index = offset >> 2;
old = pch_pic_write_reg(&s->mask, index, data);
/* enable irq when mask value change to 0 */
irq = (old & ~data) << (32 * index);
pch_pic_update_batch_irqs(s, irq, 1);
/* disable irq when mask value change to 1 */
irq = (~old & data) << (32 * index);
pch_pic_update_batch_irqs(s, irq, 0);
break;
case PCH_PIC_HTMSI_EN_START ... PCH_PIC_HTMSI_EN_END:
offset -= PCH_PIC_HTMSI_EN_START;
index = offset >> 2;
pch_pic_write_reg(&s->htmsi_en, index, data);
break;
case PCH_PIC_EDGE_START ... PCH_PIC_EDGE_END:
offset -= PCH_PIC_EDGE_START;
index = offset >> 2;
/* 1: edge triggered, 0: level triggered */
pch_pic_write_reg(&s->edge, index, data);
break;
case PCH_PIC_CLEAR_START ... PCH_PIC_CLEAR_END:
offset -= PCH_PIC_CLEAR_START;
index = offset >> 2;
/* write 1 to clear edge irq */
old = pch_pic_read_reg(&s->irr, index);
/*
* get the irq bitmap which is edge triggered and
* already set and to be cleared
*/
irq = old & pch_pic_read_reg(&s->edge, index) & data;
/* write irr to the new state where irqs have been cleared */
pch_pic_write_reg(&s->irr, index, old & ~irq);
/* update cleared irqs */
pch_pic_update_batch_irqs(s, irq, 0);
break;
case PCH_PIC_AUTO_CTRL0_START ... PCH_PIC_AUTO_CTRL0_END:
offset -= PCH_PIC_AUTO_CTRL0_START;
index = offset >> 2;
/* we only use default mode: fixed interrupt distribution mode */
pch_pic_write_reg(&s->auto_ctrl0, index, 0);
break;
case PCH_PIC_AUTO_CTRL1_START ... PCH_PIC_AUTO_CTRL1_END:
offset -= PCH_PIC_AUTO_CTRL1_START;
index = offset >> 2;
/* we only use default mode: fixed interrupt distribution mode */
pch_pic_write_reg(&s->auto_ctrl1, index, 0);
break;
case PCH_PIC_ROUTE_ENTRY_START ... PCH_PIC_ROUTE_ENTRY_END:
offset -= PCH_PIC_ROUTE_ENTRY_START;
/* only route to int0: eiointc */
s->route_entry[offset] = 1;
break;
case PCH_PIC_HTMSI_VEC_START ... PCH_PIC_HTMSI_VEC_END:
/* route table to eiointc */
offset -= PCH_PIC_HTMSI_VEC_START;
s->htmsi_vector[offset] = (u8)data;
break;
case PCH_PIC_POLARITY_START ... PCH_PIC_POLARITY_END:
offset -= PCH_PIC_POLARITY_START;
index = offset >> 2;
/* we only use defalut value 0: high level triggered */
pch_pic_write_reg(&s->polarity, index, 0);
break;
default:
ret = -EINVAL;
break;
}
spin_unlock(&s->lock);
return ret;
}
static int kvm_pch_pic_write(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, const void *val)
{
int ret;
struct loongarch_pch_pic *s = vcpu->kvm->arch.pch_pic;
if (!s) {
kvm_err("%s: pch pic irqchip not valid!\n", __func__);
return -EINVAL;
}
/* statistics of pch pic writing */
vcpu->kvm->stat.pch_pic_write_exits++;
ret = loongarch_pch_pic_write(s, addr, len, val);
return ret;
}
static const struct kvm_io_device_ops kvm_pch_pic_ops = {
.read = kvm_pch_pic_read,
.write = kvm_pch_pic_write,
};
static int kvm_pch_pic_init(struct kvm_device *dev, u64 addr)
{
int ret;
struct kvm *kvm = dev->kvm;
struct kvm_io_device *device;
struct loongarch_pch_pic *s = dev->kvm->arch.pch_pic;
s->pch_pic_base = addr;
device = &s->device;
/* init device by pch pic writing and reading ops */
kvm_iodevice_init(device, &kvm_pch_pic_ops);
mutex_lock(&kvm->slots_lock);
/* register pch pic device */
ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, addr, PCH_PIC_SIZE, device);
mutex_unlock(&kvm->slots_lock);
return (ret < 0) ? -EFAULT : 0;
}
/* used by user space to get or set pch pic registers */
static int kvm_pch_pic_regs_access(struct kvm_device *dev,
struct kvm_device_attr *attr,
bool is_write)
{
int addr, offset, len = 8, ret = 0;
void __user *data;
void *p = NULL;
struct loongarch_pch_pic *s;
s = dev->kvm->arch.pch_pic;
addr = attr->attr;
data = (void __user *)attr->addr;
/* get pointer to pch pic register by addr */
switch (addr) {
case PCH_PIC_MASK_START:
p = &s->mask;
break;
case PCH_PIC_HTMSI_EN_START:
p = &s->htmsi_en;
break;
case PCH_PIC_EDGE_START:
p = &s->edge;
break;
case PCH_PIC_AUTO_CTRL0_START:
p = &s->auto_ctrl0;
break;
case PCH_PIC_AUTO_CTRL1_START:
p = &s->auto_ctrl1;
break;
case PCH_PIC_ROUTE_ENTRY_START ... PCH_PIC_ROUTE_ENTRY_END:
offset = addr - PCH_PIC_ROUTE_ENTRY_START;
p = &s->route_entry[offset];
len = 1;
break;
case PCH_PIC_HTMSI_VEC_START ... PCH_PIC_HTMSI_VEC_END:
offset = addr - PCH_PIC_HTMSI_VEC_START;
p = &s->htmsi_vector[offset];
len = 1;
break;
case PCH_PIC_INT_IRR_START:
p = &s->irr;
break;
case PCH_PIC_INT_ISR_START:
p = &s->isr;
break;
case PCH_PIC_POLARITY_START:
p = &s->polarity;
break;
default:
return -EINVAL;
}
spin_lock(&s->lock);
/* write or read value according to is_write */
if (is_write) {
if (copy_from_user(p, data, len))
ret = -EFAULT;
} else {
if (copy_to_user(data, p, len))
ret = -EFAULT;
}
spin_unlock(&s->lock);
return ret;
}
static int kvm_pch_pic_get_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
case KVM_DEV_LOONGARCH_PCH_PIC_GRP_REGS:
return kvm_pch_pic_regs_access(dev, attr, false);
default:
return -EINVAL;
}
}
static int kvm_pch_pic_set_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
u64 addr;
void __user *uaddr = (void __user *)(long)attr->addr;
switch (attr->group) {
case KVM_DEV_LOONGARCH_PCH_PIC_GRP_CTRL:
switch (attr->attr) {
case KVM_DEV_LOONGARCH_PCH_PIC_CTRL_INIT:
if (copy_from_user(&addr, uaddr, sizeof(addr)))
return -EFAULT;
if (!dev->kvm->arch.pch_pic) {
kvm_err("%s: please create pch_pic irqchip first!\n", __func__);
return -ENODEV;
}
return kvm_pch_pic_init(dev, addr);
default:
kvm_err("%s: unknown group (%d) attr (%lld)\n", __func__, attr->group,
attr->attr);
return -EINVAL;
}
case KVM_DEV_LOONGARCH_PCH_PIC_GRP_REGS:
return kvm_pch_pic_regs_access(dev, attr, true);
default:
return -EINVAL;
}
}
static int kvm_setup_default_irq_routing(struct kvm *kvm)
{
int i, ret;
u32 nr = KVM_IRQCHIP_NUM_PINS;
struct kvm_irq_routing_entry *entries;
entries = kcalloc(nr, sizeof(*entries), GFP_KERNEL);
if (!entries)
return -ENOMEM;
for (i = 0; i < nr; i++) {
entries[i].gsi = i;
entries[i].type = KVM_IRQ_ROUTING_IRQCHIP;
entries[i].u.irqchip.irqchip = 0;
entries[i].u.irqchip.pin = i;
}
ret = kvm_set_irq_routing(kvm, entries, nr, 0);
kfree(entries);
return ret;
}
static int kvm_pch_pic_create(struct kvm_device *dev, u32 type)
{
int ret;
struct kvm *kvm = dev->kvm;
struct loongarch_pch_pic *s;
/* pch pic should not has been created */
if (kvm->arch.pch_pic)
return -EINVAL;
ret = kvm_setup_default_irq_routing(kvm);
if (ret)
return -ENOMEM;
s = kzalloc(sizeof(struct loongarch_pch_pic), GFP_KERNEL);
if (!s)
return -ENOMEM;
spin_lock_init(&s->lock);
s->kvm = kvm;
kvm->arch.pch_pic = s;
return 0;
}
static void kvm_pch_pic_destroy(struct kvm_device *dev)
{
struct kvm *kvm;
struct loongarch_pch_pic *s;
if (!dev || !dev->kvm || !dev->kvm->arch.pch_pic)
return;
kvm = dev->kvm;
s = kvm->arch.pch_pic;
/* unregister pch pic device and free it's memory */
kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &s->device);
kfree(s);
}
static struct kvm_device_ops kvm_pch_pic_dev_ops = {
.name = "kvm-loongarch-pch-pic",
.create = kvm_pch_pic_create,
.destroy = kvm_pch_pic_destroy,
.set_attr = kvm_pch_pic_set_attr,
.get_attr = kvm_pch_pic_get_attr,
};
int kvm_loongarch_register_pch_pic_device(void)
{
return kvm_register_device_ops(&kvm_pch_pic_dev_ops, KVM_DEV_TYPE_LOONGARCH_PCHPIC);
}

View File

@ -0,0 +1,89 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2024 Loongson Technology Corporation Limited
*/
#include <linux/kvm_host.h>
#include <trace/events/kvm.h>
#include <asm/kvm_pch_pic.h>
static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id, int level, bool line_status)
{
/* PCH-PIC pin (0 ~ 64) <---> GSI (0 ~ 64) */
pch_pic_set_irq(kvm->arch.pch_pic, e->irqchip.pin, level);
return 0;
}
/*
* kvm_set_msi: inject the MSI corresponding to the
* MSI routing entry
*
* This is the entry point for irqfd MSI injection
* and userspace MSI injection.
*/
int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id, int level, bool line_status)
{
if (!level)
return -1;
pch_msi_set_irq(kvm, e->msi.data, level);
return 0;
}
/*
* kvm_set_routing_entry: populate a kvm routing entry
* from a user routing entry
*
* @kvm: the VM this entry is applied to
* @e: kvm kernel routing entry handle
* @ue: user api routing entry handle
* return 0 on success, -EINVAL on errors.
*/
int kvm_set_routing_entry(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *e,
const struct kvm_irq_routing_entry *ue)
{
switch (ue->type) {
case KVM_IRQ_ROUTING_IRQCHIP:
e->set = kvm_set_pic_irq;
e->irqchip.irqchip = ue->u.irqchip.irqchip;
e->irqchip.pin = ue->u.irqchip.pin;
if (e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS)
return -EINVAL;
return 0;
case KVM_IRQ_ROUTING_MSI:
e->set = kvm_set_msi;
e->msi.address_lo = ue->u.msi.address_lo;
e->msi.address_hi = ue->u.msi.address_hi;
e->msi.data = ue->u.msi.data;
return 0;
default:
return -EINVAL;
}
}
int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id, int level, bool line_status)
{
switch (e->type) {
case KVM_IRQ_ROUTING_IRQCHIP:
pch_pic_set_irq(kvm->arch.pch_pic, e->irqchip.pin, level);
return 0;
case KVM_IRQ_ROUTING_MSI:
pch_msi_set_irq(kvm, e->msi.data, level);
return 0;
default:
return -EWOULDBLOCK;
}
}
bool kvm_arch_intc_initialized(struct kvm *kvm)
{
return kvm_arch_irqchip_in_kernel(kvm);
}

View File

@ -9,6 +9,8 @@
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/kvm_csr.h>
#include <asm/kvm_eiointc.h>
#include <asm/kvm_pch_pic.h>
#include "trace.h"
unsigned long vpid_mask;
@ -313,7 +315,7 @@ void kvm_arch_disable_virtualization_cpu(void)
static int kvm_loongarch_env_init(void)
{
int cpu, order;
int cpu, order, ret;
void *addr;
struct kvm_context *context;
@ -368,7 +370,20 @@ static int kvm_loongarch_env_init(void)
kvm_init_gcsr_flag();
return 0;
/* Register LoongArch IPI interrupt controller interface. */
ret = kvm_loongarch_register_ipi_device();
if (ret)
return ret;
/* Register LoongArch EIOINTC interrupt controller interface. */
ret = kvm_loongarch_register_eiointc_device();
if (ret)
return ret;
/* Register LoongArch PCH-PIC interrupt controller interface. */
ret = kvm_loongarch_register_pch_pic_device();
return ret;
}
static void kvm_loongarch_env_exit(void)

View File

@ -552,12 +552,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
{
int ret = 0;
kvm_pfn_t pfn = 0;
kvm_pte_t *ptep, changed, new;
gfn_t gfn = gpa >> PAGE_SHIFT;
struct kvm *kvm = vcpu->kvm;
struct kvm_memory_slot *slot;
struct page *page;
spin_lock(&kvm->mmu_lock);
@ -570,8 +568,6 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
/* Track access to pages marked old */
new = kvm_pte_mkyoung(*ptep);
/* call kvm_set_pfn_accessed() after unlock */
if (write && !kvm_pte_dirty(new)) {
if (!kvm_pte_write(new)) {
ret = -EFAULT;
@ -595,26 +591,14 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
}
changed = new ^ (*ptep);
if (changed) {
if (changed)
kvm_set_pte(ptep, new);
pfn = kvm_pte_pfn(new);
page = kvm_pfn_to_refcounted_page(pfn);
if (page)
get_page(page);
}
spin_unlock(&kvm->mmu_lock);
if (changed) {
if (kvm_pte_young(changed))
kvm_set_pfn_accessed(pfn);
if (kvm_pte_dirty(changed))
mark_page_dirty(kvm, gfn);
if (kvm_pte_dirty(changed)) {
mark_page_dirty(kvm, gfn);
kvm_set_pfn_dirty(pfn);
}
if (page)
put_page(page);
}
return ret;
out:
spin_unlock(&kvm->mmu_lock);
@ -796,6 +780,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
struct kvm *kvm = vcpu->kvm;
struct kvm_memory_slot *memslot;
struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
struct page *page;
/* Try the fast path to handle old / clean pages */
srcu_idx = srcu_read_lock(&kvm->srcu);
@ -823,7 +808,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
mmu_seq = kvm->mmu_invalidate_seq;
/*
* Ensure the read of mmu_invalidate_seq isn't reordered with PTE reads in
* gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
* kvm_faultin_pfn() (which calls get_user_pages()), so that we don't
* risk the page we get a reference to getting unmapped before we have a
* chance to grab the mmu_lock without mmu_invalidate_retry() noticing.
*
@ -835,7 +820,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
smp_rmb();
/* Slow path - ask KVM core whether we can access this GPA */
pfn = gfn_to_pfn_prot(kvm, gfn, write, &writeable);
pfn = kvm_faultin_pfn(vcpu, gfn, write, &writeable, &page);
if (is_error_noslot_pfn(pfn)) {
err = -EFAULT;
goto out;
@ -847,10 +832,10 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
/*
* This can happen when mappings are changed asynchronously, but
* also synchronously if a COW is triggered by
* gfn_to_pfn_prot().
* kvm_faultin_pfn().
*/
spin_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
kvm_release_page_unused(page);
if (retry_no > 100) {
retry_no = 0;
schedule();
@ -915,14 +900,13 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
else
++kvm->stat.pages;
kvm_set_pte(ptep, new_pte);
kvm_release_faultin_page(kvm, page, false, writeable);
spin_unlock(&kvm->mmu_lock);
if (prot_bits & _PAGE_DIRTY) {
if (prot_bits & _PAGE_DIRTY)
mark_page_dirty_in_slot(kvm, memslot, gfn);
kvm_set_pfn_dirty(pfn);
}
kvm_release_pfn_clean(pfn);
out:
srcu_read_unlock(&kvm->srcu, srcu_idx);
return err;

View File

@ -1475,6 +1475,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
/* Init */
vcpu->arch.last_sched_cpu = -1;
/* Init ipi_state lock */
spin_lock_init(&vcpu->arch.ipi_state.lock);
/*
* Initialize guest register state to valid architectural reset state.
*/

View File

@ -6,6 +6,8 @@
#include <linux/kvm_host.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_vcpu.h>
#include <asm/kvm_eiointc.h>
#include <asm/kvm_pch_pic.h>
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
@ -76,6 +78,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
int r;
switch (ext) {
case KVM_CAP_IRQCHIP:
case KVM_CAP_ONE_REG:
case KVM_CAP_ENABLE_CAP:
case KVM_CAP_READONLY_MEM:
@ -161,6 +164,8 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
struct kvm_device_attr attr;
switch (ioctl) {
case KVM_CREATE_IRQCHIP:
return 0;
case KVM_HAS_DEVICE_ATTR:
if (copy_from_user(&attr, argp, sizeof(attr)))
return -EFAULT;
@ -170,3 +175,19 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
return -ENOIOCTLCMD;
}
}
int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event, bool line_status)
{
if (!kvm_arch_irqchip_in_kernel(kvm))
return -ENXIO;
irq_event->status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
irq_event->irq, irq_event->level, line_status);
return 0;
}
bool kvm_arch_irqchip_in_kernel(struct kvm *kvm)
{
return (kvm->arch.ipi && kvm->arch.eiointc && kvm->arch.pch_pic);
}

View File

@ -484,8 +484,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
struct kvm *kvm = vcpu->kvm;
gfn_t gfn = gpa >> PAGE_SHIFT;
pte_t *ptep;
kvm_pfn_t pfn = 0; /* silence bogus GCC warning */
bool pfn_valid = false;
int ret = 0;
spin_lock(&kvm->mmu_lock);
@ -498,12 +496,9 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
}
/* Track access to pages marked old */
if (!pte_young(*ptep)) {
if (!pte_young(*ptep))
set_pte(ptep, pte_mkyoung(*ptep));
pfn = pte_pfn(*ptep);
pfn_valid = true;
/* call kvm_set_pfn_accessed() after unlock */
}
if (write_fault && !pte_dirty(*ptep)) {
if (!pte_write(*ptep)) {
ret = -EFAULT;
@ -512,9 +507,7 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
/* Track dirtying of writeable pages */
set_pte(ptep, pte_mkdirty(*ptep));
pfn = pte_pfn(*ptep);
mark_page_dirty(kvm, gfn);
kvm_set_pfn_dirty(pfn);
}
if (out_entry)
@ -524,8 +517,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
out:
spin_unlock(&kvm->mmu_lock);
if (pfn_valid)
kvm_set_pfn_accessed(pfn);
return ret;
}
@ -566,6 +557,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
bool writeable;
unsigned long prot_bits;
unsigned long mmu_seq;
struct page *page;
/* Try the fast path to handle old / clean pages */
srcu_idx = srcu_read_lock(&kvm->srcu);
@ -587,7 +579,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
mmu_seq = kvm->mmu_invalidate_seq;
/*
* Ensure the read of mmu_invalidate_seq isn't reordered with PTE reads
* in gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
* in kvm_faultin_pfn() (which calls get_user_pages()), so that we don't
* risk the page we get a reference to getting unmapped before we have a
* chance to grab the mmu_lock without mmu_invalidate_retry() noticing.
*
@ -599,7 +591,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
smp_rmb();
/* Slow path - ask KVM core whether we can access this GPA */
pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
pfn = kvm_faultin_pfn(vcpu, gfn, write_fault, &writeable, &page);
if (is_error_noslot_pfn(pfn)) {
err = -EFAULT;
goto out;
@ -611,10 +603,10 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
/*
* This can happen when mappings are changed asynchronously, but
* also synchronously if a COW is triggered by
* gfn_to_pfn_prot().
* kvm_faultin_pfn().
*/
spin_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
kvm_release_page_unused(page);
goto retry;
}
@ -628,7 +620,6 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
if (write_fault) {
prot_bits |= __WRITEABLE;
mark_page_dirty(kvm, gfn);
kvm_set_pfn_dirty(pfn);
}
}
entry = pfn_pte(pfn, __pgprot(prot_bits));
@ -642,9 +633,8 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
if (out_buddy)
*out_buddy = *ptep_buddy(ptep);
kvm_release_faultin_page(kvm, page, false, writeable);
spin_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
kvm_set_pfn_accessed(pfn);
out:
srcu_read_unlock(&kvm->srcu, srcu_idx);
return err;

View File

@ -203,7 +203,7 @@ extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bool nested,
extern int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
unsigned long gpa,
struct kvm_memory_slot *memslot,
bool writing, bool kvm_ro,
bool writing,
pte_t *inserted_pte, unsigned int *levelp);
extern int kvmppc_init_vm_radix(struct kvm *kvm);
extern void kvmppc_free_radix(struct kvm *kvm);
@ -235,7 +235,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat,
extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
extern int kvmppc_emulate_paired_single(struct kvm_vcpu *vcpu);
extern kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa,
bool writing, bool *writable);
bool writing, bool *writable, struct page **page);
extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
unsigned long *rmap, long pte_index, int realmode);
extern void kvmppc_update_dirty_map(const struct kvm_memory_slot *memslot,

View File

@ -422,7 +422,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
EXPORT_SYMBOL_GPL(kvmppc_core_prepare_to_enter);
kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
bool *writable)
bool *writable, struct page **page)
{
ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM;
gfn_t gfn = gpa >> PAGE_SHIFT;
@ -437,13 +437,14 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
kvm_pfn_t pfn;
pfn = (kvm_pfn_t)virt_to_phys((void*)shared_page) >> PAGE_SHIFT;
get_page(pfn_to_page(pfn));
*page = pfn_to_page(pfn);
get_page(*page);
if (writable)
*writable = true;
return pfn;
}
return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
return kvm_faultin_pfn(vcpu, gfn, writing, writable, page);
}
EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);

View File

@ -130,6 +130,7 @@ extern char etext[];
int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
bool iswrite)
{
struct page *page;
kvm_pfn_t hpaddr;
u64 vpn;
u64 vsid;
@ -145,7 +146,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
bool writable;
/* Get host physical address for gpa */
hpaddr = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable);
hpaddr = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable, &page);
if (is_error_noslot_pfn(hpaddr)) {
printk(KERN_INFO "Couldn't get guest page for gpa %lx!\n",
orig_pte->raddr);
@ -232,7 +233,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
pte = kvmppc_mmu_hpte_cache_next(vcpu);
if (!pte) {
kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
kvm_release_page_unused(page);
r = -EAGAIN;
goto out;
}
@ -250,7 +251,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
kvmppc_mmu_hpte_cache_map(vcpu, pte);
kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
kvm_release_page_clean(page);
out:
return r;
}

View File

@ -88,13 +88,14 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
struct hpte_cache *cpte;
unsigned long gfn = orig_pte->raddr >> PAGE_SHIFT;
unsigned long pfn;
struct page *page;
/* used to check for invalidations in progress */
mmu_seq = kvm->mmu_invalidate_seq;
smp_rmb();
/* Get host physical address for gpa */
pfn = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable);
pfn = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable, &page);
if (is_error_noslot_pfn(pfn)) {
printk(KERN_INFO "Couldn't get guest page for gpa %lx!\n",
orig_pte->raddr);
@ -121,13 +122,10 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
vpn = hpt_vpn(orig_pte->eaddr, map->host_vsid, MMU_SEGSIZE_256M);
kvm_set_pfn_accessed(pfn);
if (!orig_pte->may_write || !writable)
rflags |= PP_RXRX;
else {
else
mark_page_dirty(vcpu->kvm, gfn);
kvm_set_pfn_dirty(pfn);
}
if (!orig_pte->may_execute)
rflags |= HPTE_R_N;
@ -202,8 +200,10 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
}
out_unlock:
/* FIXME: Don't unconditionally pass unused=false. */
kvm_release_faultin_page(kvm, page, false,
orig_pte->may_write && writable);
spin_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
if (cpte)
kvmppc_mmu_hpte_cache_free(cpte);

View File

@ -603,27 +603,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
write_ok = writing;
hva = gfn_to_hva_memslot(memslot, gfn);
/*
* Do a fast check first, since __gfn_to_pfn_memslot doesn't
* do it with !atomic && !async, which is how we call it.
* We always ask for write permission since the common case
* is that the page is writable.
*/
if (get_user_page_fast_only(hva, FOLL_WRITE, &page)) {
write_ok = true;
} else {
/* Call KVM generic code to do the slow-path check */
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
writing, &write_ok, NULL);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
page = NULL;
if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);
if (PageReserved(page))
page = NULL;
}
}
pfn = __kvm_faultin_pfn(memslot, gfn, writing ? FOLL_WRITE : 0,
&write_ok, &page);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
/*
* Read the PTE from the process' radix tree and use that

View File

@ -821,7 +821,7 @@ bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bool nested, bool writing,
int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
unsigned long gpa,
struct kvm_memory_slot *memslot,
bool writing, bool kvm_ro,
bool writing,
pte_t *inserted_pte, unsigned int *levelp)
{
struct kvm *kvm = vcpu->kvm;
@ -829,40 +829,21 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
unsigned long mmu_seq;
unsigned long hva, gfn = gpa >> PAGE_SHIFT;
bool upgrade_write = false;
bool *upgrade_p = &upgrade_write;
pte_t pte, *ptep;
unsigned int shift, level;
int ret;
bool large_enable;
kvm_pfn_t pfn;
/* used to check for invalidations in progress */
mmu_seq = kvm->mmu_invalidate_seq;
smp_rmb();
/*
* Do a fast check first, since __gfn_to_pfn_memslot doesn't
* do it with !atomic && !async, which is how we call it.
* We always ask for write permission since the common case
* is that the page is writable.
*/
hva = gfn_to_hva_memslot(memslot, gfn);
if (!kvm_ro && get_user_page_fast_only(hva, FOLL_WRITE, &page)) {
upgrade_write = true;
} else {
unsigned long pfn;
/* Call KVM generic code to do the slow-path check */
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
writing, upgrade_p, NULL);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
page = NULL;
if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);
if (PageReserved(page))
page = NULL;
}
}
pfn = __kvm_faultin_pfn(memslot, gfn, writing ? FOLL_WRITE : 0,
&upgrade_write, &page);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
/*
* Read the PTE from the process' radix tree and use that
@ -950,7 +931,6 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
struct kvm_memory_slot *memslot;
long ret;
bool writing = !!(dsisr & DSISR_ISSTORE);
bool kvm_ro = false;
/* Check for unusual errors */
if (dsisr & DSISR_UNSUPP_MMU) {
@ -1003,7 +983,6 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
ea, DSISR_ISSTORE | DSISR_PROTFAULT);
return RESUME_GUEST;
}
kvm_ro = true;
}
/* Failed to set the reference/change bits */
@ -1021,7 +1000,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
/* Try to insert a pte */
ret = kvmppc_book3s_instantiate_page(vcpu, gpa, memslot, writing,
kvm_ro, NULL, NULL);
NULL, NULL);
if (ret == 0 || ret == -EAGAIN)
ret = RESUME_GUEST;

View File

@ -1535,7 +1535,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
unsigned long n_gpa, gpa, gfn, perm = 0UL;
unsigned int shift, l1_shift, level;
bool writing = !!(dsisr & DSISR_ISSTORE);
bool kvm_ro = false;
long int ret;
if (!gp->l1_gr_to_hr) {
@ -1615,7 +1614,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
ea, DSISR_ISSTORE | DSISR_PROTFAULT);
return RESUME_GUEST;
}
kvm_ro = true;
}
/* 2. Find the host pte for this L1 guest real address */
@ -1637,7 +1635,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
if (!pte_present(pte) || (writing && !(pte_val(pte) & _PAGE_WRITE))) {
/* No suitable pte found -> try to insert a mapping */
ret = kvmppc_book3s_instantiate_page(vcpu, gpa, memslot,
writing, kvm_ro, &pte, &level);
writing, &pte, &level);
if (ret == -EAGAIN)
return RESUME_GUEST;
else if (ret)

View File

@ -879,9 +879,8 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
{
int ret = H_PARAMETER;
struct page *uvmem_page;
struct page *page, *uvmem_page;
struct kvmppc_uvmem_page_pvt *pvt;
unsigned long pfn;
unsigned long gfn = gpa >> page_shift;
int srcu_idx;
unsigned long uvmem_pfn;
@ -901,8 +900,8 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
retry:
mutex_unlock(&kvm->arch.uvmem_lock);
pfn = gfn_to_pfn(kvm, gfn);
if (is_error_noslot_pfn(pfn))
page = gfn_to_page(kvm, gfn);
if (!page)
goto out;
mutex_lock(&kvm->arch.uvmem_lock);
@ -911,16 +910,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
pvt = uvmem_page->zone_device_data;
pvt->skip_page_out = true;
pvt->remove_gfn = false; /* it continues to be a valid GFN */
kvm_release_pfn_clean(pfn);
kvm_release_page_unused(page);
goto retry;
}
if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
if (!uv_page_in(kvm->arch.lpid, page_to_pfn(page) << page_shift, gpa, 0,
page_shift)) {
kvmppc_gfn_shared(gfn, kvm);
ret = H_SUCCESS;
}
kvm_release_pfn_clean(pfn);
kvm_release_page_clean(page);
mutex_unlock(&kvm->arch.uvmem_lock);
out:
srcu_read_unlock(&kvm->srcu, srcu_idx);
@ -1083,21 +1082,21 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa,
int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
{
unsigned long pfn;
struct page *page;
int ret = U_SUCCESS;
pfn = gfn_to_pfn(kvm, gfn);
if (is_error_noslot_pfn(pfn))
page = gfn_to_page(kvm, gfn);
if (!page)
return -EFAULT;
mutex_lock(&kvm->arch.uvmem_lock);
if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
goto out;
ret = uv_page_in(kvm->arch.lpid, pfn << PAGE_SHIFT, gfn << PAGE_SHIFT,
0, PAGE_SHIFT);
ret = uv_page_in(kvm->arch.lpid, page_to_pfn(page) << PAGE_SHIFT,
gfn << PAGE_SHIFT, 0, PAGE_SHIFT);
out:
kvm_release_pfn_clean(pfn);
kvm_release_page_clean(page);
mutex_unlock(&kvm->arch.uvmem_lock);
return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
}

View File

@ -639,29 +639,27 @@ static void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
*/
static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
{
struct page *hpage;
struct kvm_host_map map;
u64 hpage_offset;
u32 *page;
int i;
int i, r;
hpage = gfn_to_page(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
if (is_error_page(hpage))
r = kvm_vcpu_map(vcpu, pte->raddr >> PAGE_SHIFT, &map);
if (r)
return;
hpage_offset = pte->raddr & ~PAGE_MASK;
hpage_offset &= ~0xFFFULL;
hpage_offset /= 4;
get_page(hpage);
page = kmap_atomic(hpage);
page = map.hva;
/* patch dcbz into reserved instruction, so we trap */
for (i=hpage_offset; i < hpage_offset + (HW_PAGE_SIZE / 4); i++)
if ((be32_to_cpu(page[i]) & 0xff0007ff) == INS_DCBZ)
page[i] &= cpu_to_be32(0xfffffff7);
kunmap_atomic(page);
put_page(hpage);
kvm_vcpu_unmap(vcpu, &map);
}
static bool kvmppc_visible_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)

View File

@ -654,7 +654,7 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
}
page = gfn_to_page(kvm, gfn);
if (is_error_page(page)) {
if (!page) {
srcu_read_unlock(&kvm->srcu, srcu_idx);
pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
return -EINVAL;

View File

@ -242,7 +242,7 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
return tlbe->mas7_3 & (MAS3_SW|MAS3_UW);
}
static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
static inline bool kvmppc_e500_ref_setup(struct tlbe_ref *ref,
struct kvm_book3e_206_tlb_entry *gtlbe,
kvm_pfn_t pfn, unsigned int wimg)
{
@ -252,11 +252,7 @@ static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
/* Use guest supplied MAS2_G and MAS2_E */
ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
/* Mark the page accessed */
kvm_set_pfn_accessed(pfn);
if (tlbe_is_writable(gtlbe))
kvm_set_pfn_dirty(pfn);
return tlbe_is_writable(gtlbe);
}
static inline void kvmppc_e500_ref_release(struct tlbe_ref *ref)
@ -326,6 +322,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
{
struct kvm_memory_slot *slot;
unsigned long pfn = 0; /* silence GCC warning */
struct page *page = NULL;
unsigned long hva;
int pfnmap = 0;
int tsize = BOOK3E_PAGESZ_4K;
@ -337,6 +334,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
unsigned int wimg = 0;
pgd_t *pgdir;
unsigned long flags;
bool writable = false;
/* used to check for invalidations in progress */
mmu_seq = kvm->mmu_invalidate_seq;
@ -446,7 +444,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
if (likely(!pfnmap)) {
tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
pfn = gfn_to_pfn_memslot(slot, gfn);
pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, NULL, &page);
if (is_error_noslot_pfn(pfn)) {
if (printk_ratelimit())
pr_err("%s: real page not found for gfn %lx\n",
@ -490,7 +488,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
goto out;
}
}
kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
@ -499,11 +497,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
kvmppc_mmu_flush_icache(pfn);
out:
kvm_release_faultin_page(kvm, page, !!ret, writable);
spin_unlock(&kvm->mmu_lock);
/* Drop refcount on page, so that mmu notifiers can clear it */
kvm_release_pfn_clean(pfn);
return ret;
}

View File

@ -612,9 +612,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 8 | 4 | 2 | 1;
}
break;
case KVM_CAP_PPC_RMA:
r = 0;
break;
case KVM_CAP_PPC_HWRNG:
r = kvmppc_hwrng_present();
break;

View File

@ -286,6 +286,16 @@ struct kvm_vcpu_arch {
} sta;
};
/*
* Returns true if a Performance Monitoring Interrupt (PMI), a.k.a. perf event,
* arrived in guest context. For riscv, any event that arrives while a vCPU is
* loaded is considered to be "in guest".
*/
static inline bool kvm_arch_pmi_in_guest(struct kvm_vcpu *vcpu)
{
return IS_ENABLED(CONFIG_GUEST_PERF_EVENTS) && !!vcpu;
}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
#define KVM_RISCV_GSTAGE_TLB_MIN_ORDER 12

View File

@ -0,0 +1,245 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (c) 2024 Ventana Micro Systems Inc.
*/
#ifndef __KVM_NACL_H
#define __KVM_NACL_H
#include <linux/jump_label.h>
#include <linux/percpu.h>
#include <asm/byteorder.h>
#include <asm/csr.h>
#include <asm/sbi.h>
struct kvm_vcpu_arch;
DECLARE_STATIC_KEY_FALSE(kvm_riscv_nacl_available);
#define kvm_riscv_nacl_available() \
static_branch_unlikely(&kvm_riscv_nacl_available)
DECLARE_STATIC_KEY_FALSE(kvm_riscv_nacl_sync_csr_available);
#define kvm_riscv_nacl_sync_csr_available() \
static_branch_unlikely(&kvm_riscv_nacl_sync_csr_available)
DECLARE_STATIC_KEY_FALSE(kvm_riscv_nacl_sync_hfence_available);
#define kvm_riscv_nacl_sync_hfence_available() \
static_branch_unlikely(&kvm_riscv_nacl_sync_hfence_available)
DECLARE_STATIC_KEY_FALSE(kvm_riscv_nacl_sync_sret_available);
#define kvm_riscv_nacl_sync_sret_available() \
static_branch_unlikely(&kvm_riscv_nacl_sync_sret_available)
DECLARE_STATIC_KEY_FALSE(kvm_riscv_nacl_autoswap_csr_available);
#define kvm_riscv_nacl_autoswap_csr_available() \
static_branch_unlikely(&kvm_riscv_nacl_autoswap_csr_available)
struct kvm_riscv_nacl {
void *shmem;
phys_addr_t shmem_phys;
};
DECLARE_PER_CPU(struct kvm_riscv_nacl, kvm_riscv_nacl);
void __kvm_riscv_nacl_hfence(void *shmem,
unsigned long control,
unsigned long page_num,
unsigned long page_count);
void __kvm_riscv_nacl_switch_to(struct kvm_vcpu_arch *vcpu_arch,
unsigned long sbi_ext_id,
unsigned long sbi_func_id);
int kvm_riscv_nacl_enable(void);
void kvm_riscv_nacl_disable(void);
void kvm_riscv_nacl_exit(void);
int kvm_riscv_nacl_init(void);
#ifdef CONFIG_32BIT
#define lelong_to_cpu(__x) le32_to_cpu(__x)
#define cpu_to_lelong(__x) cpu_to_le32(__x)
#else
#define lelong_to_cpu(__x) le64_to_cpu(__x)
#define cpu_to_lelong(__x) cpu_to_le64(__x)
#endif
#define nacl_shmem() \
this_cpu_ptr(&kvm_riscv_nacl)->shmem
#define nacl_scratch_read_long(__shmem, __offset) \
({ \
unsigned long *__p = (__shmem) + \
SBI_NACL_SHMEM_SCRATCH_OFFSET + \
(__offset); \
lelong_to_cpu(*__p); \
})
#define nacl_scratch_write_long(__shmem, __offset, __val) \
do { \
unsigned long *__p = (__shmem) + \
SBI_NACL_SHMEM_SCRATCH_OFFSET + \
(__offset); \
*__p = cpu_to_lelong(__val); \
} while (0)
#define nacl_scratch_write_longs(__shmem, __offset, __array, __count) \
do { \
unsigned int __i; \
unsigned long *__p = (__shmem) + \
SBI_NACL_SHMEM_SCRATCH_OFFSET + \
(__offset); \
for (__i = 0; __i < (__count); __i++) \
__p[__i] = cpu_to_lelong((__array)[__i]); \
} while (0)
#define nacl_sync_hfence(__e) \
sbi_ecall(SBI_EXT_NACL, SBI_EXT_NACL_SYNC_HFENCE, \
(__e), 0, 0, 0, 0, 0)
#define nacl_hfence_mkconfig(__type, __order, __vmid, __asid) \
({ \
unsigned long __c = SBI_NACL_SHMEM_HFENCE_CONFIG_PEND; \
__c |= ((__type) & SBI_NACL_SHMEM_HFENCE_CONFIG_TYPE_MASK) \
<< SBI_NACL_SHMEM_HFENCE_CONFIG_TYPE_SHIFT; \
__c |= (((__order) - SBI_NACL_SHMEM_HFENCE_ORDER_BASE) & \
SBI_NACL_SHMEM_HFENCE_CONFIG_ORDER_MASK) \
<< SBI_NACL_SHMEM_HFENCE_CONFIG_ORDER_SHIFT; \
__c |= ((__vmid) & SBI_NACL_SHMEM_HFENCE_CONFIG_VMID_MASK) \
<< SBI_NACL_SHMEM_HFENCE_CONFIG_VMID_SHIFT; \
__c |= ((__asid) & SBI_NACL_SHMEM_HFENCE_CONFIG_ASID_MASK); \
__c; \
})
#define nacl_hfence_mkpnum(__order, __addr) \
((__addr) >> (__order))
#define nacl_hfence_mkpcount(__order, __size) \
((__size) >> (__order))
#define nacl_hfence_gvma(__shmem, __gpa, __gpsz, __order) \
__kvm_riscv_nacl_hfence(__shmem, \
nacl_hfence_mkconfig(SBI_NACL_SHMEM_HFENCE_TYPE_GVMA, \
__order, 0, 0), \
nacl_hfence_mkpnum(__order, __gpa), \
nacl_hfence_mkpcount(__order, __gpsz))
#define nacl_hfence_gvma_all(__shmem) \
__kvm_riscv_nacl_hfence(__shmem, \
nacl_hfence_mkconfig(SBI_NACL_SHMEM_HFENCE_TYPE_GVMA_ALL, \
0, 0, 0), 0, 0)
#define nacl_hfence_gvma_vmid(__shmem, __vmid, __gpa, __gpsz, __order) \
__kvm_riscv_nacl_hfence(__shmem, \
nacl_hfence_mkconfig(SBI_NACL_SHMEM_HFENCE_TYPE_GVMA_VMID, \
__order, __vmid, 0), \
nacl_hfence_mkpnum(__order, __gpa), \
nacl_hfence_mkpcount(__order, __gpsz))
#define nacl_hfence_gvma_vmid_all(__shmem, __vmid) \
__kvm_riscv_nacl_hfence(__shmem, \
nacl_hfence_mkconfig(SBI_NACL_SHMEM_HFENCE_TYPE_GVMA_VMID_ALL, \
0, __vmid, 0), 0, 0)
#define nacl_hfence_vvma(__shmem, __vmid, __gva, __gvsz, __order) \
__kvm_riscv_nacl_hfence(__shmem, \
nacl_hfence_mkconfig(SBI_NACL_SHMEM_HFENCE_TYPE_VVMA, \
__order, __vmid, 0), \
nacl_hfence_mkpnum(__order, __gva), \
nacl_hfence_mkpcount(__order, __gvsz))
#define nacl_hfence_vvma_all(__shmem, __vmid) \
__kvm_riscv_nacl_hfence(__shmem, \
nacl_hfence_mkconfig(SBI_NACL_SHMEM_HFENCE_TYPE_VVMA_ALL, \
0, __vmid, 0), 0, 0)
#define nacl_hfence_vvma_asid(__shmem, __vmid, __asid, __gva, __gvsz, __order)\
__kvm_riscv_nacl_hfence(__shmem, \
nacl_hfence_mkconfig(SBI_NACL_SHMEM_HFENCE_TYPE_VVMA_ASID, \
__order, __vmid, __asid), \
nacl_hfence_mkpnum(__order, __gva), \
nacl_hfence_mkpcount(__order, __gvsz))
#define nacl_hfence_vvma_asid_all(__shmem, __vmid, __asid) \
__kvm_riscv_nacl_hfence(__shmem, \
nacl_hfence_mkconfig(SBI_NACL_SHMEM_HFENCE_TYPE_VVMA_ASID_ALL, \
0, __vmid, __asid), 0, 0)
#define nacl_csr_read(__shmem, __csr) \
({ \
unsigned long *__a = (__shmem) + SBI_NACL_SHMEM_CSR_OFFSET; \
lelong_to_cpu(__a[SBI_NACL_SHMEM_CSR_INDEX(__csr)]); \
})
#define nacl_csr_write(__shmem, __csr, __val) \
do { \
void *__s = (__shmem); \
unsigned int __i = SBI_NACL_SHMEM_CSR_INDEX(__csr); \
unsigned long *__a = (__s) + SBI_NACL_SHMEM_CSR_OFFSET; \
u8 *__b = (__s) + SBI_NACL_SHMEM_DBITMAP_OFFSET; \
__a[__i] = cpu_to_lelong(__val); \
__b[__i >> 3] |= 1U << (__i & 0x7); \
} while (0)
#define nacl_csr_swap(__shmem, __csr, __val) \
({ \
void *__s = (__shmem); \
unsigned int __i = SBI_NACL_SHMEM_CSR_INDEX(__csr); \
unsigned long *__a = (__s) + SBI_NACL_SHMEM_CSR_OFFSET; \
u8 *__b = (__s) + SBI_NACL_SHMEM_DBITMAP_OFFSET; \
unsigned long __r = lelong_to_cpu(__a[__i]); \
__a[__i] = cpu_to_lelong(__val); \
__b[__i >> 3] |= 1U << (__i & 0x7); \
__r; \
})
#define nacl_sync_csr(__csr) \
sbi_ecall(SBI_EXT_NACL, SBI_EXT_NACL_SYNC_CSR, \
(__csr), 0, 0, 0, 0, 0)
/*
* Each ncsr_xyz() macro defined below has it's own static-branch so every
* use of ncsr_xyz() macro emits a patchable direct jump. This means multiple
* back-to-back ncsr_xyz() macro usage will emit multiple patchable direct
* jumps which is sub-optimal.
*
* Based on the above, it is recommended to avoid multiple back-to-back
* ncsr_xyz() macro usage.
*/
#define ncsr_read(__csr) \
({ \
unsigned long __r; \
if (kvm_riscv_nacl_available()) \
__r = nacl_csr_read(nacl_shmem(), __csr); \
else \
__r = csr_read(__csr); \
__r; \
})
#define ncsr_write(__csr, __val) \
do { \
if (kvm_riscv_nacl_sync_csr_available()) \
nacl_csr_write(nacl_shmem(), __csr, __val); \
else \
csr_write(__csr, __val); \
} while (0)
#define ncsr_swap(__csr, __val) \
({ \
unsigned long __r; \
if (kvm_riscv_nacl_sync_csr_available()) \
__r = nacl_csr_swap(nacl_shmem(), __csr, __val); \
else \
__r = csr_swap(__csr, __val); \
__r; \
})
#define nsync_csr(__csr) \
do { \
if (kvm_riscv_nacl_sync_csr_available()) \
nacl_sync_csr(__csr); \
} while (0)
#endif

View File

@ -8,6 +8,7 @@
#ifndef _ASM_RISCV_PERF_EVENT_H
#define _ASM_RISCV_PERF_EVENT_H
#ifdef CONFIG_PERF_EVENTS
#include <linux/perf_event.h>
#define perf_arch_bpf_user_pt_regs(regs) (struct user_regs_struct *)regs
@ -17,4 +18,6 @@
(regs)->sp = current_stack_pointer; \
(regs)->status = SR_PP; \
}
#endif
#endif /* _ASM_RISCV_PERF_EVENT_H */

View File

@ -34,6 +34,7 @@ enum sbi_ext_id {
SBI_EXT_PMU = 0x504D55,
SBI_EXT_DBCN = 0x4442434E,
SBI_EXT_STA = 0x535441,
SBI_EXT_NACL = 0x4E41434C,
/* Experimentals extensions must lie within this range */
SBI_EXT_EXPERIMENTAL_START = 0x08000000,
@ -281,6 +282,125 @@ struct sbi_sta_struct {
#define SBI_SHMEM_DISABLE -1
enum sbi_ext_nacl_fid {
SBI_EXT_NACL_PROBE_FEATURE = 0x0,
SBI_EXT_NACL_SET_SHMEM = 0x1,
SBI_EXT_NACL_SYNC_CSR = 0x2,
SBI_EXT_NACL_SYNC_HFENCE = 0x3,
SBI_EXT_NACL_SYNC_SRET = 0x4,
};
enum sbi_ext_nacl_feature {
SBI_NACL_FEAT_SYNC_CSR = 0x0,
SBI_NACL_FEAT_SYNC_HFENCE = 0x1,
SBI_NACL_FEAT_SYNC_SRET = 0x2,
SBI_NACL_FEAT_AUTOSWAP_CSR = 0x3,
};
#define SBI_NACL_SHMEM_ADDR_SHIFT 12
#define SBI_NACL_SHMEM_SCRATCH_OFFSET 0x0000
#define SBI_NACL_SHMEM_SCRATCH_SIZE 0x1000
#define SBI_NACL_SHMEM_SRET_OFFSET 0x0000
#define SBI_NACL_SHMEM_SRET_SIZE 0x0200
#define SBI_NACL_SHMEM_AUTOSWAP_OFFSET (SBI_NACL_SHMEM_SRET_OFFSET + \
SBI_NACL_SHMEM_SRET_SIZE)
#define SBI_NACL_SHMEM_AUTOSWAP_SIZE 0x0080
#define SBI_NACL_SHMEM_UNUSED_OFFSET (SBI_NACL_SHMEM_AUTOSWAP_OFFSET + \
SBI_NACL_SHMEM_AUTOSWAP_SIZE)
#define SBI_NACL_SHMEM_UNUSED_SIZE 0x0580
#define SBI_NACL_SHMEM_HFENCE_OFFSET (SBI_NACL_SHMEM_UNUSED_OFFSET + \
SBI_NACL_SHMEM_UNUSED_SIZE)
#define SBI_NACL_SHMEM_HFENCE_SIZE 0x0780
#define SBI_NACL_SHMEM_DBITMAP_OFFSET (SBI_NACL_SHMEM_HFENCE_OFFSET + \
SBI_NACL_SHMEM_HFENCE_SIZE)
#define SBI_NACL_SHMEM_DBITMAP_SIZE 0x0080
#define SBI_NACL_SHMEM_CSR_OFFSET (SBI_NACL_SHMEM_DBITMAP_OFFSET + \
SBI_NACL_SHMEM_DBITMAP_SIZE)
#define SBI_NACL_SHMEM_CSR_SIZE ((__riscv_xlen / 8) * 1024)
#define SBI_NACL_SHMEM_SIZE (SBI_NACL_SHMEM_CSR_OFFSET + \
SBI_NACL_SHMEM_CSR_SIZE)
#define SBI_NACL_SHMEM_CSR_INDEX(__csr_num) \
((((__csr_num) & 0xc00) >> 2) | ((__csr_num) & 0xff))
#define SBI_NACL_SHMEM_HFENCE_ENTRY_SZ ((__riscv_xlen / 8) * 4)
#define SBI_NACL_SHMEM_HFENCE_ENTRY_MAX \
(SBI_NACL_SHMEM_HFENCE_SIZE / \
SBI_NACL_SHMEM_HFENCE_ENTRY_SZ)
#define SBI_NACL_SHMEM_HFENCE_ENTRY(__num) \
(SBI_NACL_SHMEM_HFENCE_OFFSET + \
(__num) * SBI_NACL_SHMEM_HFENCE_ENTRY_SZ)
#define SBI_NACL_SHMEM_HFENCE_ENTRY_CONFIG(__num) \
SBI_NACL_SHMEM_HFENCE_ENTRY(__num)
#define SBI_NACL_SHMEM_HFENCE_ENTRY_PNUM(__num)\
(SBI_NACL_SHMEM_HFENCE_ENTRY(__num) + (__riscv_xlen / 8))
#define SBI_NACL_SHMEM_HFENCE_ENTRY_PCOUNT(__num)\
(SBI_NACL_SHMEM_HFENCE_ENTRY(__num) + \
((__riscv_xlen / 8) * 3))
#define SBI_NACL_SHMEM_HFENCE_CONFIG_PEND_BITS 1
#define SBI_NACL_SHMEM_HFENCE_CONFIG_PEND_SHIFT \
(__riscv_xlen - SBI_NACL_SHMEM_HFENCE_CONFIG_PEND_BITS)
#define SBI_NACL_SHMEM_HFENCE_CONFIG_PEND_MASK \
((1UL << SBI_NACL_SHMEM_HFENCE_CONFIG_PEND_BITS) - 1)
#define SBI_NACL_SHMEM_HFENCE_CONFIG_PEND \
(SBI_NACL_SHMEM_HFENCE_CONFIG_PEND_MASK << \
SBI_NACL_SHMEM_HFENCE_CONFIG_PEND_SHIFT)
#define SBI_NACL_SHMEM_HFENCE_CONFIG_RSVD1_BITS 3
#define SBI_NACL_SHMEM_HFENCE_CONFIG_RSVD1_SHIFT \
(SBI_NACL_SHMEM_HFENCE_CONFIG_PEND_SHIFT - \
SBI_NACL_SHMEM_HFENCE_CONFIG_RSVD1_BITS)
#define SBI_NACL_SHMEM_HFENCE_CONFIG_TYPE_BITS 4
#define SBI_NACL_SHMEM_HFENCE_CONFIG_TYPE_SHIFT \
(SBI_NACL_SHMEM_HFENCE_CONFIG_RSVD1_SHIFT - \
SBI_NACL_SHMEM_HFENCE_CONFIG_TYPE_BITS)
#define SBI_NACL_SHMEM_HFENCE_CONFIG_TYPE_MASK \
((1UL << SBI_NACL_SHMEM_HFENCE_CONFIG_TYPE_BITS) - 1)
#define SBI_NACL_SHMEM_HFENCE_TYPE_GVMA 0x0
#define SBI_NACL_SHMEM_HFENCE_TYPE_GVMA_ALL 0x1
#define SBI_NACL_SHMEM_HFENCE_TYPE_GVMA_VMID 0x2
#define SBI_NACL_SHMEM_HFENCE_TYPE_GVMA_VMID_ALL 0x3
#define SBI_NACL_SHMEM_HFENCE_TYPE_VVMA 0x4
#define SBI_NACL_SHMEM_HFENCE_TYPE_VVMA_ALL 0x5
#define SBI_NACL_SHMEM_HFENCE_TYPE_VVMA_ASID 0x6
#define SBI_NACL_SHMEM_HFENCE_TYPE_VVMA_ASID_ALL 0x7
#define SBI_NACL_SHMEM_HFENCE_CONFIG_RSVD2_BITS 1
#define SBI_NACL_SHMEM_HFENCE_CONFIG_RSVD2_SHIFT \
(SBI_NACL_SHMEM_HFENCE_CONFIG_TYPE_SHIFT - \
SBI_NACL_SHMEM_HFENCE_CONFIG_RSVD2_BITS)
#define SBI_NACL_SHMEM_HFENCE_CONFIG_ORDER_BITS 7
#define SBI_NACL_SHMEM_HFENCE_CONFIG_ORDER_SHIFT \
(SBI_NACL_SHMEM_HFENCE_CONFIG_RSVD2_SHIFT - \
SBI_NACL_SHMEM_HFENCE_CONFIG_ORDER_BITS)
#define SBI_NACL_SHMEM_HFENCE_CONFIG_ORDER_MASK \
((1UL << SBI_NACL_SHMEM_HFENCE_CONFIG_ORDER_BITS) - 1)
#define SBI_NACL_SHMEM_HFENCE_ORDER_BASE 12
#if __riscv_xlen == 32
#define SBI_NACL_SHMEM_HFENCE_CONFIG_ASID_BITS 9
#define SBI_NACL_SHMEM_HFENCE_CONFIG_VMID_BITS 7
#else
#define SBI_NACL_SHMEM_HFENCE_CONFIG_ASID_BITS 16
#define SBI_NACL_SHMEM_HFENCE_CONFIG_VMID_BITS 14
#endif
#define SBI_NACL_SHMEM_HFENCE_CONFIG_VMID_SHIFT \
SBI_NACL_SHMEM_HFENCE_CONFIG_ASID_BITS
#define SBI_NACL_SHMEM_HFENCE_CONFIG_ASID_MASK \
((1UL << SBI_NACL_SHMEM_HFENCE_CONFIG_ASID_BITS) - 1)
#define SBI_NACL_SHMEM_HFENCE_CONFIG_VMID_MASK \
((1UL << SBI_NACL_SHMEM_HFENCE_CONFIG_VMID_BITS) - 1)
#define SBI_NACL_SHMEM_AUTOSWAP_FLAG_HSTATUS BIT(0)
#define SBI_NACL_SHMEM_AUTOSWAP_HSTATUS ((__riscv_xlen / 8) * 1)
#define SBI_NACL_SHMEM_SRET_X(__i) ((__riscv_xlen / 8) * (__i))
#define SBI_NACL_SHMEM_SRET_X_LAST 31
/* SBI spec version fields */
#define SBI_SPEC_VERSION_DEFAULT 0x1
#define SBI_SPEC_VERSION_MAJOR_SHIFT 24

View File

@ -28,11 +28,21 @@ static bool fill_callchain(void *entry, unsigned long pc)
void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
struct pt_regs *regs)
{
if (perf_guest_state()) {
/* TODO: We don't support guest os callchain now */
return;
}
arch_stack_walk_user(fill_callchain, entry, regs);
}
void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
struct pt_regs *regs)
{
if (perf_guest_state()) {
/* TODO: We don't support guest os callchain now */
return;
}
walk_stackframe(NULL, regs, fill_callchain, entry);
}

View File

@ -32,6 +32,7 @@ config KVM
select KVM_XFER_TO_GUEST_WORK
select KVM_GENERIC_MMU_NOTIFIER
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
help
Support hosting virtualized guest machines.

View File

@ -9,27 +9,30 @@ include $(srctree)/virt/kvm/Makefile.kvm
obj-$(CONFIG_KVM) += kvm.o
# Ordered alphabetically
kvm-y += aia.o
kvm-y += aia_aplic.o
kvm-y += aia_device.o
kvm-y += aia_imsic.o
kvm-y += main.o
kvm-y += vm.o
kvm-y += vmid.o
kvm-y += tlb.o
kvm-y += mmu.o
kvm-y += nacl.o
kvm-y += tlb.o
kvm-y += vcpu.o
kvm-y += vcpu_exit.o
kvm-y += vcpu_fp.o
kvm-y += vcpu_vector.o
kvm-y += vcpu_insn.o
kvm-y += vcpu_onereg.o
kvm-y += vcpu_switch.o
kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
kvm-y += vcpu_sbi.o
kvm-$(CONFIG_RISCV_SBI_V01) += vcpu_sbi_v01.o
kvm-y += vcpu_sbi_base.o
kvm-y += vcpu_sbi_replace.o
kvm-y += vcpu_sbi_hsm.o
kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_sbi_pmu.o
kvm-y += vcpu_sbi_replace.o
kvm-y += vcpu_sbi_sta.o
kvm-$(CONFIG_RISCV_SBI_V01) += vcpu_sbi_v01.o
kvm-y += vcpu_switch.o
kvm-y += vcpu_timer.o
kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o vcpu_sbi_pmu.o
kvm-y += aia.o
kvm-y += aia_device.o
kvm-y += aia_aplic.o
kvm-y += aia_imsic.o
kvm-y += vcpu_vector.o
kvm-y += vm.o
kvm-y += vmid.o

View File

@ -16,6 +16,7 @@
#include <linux/percpu.h>
#include <linux/spinlock.h>
#include <asm/cpufeature.h>
#include <asm/kvm_nacl.h>
struct aia_hgei_control {
raw_spinlock_t lock;
@ -51,7 +52,7 @@ static int aia_find_hgei(struct kvm_vcpu *owner)
return hgei;
}
static void aia_set_hvictl(bool ext_irq_pending)
static inline unsigned long aia_hvictl_value(bool ext_irq_pending)
{
unsigned long hvictl;
@ -62,7 +63,7 @@ static void aia_set_hvictl(bool ext_irq_pending)
hvictl = (IRQ_S_EXT << HVICTL_IID_SHIFT) & HVICTL_IID;
hvictl |= ext_irq_pending;
csr_write(CSR_HVICTL, hvictl);
return hvictl;
}
#ifdef CONFIG_32BIT
@ -88,7 +89,7 @@ void kvm_riscv_vcpu_aia_sync_interrupts(struct kvm_vcpu *vcpu)
struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr;
if (kvm_riscv_aia_available())
csr->vsieh = csr_read(CSR_VSIEH);
csr->vsieh = ncsr_read(CSR_VSIEH);
}
#endif
@ -115,7 +116,7 @@ bool kvm_riscv_vcpu_aia_has_interrupts(struct kvm_vcpu *vcpu, u64 mask)
hgei = aia_find_hgei(vcpu);
if (hgei > 0)
return !!(csr_read(CSR_HGEIP) & BIT(hgei));
return !!(ncsr_read(CSR_HGEIP) & BIT(hgei));
return false;
}
@ -128,45 +129,73 @@ void kvm_riscv_vcpu_aia_update_hvip(struct kvm_vcpu *vcpu)
return;
#ifdef CONFIG_32BIT
csr_write(CSR_HVIPH, vcpu->arch.aia_context.guest_csr.hviph);
ncsr_write(CSR_HVIPH, vcpu->arch.aia_context.guest_csr.hviph);
#endif
aia_set_hvictl(!!(csr->hvip & BIT(IRQ_VS_EXT)));
ncsr_write(CSR_HVICTL, aia_hvictl_value(!!(csr->hvip & BIT(IRQ_VS_EXT))));
}
void kvm_riscv_vcpu_aia_load(struct kvm_vcpu *vcpu, int cpu)
{
struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr;
void *nsh;
if (!kvm_riscv_aia_available())
return;
csr_write(CSR_VSISELECT, csr->vsiselect);
csr_write(CSR_HVIPRIO1, csr->hviprio1);
csr_write(CSR_HVIPRIO2, csr->hviprio2);
if (kvm_riscv_nacl_sync_csr_available()) {
nsh = nacl_shmem();
nacl_csr_write(nsh, CSR_VSISELECT, csr->vsiselect);
nacl_csr_write(nsh, CSR_HVIPRIO1, csr->hviprio1);
nacl_csr_write(nsh, CSR_HVIPRIO2, csr->hviprio2);
#ifdef CONFIG_32BIT
csr_write(CSR_VSIEH, csr->vsieh);
csr_write(CSR_HVIPH, csr->hviph);
csr_write(CSR_HVIPRIO1H, csr->hviprio1h);
csr_write(CSR_HVIPRIO2H, csr->hviprio2h);
nacl_csr_write(nsh, CSR_VSIEH, csr->vsieh);
nacl_csr_write(nsh, CSR_HVIPH, csr->hviph);
nacl_csr_write(nsh, CSR_HVIPRIO1H, csr->hviprio1h);
nacl_csr_write(nsh, CSR_HVIPRIO2H, csr->hviprio2h);
#endif
} else {
csr_write(CSR_VSISELECT, csr->vsiselect);
csr_write(CSR_HVIPRIO1, csr->hviprio1);
csr_write(CSR_HVIPRIO2, csr->hviprio2);
#ifdef CONFIG_32BIT
csr_write(CSR_VSIEH, csr->vsieh);
csr_write(CSR_HVIPH, csr->hviph);
csr_write(CSR_HVIPRIO1H, csr->hviprio1h);
csr_write(CSR_HVIPRIO2H, csr->hviprio2h);
#endif
}
}
void kvm_riscv_vcpu_aia_put(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr;
void *nsh;
if (!kvm_riscv_aia_available())
return;
csr->vsiselect = csr_read(CSR_VSISELECT);
csr->hviprio1 = csr_read(CSR_HVIPRIO1);
csr->hviprio2 = csr_read(CSR_HVIPRIO2);
if (kvm_riscv_nacl_available()) {
nsh = nacl_shmem();
csr->vsiselect = nacl_csr_read(nsh, CSR_VSISELECT);
csr->hviprio1 = nacl_csr_read(nsh, CSR_HVIPRIO1);
csr->hviprio2 = nacl_csr_read(nsh, CSR_HVIPRIO2);
#ifdef CONFIG_32BIT
csr->vsieh = csr_read(CSR_VSIEH);
csr->hviph = csr_read(CSR_HVIPH);
csr->hviprio1h = csr_read(CSR_HVIPRIO1H);
csr->hviprio2h = csr_read(CSR_HVIPRIO2H);
csr->vsieh = nacl_csr_read(nsh, CSR_VSIEH);
csr->hviph = nacl_csr_read(nsh, CSR_HVIPH);
csr->hviprio1h = nacl_csr_read(nsh, CSR_HVIPRIO1H);
csr->hviprio2h = nacl_csr_read(nsh, CSR_HVIPRIO2H);
#endif
} else {
csr->vsiselect = csr_read(CSR_VSISELECT);
csr->hviprio1 = csr_read(CSR_HVIPRIO1);
csr->hviprio2 = csr_read(CSR_HVIPRIO2);
#ifdef CONFIG_32BIT
csr->vsieh = csr_read(CSR_VSIEH);
csr->hviph = csr_read(CSR_HVIPH);
csr->hviprio1h = csr_read(CSR_HVIPRIO1H);
csr->hviprio2h = csr_read(CSR_HVIPRIO2H);
#endif
}
}
int kvm_riscv_vcpu_aia_get_csr(struct kvm_vcpu *vcpu,
@ -250,20 +279,20 @@ static u8 aia_get_iprio8(struct kvm_vcpu *vcpu, unsigned int irq)
switch (bitpos / BITS_PER_LONG) {
case 0:
hviprio = csr_read(CSR_HVIPRIO1);
hviprio = ncsr_read(CSR_HVIPRIO1);
break;
case 1:
#ifndef CONFIG_32BIT
hviprio = csr_read(CSR_HVIPRIO2);
hviprio = ncsr_read(CSR_HVIPRIO2);
break;
#else
hviprio = csr_read(CSR_HVIPRIO1H);
hviprio = ncsr_read(CSR_HVIPRIO1H);
break;
case 2:
hviprio = csr_read(CSR_HVIPRIO2);
hviprio = ncsr_read(CSR_HVIPRIO2);
break;
case 3:
hviprio = csr_read(CSR_HVIPRIO2H);
hviprio = ncsr_read(CSR_HVIPRIO2H);
break;
#endif
default:
@ -283,20 +312,20 @@ static void aia_set_iprio8(struct kvm_vcpu *vcpu, unsigned int irq, u8 prio)
switch (bitpos / BITS_PER_LONG) {
case 0:
hviprio = csr_read(CSR_HVIPRIO1);
hviprio = ncsr_read(CSR_HVIPRIO1);
break;
case 1:
#ifndef CONFIG_32BIT
hviprio = csr_read(CSR_HVIPRIO2);
hviprio = ncsr_read(CSR_HVIPRIO2);
break;
#else
hviprio = csr_read(CSR_HVIPRIO1H);
hviprio = ncsr_read(CSR_HVIPRIO1H);
break;
case 2:
hviprio = csr_read(CSR_HVIPRIO2);
hviprio = ncsr_read(CSR_HVIPRIO2);
break;
case 3:
hviprio = csr_read(CSR_HVIPRIO2H);
hviprio = ncsr_read(CSR_HVIPRIO2H);
break;
#endif
default:
@ -308,20 +337,20 @@ static void aia_set_iprio8(struct kvm_vcpu *vcpu, unsigned int irq, u8 prio)
switch (bitpos / BITS_PER_LONG) {
case 0:
csr_write(CSR_HVIPRIO1, hviprio);
ncsr_write(CSR_HVIPRIO1, hviprio);
break;
case 1:
#ifndef CONFIG_32BIT
csr_write(CSR_HVIPRIO2, hviprio);
ncsr_write(CSR_HVIPRIO2, hviprio);
break;
#else
csr_write(CSR_HVIPRIO1H, hviprio);
ncsr_write(CSR_HVIPRIO1H, hviprio);
break;
case 2:
csr_write(CSR_HVIPRIO2, hviprio);
ncsr_write(CSR_HVIPRIO2, hviprio);
break;
case 3:
csr_write(CSR_HVIPRIO2H, hviprio);
ncsr_write(CSR_HVIPRIO2H, hviprio);
break;
#endif
default:
@ -377,7 +406,7 @@ int kvm_riscv_vcpu_aia_rmw_ireg(struct kvm_vcpu *vcpu, unsigned int csr_num,
return KVM_INSN_ILLEGAL_TRAP;
/* First try to emulate in kernel space */
isel = csr_read(CSR_VSISELECT) & ISELECT_MASK;
isel = ncsr_read(CSR_VSISELECT) & ISELECT_MASK;
if (isel >= ISELECT_IPRIO0 && isel <= ISELECT_IPRIO15)
return aia_rmw_iprio(vcpu, isel, val, new_val, wr_mask);
else if (isel >= IMSIC_FIRST && isel <= IMSIC_LAST &&
@ -499,6 +528,10 @@ static int aia_hgei_init(void)
hgctrl->free_bitmap = 0;
}
/* Skip SGEI interrupt setup for zero guest external interrupts */
if (!kvm_riscv_aia_nr_hgei)
goto skip_sgei_interrupt;
/* Find INTC irq domain */
domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
DOMAIN_BUS_ANY);
@ -522,11 +555,16 @@ static int aia_hgei_init(void)
return rc;
}
skip_sgei_interrupt:
return 0;
}
static void aia_hgei_exit(void)
{
/* Do nothing for zero guest external interrupts */
if (!kvm_riscv_aia_nr_hgei)
return;
/* Free per-CPU SGEI interrupt */
free_percpu_irq(hgei_parent_irq, &aia_hgei);
}
@ -536,7 +574,7 @@ void kvm_riscv_aia_enable(void)
if (!kvm_riscv_aia_available())
return;
aia_set_hvictl(false);
csr_write(CSR_HVICTL, aia_hvictl_value(false));
csr_write(CSR_HVIPRIO1, 0x0);
csr_write(CSR_HVIPRIO2, 0x0);
#ifdef CONFIG_32BIT
@ -572,7 +610,7 @@ void kvm_riscv_aia_disable(void)
csr_clear(CSR_HIE, BIT(IRQ_S_GEXT));
disable_percpu_irq(hgei_parent_irq);
aia_set_hvictl(false);
csr_write(CSR_HVICTL, aia_hvictl_value(false));
raw_spin_lock_irqsave(&hgctrl->lock, flags);

View File

@ -143,7 +143,7 @@ static void aplic_write_pending(struct aplic *aplic, u32 irq, bool pending)
if (sm == APLIC_SOURCECFG_SM_LEVEL_HIGH ||
sm == APLIC_SOURCECFG_SM_LEVEL_LOW) {
if (!pending)
goto skip_write_pending;
goto noskip_write_pending;
if ((irqd->state & APLIC_IRQ_STATE_INPUT) &&
sm == APLIC_SOURCECFG_SM_LEVEL_LOW)
goto skip_write_pending;
@ -152,6 +152,7 @@ static void aplic_write_pending(struct aplic *aplic, u32 irq, bool pending)
goto skip_write_pending;
}
noskip_write_pending:
if (pending)
irqd->state |= APLIC_IRQ_STATE_PENDING;
else

View File

@ -10,8 +10,8 @@
#include <linux/err.h>
#include <linux/module.h>
#include <linux/kvm_host.h>
#include <asm/csr.h>
#include <asm/cpufeature.h>
#include <asm/kvm_nacl.h>
#include <asm/sbi.h>
long kvm_arch_dev_ioctl(struct file *filp,
@ -22,6 +22,12 @@ long kvm_arch_dev_ioctl(struct file *filp,
int kvm_arch_enable_virtualization_cpu(void)
{
int rc;
rc = kvm_riscv_nacl_enable();
if (rc)
return rc;
csr_write(CSR_HEDELEG, KVM_HEDELEG_DEFAULT);
csr_write(CSR_HIDELEG, KVM_HIDELEG_DEFAULT);
@ -49,11 +55,21 @@ void kvm_arch_disable_virtualization_cpu(void)
csr_write(CSR_HVIP, 0);
csr_write(CSR_HEDELEG, 0);
csr_write(CSR_HIDELEG, 0);
kvm_riscv_nacl_disable();
}
static void kvm_riscv_teardown(void)
{
kvm_riscv_aia_exit();
kvm_riscv_nacl_exit();
kvm_unregister_perf_callbacks();
}
static int __init riscv_kvm_init(void)
{
int rc;
char slist[64];
const char *str;
if (!riscv_isa_extension_available(NULL, h)) {
@ -71,16 +87,53 @@ static int __init riscv_kvm_init(void)
return -ENODEV;
}
rc = kvm_riscv_nacl_init();
if (rc && rc != -ENODEV)
return rc;
kvm_riscv_gstage_mode_detect();
kvm_riscv_gstage_vmid_detect();
rc = kvm_riscv_aia_init();
if (rc && rc != -ENODEV)
if (rc && rc != -ENODEV) {
kvm_riscv_nacl_exit();
return rc;
}
kvm_info("hypervisor extension available\n");
if (kvm_riscv_nacl_available()) {
rc = 0;
slist[0] = '\0';
if (kvm_riscv_nacl_sync_csr_available()) {
if (rc)
strcat(slist, ", ");
strcat(slist, "sync_csr");
rc++;
}
if (kvm_riscv_nacl_sync_hfence_available()) {
if (rc)
strcat(slist, ", ");
strcat(slist, "sync_hfence");
rc++;
}
if (kvm_riscv_nacl_sync_sret_available()) {
if (rc)
strcat(slist, ", ");
strcat(slist, "sync_sret");
rc++;
}
if (kvm_riscv_nacl_autoswap_csr_available()) {
if (rc)
strcat(slist, ", ");
strcat(slist, "autoswap_csr");
rc++;
}
kvm_info("using SBI nested acceleration with %s\n",
(rc) ? slist : "no features");
}
switch (kvm_riscv_gstage_mode()) {
case HGATP_MODE_SV32X4:
str = "Sv32x4";
@ -105,9 +158,11 @@ static int __init riscv_kvm_init(void)
kvm_info("AIA available with %d guest external interrupts\n",
kvm_riscv_aia_nr_hgei);
kvm_register_perf_callbacks(NULL);
rc = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
if (rc) {
kvm_riscv_aia_exit();
kvm_riscv_teardown();
return rc;
}
@ -117,7 +172,7 @@ module_init(riscv_kvm_init);
static void __exit riscv_kvm_exit(void)
{
kvm_riscv_aia_exit();
kvm_riscv_teardown();
kvm_exit();
}

View File

@ -15,7 +15,7 @@
#include <linux/vmalloc.h>
#include <linux/kvm_host.h>
#include <linux/sched/signal.h>
#include <asm/csr.h>
#include <asm/kvm_nacl.h>
#include <asm/page.h>
#include <asm/pgtable.h>
@ -601,6 +601,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
bool logging = (memslot->dirty_bitmap &&
!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
unsigned long vma_pagesize, mmu_seq;
struct page *page;
/* We need minimum second+third level pages */
ret = kvm_mmu_topup_memory_cache(pcache, gstage_pgd_levels);
@ -631,7 +632,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
/*
* Read mmu_invalidate_seq so that KVM can detect if the results of
* vma_lookup() or gfn_to_pfn_prot() become stale priort to acquiring
* vma_lookup() or __kvm_faultin_pfn() become stale prior to acquiring
* kvm->mmu_lock.
*
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
@ -647,7 +648,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
return -EFAULT;
}
hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writable);
hfn = kvm_faultin_pfn(vcpu, gfn, is_write, &writable, &page);
if (hfn == KVM_PFN_ERR_HWPOISON) {
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
vma_pageshift, current);
@ -669,7 +670,6 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
goto out_unlock;
if (writable) {
kvm_set_pfn_dirty(hfn);
mark_page_dirty(kvm, gfn);
ret = gstage_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
vma_pagesize, false, true);
@ -682,9 +682,8 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
kvm_err("Failed to map in G-stage\n");
out_unlock:
kvm_release_faultin_page(kvm, page, ret && ret != -EEXIST, writable);
spin_unlock(&kvm->mmu_lock);
kvm_set_pfn_accessed(hfn);
kvm_release_pfn_clean(hfn);
return ret;
}
@ -732,7 +731,7 @@ void kvm_riscv_gstage_update_hgatp(struct kvm_vcpu *vcpu)
hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
csr_write(CSR_HGATP, hgatp);
ncsr_write(CSR_HGATP, hgatp);
if (!kvm_riscv_gstage_vmid_bits())
kvm_riscv_local_hfence_gvma_all();

152
arch/riscv/kvm/nacl.c Normal file
View File

@ -0,0 +1,152 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2024 Ventana Micro Systems Inc.
*/
#include <linux/kvm_host.h>
#include <linux/vmalloc.h>
#include <asm/kvm_nacl.h>
DEFINE_STATIC_KEY_FALSE(kvm_riscv_nacl_available);
DEFINE_STATIC_KEY_FALSE(kvm_riscv_nacl_sync_csr_available);
DEFINE_STATIC_KEY_FALSE(kvm_riscv_nacl_sync_hfence_available);
DEFINE_STATIC_KEY_FALSE(kvm_riscv_nacl_sync_sret_available);
DEFINE_STATIC_KEY_FALSE(kvm_riscv_nacl_autoswap_csr_available);
DEFINE_PER_CPU(struct kvm_riscv_nacl, kvm_riscv_nacl);
void __kvm_riscv_nacl_hfence(void *shmem,
unsigned long control,
unsigned long page_num,
unsigned long page_count)
{
int i, ent = -1, try_count = 5;
unsigned long *entp;
again:
for (i = 0; i < SBI_NACL_SHMEM_HFENCE_ENTRY_MAX; i++) {
entp = shmem + SBI_NACL_SHMEM_HFENCE_ENTRY_CONFIG(i);
if (lelong_to_cpu(*entp) & SBI_NACL_SHMEM_HFENCE_CONFIG_PEND)
continue;
ent = i;
break;
}
if (ent < 0) {
if (try_count) {
nacl_sync_hfence(-1UL);
goto again;
} else {
pr_warn("KVM: No free entry in NACL shared memory\n");
return;
}
}
entp = shmem + SBI_NACL_SHMEM_HFENCE_ENTRY_CONFIG(i);
*entp = cpu_to_lelong(control);
entp = shmem + SBI_NACL_SHMEM_HFENCE_ENTRY_PNUM(i);
*entp = cpu_to_lelong(page_num);
entp = shmem + SBI_NACL_SHMEM_HFENCE_ENTRY_PCOUNT(i);
*entp = cpu_to_lelong(page_count);
}
int kvm_riscv_nacl_enable(void)
{
int rc;
struct sbiret ret;
struct kvm_riscv_nacl *nacl;
if (!kvm_riscv_nacl_available())
return 0;
nacl = this_cpu_ptr(&kvm_riscv_nacl);
ret = sbi_ecall(SBI_EXT_NACL, SBI_EXT_NACL_SET_SHMEM,
nacl->shmem_phys, 0, 0, 0, 0, 0);
rc = sbi_err_map_linux_errno(ret.error);
if (rc)
return rc;
return 0;
}
void kvm_riscv_nacl_disable(void)
{
if (!kvm_riscv_nacl_available())
return;
sbi_ecall(SBI_EXT_NACL, SBI_EXT_NACL_SET_SHMEM,
SBI_SHMEM_DISABLE, SBI_SHMEM_DISABLE, 0, 0, 0, 0);
}
void kvm_riscv_nacl_exit(void)
{
int cpu;
struct kvm_riscv_nacl *nacl;
if (!kvm_riscv_nacl_available())
return;
/* Allocate per-CPU shared memory */
for_each_possible_cpu(cpu) {
nacl = per_cpu_ptr(&kvm_riscv_nacl, cpu);
if (!nacl->shmem)
continue;
free_pages((unsigned long)nacl->shmem,
get_order(SBI_NACL_SHMEM_SIZE));
nacl->shmem = NULL;
nacl->shmem_phys = 0;
}
}
static long nacl_probe_feature(long feature_id)
{
struct sbiret ret;
if (!kvm_riscv_nacl_available())
return 0;
ret = sbi_ecall(SBI_EXT_NACL, SBI_EXT_NACL_PROBE_FEATURE,
feature_id, 0, 0, 0, 0, 0);
return ret.value;
}
int kvm_riscv_nacl_init(void)
{
int cpu;
struct page *shmem_page;
struct kvm_riscv_nacl *nacl;
if (sbi_spec_version < sbi_mk_version(1, 0) ||
sbi_probe_extension(SBI_EXT_NACL) <= 0)
return -ENODEV;
/* Enable NACL support */
static_branch_enable(&kvm_riscv_nacl_available);
/* Probe NACL features */
if (nacl_probe_feature(SBI_NACL_FEAT_SYNC_CSR))
static_branch_enable(&kvm_riscv_nacl_sync_csr_available);
if (nacl_probe_feature(SBI_NACL_FEAT_SYNC_HFENCE))
static_branch_enable(&kvm_riscv_nacl_sync_hfence_available);
if (nacl_probe_feature(SBI_NACL_FEAT_SYNC_SRET))
static_branch_enable(&kvm_riscv_nacl_sync_sret_available);
if (nacl_probe_feature(SBI_NACL_FEAT_AUTOSWAP_CSR))
static_branch_enable(&kvm_riscv_nacl_autoswap_csr_available);
/* Allocate per-CPU shared memory */
for_each_possible_cpu(cpu) {
nacl = per_cpu_ptr(&kvm_riscv_nacl, cpu);
shmem_page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
get_order(SBI_NACL_SHMEM_SIZE));
if (!shmem_page) {
kvm_riscv_nacl_exit();
return -ENOMEM;
}
nacl->shmem = page_to_virt(shmem_page);
nacl->shmem_phys = page_to_phys(shmem_page);
}
return 0;
}

View File

@ -14,6 +14,7 @@
#include <asm/csr.h>
#include <asm/cpufeature.h>
#include <asm/insn-def.h>
#include <asm/kvm_nacl.h>
#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL)
@ -186,18 +187,24 @@ void kvm_riscv_fence_i_process(struct kvm_vcpu *vcpu)
void kvm_riscv_hfence_gvma_vmid_all_process(struct kvm_vcpu *vcpu)
{
struct kvm_vmid *vmid;
struct kvm_vmid *v = &vcpu->kvm->arch.vmid;
unsigned long vmid = READ_ONCE(v->vmid);
vmid = &vcpu->kvm->arch.vmid;
kvm_riscv_local_hfence_gvma_vmid_all(READ_ONCE(vmid->vmid));
if (kvm_riscv_nacl_available())
nacl_hfence_gvma_vmid_all(nacl_shmem(), vmid);
else
kvm_riscv_local_hfence_gvma_vmid_all(vmid);
}
void kvm_riscv_hfence_vvma_all_process(struct kvm_vcpu *vcpu)
{
struct kvm_vmid *vmid;
struct kvm_vmid *v = &vcpu->kvm->arch.vmid;
unsigned long vmid = READ_ONCE(v->vmid);
vmid = &vcpu->kvm->arch.vmid;
kvm_riscv_local_hfence_vvma_all(READ_ONCE(vmid->vmid));
if (kvm_riscv_nacl_available())
nacl_hfence_vvma_all(nacl_shmem(), vmid);
else
kvm_riscv_local_hfence_vvma_all(vmid);
}
static bool vcpu_hfence_dequeue(struct kvm_vcpu *vcpu,
@ -251,6 +258,7 @@ static bool vcpu_hfence_enqueue(struct kvm_vcpu *vcpu,
void kvm_riscv_hfence_process(struct kvm_vcpu *vcpu)
{
unsigned long vmid;
struct kvm_riscv_hfence d = { 0 };
struct kvm_vmid *v = &vcpu->kvm->arch.vmid;
@ -259,26 +267,41 @@ void kvm_riscv_hfence_process(struct kvm_vcpu *vcpu)
case KVM_RISCV_HFENCE_UNKNOWN:
break;
case KVM_RISCV_HFENCE_GVMA_VMID_GPA:
kvm_riscv_local_hfence_gvma_vmid_gpa(
READ_ONCE(v->vmid),
d.addr, d.size, d.order);
vmid = READ_ONCE(v->vmid);
if (kvm_riscv_nacl_available())
nacl_hfence_gvma_vmid(nacl_shmem(), vmid,
d.addr, d.size, d.order);
else
kvm_riscv_local_hfence_gvma_vmid_gpa(vmid, d.addr,
d.size, d.order);
break;
case KVM_RISCV_HFENCE_VVMA_ASID_GVA:
kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_ASID_RCVD);
kvm_riscv_local_hfence_vvma_asid_gva(
READ_ONCE(v->vmid), d.asid,
d.addr, d.size, d.order);
vmid = READ_ONCE(v->vmid);
if (kvm_riscv_nacl_available())
nacl_hfence_vvma_asid(nacl_shmem(), vmid, d.asid,
d.addr, d.size, d.order);
else
kvm_riscv_local_hfence_vvma_asid_gva(vmid, d.asid, d.addr,
d.size, d.order);
break;
case KVM_RISCV_HFENCE_VVMA_ASID_ALL:
kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_ASID_RCVD);
kvm_riscv_local_hfence_vvma_asid_all(
READ_ONCE(v->vmid), d.asid);
vmid = READ_ONCE(v->vmid);
if (kvm_riscv_nacl_available())
nacl_hfence_vvma_asid_all(nacl_shmem(), vmid, d.asid);
else
kvm_riscv_local_hfence_vvma_asid_all(vmid, d.asid);
break;
case KVM_RISCV_HFENCE_VVMA_GVA:
kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_RCVD);
kvm_riscv_local_hfence_vvma_gva(
READ_ONCE(v->vmid),
d.addr, d.size, d.order);
vmid = READ_ONCE(v->vmid);
if (kvm_riscv_nacl_available())
nacl_hfence_vvma(nacl_shmem(), vmid,
d.addr, d.size, d.order);
else
kvm_riscv_local_hfence_vvma_gva(vmid, d.addr,
d.size, d.order);
break;
default:
break;

View File

@ -17,8 +17,8 @@
#include <linux/sched/signal.h>
#include <linux/fs.h>
#include <linux/kvm_host.h>
#include <asm/csr.h>
#include <asm/cacheflush.h>
#include <asm/kvm_nacl.h>
#include <asm/kvm_vcpu_vector.h>
#define CREATE_TRACE_POINTS
@ -226,6 +226,13 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
return (vcpu->arch.guest_context.sstatus & SR_SPP) ? true : false;
}
#ifdef CONFIG_GUEST_PERF_EVENTS
unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
{
return vcpu->arch.guest_context.sepc;
}
#endif
vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
{
return VM_FAULT_SIGBUS;
@ -361,10 +368,10 @@ void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
/* Read current HVIP and VSIE CSRs */
csr->vsie = csr_read(CSR_VSIE);
csr->vsie = ncsr_read(CSR_VSIE);
/* Sync-up HVIP.VSSIP bit changes does by Guest */
hvip = csr_read(CSR_HVIP);
hvip = ncsr_read(CSR_HVIP);
if ((csr->hvip ^ hvip) & (1UL << IRQ_VS_SOFT)) {
if (hvip & (1UL << IRQ_VS_SOFT)) {
if (!test_and_set_bit(IRQ_VS_SOFT,
@ -561,26 +568,49 @@ static void kvm_riscv_vcpu_setup_config(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
void *nsh;
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
struct kvm_vcpu_config *cfg = &vcpu->arch.cfg;
csr_write(CSR_VSSTATUS, csr->vsstatus);
csr_write(CSR_VSIE, csr->vsie);
csr_write(CSR_VSTVEC, csr->vstvec);
csr_write(CSR_VSSCRATCH, csr->vsscratch);
csr_write(CSR_VSEPC, csr->vsepc);
csr_write(CSR_VSCAUSE, csr->vscause);
csr_write(CSR_VSTVAL, csr->vstval);
csr_write(CSR_HEDELEG, cfg->hedeleg);
csr_write(CSR_HVIP, csr->hvip);
csr_write(CSR_VSATP, csr->vsatp);
csr_write(CSR_HENVCFG, cfg->henvcfg);
if (IS_ENABLED(CONFIG_32BIT))
csr_write(CSR_HENVCFGH, cfg->henvcfg >> 32);
if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN)) {
csr_write(CSR_HSTATEEN0, cfg->hstateen0);
if (kvm_riscv_nacl_sync_csr_available()) {
nsh = nacl_shmem();
nacl_csr_write(nsh, CSR_VSSTATUS, csr->vsstatus);
nacl_csr_write(nsh, CSR_VSIE, csr->vsie);
nacl_csr_write(nsh, CSR_VSTVEC, csr->vstvec);
nacl_csr_write(nsh, CSR_VSSCRATCH, csr->vsscratch);
nacl_csr_write(nsh, CSR_VSEPC, csr->vsepc);
nacl_csr_write(nsh, CSR_VSCAUSE, csr->vscause);
nacl_csr_write(nsh, CSR_VSTVAL, csr->vstval);
nacl_csr_write(nsh, CSR_HEDELEG, cfg->hedeleg);
nacl_csr_write(nsh, CSR_HVIP, csr->hvip);
nacl_csr_write(nsh, CSR_VSATP, csr->vsatp);
nacl_csr_write(nsh, CSR_HENVCFG, cfg->henvcfg);
if (IS_ENABLED(CONFIG_32BIT))
csr_write(CSR_HSTATEEN0H, cfg->hstateen0 >> 32);
nacl_csr_write(nsh, CSR_HENVCFGH, cfg->henvcfg >> 32);
if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN)) {
nacl_csr_write(nsh, CSR_HSTATEEN0, cfg->hstateen0);
if (IS_ENABLED(CONFIG_32BIT))
nacl_csr_write(nsh, CSR_HSTATEEN0H, cfg->hstateen0 >> 32);
}
} else {
csr_write(CSR_VSSTATUS, csr->vsstatus);
csr_write(CSR_VSIE, csr->vsie);
csr_write(CSR_VSTVEC, csr->vstvec);
csr_write(CSR_VSSCRATCH, csr->vsscratch);
csr_write(CSR_VSEPC, csr->vsepc);
csr_write(CSR_VSCAUSE, csr->vscause);
csr_write(CSR_VSTVAL, csr->vstval);
csr_write(CSR_HEDELEG, cfg->hedeleg);
csr_write(CSR_HVIP, csr->hvip);
csr_write(CSR_VSATP, csr->vsatp);
csr_write(CSR_HENVCFG, cfg->henvcfg);
if (IS_ENABLED(CONFIG_32BIT))
csr_write(CSR_HENVCFGH, cfg->henvcfg >> 32);
if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN)) {
csr_write(CSR_HSTATEEN0, cfg->hstateen0);
if (IS_ENABLED(CONFIG_32BIT))
csr_write(CSR_HSTATEEN0H, cfg->hstateen0 >> 32);
}
}
kvm_riscv_gstage_update_hgatp(vcpu);
@ -603,6 +633,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
void *nsh;
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
vcpu->cpu = -1;
@ -618,15 +649,28 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
vcpu->arch.isa);
kvm_riscv_vcpu_host_vector_restore(&vcpu->arch.host_context);
csr->vsstatus = csr_read(CSR_VSSTATUS);
csr->vsie = csr_read(CSR_VSIE);
csr->vstvec = csr_read(CSR_VSTVEC);
csr->vsscratch = csr_read(CSR_VSSCRATCH);
csr->vsepc = csr_read(CSR_VSEPC);
csr->vscause = csr_read(CSR_VSCAUSE);
csr->vstval = csr_read(CSR_VSTVAL);
csr->hvip = csr_read(CSR_HVIP);
csr->vsatp = csr_read(CSR_VSATP);
if (kvm_riscv_nacl_available()) {
nsh = nacl_shmem();
csr->vsstatus = nacl_csr_read(nsh, CSR_VSSTATUS);
csr->vsie = nacl_csr_read(nsh, CSR_VSIE);
csr->vstvec = nacl_csr_read(nsh, CSR_VSTVEC);
csr->vsscratch = nacl_csr_read(nsh, CSR_VSSCRATCH);
csr->vsepc = nacl_csr_read(nsh, CSR_VSEPC);
csr->vscause = nacl_csr_read(nsh, CSR_VSCAUSE);
csr->vstval = nacl_csr_read(nsh, CSR_VSTVAL);
csr->hvip = nacl_csr_read(nsh, CSR_HVIP);
csr->vsatp = nacl_csr_read(nsh, CSR_VSATP);
} else {
csr->vsstatus = csr_read(CSR_VSSTATUS);
csr->vsie = csr_read(CSR_VSIE);
csr->vstvec = csr_read(CSR_VSTVEC);
csr->vsscratch = csr_read(CSR_VSSCRATCH);
csr->vsepc = csr_read(CSR_VSEPC);
csr->vscause = csr_read(CSR_VSCAUSE);
csr->vstval = csr_read(CSR_VSTVAL);
csr->hvip = csr_read(CSR_HVIP);
csr->vsatp = csr_read(CSR_VSATP);
}
}
static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
@ -681,7 +725,7 @@ static void kvm_riscv_update_hvip(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
csr_write(CSR_HVIP, csr->hvip);
ncsr_write(CSR_HVIP, csr->hvip);
kvm_riscv_vcpu_aia_update_hvip(vcpu);
}
@ -691,6 +735,7 @@ static __always_inline void kvm_riscv_vcpu_swap_in_guest_state(struct kvm_vcpu *
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
struct kvm_vcpu_config *cfg = &vcpu->arch.cfg;
vcpu->arch.host_scounteren = csr_swap(CSR_SCOUNTEREN, csr->scounteren);
vcpu->arch.host_senvcfg = csr_swap(CSR_SENVCFG, csr->senvcfg);
if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN) &&
(cfg->hstateen0 & SMSTATEEN0_SSTATEEN0))
@ -704,6 +749,7 @@ static __always_inline void kvm_riscv_vcpu_swap_in_host_state(struct kvm_vcpu *v
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
struct kvm_vcpu_config *cfg = &vcpu->arch.cfg;
csr->scounteren = csr_swap(CSR_SCOUNTEREN, vcpu->arch.host_scounteren);
csr->senvcfg = csr_swap(CSR_SENVCFG, vcpu->arch.host_senvcfg);
if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN) &&
(cfg->hstateen0 & SMSTATEEN0_SSTATEEN0))
@ -718,11 +764,81 @@ static __always_inline void kvm_riscv_vcpu_swap_in_host_state(struct kvm_vcpu *v
* This must be noinstr as instrumentation may make use of RCU, and this is not
* safe during the EQS.
*/
static void noinstr kvm_riscv_vcpu_enter_exit(struct kvm_vcpu *vcpu)
static void noinstr kvm_riscv_vcpu_enter_exit(struct kvm_vcpu *vcpu,
struct kvm_cpu_trap *trap)
{
void *nsh;
struct kvm_cpu_context *gcntx = &vcpu->arch.guest_context;
struct kvm_cpu_context *hcntx = &vcpu->arch.host_context;
/*
* We save trap CSRs (such as SEPC, SCAUSE, STVAL, HTVAL, and
* HTINST) here because we do local_irq_enable() after this
* function in kvm_arch_vcpu_ioctl_run() which can result in
* an interrupt immediately after local_irq_enable() and can
* potentially change trap CSRs.
*/
kvm_riscv_vcpu_swap_in_guest_state(vcpu);
guest_state_enter_irqoff();
__kvm_riscv_switch_to(&vcpu->arch);
if (kvm_riscv_nacl_sync_sret_available()) {
nsh = nacl_shmem();
if (kvm_riscv_nacl_autoswap_csr_available()) {
hcntx->hstatus =
nacl_csr_read(nsh, CSR_HSTATUS);
nacl_scratch_write_long(nsh,
SBI_NACL_SHMEM_AUTOSWAP_OFFSET +
SBI_NACL_SHMEM_AUTOSWAP_HSTATUS,
gcntx->hstatus);
nacl_scratch_write_long(nsh,
SBI_NACL_SHMEM_AUTOSWAP_OFFSET,
SBI_NACL_SHMEM_AUTOSWAP_FLAG_HSTATUS);
} else if (kvm_riscv_nacl_sync_csr_available()) {
hcntx->hstatus = nacl_csr_swap(nsh,
CSR_HSTATUS, gcntx->hstatus);
} else {
hcntx->hstatus = csr_swap(CSR_HSTATUS, gcntx->hstatus);
}
nacl_scratch_write_longs(nsh,
SBI_NACL_SHMEM_SRET_OFFSET +
SBI_NACL_SHMEM_SRET_X(1),
&gcntx->ra,
SBI_NACL_SHMEM_SRET_X_LAST);
__kvm_riscv_nacl_switch_to(&vcpu->arch, SBI_EXT_NACL,
SBI_EXT_NACL_SYNC_SRET);
if (kvm_riscv_nacl_autoswap_csr_available()) {
nacl_scratch_write_long(nsh,
SBI_NACL_SHMEM_AUTOSWAP_OFFSET,
0);
gcntx->hstatus = nacl_scratch_read_long(nsh,
SBI_NACL_SHMEM_AUTOSWAP_OFFSET +
SBI_NACL_SHMEM_AUTOSWAP_HSTATUS);
} else {
gcntx->hstatus = csr_swap(CSR_HSTATUS, hcntx->hstatus);
}
trap->htval = nacl_csr_read(nsh, CSR_HTVAL);
trap->htinst = nacl_csr_read(nsh, CSR_HTINST);
} else {
hcntx->hstatus = csr_swap(CSR_HSTATUS, gcntx->hstatus);
__kvm_riscv_switch_to(&vcpu->arch);
gcntx->hstatus = csr_swap(CSR_HSTATUS, hcntx->hstatus);
trap->htval = csr_read(CSR_HTVAL);
trap->htinst = csr_read(CSR_HTINST);
}
trap->sepc = gcntx->sepc;
trap->scause = csr_read(CSR_SCAUSE);
trap->stval = csr_read(CSR_STVAL);
vcpu->arch.last_exit_cpu = vcpu->cpu;
guest_state_exit_irqoff();
kvm_riscv_vcpu_swap_in_host_state(vcpu);
@ -839,22 +955,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
guest_timing_enter_irqoff();
kvm_riscv_vcpu_enter_exit(vcpu);
kvm_riscv_vcpu_enter_exit(vcpu, &trap);
vcpu->mode = OUTSIDE_GUEST_MODE;
vcpu->stat.exits++;
/*
* Save SCAUSE, STVAL, HTVAL, and HTINST because we might
* get an interrupt between __kvm_riscv_switch_to() and
* local_irq_enable() which can potentially change CSRs.
*/
trap.sepc = vcpu->arch.guest_context.sepc;
trap.scause = csr_read(CSR_SCAUSE);
trap.stval = csr_read(CSR_STVAL);
trap.htval = csr_read(CSR_HTVAL);
trap.htinst = csr_read(CSR_HTINST);
/* Syncup interrupts state with HW */
kvm_riscv_vcpu_sync_interrupts(vcpu);

View File

@ -486,19 +486,22 @@ void kvm_riscv_vcpu_sbi_init(struct kvm_vcpu *vcpu)
struct kvm_vcpu_sbi_context *scontext = &vcpu->arch.sbi_context;
const struct kvm_riscv_sbi_extension_entry *entry;
const struct kvm_vcpu_sbi_extension *ext;
int i;
int idx, i;
for (i = 0; i < ARRAY_SIZE(sbi_ext); i++) {
entry = &sbi_ext[i];
ext = entry->ext_ptr;
idx = entry->ext_idx;
if (idx < 0 || idx >= ARRAY_SIZE(scontext->ext_status))
continue;
if (ext->probe && !ext->probe(vcpu)) {
scontext->ext_status[entry->ext_idx] =
KVM_RISCV_SBI_EXT_STATUS_UNAVAILABLE;
scontext->ext_status[idx] = KVM_RISCV_SBI_EXT_STATUS_UNAVAILABLE;
continue;
}
scontext->ext_status[entry->ext_idx] = ext->default_disabled ?
scontext->ext_status[idx] = ext->default_disabled ?
KVM_RISCV_SBI_EXT_STATUS_DISABLED :
KVM_RISCV_SBI_EXT_STATUS_ENABLED;
}

View File

@ -11,11 +11,7 @@
#include <asm/asm-offsets.h>
#include <asm/csr.h>
.text
.altmacro
.option norelax
SYM_FUNC_START(__kvm_riscv_switch_to)
.macro SAVE_HOST_GPRS
/* Save Host GPRs (except A0 and T0-T6) */
REG_S ra, (KVM_ARCH_HOST_RA)(a0)
REG_S sp, (KVM_ARCH_HOST_SP)(a0)
@ -40,39 +36,33 @@ SYM_FUNC_START(__kvm_riscv_switch_to)
REG_S s9, (KVM_ARCH_HOST_S9)(a0)
REG_S s10, (KVM_ARCH_HOST_S10)(a0)
REG_S s11, (KVM_ARCH_HOST_S11)(a0)
.endm
.macro SAVE_HOST_AND_RESTORE_GUEST_CSRS __resume_addr
/* Load Guest CSR values */
REG_L t0, (KVM_ARCH_GUEST_SSTATUS)(a0)
REG_L t1, (KVM_ARCH_GUEST_HSTATUS)(a0)
REG_L t2, (KVM_ARCH_GUEST_SCOUNTEREN)(a0)
la t4, .Lkvm_switch_return
REG_L t5, (KVM_ARCH_GUEST_SEPC)(a0)
la t1, \__resume_addr
REG_L t2, (KVM_ARCH_GUEST_SEPC)(a0)
/* Save Host and Restore Guest SSTATUS */
csrrw t0, CSR_SSTATUS, t0
/* Save Host and Restore Guest HSTATUS */
csrrw t1, CSR_HSTATUS, t1
/* Save Host and Restore Guest SCOUNTEREN */
csrrw t2, CSR_SCOUNTEREN, t2
/* Save Host STVEC and change it to return path */
csrrw t4, CSR_STVEC, t4
csrrw t1, CSR_STVEC, t1
/* Restore Guest SEPC */
csrw CSR_SEPC, t2
/* Save Host SSCRATCH and change it to struct kvm_vcpu_arch pointer */
csrrw t3, CSR_SSCRATCH, a0
/* Restore Guest SEPC */
csrw CSR_SEPC, t5
/* Store Host CSR values */
REG_S t0, (KVM_ARCH_HOST_SSTATUS)(a0)
REG_S t1, (KVM_ARCH_HOST_HSTATUS)(a0)
REG_S t2, (KVM_ARCH_HOST_SCOUNTEREN)(a0)
REG_S t1, (KVM_ARCH_HOST_STVEC)(a0)
REG_S t3, (KVM_ARCH_HOST_SSCRATCH)(a0)
REG_S t4, (KVM_ARCH_HOST_STVEC)(a0)
.endm
.macro RESTORE_GUEST_GPRS
/* Restore Guest GPRs (except A0) */
REG_L ra, (KVM_ARCH_GUEST_RA)(a0)
REG_L sp, (KVM_ARCH_GUEST_SP)(a0)
@ -107,13 +97,9 @@ SYM_FUNC_START(__kvm_riscv_switch_to)
/* Restore Guest A0 */
REG_L a0, (KVM_ARCH_GUEST_A0)(a0)
.endm
/* Resume Guest */
sret
/* Back to Host */
.align 2
.Lkvm_switch_return:
.macro SAVE_GUEST_GPRS
/* Swap Guest A0 with SSCRATCH */
csrrw a0, CSR_SSCRATCH, a0
@ -148,39 +134,33 @@ SYM_FUNC_START(__kvm_riscv_switch_to)
REG_S t4, (KVM_ARCH_GUEST_T4)(a0)
REG_S t5, (KVM_ARCH_GUEST_T5)(a0)
REG_S t6, (KVM_ARCH_GUEST_T6)(a0)
.endm
.macro SAVE_GUEST_AND_RESTORE_HOST_CSRS
/* Load Host CSR values */
REG_L t1, (KVM_ARCH_HOST_STVEC)(a0)
REG_L t2, (KVM_ARCH_HOST_SSCRATCH)(a0)
REG_L t3, (KVM_ARCH_HOST_SCOUNTEREN)(a0)
REG_L t4, (KVM_ARCH_HOST_HSTATUS)(a0)
REG_L t5, (KVM_ARCH_HOST_SSTATUS)(a0)
/* Save Guest SEPC */
csrr t0, CSR_SEPC
REG_L t0, (KVM_ARCH_HOST_STVEC)(a0)
REG_L t1, (KVM_ARCH_HOST_SSCRATCH)(a0)
REG_L t2, (KVM_ARCH_HOST_SSTATUS)(a0)
/* Save Guest A0 and Restore Host SSCRATCH */
csrrw t2, CSR_SSCRATCH, t2
csrrw t1, CSR_SSCRATCH, t1
/* Save Guest SEPC */
csrr t3, CSR_SEPC
/* Restore Host STVEC */
csrw CSR_STVEC, t1
/* Save Guest and Restore Host SCOUNTEREN */
csrrw t3, CSR_SCOUNTEREN, t3
/* Save Guest and Restore Host HSTATUS */
csrrw t4, CSR_HSTATUS, t4
csrw CSR_STVEC, t0
/* Save Guest and Restore Host SSTATUS */
csrrw t5, CSR_SSTATUS, t5
csrrw t2, CSR_SSTATUS, t2
/* Store Guest CSR values */
REG_S t0, (KVM_ARCH_GUEST_SEPC)(a0)
REG_S t2, (KVM_ARCH_GUEST_A0)(a0)
REG_S t3, (KVM_ARCH_GUEST_SCOUNTEREN)(a0)
REG_S t4, (KVM_ARCH_GUEST_HSTATUS)(a0)
REG_S t5, (KVM_ARCH_GUEST_SSTATUS)(a0)
REG_S t1, (KVM_ARCH_GUEST_A0)(a0)
REG_S t2, (KVM_ARCH_GUEST_SSTATUS)(a0)
REG_S t3, (KVM_ARCH_GUEST_SEPC)(a0)
.endm
.macro RESTORE_HOST_GPRS
/* Restore Host GPRs (except A0 and T0-T6) */
REG_L ra, (KVM_ARCH_HOST_RA)(a0)
REG_L sp, (KVM_ARCH_HOST_SP)(a0)
@ -205,11 +185,68 @@ SYM_FUNC_START(__kvm_riscv_switch_to)
REG_L s9, (KVM_ARCH_HOST_S9)(a0)
REG_L s10, (KVM_ARCH_HOST_S10)(a0)
REG_L s11, (KVM_ARCH_HOST_S11)(a0)
.endm
.text
.altmacro
.option norelax
/*
* Parameters:
* A0 <= Pointer to struct kvm_vcpu_arch
*/
SYM_FUNC_START(__kvm_riscv_switch_to)
SAVE_HOST_GPRS
SAVE_HOST_AND_RESTORE_GUEST_CSRS .Lkvm_switch_return
RESTORE_GUEST_GPRS
/* Resume Guest using SRET */
sret
/* Back to Host */
.align 2
.Lkvm_switch_return:
SAVE_GUEST_GPRS
SAVE_GUEST_AND_RESTORE_HOST_CSRS
RESTORE_HOST_GPRS
/* Return to C code */
ret
SYM_FUNC_END(__kvm_riscv_switch_to)
/*
* Parameters:
* A0 <= Pointer to struct kvm_vcpu_arch
* A1 <= SBI extension ID
* A2 <= SBI function ID
*/
SYM_FUNC_START(__kvm_riscv_nacl_switch_to)
SAVE_HOST_GPRS
SAVE_HOST_AND_RESTORE_GUEST_CSRS .Lkvm_nacl_switch_return
/* Resume Guest using SBI nested acceleration */
add a6, a2, zero
add a7, a1, zero
ecall
/* Back to Host */
.align 2
.Lkvm_nacl_switch_return:
SAVE_GUEST_GPRS
SAVE_GUEST_AND_RESTORE_HOST_CSRS
RESTORE_HOST_GPRS
/* Return to C code */
ret
SYM_FUNC_END(__kvm_riscv_nacl_switch_to)
SYM_CODE_START(__kvm_riscv_unpriv_trap)
/*
* We assume that faulting unpriv load/store instruction is

View File

@ -11,8 +11,8 @@
#include <linux/kvm_host.h>
#include <linux/uaccess.h>
#include <clocksource/timer-riscv.h>
#include <asm/csr.h>
#include <asm/delay.h>
#include <asm/kvm_nacl.h>
#include <asm/kvm_vcpu_timer.h>
static u64 kvm_riscv_current_cycles(struct kvm_guest_timer *gt)
@ -72,12 +72,12 @@ static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
static int kvm_riscv_vcpu_update_vstimecmp(struct kvm_vcpu *vcpu, u64 ncycles)
{
#if defined(CONFIG_32BIT)
csr_write(CSR_VSTIMECMP, ncycles & 0xFFFFFFFF);
csr_write(CSR_VSTIMECMPH, ncycles >> 32);
ncsr_write(CSR_VSTIMECMP, ncycles & 0xFFFFFFFF);
ncsr_write(CSR_VSTIMECMPH, ncycles >> 32);
#else
csr_write(CSR_VSTIMECMP, ncycles);
ncsr_write(CSR_VSTIMECMP, ncycles);
#endif
return 0;
return 0;
}
static int kvm_riscv_vcpu_update_hrtimer(struct kvm_vcpu *vcpu, u64 ncycles)
@ -289,10 +289,10 @@ static void kvm_riscv_vcpu_update_timedelta(struct kvm_vcpu *vcpu)
struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer;
#if defined(CONFIG_32BIT)
csr_write(CSR_HTIMEDELTA, (u32)(gt->time_delta));
csr_write(CSR_HTIMEDELTAH, (u32)(gt->time_delta >> 32));
ncsr_write(CSR_HTIMEDELTA, (u32)(gt->time_delta));
ncsr_write(CSR_HTIMEDELTAH, (u32)(gt->time_delta >> 32));
#else
csr_write(CSR_HTIMEDELTA, gt->time_delta);
ncsr_write(CSR_HTIMEDELTA, gt->time_delta);
#endif
}
@ -306,10 +306,10 @@ void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu)
return;
#if defined(CONFIG_32BIT)
csr_write(CSR_VSTIMECMP, (u32)t->next_cycles);
csr_write(CSR_VSTIMECMPH, (u32)(t->next_cycles >> 32));
ncsr_write(CSR_VSTIMECMP, (u32)t->next_cycles);
ncsr_write(CSR_VSTIMECMPH, (u32)(t->next_cycles >> 32));
#else
csr_write(CSR_VSTIMECMP, t->next_cycles);
ncsr_write(CSR_VSTIMECMP, t->next_cycles);
#endif
/* timer should be enabled for the remaining operations */
@ -327,10 +327,10 @@ void kvm_riscv_vcpu_timer_sync(struct kvm_vcpu *vcpu)
return;
#if defined(CONFIG_32BIT)
t->next_cycles = csr_read(CSR_VSTIMECMP);
t->next_cycles |= (u64)csr_read(CSR_VSTIMECMPH) << 32;
t->next_cycles = ncsr_read(CSR_VSTIMECMP);
t->next_cycles |= (u64)ncsr_read(CSR_VSTIMECMPH) << 32;
#else
t->next_cycles = csr_read(CSR_VSTIMECMP);
t->next_cycles = ncsr_read(CSR_VSTIMECMP);
#endif
}

View File

@ -356,6 +356,7 @@ struct kvm_s390_sie_block {
#define ECD_MEF 0x08000000
#define ECD_ETOKENF 0x02000000
#define ECD_ECC 0x00200000
#define ECD_HMAC 0x00004000
__u32 ecd; /* 0x01c8 */
__u8 reserved1cc[18]; /* 0x01cc */
__u64 pp; /* 0x01de */

View File

@ -469,7 +469,8 @@ struct kvm_s390_vm_cpu_subfunc {
__u8 kdsa[16]; /* with MSA9 */
__u8 sortl[32]; /* with STFLE.150 */
__u8 dfltcc[32]; /* with STFLE.151 */
__u8 reserved[1728];
__u8 pfcr[16]; /* with STFLE.201 */
__u8 reserved[1712];
};
#define KVM_S390_VM_CPU_PROCESSOR_UV_FEAT_GUEST 6

View File

@ -348,6 +348,16 @@ static inline int plo_test_bit(unsigned char nr)
return CC_TRANSFORM(cc) == 0;
}
static __always_inline void pfcr_query(u8 (*query)[16])
{
asm volatile(
" lghi 0,0\n"
" .insn rsy,0xeb0000000016,0,0,%[query]\n"
: [query] "=QS" (*query)
:
: "cc", "0");
}
static __always_inline void __sortl_query(u8 (*query)[32])
{
asm volatile(
@ -429,6 +439,9 @@ static void __init kvm_s390_cpu_feat_init(void)
if (test_facility(151)) /* DFLTCC */
__dfltcc_query(&kvm_s390_available_subfunc.dfltcc);
if (test_facility(201)) /* PFCR */
pfcr_query(&kvm_s390_available_subfunc.pfcr);
if (MACHINE_HAS_ESOP)
allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
/*
@ -799,6 +812,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
set_kvm_facility(kvm->arch.model.fac_mask, 192);
set_kvm_facility(kvm->arch.model.fac_list, 192);
}
if (test_facility(198)) {
set_kvm_facility(kvm->arch.model.fac_mask, 198);
set_kvm_facility(kvm->arch.model.fac_list, 198);
}
if (test_facility(199)) {
set_kvm_facility(kvm->arch.model.fac_mask, 199);
set_kvm_facility(kvm->arch.model.fac_list, 199);
}
r = 0;
} else
r = -EINVAL;
@ -1543,6 +1564,9 @@ static int kvm_s390_set_processor_subfunc(struct kvm *kvm,
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[1],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[2],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[3]);
VM_EVENT(kvm, 3, "GET: guest PFCR subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm_s390_available_subfunc.pfcr)[0],
((unsigned long *) &kvm_s390_available_subfunc.pfcr)[1]);
return 0;
}
@ -1757,6 +1781,9 @@ static int kvm_s390_get_processor_subfunc(struct kvm *kvm,
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[1],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[2],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[3]);
VM_EVENT(kvm, 3, "GET: guest PFCR subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm_s390_available_subfunc.pfcr)[0],
((unsigned long *) &kvm_s390_available_subfunc.pfcr)[1]);
return 0;
}
@ -1825,6 +1852,9 @@ static int kvm_s390_get_machine_subfunc(struct kvm *kvm,
((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[1],
((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[2],
((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[3]);
VM_EVENT(kvm, 3, "GET: host PFCR subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm_s390_available_subfunc.pfcr)[0],
((unsigned long *) &kvm_s390_available_subfunc.pfcr)[1]);
return 0;
}
@ -3769,6 +3799,13 @@ static bool kvm_has_pckmo_ecc(struct kvm *kvm)
}
static bool kvm_has_pckmo_hmac(struct kvm *kvm)
{
/* At least one HMAC subfunction must be present */
return kvm_has_pckmo_subfunc(kvm, 118) ||
kvm_has_pckmo_subfunc(kvm, 122);
}
static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
{
/*
@ -3781,7 +3818,7 @@ static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
vcpu->arch.sie_block->eca &= ~ECA_APIE;
vcpu->arch.sie_block->ecd &= ~ECD_ECC;
vcpu->arch.sie_block->ecd &= ~(ECD_ECC | ECD_HMAC);
if (vcpu->kvm->arch.crypto.apie)
vcpu->arch.sie_block->eca |= ECA_APIE;
@ -3789,9 +3826,11 @@ static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
/* Set up protected key support */
if (vcpu->kvm->arch.crypto.aes_kw) {
vcpu->arch.sie_block->ecb3 |= ECB3_AES;
/* ecc is also wrapped with AES key */
/* ecc/hmac is also wrapped with AES key */
if (kvm_has_pckmo_ecc(vcpu->kvm))
vcpu->arch.sie_block->ecd |= ECD_ECC;
if (kvm_has_pckmo_hmac(vcpu->kvm))
vcpu->arch.sie_block->ecd |= ECD_HMAC;
}
if (vcpu->kvm->arch.crypto.dea_kw)

View File

@ -335,7 +335,8 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
/* we may only allow it if enabled for guest 2 */
ecb3_flags = scb_o->ecb3 & vcpu->arch.sie_block->ecb3 &
(ECB3_AES | ECB3_DEA);
ecd_flags = scb_o->ecd & vcpu->arch.sie_block->ecd & ECD_ECC;
ecd_flags = scb_o->ecd & vcpu->arch.sie_block->ecd &
(ECD_ECC | ECD_HMAC);
if (!ecb3_flags && !ecd_flags)
goto end;
@ -661,7 +662,7 @@ static int pin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t *hpa)
struct page *page;
page = gfn_to_page(kvm, gpa_to_gfn(gpa));
if (is_error_page(page))
if (!page)
return -EINVAL;
*hpa = (hpa_t)page_to_phys(page) + (gpa & ~PAGE_MASK);
return 0;
@ -670,7 +671,7 @@ static int pin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t *hpa)
/* Unpins a page previously pinned via pin_guest_page, marking it as dirty. */
static void unpin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t hpa)
{
kvm_release_pfn_dirty(hpa >> PAGE_SHIFT);
kvm_release_page_dirty(pfn_to_page(hpa >> PAGE_SHIFT));
/* mark the page always as dirty for migration */
mark_page_dirty(kvm, gpa_to_gfn(gpa));
}

View File

@ -109,10 +109,12 @@ static struct facility_def facility_defs[] = {
15, /* AP Facilities Test */
156, /* etoken facility */
165, /* nnpa facility */
170, /* ineffective-nonconstrained-transaction facility */
193, /* bear enhancement facility */
194, /* rdp enhancement facility */
196, /* processor activity instrumentation facility */
197, /* processor activity instrumentation extension 1 */
201, /* concurrent-functions facility */
-1 /* END */
}
},

Some files were not shown because too many files have changed in this diff Show More