Pass @opaque to kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() to allow architecture specific code to
reference @opaque without having to stash it away in a temporary global
variable. This will enable x86 to separate its vendor specific callback
ops, which are passed via @opaque, into "init" and "runtime" ops without
having to stash away the "init" ops.
No functional change intended.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com> #s390
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
The arch specific implemenations are extremely similar, differing
only in whether the dirty log needs to be sync'd from hardware (x86)
and how the TLBs are flushed. Add new arch hooks to handle sync
and TLB flush; the sync will also be used for non-generic dirty log
support in a future patch (s390).
The ulterior motive for providing a common implementation is to
eliminate the dependency between arch and common code with respect to
the memslot referenced by the dirty log, i.e. to make it obvious in the
code that the validity of the memslot is guaranteed, as a future patch
will rework memslot handling such that id_to_memslot() can return NULL.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop the "const" attribute from @old in kvm_arch_commit_memory_region()
to allow arch specific code to free arch specific resources in the old
memslot without having to cast away the attribute. Freeing resources in
kvm_arch_commit_memory_region() paves the way for simplifying
kvm_free_memslot() by eliminating the last usage of its @dont param.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove kvm_arch_create_memslot() now that all arch implementations are
effectively nops. Removing kvm_arch_create_memslot() eliminates the
possibility for arch specific code to allocate memory prior to setting
a memslot, which sets the stage for simplifying kvm_free_memslot().
Cc: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fold kvm_mips_comparecount_func() into kvm_mips_comparecount_wakeup() to
eliminate the nondescript function name as well as its unnecessary cast
of a vcpu to "unsigned long" and back to a vcpu. Presumably func() was
used as a callback at some point during pre-upstream development, as
wakeup() is the only user of func() and has been the only user since
both with introduced by commit 669e846e6c ("KVM/MIPS32: MIPS arch
specific APIs for KVM").
Cc: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hoist kvm_mips_comparecount_wakeup() above its only user,
kvm_arch_vcpu_create() to fix a compilation error due to referencing an
undefined function.
Fixes: d11dfed5d7 ("KVM: MIPS: Move all vcpu init code into kvm_arch_vcpu_create()")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() now that all
arch specific implementations are nops.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fold init() into create() now that the two are called back-to-back by
common KVM code (kvm_vcpu_init() calls kvm_arch_vcpu_init() as its last
action, and kvm_vm_ioctl_create_vcpu() calls kvm_arch_vcpu_create()
immediately thereafter). Rinse and repeat for kvm_arch_vcpu_uninit()
and kvm_arch_vcpu_destroy(). This paves the way for removing
kvm_arch_vcpu_{un}init() entirely.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove kvm_arch_vcpu_setup() now that all arch specific implementations
are nops.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fold setup() into create() now that the two are called back-to-back by
common KVM code. This paves the way for removing kvm_arch_vcpu_setup().
Note, there is no unwind function associated with kvm_arch_vcpu_setup(),
i.e. no teardown path that also needs to be moved.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all architectures tightly couple vcpu allocation/free with the
mandatory calls to kvm_{un}init_vcpu(), move the sequences verbatim to
common KVM code.
Move both allocation and initialization in a single patch to eliminate
thrash in arch specific code. The bisection benefits of moving the two
pieces in separate patches is marginal at best, whereas the odds of
introducing a transient arch specific bug are non-zero.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add kvm_vcpu_destroy() and wire up all architectures to call the common
function instead of their arch specific implementation. The common
destruction function will be used by future patches to move allocation
and initialization of vCPUs to common KVM code, i.e. to free resources
that are allocated by arch agnostic code.
No functional change intended.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the call to kvm_vcpu_uninit() in kvm_arch_vcpu_destroy() down a few
lines so that it is invoked immediately prior to freeing the vCPU. This
paves the way for moving the uninit and free sequence to common KVM code
without an associated functional change.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a pre-allocation arch hook to handle checks that are currently done
by arch specific code prior to allocating the vCPU object. This paves
the way for moving the allocation to common KVM code.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the superfluous kvm_arch_vcpu_free() as it is no longer called
from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from
common code, i.e. choosing which function to whack is not completely
arbitrary.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For reasons unknown, MIPS configures the vCPU allocation cache but
allocates vCPUs via kzalloc(). Allocate from the vCPU cache in
preparation for moving vCPU allocation to common KVM code.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no need for this function as all arches have to implement
kvm_arch_create_vcpu_debugfs() no matter what. A #define symbol
let us actually simplify the code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a wrapper to invoke kvm_arch_check_processor_compat() so that the
boilerplate ugliness of checking virtualization support on all CPUs is
hidden from the arch specific code. x86's implementation in particular
is quite heinous, as it unnecessarily propagates the out-param pattern
into kvm_x86_ops.
While the x86 specific issue could be resolved solely by changing
kvm_x86_ops, make the change for all architectures as returning a value
directly is prettier and technically more robust, e.g. s390 doesn't set
the out param, which could lead to subtle breakage in the (highly
unlikely) scenario where the out-param was not pre-initialized by the
caller.
Opportunistically annotate svm_check_processor_compat() with __init.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
architectures. However, on s390x, the amount of usable CPUs is determined
during runtime - it is depending on the features of the machine the code
is running on. Since we are using the vcpu_id as an index into the SCA
structures that are defined by the hardware (see e.g. the sca_add_vcpu()
function), it is not only the amount of CPUs that is limited by the hard-
ware, but also the range of IDs that we can use.
Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
code into the architecture specific code, and on s390x we have to return
the same value here as for KVM_CAP_MAX_VCPUS.
This problem has been discovered with the kvm_create_max_vcpus selftest.
With this change applied, the selftest now passes on s390x, too.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190523164309.13345-9-thuth@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Introduce support for using MemoryMapIDs (MMIDs) as an alternative to
Address Space IDs (ASIDs). The major difference between the two is that
MMIDs are global - ie. an MMID uniquely identifies an address space
across all coherent CPUs. In contrast ASIDs are non-global per-CPU IDs,
wherein each address space is allocated a separate ASID for each CPU
upon which it is used. This global namespace allows a new GINVT
instruction be used to globally invalidate TLB entries associated with a
particular MMID across all coherent CPUs in the system, removing the
need for IPIs to invalidate entries with separate ASIDs on each CPU.
The allocation scheme used here is largely borrowed from arm64 (see
arch/arm64/mm/context.c). In essence we maintain a bitmap to track
available MMIDs, and MMIDs in active use at the time of a rollover to a
new MMID version are preserved in the new version. The allocation scheme
requires efficient 64 bit atomics in order to perform reasonably, so
this support depends upon CONFIG_GENERIC_ATOMIC64=n (ie. currently it
will only be included in MIPS64 kernels).
The first, and currently only, available CPU with support for MMIDs is
the MIPS I6500. This CPU supports 16 bit MMIDs, and so for now we cap
our MMIDs to 16 bits wide in order to prevent the bitmap growing to
absurd sizes if any future CPU does implement 32 bit MMIDs as the
architecture manuals suggest is recommended.
When MMIDs are in use we also make use of GINVT instruction which is
available due to the global nature of MMIDs. By executing a sequence of
GINVT & SYNC 0x14 instructions we can avoid the overhead of an IPI to
each remote CPU in many cases. One complication is that GINVT will
invalidate wired entries (in all cases apart from type 0, which targets
the entire TLB). In order to avoid GINVT invalidating any wired TLB
entries we set up, we make sure to create those entries using a reserved
MMID (0) that we never associate with any address space.
Also of note is that KVM will require further work in order to support
MMIDs & GINVT, since KVM is involved in allocating IDs for guests & in
configuring the MMU. That work is not part of this patch, so for now
when MMIDs are in use KVM is disabled.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's
pointer argument will always be false on exit, because no TLB flush is needed
until the manual re-protection operation. Rename it from "is_dirty" to "flush",
which more accurately tells the caller what they have to do with it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.
The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>
@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>
[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ARM: lazy context-switching of FPSIMD registers on arm64, "split"
regions for vGIC redistributor
* s390: cleanups for nested, clock handling, crypto, storage keys and
control register bits
* x86: many bugfixes, implement more Hyper-V super powers,
implement lapic_timer_advance_ns even when the LAPIC timer
is emulated using the processor's VMX preemption timer. Two
security-related bugfixes at the top of the branch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJbH8Z/AAoJEL/70l94x66DF+UIAJeOuTp6LGasT/9uAb2OovaN
+5kGmOPGFwkTcmg8BQHI2fXT4vhxMXWPFcQnyig9eXJVxhuwluXDOH4P9IMay0yw
VDCBsWRdMvZDQad2hn6Z5zR4Jx01XrSaG/KqvXbbDKDCy96mWG7SYAY2m3ZwmeQi
3Pa3O3BTijr7hBYnMhdXGkSn4ZyU8uPaAgIJ8795YKeOJ2JmioGYk6fj6y2WCxA3
ztJymBjTmIoZ/F8bjuVouIyP64xH4q9roAyw4rpu7vnbWGqx1fjPYJoB8yddluWF
JqCPsPzhKDO7mjZJy+lfaxIlzz2BN7tKBNCm88s5GefGXgZwk3ByAq/0GQ2M3rk=
=H5zI
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small update for KVM:
ARM:
- lazy context-switching of FPSIMD registers on arm64
- "split" regions for vGIC redistributor
s390:
- cleanups for nested
- clock handling
- crypto
- storage keys
- control register bits
x86:
- many bugfixes
- implement more Hyper-V super powers
- implement lapic_timer_advance_ns even when the LAPIC timer is
emulated using the processor's VMX preemption timer.
- two security-related bugfixes at the top of the branch"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits)
kvm: fix typo in flag name
kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access
KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system
KVM: x86: introduce linear_{read,write}_system
kvm: nVMX: Enforce cpl=0 for VMX instructions
kvm: nVMX: Add support for "VMWRITE to any supported field"
kvm: nVMX: Restrict VMX capability MSR changes
KVM: VMX: Optimize tscdeadline timer latency
KVM: docs: nVMX: Remove known limitations as they do not exist now
KVM: docs: mmu: KVM support exposing SLAT to guests
kvm: no need to check return value of debugfs_create functions
kvm: Make VM ioctl do valloc for some archs
kvm: Change return type to vm_fault_t
KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008
kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation
KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation
KVM: introduce kvm_make_vcpus_request_mask() API
KVM: x86: hyperv: do rep check for each hypercall separately
...
Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.
commit 1c8f422059 ("mm: change return type to vm_fault_t")
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Trivial fix to spelling mistake in debugfs_entries text.
Fixes: 669e846e6c ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kernel-janitors@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: James Hogan <jhogan@kernel.org>
After the vcpu_load/vcpu_put pushdown, the handling of asynchronous VCPU
ioctl is already much clearer in that it is obvious that they bypass
vcpu_load and vcpu_put.
However, it is still not perfect in that the different state of the VCPU
mutex is still hidden in the caller. Separate those ioctls into a new
function kvm_arch_vcpu_async_ioctl that returns -ENOIOCTLCMD for more
"traditional" synchronous ioctls.
Cc: James Hogan <jhogan@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Suggested-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the calls to vcpu_load() and vcpu_put() in to the architecture
specific implementations of kvm_arch_vcpu_ioctl() which dispatches
further architecture-specific ioctls on to other functions.
Some architectures support asynchronous vcpu ioctls which cannot call
vcpu_load() or take the vcpu->mutex, because that would prevent
concurrent execution with a running VCPU, which is the intended purpose
of these ioctls, for example because they inject interrupts.
We repeat the separate checks for these specifics in the architecture
code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and
calling vcpu_load for these ioctls.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_regs().
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_regs().
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_run().
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 parts
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[Rebased. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".
This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.
Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For example, the following could occur, making us miss a wakeup:
CPU0 CPU1
kvm_vcpu_block kvm_mips_comparecount_func
[L] swait_active(&vcpu->wq)
[S] prepare_to_swait(&vcpu->wq)
[L] if (!kvm_vcpu_has_pending_timer(vcpu))
schedule() [S] queue_timer_int(vcpu)
Ensure that the swait_active() check is not hoisted over the interrupt.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If a vcpu exits due to request a user mode spinlock, then
the spinlock-holder may be preempted in user mode or kernel mode.
(Note that not all architectures trap spin loops in user mode,
only AMD x86 and ARM/ARM64 currently do).
But if a vcpu exits in kernel mode, then the holder must be
preempted in kernel mode, so we should choose a vcpu in kernel mode
as a more likely candidate for the lock holder.
This introduces kvm_arch_vcpu_in_kernel() to decide whether the
vcpu is in kernel-mode when it's preempted. kvm_vcpu_on_spin's
new argument says the same of the spinning VCPU.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove code from architecture files that can be moved to virt/kvm, since there
is already common code for coalesced MMIO.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[Removed a pointless 'break' after 'return'.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Create a trace event for guest mode changes, and enable VZ's
GuestCtl0.MC bit after the trace event is enabled to trap all guest mode
changes.
The MC bit causes Guest Hardware Field Change (GHFC) exceptions whenever
a guest mode change occurs (such as an exception entry or return from
exception), so we need to handle this exception now. The MC bit is only
enabled when restoring register state, so enabling the trace event won't
take immediate effect.
Tracing guest mode changes can be particularly handy when trying to work
out what a guest OS gets up to before something goes wrong, especially
if the problem occurs as a result of some previous guest userland
exception which would otherwise be invisible in the trace.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Transfer timer state to the VZ guest context (CP0_GTOffset & guest
CP0_Count) when entering guest mode, enabling direct guest access to it,
and transfer back to soft timer when saving guest register state.
This usually allows guest code to directly read CP0_Count (via MFC0 and
RDHWR) and read/write CP0_Compare, without trapping to the hypervisor
for it to emulate the guest timer. Writing to CP0_Count or CP0_Cause.DC
is much less common and still triggers a hypervisor GPSI exception, in
which case the timer state is transferred back to an hrtimer before
emulating the write.
We are careful to prevent small amounts of drift from building up due to
undeterministic time intervals between reading of the ktime and reading
of CP0_Count. Some drift is expected however, since the system
clocksource may use a different timer to the local CP0_Count timer used
by VZ. This is permitted to prevent guest CP0_Count from appearing to go
backwards.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add the main support for the MIPS Virtualization ASE (A.K.A. VZ) to MIPS
KVM. The bulk of this work is in vz.c, with various new state and
definitions elsewhere.
Enough is implemented to be able to run on a minimal VZ core. Further
patches will fill out support for guest features which are optional or
can be disabled.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
The general guest exit handler needs a few tweaks for VZ compared to
trap & emulate, which for now are made directly depending on
CONFIG_KVM_MIPS_VZ:
- There is no need to re-enable the hardware page table walker (HTW), as
it can be left enabled during guest mode operation with VZ.
- There is no need to perform a privilege check, as any guest privilege
violations should have already been detected by the hardware and
triggered the appropriate guest exception.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Update MIPS KVM entry code to support VZ:
- We need to set GuestCtl0.GM while in guest mode.
- For cores supporting GuestID, we need to set the root GuestID to
match the main GuestID while in guest mode so that the root TLB
refill handler writes the correct GuestID into the TLB.
- For cores without GuestID where the root ASID dealiases RVA/GPA
mappings, we need to load that ASID from the gpa_mm rather than the
per-VCPU guest_kernel_mm or guest_user_mm, since the root TLB maps
guest physical addresses. We also need to restore the normal process
ASID on exit.
- The normal linux process pgd needs restoring on exit, as we can't
leave the GPA mappings active for kernel code.
- GuestCtl0 needs saving on exit for the GExcCode field, as it may be
clobbered if a preemption occurs.
We also need to move the TLB refill handler to the XTLB vector at offset
0x80 on 64-bit VZ kernels, as hardware will use Root.Status.KX to
determine whether a TLB refill or XTLB Refill exception is to be taken
on a root TLB miss from guest mode, and KX needs to be set for kernel
code to be able to access the 64-bit segments.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Abstract the MIPS KVM guest CP0 register access macros into inline
functions which are generated by macros. This allows them to be
generated differently for VZ, where they will usually need to access the
hardware guest CP0 context rather than the saved values in RAM.
Accessors for each individual register are generated using these macros:
- __BUILD_KVM_*_SW() for registers which are not present in the VZ
hardware guest context, so kvm_{read,write}_c0_guest_##name() will
access the saved value in RAM regardless of whether VZ is enabled.
- __BUILD_KVM_*_HW() for registers which are present in the VZ hardware
guest context, so kvm_{read,write}_c0_guest_##name() will access the
hardware register when VZ is enabled.
These build the underlying accessors using further macros:
- __BUILD_KVM_*_SAVED() builds e.g. kvm_{read,write}_sw_gc0_##name()
functions for accessing the saved versions of the registers in RAM.
This is used for implementing the common
kvm_{read,write}_c0_guest_##name() accessors with T&E where registers
are always stored in RAM, but are also available with VZ HW registers
to allow them to be accessed while saved.
- __BUILD_KVM_*_VZ() builds e.g. kvm_{read,write}_vz_gc0_##name()
functions for accessing the VZ hardware guest context registers
directly. This is used for implementing the common
kvm_{read,write}_c0_guest_##name() accessors with VZ.
- __BUILD_KVM_*_WRAP() builds wrappers with different names, which
allows the common kvm_{read,write}_c0_guest_##name() functions to be
implemented using the VZ accessors while still having the SAVED
accessors available too.
- __BUILD_KVM_SAVE_VZ() builds functions for saving and restoring VZ
hardware guest context register state to RAM, improving conciseness
of VZ context saving and restoring.
Similar macros exist for generating modifiers (set, clear, change),
either with a normal unlocked read/modify/write, or using atomic LL/SC
sequences.
These changes change the types of 32-bit registers to u32 instead of
unsigned long, which requires some changes to printk() functions in MIPS
KVM.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add a callback for MIPS KVM implementations to handle the VZ guest
exit exception. Currently the trap & emulate implementation contains a
stub which reports an internal error, but the callback will be used
properly by the VZ implementation.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add an implementation callback for the kvm_arch_hardware_enable() and
kvm_arch_hardware_disable() architecture functions, with simple stubs
for trap & emulate. This is in preparation for VZ which will make use of
them.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add an implementation callback for checking presence of KVM extensions.
This allows implementation specific extensions to be provided without
ifdefs in mips.c.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Currently the software emulated timer is initialised to a frequency of
100MHz by kvm_mips_init_count(), but this isn't suitable for VZ where
the frequency of the guest timer matches that of the host.
Add a count_hz argument so the caller can specify the default frequency,
and move the call from kvm_arch_vcpu_create() to the implementation
specific vcpu_setup() callback, so that VZ can specify a different
frequency.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add new KVM_CAP_MIPS_VZ and KVM_CAP_MIPS_TE capabilities, and in order
to allow MIPS KVM to support VZ without confusing old users (which
expect the trap & emulate implementation), define and start checking
KVM_CREATE_VM type codes.
The codes available are:
- KVM_VM_MIPS_TE = 0
This is the current value expected from the user, and will create a
VM using trap & emulate in user mode, confined to the user mode
address space. This may in future become unavailable if the kernel is
only configured to support VZ, in which case the EINVAL error will be
returned and KVM_CAP_MIPS_TE won't be available even though
KVM_CAP_MIPS_VZ is.
- KVM_VM_MIPS_VZ = 1
This can be provided when the KVM_CAP_MIPS_VZ capability is available
to create a VM using VZ, with a fully virtualized guest virtual
address space. If VZ support is unavailable in the kernel, the EINVAL
error will be returned (although old kernels without the
KVM_CAP_MIPS_VZ capability may well succeed and create a trap &
emulate VM).
This is designed to allow the desired implementation (T&E vs VZ) to be
potentially chosen at runtime rather than being fixed in the kernel
configuration.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Extend MIPS KVM stats counters and kvm_transition trace event codes to
cover hypervisor exceptions, which have their own GExcCode field in
CP0_GuestCtl0 with up to 32 hypervisor exception cause codes.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Update the implementation of kvm_lose_fpu() for VZ, where there is no
need to enable the FPU/MSA in the root context if the FPU/MSA state is
loaded but disabled in the guest context.
The trap & emulate implementation needs to disable FPU/MSA in the root
context when the guest disables them in order to catch the COP1 unusable
or MSA disabled exception when they're used and pass it on to the guest.
For VZ however as long as the context is loaded and enabled in the root
context, the guest can enable and disable it in the guest context
without the hypervisor having to do much, and will take guest exceptions
without hypervisor intervention if used without being enabled in the
guest context.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Fix up affected files that include this signal functionality via sched.h.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
a VCPU out of KVM_RUN through a POSIX signal. A signal is attached
to a dummy signal handler; by blocking the signal outside KVM_RUN and
unblocking it inside, this possible race is closed:
VCPU thread service thread
--------------------------------------------------------------
check flag
set flag
raise signal
(signal handler does nothing)
KVM_RUN
However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
tsk->sighand->siglock on every KVM_RUN. This lock is often on a
remote NUMA node, because it is on the node of a thread's creator.
Taking this lock can be very expensive if there are many userspace
exits (as is the case for SMP Windows VMs without Hyper-V reference
time counter).
As an alternative, we can put the flag directly in kvm_run so that
KVM can see it:
VCPU thread service thread
--------------------------------------------------------------
raise signal
signal handler
set run->immediate_exit
KVM_RUN
check run->immediate_exit
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Increase the maximum number of MIPS KVM VCPUs to 8, and implement the
KVM_CAP_NR_VCPUS and KVM_CAP_MAX_CPUS capabilities which expose the
recommended and maximum number of VCPUs to userland. The previous
maximum of 1 didn't allow for any form of SMP guests.
We calculate the values similarly to ARM, recommending as many VCPUs as
there are CPUs online in the system. This will allow userland to know
how many VCPUs it is possible to create.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Access to various CP0 registers via the KVM register access API needs to
be implementation specific to allow restrictions to be made on changes,
for example when VZ guest registers aren't present, so move them all
into trap_emul.c in preparation for VZ.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that load/store faults due to read only memory regions are treated
as MMIO accesses it is safe to claim support for read only memory
regions (KVM_CAP_READONLY_MEM).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement the SYNC_MMU capability for KVM MIPS, allowing changes in the
underlying user host virtual address (HVA) mappings to be promptly
reflected in the corresponding guest physical address (GPA) mappings.
This allows for several features to work with guest RAM which require
mappings to be altered or protected, such as copy-on-write, KSM (Kernel
Samepage Merging), idle page tracking, memory swapping, and guest memory
ballooning.
There are two main aspects of this change, described below.
The KVM MMU notifier architecture callbacks are implemented so we can be
notified of changes in the HVA mappings. These arrange for the guest
physical address (GPA) page tables to be modified and possibly for
derived mappings (GVA page tables and TLBs) to be flushed.
- kvm_unmap_hva[_range]() - These deal with HVA mappings being removed,
for example before a copy-on-write takes place, which requires the
corresponding GPA page table mappings to be removed too.
- kvm_set_spte_hva() - These update a GPA page table entry to match the
new HVA entry, but must be careful to respect KVM specific
configuration such as not dirtying a clean guest page which is dirty
to the host, and write protecting writable pages in read only
memslots (which will soon be supported).
- kvm[_test]_age_hva() - These update GPA page table entries to be old
(invalid) so that access can be tracked, making them young again.
The GPA page fault handling (kvm_mips_map_page) is updated to use
gfn_to_pfn_prot() (which may provide read-only pages), to handle
asynchronous page table invalidation from MMU notifier callbacks, and to
handle more cases in the fast path.
- mmu_notifier_seq is used to detect asynchronous page table
invalidations while we're holding a pfn from gfn_to_pfn_prot()
outside of kvm->mmu_lock, retrying if invalidations have taken place,
e.g. a COW or a KSM page merge.
- The fast path (_kvm_mips_map_page_fast) now handles marking old pages
as young / accessed, and disallowing dirtying of clean pages that
aren't actually writable (e.g. shared pages that should COW, and
read-only memory regions when they are enabled in a future patch).
- Due to the use of MMU notifications we no longer need to keep the
page references after we've updated the GPA page tables.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
When an existing memory region has dirty page logging enabled, make the
entire slot clean (read only) so that writes will immediately start
logging dirty pages (once the dirty bit is transferred from GPA to GVA
page tables in an upcoming patch).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
MIPS hasn't up to this point properly supported dirty page logging, as
pages in slots with dirty logging enabled aren't made clean, and tlbmod
exceptions from writes to clean pages have been assumed to be due to
guest TLB protection and unconditionally passed to the guest.
Use the generic dirty logging helper kvm_get_dirty_log_protect() to
properly implement kvm_vm_ioctl_get_dirty_log(), similar to how ARM
does. This uses xchg to clear the dirty bits when reading them, rather
than wiping them out afterwards with a memset, which would potentially
wipe recently set bits that weren't caught by kvm_get_dirty_log(). It
also makes the pages clean again using the
kvm_arch_mmu_enable_log_dirty_pt_masked() architecture callback so that
further writes after the shadow memslot is flushed will trigger tlbmod
exceptions and dirty handling.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement the kvm_arch_flush_shadow_all() and
kvm_arch_flush_shadow_memslot() KVM functions for MIPS to allow guest
physical mappings to be safely changed.
The general MIPS KVM code takes care of flushing of GPA page table
entries. kvm_arch_flush_shadow_all() flushes the whole GPA page table,
and is always called on the cleanup path so there is no need to acquire
the kvm->mmu_lock. kvm_arch_flush_shadow_memslot() flushes only the
range of mappings in the GPA page table corresponding to the slot being
flushed, and happens when memory regions are moved or deleted.
MIPS KVM implementation callbacks are added for handling the
implementation specific flushing of mappings derived from the GPA page
tables. These are implemented for trap_emul.c using
kvm_flush_remote_tlbs() which should now be functional, and will flush
the per-VCPU GVA page tables and ASIDS synchronously (before next
entering guest mode or directly accessing GVA space).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Keep the vcpu->mode and vcpu->cpu variables up to date so that
kvm_make_all_cpus_request() has a chance of functioning correctly. This
will soon need to be used for kvm_flush_remote_tlbs().
We can easily update vcpu->cpu when the VCPU context is loaded or saved,
which will happen when accessing guest context and when the guest is
scheduled in and out.
We need to be a little careful with vcpu->mode though, as we will in
future be checking for outstanding VCPU requests, and this must be done
after the value of IN_GUEST_MODE in vcpu->mode is visible to other CPUs.
Otherwise the other CPU could fail to trigger an IPI to wait for
completion dispite the VCPU request not being seen.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Current guest physical memory is mapped to host physical addresses using
a single linear array (guest_pmap of length guest_pmap_npages). This was
only really meant to be temporary, and isn't sparse, so its wasteful of
memory. A small amount of RAM at GPA 0 and a small boot exception vector
at GPA 0x1fc00000 cannot be represented without a full 128KiB guest_pmap
allocation (MIPS32 with 16KiB pages), which is one reason why QEMU
currently runs its boot code at the top of RAM instead of the usual boot
exception vector address.
Instead use the existing infrastructure for host virtual page table
management to allocate a page table for guest physical memory too. This
should be sufficient for now, assuming the size of physical memory
doesn't exceed the size of virtual memory. It may need extending in
future to handle XPA (eXtended Physical Addressing) in 32-bit guests, as
supported by VZ guests on P5600.
Some of this code is based loosely on Cavium's VZ KVM implementation.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
When exiting from the guest, store the values of the CP0_BadInstr and
CP0_BadInstrP registers if they exist, which contain the encodings of
the instructions which caused the last synchronous exception.
When the instruction is needed for emulation, kvm_get_badinstr() and
kvm_get_badinstrp() are used instead of calling kvm_get_inst() directly,
to decide whether to read the saved CP0_BadInstr/CP0_BadInstrP registers
(if they exist), or read the instruction from memory (if not).
The use of these registers should be more robust than using
kvm_get_inst(), as it actually gives the instruction encoding seen by
the hardware rather than relying on user accessors after the fact, which
can be fooled by incoherent icache or a racing code modification. It
will also work with VZ, where the guest virtual memory isn't directly
accessible by the host with user accessors.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Currently kvm_get_inst() returns KVM_INVALID_INST in the event of a
fault reading the guest instruction. This has the rather arbitrary magic
value 0xdeadbeef. This API isn't very robust, and in fact 0xdeadbeef is
a valid MIPS64 instruction encoding, namely "ld t1,-16657(s5)".
Therefore change the kvm_get_inst() API to return 0 or -EFAULT, and to
return the instruction via a u32 *out argument. We can then drop the
KVM_INVALID_INST definition entirely.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that the commpage doesn't use wired TLB entries, the per-CPU
vm_init() callback is the only work done by kvm_mips_init_vm_percpu().
The trap & emulate implementation doesn't actually need to do anything
from vm_init(), and the future VZ implementation would be better served
by a kvm_arch_hardware_enable callback anyway.
Therefore drop the vm_init() callback entirely, allowing the
kvm_mips_init_vm_percpu() function to also be dropped, along with the
kvm_mips_instance atomic counter.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that we have GVA page tables and an optimised TLB refill handler in
place, convert the handling of commpage faults from the guest kernel to
fill the GVA page table and invalidate the TLB entry, rather than
filling the wired TLB entry directly.
For simplicity we no longer use a wired entry for the commpage (refill
should be much cheaper with the fast-path handler anyway). Since we
don't need to manipulate the TLB directly any longer, move the function
from tlb.c to mmu.c. This puts it closer to the similar functions
handling KSeg0 and TLB mapped page faults from the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement invalidation of specific pairs of GVA page table entries in
one or both of the GVA page tables. This is used when existing mappings
are replaced in the guest TLB by emulated TLBWI/TLBWR instructions. Due
to the sharing of page tables in the host kernel range, we should be
careful not to allow host pages to be invalidated.
Add a helper kvm_mips_walk_pgd() which can be used when walking of
either GPA (future patches) or GVA page tables is needed, optionally
with allocation of page tables along the way when they don't exist.
GPA page table walking will need to be protected by the kvm->mmu_lock,
so we also add a small MMU page cache in each KVM VCPU, like that found
for other architectures but smaller. This allows enough pages to be
pre-allocated to handle a single fault without holding the lock,
allowing the helper to run with the lock held without having to handle
allocation failures.
Using the same mechanism for GVA allows the same code to be used, and
allows it to use the same cache of allocated pages if the GPA walk
didn't need to allocate any new tables.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use functions from the general MIPS TLB exception vector generation code
(tlbex.c) to construct a fast path TLB refill handler similar to the
general one, but cut down and capable of preserving K0 and K1.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Allocate GVA -> HPA page tables for guest kernel and guest user mode on
each VCPU, to allow for fast path TLB refill handling to be added later.
In the process kvm_arch_vcpu_init() needs updating to pass on any error
from the vcpu_init() callback.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Wire up a vcpu uninit implementation callback. This will be used for the
clean up of GVA->HPA page tables.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add implementation callbacks for entering the guest (vcpu_run()) and
reentering the guest (vcpu_reenter()), allowing implementation specific
operations to be performed before entering the guest or after returning
to the host without cluttering kvm_arch_vcpu_ioctl_run().
This allows the T&E specific lazy user GVA flush to be moved into
trap_emul.c, along with disabling of the HTW. We also move
kvm_mips_deliver_interrupts() as VZ will need to restore the guest timer
state prior to delivering interrupts.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
The kvm_vcpu_arch structure contains both mm_structs for allocating MMU
contexts (primarily the ASID) but it also copies the resulting ASIDs
into guest_{user,kernel}_asid[] arrays which are referenced from uasm
generated code.
This duplication doesn't seem to serve any purpose, and it gets in the
way of generalising the ASID handling across guest kernel/user modes, so
lets just extract the ASID straight out of the mm_struct on demand, and
in fact there are convenient cpu_context() and cpu_asid() macros for
doing so.
To reduce the verbosity of this code we do also add kern_mm and user_mm
local variables where the kernel and user mm_structs are used.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
MIPS incompletely implements the KVM_NMI ioctl to supposedly perform a
CPU reset, but all it actually does is invalidate the ASIDs. It doesn't
expose the KVM_CAP_USER_NMI capability which is supposed to indicate the
presence of the KVM_NMI ioctl, and no user software actually uses it on
MIPS.
Since this is dead code that would technically need updating for GVA
page table handling in upcoming patches, remove it now. If we wanted to
implement NMI injection later it can always be done properly along with
the KVM_CAP_USER_NMI capability, and if we wanted to implement a proper
CPU reset it would be better done with a separate ioctl.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
* Return directly after a call of the function "copy_from_user" failed
in a case block.
* Delete the jump label "out" which became unnecessary with
this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Flush the KVM entry code from the icache on all CPUs, not just the one
that built the entry code.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.16.x-
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
kvm_mips_check_asids() runs before entering the guest and performs lazy
regeneration of host ASID for guest usermode, using last_user_gasid to
track the last guest ASID in the VCPU that was used by guest usermode on
any host CPU.
last_user_gasid is reset after performing the lazy ASID regeneration on
the current CPU, and by kvm_arch_vcpu_load() if the host ASID for guest
usermode is regenerated due to staleness (to cancel outstanding lazy
ASID regenerations). Unfortunately neither case handles SMP hosts
correctly:
- When the lazy ASID regeneration is performed it should apply to all
CPUs (as last_user_gasid does), so reset the ASID on other CPUs to
zero to trigger regeneration when the VCPU is next loaded on those
CPUs.
- When the ASID is found to be stale on the current CPU, we should not
cancel lazy ASID regenerations globally, so drop the reset of
last_user_gasid altogether here.
Both cases would require a guest ASID change and two host CPU migrations
(and in the latter case one of the CPUs to start a new ASID cycle)
before guest usermode could potentially access stale user pages from a
previously running ASID in the same VCPU.
Fixes: 25b08c7fb0 ("KVM: MIPS: Invalidate TLB by regenerating ASIDs")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MIPS KVM uses user memory accessors but mips.c doesn't directly include
uaccess.h, so include it now.
This wasn't too much of a problem before v4.9-rc1 as asm/module.h
included asm/uaccess.h, however since commit 29abfbd9cb ("mips:
separate extable.h, switch module.h to it") this is no longer the case.
This resulted in build failures when trace points were disabled, as
trace/define_trace.h includes trace/trace_events.h only ifdef
TRACEPOINTS_ENABLED, which goes on to include asm/uaccess.h via a couple
of other headers.
Fixes: 29abfbd9cb ("mips: separate extable.h, switch module.h to it")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Invalidate host TLB mappings when the guest ASID is changed by
regenerating ASIDs, rather than flushing the entire host TLB except
entries in the guest KSeg0 range.
For the guest kernel mode ASID we regenerate on the spot when the guest
ASID is changed, as that will always take place while the guest is in
kernel mode.
However when the guest invalidates TLB entries the ASID will often by
changed temporarily as part of writing EntryHi without the guest
returning to user mode in between. We therefore regenerate the user mode
ASID lazily before entering the guest in user mode, if and only if the
guest ASID has actually changed since the last guest user mode entry.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Two stubs are added:
o kvm_arch_has_vcpu_debugfs(): must return true if the arch
supports creating debugfs entries in the vcpu debugfs dir
(which will be implemented by the next commit)
o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
entries in the vcpu debugfs dir
For x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fail if the address of the allocated exception base doesn't fit into the
CP0_EBase register. This can happen on MIPS64 if CP0_EBase.WG isn't
implemented but RAM is available outside of the range of KSeg0.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MIPSr6 doesn't have lo/hi registers, so don't bother saving or
restoring them, and don't expose them to userland with the KVM ioctl
interface either.
In fact the lo/hi registers aren't callee saved in the MIPS ABIs anyway,
so there is no need to preserve the host lo/hi values at all when
transitioning to and from the guest (which happens via a function call).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a relative branch to get from the individual exception vectors to
the common guest exit handler, rather than loading the address of the
exit handler and jumping to it.
This is made easier due to the fact we are now generating the entry code
dynamically. This will also allow the exception code to be further
reduced in future patches.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Scratch cop0 registers are needed by KVM to be able to save/restore all
the GPRs, including k0/k1, and for storing the VCPU pointer. However no
registers are universally suitable for these purposes, so the decision
should be made at runtime.
Until now, we've used DDATA_LO to store the VCPU pointer, and ErrorEPC
as a temporary. It could be argued that this is abuse of those
registers, and DDATA_LO is known not to be usable on certain
implementations (Cavium Octeon). If KScratch registers are present, use
them instead.
We save & restore the temporary register in addition to the VCPU pointer
register when using a KScratch register for it, as it may be used for
normal host TLB handling too.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dump the generated entry code with pr_debug(), similar to how it is done
in tlbex.c, so it can be more easily debugged.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert the whole of locore.S (assembly to enter guest and handle
exception entry) to be generated dynamically with uasm. This is done
with minimal changes to the resulting code.
The main changes are:
- Some constants are generated by uasm using LUI+ADDIU instead of
LUI+ORI.
- Loading of lo and hi are swapped around in vcpu_run but not when
resuming the guest after an exit. Both bits of logic are now generated
by the same code.
- Register MOVEs in uasm use different ADDU operand ordering to GNU as,
putting zero register into rs instead of rt.
- The JALR.HB to call the C exit handler is switched to JALR, since the
hazard barrier would appear to be unnecessary.
This will allow further optimisation in the future to dynamically handle
the capabilities of the CPU.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the functions from context_tracking.h directly.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow up to 6 KVM guest KScratch registers to be enabled and accessed
via the KVM guest register API and from the guest itself (the fallback
reading and writing of commpage registers is sufficient for KScratch
registers to work as expected).
User mode can expose the registers by setting the appropriate bits of
the guest Config4.KScrExist field. KScratch registers that aren't usable
won't be writeable via the KVM Ioctl API.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make KVM_GET_REG_LIST list FPU & MSA registers. Specifically we list all
32 vector registers when MSA can be enabled, 32 single-precision FP
registers when FPU can be enabled, and either 16 or 32 double-precision
FP registers when FPU can be enabled depending on whether FR mode is
supported (which provides 32 doubles instead of 16 even doubles).
Note, these registers may still be inaccessible depending on the current
FP mode of the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make the implementation of KVM_GET_REG_LIST more dynamic so that only
the subset of registers actually available can be exposed to user mode.
This is important for VZ where some of the guest register state may not
be possible to prevent the guest from accessing, therefore the user
process may need to be aware of the state even if it doesn't understand
what the state is for.
This also allows different MIPS KVM implementations to provide different
registers to one another, by way of new num_regs(vcpu) and
copy_reg_indices(vcpu, indices) callback functions, currently just
stubbed for trap & emulate.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pass all unrecognised register IDs through to the set_one_reg() and
get_one_reg() callbacks, not just select ones. This allows
implementation specific registers to be more easily added without having
to modify arch/mips/kvm/mips.c.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a few trace events for entering and coming out of guest mode, as well
as re-entering it from a guest exit exception.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up the MIPS kvm_exit trace event so that the exit reasons are
specified in a trace friendly way (via __print_symbolic), and so that
the exit reasons that derive straight from Cause.ExcCode values map
directly, allowing a single trace_kvm_exit() call to replace a bunch of
individual ones.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a MIPS specific trace event for auxiliary context operations
(notably FPU and MSA). Unfortunately the generic kvm_fpu trace event
isn't flexible enough to handle the range of interesting things that can
happen with FPU and MSA context.
The type of state being operated on is traced:
- FPU: Just the FPU registers.
- MSA: Just the upper half of the MSA vector registers (low half already
loaded with FPU state).
- FPU & MSA: Full MSA vector state (includes FPU state).
As is the type of operation:
- Restore: State was enabled and restored.
- Save: State was saved and disabled.
- Enable: State was enabled (already loaded).
- Disable: State was disabled (kept loaded).
- Discard: State was discarded and disabled.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
[Fix remaining occurrence of "fpu_msa", change to "aux". - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename fpu_inuse and the related definitions to aux_inuse so it can be
used for lazy context management of other auxiliary processor state too,
such as VZ guest timer, watchpoints and performance counters.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The host kernel's exception vector base address is currently saved in
the VCPU structure at creation time, and restored on a guest exit.
However it doesn't change and can already be easily accessed from the
'ebase' variable (arch/mips/kernel/traps.c), so drop the host_ebase
member of kvm_vcpu_arch, export the 'ebase' variable to modules and load
from there instead.
This does result in a single extra instruction (lui) on the guest exit
path, but simplifies the code a bit and removes the redundant storage of
the host exception base address.
Credit for the idea goes to Cavium's VZ KVM implementation.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Several KVM module functions are indirected so that they can be accessed
from tlb.c which is statically built into the kernel. This is no longer
necessary as the relevant bits of code have moved into mmu.c which is
part of the KVM module, so drop the indirections.
Note: is_error_pfn() is defined inline in kvm_host.h, so didn't actually
require the KVM module to be loaded for it to work anyway.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert the MIPS KVM C code to use standard kernel sized types (e.g.
u32) instead of inttypes.h style ones (e.g. uint32_t) or other types as
appropriate.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Copy __kvm_mips_vcpu_run() into unmapped memory, so that we can never
get a TLB refill exception in it when KVM is built as a module.
This was observed to happen with the host MIPS kernel running under
QEMU, due to a not entirely transparent optimisation in the QEMU TLB
handling where TLB entries replaced with TLBWR are copied to a separate
part of the TLB array. Code in those pages continue to be executable,
but those mappings persist only until the next ASID switch, even if they
are marked global.
An ASID switch happens in __kvm_mips_vcpu_run() at exception level after
switching to the guest exception base. Subsequent TLB mapped kernel
instructions just prior to switching to the guest trigger a TLB refill
exception, which enters the guest exception handlers without updating
EPC. This appears as a guest triggered TLB refill on a host kernel
mapped (host KSeg2) address, which is not handled correctly as user
(guest) mode accesses to kernel (host) segments always generate address
error exceptions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some wakeups should not be considered a sucessful poll. For example on
s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
would be considered runnable - letting all vCPUs poll all the time for
transactional like workload, even if one vCPU would be enough.
This can result in huge CPU usage for large guests.
This patch lets architectures provide a way to qualify wakeups if they
should be considered a good/bad wakeups in regard to polls.
For s390 the implementation will fence of halt polling for anything but
known good, single vCPU events. The s390 implementation for floating
interrupts does a wakeup for one vCPU, but the interrupt will be delivered
by whatever CPU checks first for a pending interrupt. We prefer the
woken up CPU by marking the poll of this CPU as "good" poll.
This code will also mark several other wakeup reasons like IPI or
expired timers as "good". This will of course also mark some events as
not sucessful. As KVM on z runs always as a 2nd level hypervisor,
we prefer to not poll, unless we are really sure, though.
This patch successfully limits the CPU usage for cases like uperf 1byte
transactional ping pong workload or wakeup heavy workload like OLTP
while still providing a proper speedup.
This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
wakeups that are considered not good for polling.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
Cc: David Matlack <dmatlack@google.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
[Rename config symbol. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the necessary hazard barriers after disabling the FPU in
kvm_lose_fpu(), just to be safe.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄmář" <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reading the KVM_CAP_MIPS_FPU capability returns cpu_has_fpu, however
this uses smp_processor_id() to read the current CPU capabilities (since
some old MIPS systems could have FPUs present on only a subset of CPUs).
We don't support any such systems, so work around the warning by using
raw_cpu_has_fpu instead.
We should probably instead claim not to support FPU at all if any one
CPU is lacking an FPU, but this should do for now.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄmář" <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>