- Fix a relatively benign off-by-one error when splitting huge pages during
CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
- Relax the TDP MMU's lockdep assertions related to holding mmu_lock for read
versus write so that KVM doesn't pass "bool shared" all over the place just
to have precise assertions in paths that don't actually care about whether
the caller is a reader or a writer.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmWW/W4SHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5pAQP/27OuHVhtcao9m8GAsm55tMEIDh2n6Ih
Yb74SOQhprbaGWyx/J6QdKABzcQX86L3CvCMSy6Ssoz2FeDM8Qs2a80/0jkK0qtI
TPXnOsvxh/MgEsC55LOSJZeWM+xwEqaTA2wuPAbvAcJLdAZsSLQQV1XwH6nl4l0l
LhcdkUeAKqQ8DbeBvTDyz80zuWaKhQMm5DJOBM6HbqaAdmMpw/hPDtu9b654jC2R
Z6Rs5590OFrR13cBAvgCBayvb54pjW9dbm8lfZ4Grcq1I+C0I7Y5jF+DYFjKuTUi
N7t+3HrpAM54DWwD95qkTO73g8tMSvwKGKq+OhylZrbPjeqgyYApcUvbxgKmhgGu
hBRoAS9D2goJQtFv4TehpAiJIoR6YGX5cCyIIFI50EfsWkifED/b7zUBf2jg0UnV
Z7IzvGLQtMAHj6daDcgXxaLVAXQ3v4aQK7DiPuCIiFGcXvedgcGIFjZH4NaVvK2D
Y9ygO8PSbN0MbGUSkaS8Urrxmx7l1tv05cardxjSVxnKEx01nYugB0vTM0/o7gSq
Bs2gnf+UJrazeabqFuVBYzzuHH7QyjsiQQ0HWoQiq04v8+C5VldwQKk+g8vxLXG1
iYNXW7qgueLxYxRNleLiyfw1HIyspR09d6e67n2729Xxtay0cvL8dC68AhEenb+N
fZ2lKFhnZ2Vs
=3f2t
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-mmu-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MMU changes for 6.8:
- Fix a relatively benign off-by-one error when splitting huge pages during
CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
- Relax the TDP MMU's lockdep assertions related to holding mmu_lock for read
versus write so that KVM doesn't pass "bool shared" all over the place just
to have precise assertions in paths that don't actually care about whether
the caller is a reader or a writer.
To workaround Xen guests that don't expect Xen PV clocks to be marked as being
based on a stable TSC, add a Xen config knob to allow userspace to opt out of
KVM setting the "TSC stable" bit in Xen PV clocks. Note, the "TSC stable" bit
was added to the PVCLOCK ABI by KVM without an ack from Xen, i.e. KVM isn't
entirely blameless for the buggy guest behavior.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmWXASsSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5R54P/iQPQBs4dJmNkPiA6uSq1O5/8hN4P59z
aapJNgDiny/D9/zPbOxGWR31W7lvCgiES/lp3KcHZmwbeAwJpdT6a0cJWGRlGuov
gccK8AoYcnwSU98sPisnFv7dJ66ogJfXVkPKKaWo+zVW53XUq2XpIie4eWaOweBt
QsXpTGYpGajv1Bf/MgRtNtlkVAo1w8XL1L0NWRugzCk2CAYezz8IT1874GNZoJbd
GJfVP+76FdNw+4/CxiaBwxP0gHfBIiAsJzGqbmMPhGG2xJn+KGs5FTEf37Pta8cl
aMHAq6/JAoabJfP39MexVkopMaFlPbDwIWfkLWf6wSP86KHei+t9kLC0E4/R2NJ+
GKlrBB6Gj+gzFR4fZ75hIwS/4REMt6zVCbS7uSRrCduqrlEFcY5ED2NesoL9wZrB
WMDIxIGIVDdRxc9WLypKmBj7KTgL0qXBxnsAcPiDRf1sk6SGajkesWxA1C1Nzo/H
yNfqq0gjdPZVB2RIGN6DpWQFu3d+ZQnG2ToKIBW7OkvJ5USYiDSo4VozhESgYHRZ
UJDhJ73QYESynClP6ST+9cxNof3FXCEPDeKr5NcmjVZxlJcdeUDNRqv0LUxQ56BI
FvHMHtSs4WLYHZZVzsdh+Yhnc9rEGfoL0NwDPBCcOXjuNMvNQmuzSldc/VDGm/qt
sCtxYMms5n7u
=3v8F
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-xen-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM Xen change for 6.8:
To workaround Xen guests that don't expect Xen PV clocks to be marked as being
based on a stable TSC, add a Xen config knob to allow userspace to opt out of
KVM setting the "TSC stable" bit in Xen PV clocks. Note, the "TSC stable" bit
was added to the PVCLOCK ABI by KVM without an ack from Xen, i.e. KVM isn't
entirely blameless for the buggy guest behavior.
The deprecated interfaces were removed 15 years ago. KVM's
device assignment was deprecated in 4.2 and removed 6.5 years
ago; the only interest might be in compiling ancient versions
of QEMU, but QEMU has been using its own imported copy of the
kernel headers since June 2011. So again we go into archaeology
territory; just remove the cruft.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unless explicitly told to do so (by passing 'clocksource=tsc' and
'tsc=stable:socket', and then jumping through some hoops concerning
potential CPU hotplug) Xen will never use TSC as its clocksource.
Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
in either the primary or secondary pvclock memory areas. This has
led to bugs in some guest kernels which only become evident if
PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
such guests, give the VMM a new Xen HVM config flag to tell KVM to
forcibly clear the bit in the Xen pvclocks.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20231102162128.2353459-1-paul@xen.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
It is cheap to take tdp_mmu_pages_lock in all write-side critical sections.
We already do it all the time when zapping with read_lock(), so it is not
a problem to do it from the kvm_tdp_mmu_zap_all() path (aka
kvm_arch_flush_shadow_all(), aka VM destruction and MMU notifier release).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231125083400.1399197-4-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a new x86 VM type, KVM_X86_SW_PROTECTED_VM, to serve as a development
and testing vehicle for Confidential (CoCo) VMs, and potentially to even
become a "real" product in the distant future, e.g. a la pKVM.
The private memory support in KVM x86 is aimed at AMD's SEV-SNP and
Intel's TDX, but those technologies are extremely complex (understatement),
difficult to debug, don't support running as nested guests, and require
hardware that's isn't universally accessible. I.e. relying SEV-SNP or TDX
for maintaining guest private memory isn't a realistic option.
At the very least, KVM_X86_SW_PROTECTED_VM will enable a variety of
selftests for guest_memfd and private memory support without requiring
unique hardware.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231027182217.3615211-24-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add support for resolving page faults on guest private memory for VMs
that differentiate between "shared" and "private" memory. For such VMs,
KVM_MEM_GUEST_MEMFD memslots can include both fd-based private memory and
hva-based shared memory, and KVM needs to map in the "correct" variant,
i.e. KVM needs to map the gfn shared/private as appropriate based on the
current state of the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE flag.
For AMD's SEV-SNP and Intel's TDX, the guest effectively gets to request
shared vs. private via a bit in the guest page tables, i.e. what the guest
wants may conflict with the current memory attributes. To support such
"implicit" conversion requests, exit to user with KVM_EXIT_MEMORY_FAULT
to forward the request to userspace. Add a new flag for memory faults,
KVM_MEMORY_EXIT_FLAG_PRIVATE, to communicate whether the guest wants to
map memory as shared vs. private.
Like KVM_MEMORY_ATTRIBUTE_PRIVATE, use bit 3 for flagging private memory
so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace
needs such information, e.g. a likely user of KVM_EXIT_MEMORY_FAULT is to
exit on missing mappings when handling guest page fault VM-Exits. In
that case, userspace will want to know RWX information in order to
correctly/precisely resolve the fault.
Note, private memory *must* be backed by guest_memfd, i.e. shared mappings
always come from the host userspace page tables, and private mappings
always come from a guest_memfd instance.
Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-21-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based
memory that is tied to a specific KVM virtual machine and whose primary
purpose is to serve guest memory.
A guest-first memory subsystem allows for optimizations and enhancements
that are kludgy or outright infeasible to implement/support in a generic
memory subsystem. With guest_memfd, guest protections and mapping sizes
are fully decoupled from host userspace mappings. E.g. KVM currently
doesn't support mapping memory as writable in the guest without it also
being writable in host userspace, as KVM's ABI uses VMA protections to
define the allow guest protection. Userspace can fudge this by
establishing two mappings, a writable mapping for the guest and readable
one for itself, but that’s suboptimal on multiple fronts.
Similarly, KVM currently requires the guest mapping size to be a strict
subset of the host userspace mapping size, e.g. KVM doesn’t support
creating a 1GiB guest mapping unless userspace also has a 1GiB guest
mapping. Decoupling the mappings sizes would allow userspace to precisely
map only what is needed without impacting guest performance, e.g. to
harden against unintentional accesses to guest memory.
Decoupling guest and userspace mappings may also allow for a cleaner
alternative to high-granularity mappings for HugeTLB, which has reached a
bit of an impasse and is unlikely to ever be merged.
A guest-first memory subsystem also provides clearer line of sight to
things like a dedicated memory pool (for slice-of-hardware VMs) and
elimination of "struct page" (for offload setups where userspace _never_
needs to mmap() guest memory).
More immediately, being able to map memory into KVM guests without mapping
said memory into the host is critical for Confidential VMs (CoCo VMs), the
initial use case for guest_memfd. While AMD's SEV and Intel's TDX prevent
untrusted software from reading guest private data by encrypting guest
memory with a key that isn't usable by the untrusted host, projects such
as Protected KVM (pKVM) provide confidentiality and integrity *without*
relying on memory encryption. And with SEV-SNP and TDX, accessing guest
private memory can be fatal to the host, i.e. KVM must be prevent host
userspace from accessing guest memory irrespective of hardware behavior.
Attempt #1 to support CoCo VMs was to add a VMA flag to mark memory as
being mappable only by KVM (or a similarly enlightened kernel subsystem).
That approach was abandoned largely due to it needing to play games with
PROT_NONE to prevent userspace from accessing guest memory.
Attempt #2 to was to usurp PG_hwpoison to prevent the host from mapping
guest private memory into userspace, but that approach failed to meet
several requirements for software-based CoCo VMs, e.g. pKVM, as the kernel
wouldn't easily be able to enforce a 1:1 page:guest association, let alone
a 1:1 pfn:gfn mapping. And using PG_hwpoison does not work for memory
that isn't backed by 'struct page', e.g. if devices gain support for
exposing encrypted memory regions to guests.
Attempt #3 was to extend the memfd() syscall and wrap shmem to provide
dedicated file-based guest memory. That approach made it as far as v10
before feedback from Hugh Dickins and Christian Brauner (and others) led
to it demise.
Hugh's objection was that piggybacking shmem made no sense for KVM's use
case as KVM didn't actually *want* the features provided by shmem. I.e.
KVM was using memfd() and shmem to avoid having to manage memory directly,
not because memfd() and shmem were the optimal solution, e.g. things like
read/write/mmap in shmem were dead weight.
Christian pointed out flaws with implementing a partial overlay (wrapping
only _some_ of shmem), e.g. poking at inode_operations or super_operations
would show shmem stuff, but address_space_operations and file_operations
would show KVM's overlay. Paraphrashing heavily, Christian suggested KVM
stop being lazy and create a proper API.
Link: https://lore.kernel.org/all/20201020061859.18385-1-kirill.shutemov@linux.intel.com
Link: https://lore.kernel.org/all/20210416154106.23721-1-kirill.shutemov@linux.intel.com
Link: https://lore.kernel.org/all/20210824005248.200037-1-seanjc@google.com
Link: https://lore.kernel.org/all/20211111141352.26311-1-chao.p.peng@linux.intel.com
Link: https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.peng@linux.intel.com
Link: https://lore.kernel.org/all/ff5c5b97-acdf-9745-ebe5-c6609dd6322e@google.com
Link: https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner
Link: https://lore.kernel.org/all/ZEM5Zq8oo+xnApW9@google.com
Link: https://lore.kernel.org/linux-mm/20230306191944.GA15773@monkey
Link: https://lore.kernel.org/linux-mm/ZII1p8ZHlHaQ3dDl@casper.infradead.org
Cc: Fuad Tabba <tabba@google.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Maciej Szmigiero <mail@maciej.szmigiero.name>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Wang <wei.w.wang@intel.com>
Cc: Liam Merwick <liam.merwick@oracle.com>
Cc: Isaku Yamahata <isaku.yamahata@gmail.com>
Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-17-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In confidential computing usages, whether a page is private or shared is
necessary information for KVM to perform operations like page fault
handling, page zapping etc. There are other potential use cases for
per-page memory attributes, e.g. to make memory read-only (or no-exec,
or exec-only, etc.) without having to modify memslots.
Introduce the KVM_SET_MEMORY_ATTRIBUTES ioctl, advertised by
KVM_CAP_MEMORY_ATTRIBUTES, to allow userspace to set the per-page memory
attributes to a guest memory range.
Use an xarray to store the per-page attributes internally, with a naive,
not fully optimized implementation, i.e. prioritize correctness over
performance for the initial implementation.
Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX
attributes/protections in the future, e.g. to give userspace fine-grained
control over read, write, and execute protections for guest memory.
Provide arch hooks for handling attribute changes before and after common
code sets the new attributes, e.g. x86 will use the "pre" hook to zap all
relevant mappings, and the "post" hook to track whether or not hugepages
can be used to map the range.
To simplify the implementation wrap the entire sequence with
kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly
guaranteed to be an invalidation. For the initial use case, x86 *will*
always invalidate memory, and preventing arch code from creating new
mappings while the attributes are in flux makes it much easier to reason
about the correctness of consuming attributes.
It's possible that future usages may not require an invalidation, e.g.
if KVM ends up supporting RWX protections and userspace grants _more_
protections, but again opt for simplicity and punt optimizations to
if/when they are needed.
Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com
Cc: Fuad Tabba <tabba@google.com>
Cc: Xu Yilun <yilun.xu@intel.com>
Cc: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new KVM exit type to allow userspace to handle memory faults that
KVM cannot resolve, but that userspace *may* be able to handle (without
terminating the guest).
KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit
conversions between private and shared memory. With guest private memory,
there will be two kind of memory conversions:
- explicit conversion: happens when the guest explicitly calls into KVM
to map a range (as private or shared)
- implicit conversion: happens when the guest attempts to access a gfn
that is configured in the "wrong" state (private vs. shared)
On x86 (first architecture to support guest private memory), explicit
conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE,
but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable
as there is (obviously) no hypercall, and there is no guarantee that the
guest actually intends to convert between private and shared, i.e. what
KVM thinks is an implicit conversion "request" could actually be the
result of a guest code bug.
KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to
be implicit conversions.
Note! To allow for future possibilities where KVM reports
KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved
fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's
perspective), not '0'! Due to historical baggage within KVM, exiting to
userspace with '0' from deep callstacks, e.g. in emulation paths, is
infeasible as doing so would require a near-complete overhaul of KVM,
whereas KVM already propagates -errno return codes to userspace even when
the -errno originated in a low level helper.
Report the gpa+size instead of a single gfn even though the initial usage
is expected to always report single pages. It's entirely possible, likely
even, that KVM will someday support sub-page granularity faults, e.g.
Intel's sub-page protection feature allows for additional protections at
128-byte granularity.
Link: https://lore.kernel.org/all/20230908222905.1321305-5-amoorthy@google.com
Link: https://lore.kernel.org/all/ZQ3AmLO2SYv3DszH@google.com
Cc: Anish Moorthy <amoorthy@google.com>
Cc: David Matlack <dmatlack@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231027182217.3615211-10-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional
information can be supplied without setting userspace up to fail. The
padding in the new kvm_userspace_memory_region2 structure will be used to
pass a file descriptor in addition to the userspace_addr, i.e. allow
userspace to point at a file descriptor and map memory into a guest that
is NOT mapped into host userspace.
Alternatively, KVM could simply add "struct kvm_userspace_memory_region2"
without a new ioctl(), but as Paolo pointed out, adding a new ioctl()
makes detection of bad flags a bit more robust, e.g. if the new fd field
is guarded only by a flag and not a new ioctl(), then a userspace bug
(setting a "bad" flag) would generate out-of-bounds access instead of an
-EINVAL error.
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-9-seanjc@google.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its guest
- Optimization for vSGI injection, opportunistically compressing MPIDR
to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZUFJRgAKCRCivnWIJHzd
FtgYAP9cMsc1Mhlw3jNQnTc6q0cbTulD/SoEDPUat1dXMqjs+gEAnskwQTrTX834
fgGQeCAyp7Gmar+KeP64H0xm8kPSpAw=
=R4M7
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.7
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its guest
- Optimization for vSGI injection, opportunistically compressing MPIDR
to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
- Fix various typos, notably a confusing reference to the non-existent
"struct kvm_vcpu_event" (the actual structure is kvm_vcpu_events, plural).
- Update x86's kvm_mmu_page documentation to bring it closer to the code
(this raced with the removal of async zapping and so the documentation is
already stale; my bad).
- Document the behavior of x86 PMU filters on fixed counters.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmU8D4kSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5D4wQAKp05OrsZS2ABGu6DEDG/WnSrEjq6gKZ
/jACooXABBxb5c83iVrDlx+VJY/gxJoT5lrpE9YU9Y1ZhVk1CK09ADalW3OelmkG
L9NA3QB7iHrVwqQhTbKvLguLNNDiCyQd2dyzMGVWf+aUERvGN2D6mZvB/PeBvLHV
rJ4tNmOVtx+Ge4OabHxf94mnGU2ioVCXinwh36DjVkC+Cavq+CHpHdD5NSBcn9Ax
/nmAf4p6o3utIXAk/7iEPiiFoq6WTj6NwCqhCXpB50/DcAi15lzXhtp0lLaTVD20
wvepeNsk451HXE5MYsUwhAndBTBr3/N1+aBaLTTtK3TPnUf40M+UY5FDXG7Bs3kQ
k5pSFobodUrd4vfXyu+Vpl6Q6VBpPo74aklmYr2VwfLPAzi8k1zazaBYij/fueQa
Xrl2Y3gT3Pl2KUnKK7BX4l0TYSWM0E3zdCDs6sLdOAsDeEPo0sW3ZQHG8nAXY/EH
eWp7AAMjooPshCjASslaZDn5lxQSIoauhwfWOoD4xMTIGpSTw8EUJK1GoUgx2Un8
pirNWNIJuv1DwdKti4uJnGwQhwol7pIa0thbx4JS+oSe16buR2yYSeXXwEKviImy
nB0fmdwslpbkjfpjPfSDcHl+XF6HwPszQ6f6WXiNsmv3n2tG701F8lhD1y/fKmo2
x3zpc+DpKUw4
=W12P
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-docs-6.7' of https://github.com/kvm-x86/linux into HEAD
KVM x86 Documentation updates for 6.7:
- Fix various typos, notably a confusing reference to the non-existent
"struct kvm_vcpu_event" (the actual structure is kvm_vcpu_events, plural).
- Update x86's kvm_mmu_page documentation to bring it closer to the code
(this raced with the removal of async zapping and so the documentation is
already stale; my bad).
- Document the behavior of x86 PMU filters on fixed counters.
* kvm-arm64/writable-id-regs:
: Writable ID registers, courtesy of Jing Zhang
:
: This series significantly expands the architectural feature set that
: userspace can manipulate via the ID registers. A new ioctl is defined
: that makes the mutable fields in the ID registers discoverable to
: userspace.
KVM: selftests: Avoid using forced target for generating arm64 headers
tools headers arm64: Fix references to top srcdir in Makefile
KVM: arm64: selftests: Test for setting ID register from usersapce
tools headers arm64: Update sysreg.h with kernel sources
KVM: selftests: Generate sysreg-defs.h and add to include path
perf build: Generate arm64's sysreg-defs.h and add to include path
tools: arm64: Add a Makefile for generating sysreg-defs.h
KVM: arm64: Document vCPU feature selection UAPIs
KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1
KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1
KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1
KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1
KVM: arm64: Bump up the default KVM sanitised debug version to v8p8
KVM: arm64: Reject attempts to set invalid debug arch version
KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version
KVM: arm64: Use guest ID register values for the sake of emulation
KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS
KVM: arm64: Allow userspace to get the writable masks for feature ID registers
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
KVM/arm64 has a couple schemes for handling vCPU feature selection now,
which is a lot to put on userspace. Add some documentation about how
these interact and provide some recommendations for how to use the
writable ID register scheme.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-11-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Add some basic documentation on how to get feature ID register writable
masks from userspace.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Supplement kvm document about LoongArch-specific part, such as add
api introduction for GET/SET_ONE_REG, GET/SET_FPU, GET/SET_MP_STATE,
etc.
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
It goes without saying, but it is probably better to spell it out:
If userspace tries to restore and VM, but creates vcpus and/or RDs
in a different order, the vcpu/RD mapping will be different. Yes,
our API is an ugly piece of crap and I can't believe that we missed
this.
If we want to relax the above, we'll need to define a new userspace
API that allows the mapping to be specified, rather than relying
on the kernel to perform the mapping on its own.
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-12-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Update the documentation for the KVM_SET_PMU_EVENT_FILTER ioctl
to include a detailed description of how fixed performance events
are handled in the pmu filter. The action and fixed_counter_bitmap
members of the pmu filter to determine whether fixed performance
events can be programmed by the guest. This information is helpful
for correctly configuring the fixed_counter_bitmap and action fields
to filter fixed performance events.
Suggested-by: Like Xu <likexu@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304150850.rx4UDDsB-lkp@intel.com
Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Link: https://lore.kernel.org/r/20230531075052.43239-1-cloudliang@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add the description for tdp_mmu_page into kvm_mmu_page description.
tdp_mmu_page is a field to differentiate shadow pages from TDP MMU and
non-TDP MMU.
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230912184553.1887764-7-mizhang@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add the description for mmu_valid_gen into kvm_mmu_page description.
mmu_valid_gen is used in shadow MMU for fast zapping. Update the doc to
reflect that.
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230912184553.1887764-6-mizhang@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add the description of tdp_mmu_root_count into kvm_mmu_page description and
combine it with the description of root_count. tdp_mmu_root_count is an
atomic counter used only in TDP MMU. Update the doc.
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230912184553.1887764-5-mizhang@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add the missing description for ptep in kvm_mmu_page description. ptep is
used when TDP MMU is enabled and it shares the storage with parent_ptes.
Update the doc to help readers to get up-to-date info.
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230912184553.1887764-4-mizhang@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Update the field 'gfns' in kvm_mmu_page to 'shadowed_translation' to be
consistent with the code. Also update the corresponding 'gfns' in the
comments. The more detailed description of 'shadowed_translation' is
already inlined in the data structure definition, so no need to duplicate
the text but simply just update the names.
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230912184553.1887764-3-mizhang@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add the missing description for guest_mode in kvm_mmu_page_role
description. guest_mode tells KVM whether a shadow page is used for the L1
or an L2. Update the missing field in documentation.
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230912184553.1887764-2-mizhang@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Set KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS parameter type to
`struct kvm_vcpu_events`. Events, plural.
Opportunistically fix few other typos.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230814222358.707877-4-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
* Clean up vCPU targets, always returning generic v8 as the preferred target
* Trap forwarding infrastructure for nested virtualization (used for traps
that are taken from an L2 guest and are needed by the L1 hypervisor)
* FEAT_TLBIRANGE support to only invalidate specific ranges of addresses
when collapsing a table PTE to a block PTE. This avoids that the guest
refills the TLBs again for addresses that aren't covered by the table PTE.
* Fix vPMU issues related to handling of PMUver.
* Don't unnecessary align non-stack allocations in the EL2 VA space
* Drop HCR_VIRT_EXCP_MASK, which was never used...
* Don't use smp_processor_id() in kvm_arch_vcpu_load(),
but the cpu parameter instead
* Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
* Remove prototypes without implementations
RISC-V:
* Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
* Added ONE_REG interface for SATP mode
* Added ONE_REG interface to enable/disable multiple ISA extensions
* Improved error codes returned by ONE_REG interfaces
* Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
* Added get-reg-list selftest for KVM RISC-V
s390:
* PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
* Guest debug fixes (Ilya)
x86:
* Clean up KVM's handling of Intel architectural events
* Intel bugfixes
* Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug
registers and generate/handle #DBs
* Clean up LBR virtualization code
* Fix a bug where KVM fails to set the target pCPU during an IRTE update
* Fix fatal bugs in SEV-ES intrahost migration
* Fix a bug where the recent (architecturally correct) change to reinject
#BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
* Retry APIC map recalculation if a vCPU is added/enabled
* Overhaul emergency reboot code to bring SVM up to par with VMX, tie the
"emergency disabling" behavior to KVM actually being loaded, and move all of
the logic within KVM
* Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC
ratio MSR cannot diverge from the default when TSC scaling is disabled
up related code
* Add a framework to allow "caching" feature flags so that KVM can check if
the guest can use a feature without needing to search guest CPUID
* Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
* Fix KVM's handling of !visible guest roots to avoid premature triple fault
injection
* Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface
that is needed by external users (currently only KVMGT), and fix a variety
of issues in the process
This last item had a silly one-character bug in the topic branch that
was sent to me. Because it caused pretty bad selftest failures in
some configurations, I decided to squash in the fix. So, while the
exact commit ids haven't been in linux-next, the code has (from the
kvm-x86 tree).
Generic:
* Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass
action specific data without needing to constantly update the main handlers.
* Drop unused function declarations
Selftests:
* Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
* Add support for printf() in guest code and covert all guest asserts to use
printf-based reporting
* Clean up the PMU event filter test and add new testcases
* Include x86 selftests in the KVM x86 MAINTAINERS entry
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT1m0kUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMNgggAiN7nz6UC423FznuI+yO3TLm8tkx1
CpKh5onqQogVtchH+vrngi97cfOzZb1/AtifY90OWQi31KEWhehkeofcx7G6ERhj
5a9NFADY1xGBsX4exca/VHDxhnzsbDWaWYPXw5vWFWI6erft9Mvy3tp1LwTvOzqM
v8X4aWz+g5bmo/DWJf4Wu32tEU6mnxzkrjKU14JmyqQTBawVmJ3RYvHVJ/Agpw+n
hRtPAy7FU6XTdkmq/uCT+KUHuJEIK0E/l1js47HFAqSzwdW70UDg14GGo1o4ETxu
RjZQmVNvL57yVgi6QU38/A0FWIsWQm5IlaX1Ug6x8pjZPnUKNbo9BY4T1g==
=W+4p
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Clean up vCPU targets, always returning generic v8 as the preferred
target
- Trap forwarding infrastructure for nested virtualization (used for
traps that are taken from an L2 guest and are needed by the L1
hypervisor)
- FEAT_TLBIRANGE support to only invalidate specific ranges of
addresses when collapsing a table PTE to a block PTE. This avoids
that the guest refills the TLBs again for addresses that aren't
covered by the table PTE.
- Fix vPMU issues related to handling of PMUver.
- Don't unnecessary align non-stack allocations in the EL2 VA space
- Drop HCR_VIRT_EXCP_MASK, which was never used...
- Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
parameter instead
- Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
- Remove prototypes without implementations
RISC-V:
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
s390:
- PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
- Guest debug fixes (Ilya)
x86:
- Clean up KVM's handling of Intel architectural events
- Intel bugfixes
- Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
debug registers and generate/handle #DBs
- Clean up LBR virtualization code
- Fix a bug where KVM fails to set the target pCPU during an IRTE
update
- Fix fatal bugs in SEV-ES intrahost migration
- Fix a bug where the recent (architecturally correct) change to
reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
skip it)
- Retry APIC map recalculation if a vCPU is added/enabled
- Overhaul emergency reboot code to bring SVM up to par with VMX, tie
the "emergency disabling" behavior to KVM actually being loaded,
and move all of the logic within KVM
- Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
TSC ratio MSR cannot diverge from the default when TSC scaling is
disabled up related code
- Add a framework to allow "caching" feature flags so that KVM can
check if the guest can use a feature without needing to search
guest CPUID
- Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
- Fix KVM's handling of !visible guest roots to avoid premature
triple fault injection
- Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
API surface that is needed by external users (currently only
KVMGT), and fix a variety of issues in the process
Generic:
- Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
events to pass action specific data without needing to constantly
update the main handlers.
- Drop unused function declarations
Selftests:
- Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
- Add support for printf() in guest code and covert all guest asserts
to use printf-based reporting
- Clean up the PMU event filter test and add new testcases
- Include x86 selftests in the KVM x86 MAINTAINERS entry"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
KVM: x86/mmu: Include mmu.h in spte.h
KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
KVM: x86/mmu: Disallow guest from using !visible slots for page tables
KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
KVM: x86/mmu: Harden new PGD against roots without shadow pages
KVM: x86/mmu: Add helper to convert root hpa to shadow page
drm/i915/gvt: Drop final dependencies on KVM internal details
KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
KVM: x86/mmu: Assert that correct locks are held for page write-tracking
KVM: x86/mmu: Rename page-track APIs to reflect the new reality
KVM: x86/mmu: Drop infrastructure for multiple page-track modes
KVM: x86/mmu: Use page-track notifiers iff there are external users
KVM: x86/mmu: Move KVM-only page-track declarations to internal header
KVM: x86: Remove the unused page-track hook track_flush_slot()
drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
KVM: x86: Add a new page-track hook to handle memslot deletion
drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
KVM: x86: Reject memslot MOVE operations if KVMGT is attached
...
- VFIO direct character device (cdev) interface support. This extracts
the vfio device fd from the container and group model, and is intended
to be the native uAPI for use with IOMMUFD. (Yi Liu)
- Enhancements to the PCI hot reset interface in support of cdev usage.
(Yi Liu)
- Fix a potential race between registering and unregistering vfio files
in the kvm-vfio interface and extend use of a lock to avoid extra
drop and acquires. (Dmitry Torokhov)
- A new vfio-pci variant driver for the AMD/Pensando Distributed Services
Card (PDS) Ethernet device, supporting live migration. (Brett Creeley)
- Cleanups to remove redundant owner setup in cdx and fsl bus drivers,
and simplify driver init/exit in fsl code. (Li Zetao)
- Fix uninitialized hole in data structure and pad capability structures
for alignment. (Stefan Hajnoczi)
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmTvnDUbHGFsZXgud2ls
bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsimEEP/AzG+VRcu5LfYbLGLe0z
zB8ts6G7S78wXlmfN/LYi3v92XWvMMcm+vYF8oNAMfr1YL5sibWN6UtQfY1KCr7h
nWKdQdqjajJ5yDDZnOFdhqHJGNfmZw6+fey8Z0j8zRI2oymK4DncWWX3g/7L1SNr
9tIexGJef+mOdAmC94yOut3YviAaZ+f95T/xrdXHzzoNr50DD0+PD6AJdKJfKggP
vhiC/DAYH3Fofaa6tRasgWuKCYWdjZLR/kxgNpeEmW6kZnbq/dnzZ+kgn4HH1f9G
8p7UKVARR6FfG5aLheWu6Y9PDaKnfnqu8y/hobuE/ivXcmqqK+a6xSxrjgbVs8WJ
94SYnTBRoTlDJaKWa7GxqdgzJnV+s5ZyAgPhjzdi6mLTPWGzkuLhFWGtYL+LZAQ6
pNeZSM6CFBk+bva/xT0nNPCXxPh+/j/Y0G18FREj8aPFc03HrJQqz0RLydvTnoDz
nX/by5KdzMSVSVLPr4uDMtAsgxsGqWiFcp7QMw1HhhlLWxqmYbA+mLZaqyMZUUOx
6b/P8WXT9P2I+qPVKWQ5CWyqpsEqm6P+72yg6LOM9kINvgwDhOa7cagMXIuMWYMH
Rf97FL+K8p1eIy6AnvRHgFBMM5185uG+0YcJyVqtucDr/k8T/Om6ujAI6JbWtNe6
cLgaVAqKOYqCR4HC9bfVGSbd
=eKSR
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- VFIO direct character device (cdev) interface support. This extracts
the vfio device fd from the container and group model, and is
intended to be the native uAPI for use with IOMMUFD (Yi Liu)
- Enhancements to the PCI hot reset interface in support of cdev usage
(Yi Liu)
- Fix a potential race between registering and unregistering vfio files
in the kvm-vfio interface and extend use of a lock to avoid extra
drop and acquires (Dmitry Torokhov)
- A new vfio-pci variant driver for the AMD/Pensando Distributed
Services Card (PDS) Ethernet device, supporting live migration (Brett
Creeley)
- Cleanups to remove redundant owner setup in cdx and fsl bus drivers,
and simplify driver init/exit in fsl code (Li Zetao)
- Fix uninitialized hole in data structure and pad capability
structures for alignment (Stefan Hajnoczi)
* tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio: (53 commits)
vfio/pds: Send type for SUSPEND_STATUS command
vfio/pds: fix return value in pds_vfio_get_lm_file()
pds_core: Fix function header descriptions
vfio: align capability structures
vfio/type1: fix cap_migration information leak
vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code
vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver
vfio/pds: Add Kconfig and documentation
vfio/pds: Add support for firmware recovery
vfio/pds: Add support for dirty page tracking
vfio/pds: Add VFIO live migration support
vfio/pds: register with the pds_core PF
pds_core: Require callers of register/unregister to pass PF drvdata
vfio/pds: Initial support for pds VFIO driver
vfio: Commonize combine_ranges for use in other VFIO drivers
kvm/vfio: avoid bouncing the mutex when adding and deleting groups
kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add()
docs: vfio: Add vfio device cdev description
vfio: Compile vfio_group infrastructure optionally
vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev()
...
KVM_GET_REG_LIST API will return all registers that are available to
KVM_GET/SET_ONE_REG APIs. It's very useful to identify some platform
regression issue during VM migration.
Since this API was already supported on arm64, it is straightforward
to enable it on riscv with similar code structure.
Signed-off-by: Haibo Xu <haibo1.xu@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The EBUSY errno is being used for KVM_SET_ONE_REG as a way to tell
userspace that a given reg can't be changed after the vcpu started.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-6-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
also have a kernel-doc tweak for enums and the long-overdue removal of the
outdated and redundant patch-submission comments at the top of the
MAINTAINERS file.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmSnR4EPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YfWIH/0b+gYD0PftjpG1MfPTlvsxm3yiO2IkZR1rX
ZEvzMIk3cqDsZuhv8g4Xh3qrn7QHW9JE8XbOdkMDw+Hd1kkmYeweVhsLhcar44ai
KPBXCbnd6bU6HcjT/o/AEkYVJzZDmKbt8ALi5C81xu8bWn2iybKgnJv1a3M1PFAx
Dr5ne14HTEau5ewYeYPhkC2n1XRIE1BV0k4PdZlQE/67uwhplh9J2P/DiXh3I9DT
0oxh8cZHRVheCkXNYseMWEC5V+xFfh3jP/fvIefuNGCb7AGDSE4s+Wx8I9CbduIN
SwFtsqXRm2cQ8aj950T0E4JQZLVY0DJKrIJo0qh3LrfUYTinQx0=
=tuod
-----END PGP SIGNATURE-----
Merge tag 'docs-6.5-2' of git://git.lwn.net/linux
Pull mode documentation updates from Jonathan Corbet:
"A half-dozen late arriving docs patches. They are mostly fixes, but we
also have a kernel-doc tweak for enums and the long-overdue removal of
the outdated and redundant patch-submission comments at the top of the
MAINTAINERS file"
* tag 'docs-6.5-2' of git://git.lwn.net/linux:
scripts: kernel-doc: support private / public marking for enums
Documentation: KVM: SEV: add a missing backtick
Documentation: ACPI: fix typo in ssdt-overlays.rst
Fix documentation of panic_on_warn
docs: remove the tips on how to submit patches from MAINTAINERS
docs: fix typo in zh_TW and zh_CN translation
* Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of hugepage splitting in the stage-2
fault path.
* Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
services that live in the Secure world. pKVM intervenes on FF-A calls
to guarantee the host doesn't misuse memory donated to the hyp or a
pKVM guest.
* Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
* Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set configuration
from userspace, but the intent is to relax this limitation and allow
userspace to select a feature set consistent with the CPU.
* Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
* Use a separate set of pointer authentication keys for the hypervisor
when running in protected mode, as the host is untrusted at runtime.
* Ensure timer IRQs are consistently released in the init failure
paths.
* Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
(FEAT_EVT), as it is a register commonly read from userspace.
* Erratum workaround for the upcoming AmpereOne part, which has broken
hardware A/D state management.
RISC-V:
* Redirect AMO load/store misaligned traps to KVM guest
* Trap-n-emulate AIA in-kernel irqchip for KVM guest
* Svnapot support for KVM Guest
s390:
* New uvdevice secret API
* CMM selftest and fixes
* fix racy access to target CPU for diag 9c
x86:
* Fix missing/incorrect #GP checks on ENCLS
* Use standard mmu_notifier hooks for handling APIC access page
* Drop now unnecessary TR/TSS load after VM-Exit on AMD
* Print more descriptive information about the status of SEV and SEV-ES during
module load
* Add a test for splitting and reconstituting hugepages during and after
dirty logging
* Add support for CPU pinning in demand paging test
* Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
included along the way
* Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage
recovery threads (because nx_huge_pages=off can be toggled at runtime)
* Move handling of PAT out of MTRR code and dedup SVM+VMX code
* Fix output of PIC poll command emulation when there's an interrupt
* Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.
* Misc cleanups, fixes and comments
Generic:
* Miscellaneous bugfixes and cleanups
Selftests:
* Generate dependency files so that partial rebuilds work as expected
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSgHrIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroORcAf+KkBlXwQMf+Q0Hy6Mfe0OtkKmh0Ae
6HJ6dsuMfOHhWv5kgukh+qvuGUGzHq+gpVKmZg2yP3h3cLHOLUAYMCDm+rjXyjsk
F4DbnJLfxq43Pe9PHRKFxxSecRcRYCNox0GD5UYL4PLKcH0FyfQrV+HVBK+GI8L3
FDzUcyJkR12Lcj1qf++7fsbzfOshL0AJPmidQCoc6wkLJpUEr/nYUqlI1Kx3YNuQ
LKmxFHS4l4/O/px3GKNDrLWDbrVlwciGIa3GZLS52PZdW3mAqT+cqcPcYK6SW71P
m1vE80VbNELX5q3YSRoOXtedoZ3Pk97LEmz/xQAsJ/jri0Z5Syk0Ok0m/Q==
=AMXp
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of hugepage splitting in the
stage-2 fault path.
- Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
with services that live in the Secure world. pKVM intervenes on
FF-A calls to guarantee the host doesn't misuse memory donated to
the hyp or a pKVM guest.
- Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
- Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set
configuration from userspace, but the intent is to relax this
limitation and allow userspace to select a feature set consistent
with the CPU.
- Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
- Use a separate set of pointer authentication keys for the
hypervisor when running in protected mode, as the host is untrusted
at runtime.
- Ensure timer IRQs are consistently released in the init failure
paths.
- Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
Traps (FEAT_EVT), as it is a register commonly read from userspace.
- Erratum workaround for the upcoming AmpereOne part, which has
broken hardware A/D state management.
RISC-V:
- Redirect AMO load/store misaligned traps to KVM guest
- Trap-n-emulate AIA in-kernel irqchip for KVM guest
- Svnapot support for KVM Guest
s390:
- New uvdevice secret API
- CMM selftest and fixes
- fix racy access to target CPU for diag 9c
x86:
- Fix missing/incorrect #GP checks on ENCLS
- Use standard mmu_notifier hooks for handling APIC access page
- Drop now unnecessary TR/TSS load after VM-Exit on AMD
- Print more descriptive information about the status of SEV and
SEV-ES during module load
- Add a test for splitting and reconstituting hugepages during and
after dirty logging
- Add support for CPU pinning in demand paging test
- Add support for AMD PerfMonV2, with a variety of cleanups and minor
fixes included along the way
- Add a "nx_huge_pages=never" option to effectively avoid creating NX
hugepage recovery threads (because nx_huge_pages=off can be toggled
at runtime)
- Move handling of PAT out of MTRR code and dedup SVM+VMX code
- Fix output of PIC poll command emulation when there's an interrupt
- Add a maintainer's handbook to document KVM x86 processes,
preferred coding style, testing expectations, etc.
- Misc cleanups, fixes and comments
Generic:
- Miscellaneous bugfixes and cleanups
Selftests:
- Generate dependency files so that partial rebuilds work as
expected"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
Documentation/process: Add a maintainer handbook for KVM x86
Documentation/process: Add a label for the tip tree handbook's coding style
KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
RISC-V: KVM: Remove unneeded semicolon
RISC-V: KVM: Allow Svnapot extension for Guest/VM
riscv: kvm: define vcpu_sbi_ext_pmu in header
RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel emulation of AIA APLIC
RISC-V: KVM: Implement device interface for AIA irqchip
RISC-V: KVM: Skeletal in-kernel AIA irqchip support
RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
RISC-V: KVM: Add APLIC related defines
RISC-V: KVM: Add IMSIC related defines
RISC-V: KVM: Implement guest external interrupt line management
KVM: x86: Remove PRIx* definitions as they are solely for user space
s390/uv: Update query for secret-UVCs
s390/uv: replace scnprintf with sysfs_emit
s390/uvdevice: Add 'Lock Secret Store' UVC
...
- Move handling of PAT out of MTRR code and dedup SVM+VMX code
- Fix output of PIC poll command emulation when there's an interrupt
- Fix a longstanding bug in the reporting of the number of entries returned by
KVM_GET_CPUID2
- Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.
- Misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmSaGMMSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5iDIP/0PwY3J5odTEUTnAyuDFPimd5PBt9k/O
B414wdpSKVgzq+0An4qM9mKRnklVIh2p8QqQTvDhcBUg3xb6CX9xZ4ery7hp/T5O
tr5bAXs2AYX6jpxvsopt+w+E9j6fvkJhcJCRU9im3QbrqwUE+ecyU5OHvmv2n/GO
syVZJbPOYuoLPKDjlSMrScE6fWEl9UOvHc5BK/vafTeyisMG3vv1BSmJj6GuiNNk
TS1RRIg//cOZghQyDfdXt0azTmakNZyNn35xnoX9x8SRmdRykyUjQeHmeqWxPDso
kiGO+CGancfS57S6ZtCkJjqEWZ1o/zKdOxr8MMf/3nJhv4kY7/5XtlVoACv5soW9
bZEmNiXIaSbvKNMwAlLJxHFbLa1sMdSCb345CIuMdt5QiWJ53ZiTyIAJX6+eL+Zf
8nkeekgPf5VUs6Zt0RdRPyvo+W7Vp9BtI87yDXm1nQKpbys2pt6CD3YB/oF4QViG
a5cyGoFuqRQbS3nmbshIlR7EanTuxbhLZKrNrFnolZ5e624h3Cnk2hVsfTznVGiX
vNHWM80phk1CWB9McErrZVkGfjlyVyBL13CBB2XF7Dl6PfF6/N22a9bOuTJD3tvk
PlNx4hvZm3esvvyGpjfbSajTKYE8O7rxiE1KrF0BpZ5IUl5WSiTr6XCy/yI/mIeM
hay2IWhPOF2z
=D0BH
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEAD
KVM x86 changes for 6.5:
* Move handling of PAT out of MTRR code and dedup SVM+VMX code
* Fix output of PIC poll command emulation when there's an interrupt
* Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.
* Misc cleanups
brings some order to the documentation directory, declutters the top-level
directory, and makes the documentation organization more closely match that
of the source.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmSbDsIPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Y+ksH/2Xqun1ipPvu66+bBdPIf8N9AVFatl2q3mt4
tgX3A4RH3Ejklb4GbRLOIP23PmCxt7LRv4P05ttw8VpTP3A+Cw1d1s2RxiXGvfDE
j7IW6hrpUmVoDdiDCRGtjdIa7MVI5aAsj8CCTjEFywGi5CQe0Uzq4aTUKoxJDEnu
GYVy2CwDNEt4GTQ6ClPpFx2rc4UZf/H2XqXsnod9ef8A5Nkt3EtgoS1hh3o1QZGA
Mqx2HAOVS1tb6GUVUbVLCdj40+YjBLjXFlsH4dA+wsFFdUlZLKuTesdiAMg2X6eT
E8C/6oRT+OiWbrnXUTJEn8z98Ds8VHn7D4n97O9bIQ+R9AFtmPI=
=H/+D
-----END PGP SIGNATURE-----
Merge tag 'docs-arm64-move' of git://git.lwn.net/linux
Pull arm64 documentation move from Jonathan Corbet:
"Move the arm64 architecture documentation under Documentation/arch/.
This brings some order to the documentation directory, declutters the
top-level directory, and makes the documentation organization more
closely match that of the source"
* tag 'docs-arm64-move' of git://git.lwn.net/linux:
perf arm-spe: Fix a dangling Documentation/arm64 reference
mm: Fix a dangling Documentation/arm64 reference
arm64: Fix dangling references to Documentation/arm64
dt-bindings: fix dangling Documentation/arm64 reference
docs: arm64: Move arm64 documentation under Documentation/arch/
- Some initial page-table documentation from Linus (the other Linus)
- Regression-handling documentation improvements from Thorsten
- Addition of kerneldoc documentation for the ERR_PTR() and related
macros from James Seo
...and the usual collection of fixes and updates.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmSbC9wPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Yw7YH/Rcd2oVQ/B8ui9TYcXTQid0ly5GvLl/ot0zf
pml725bZSKodcdtmLvQ6CzMGRdzxhQpVfzy21zHAlQWiBMdheWeu0Etmpspn8fCI
wnJIlUbGdp5Aq4ZtoJPTtE3vXvWEQ32gVytGjbTVNtSLRLXQ1bc+A/IvmRj3jdkV
dwPfN7hPLVhBt5770pHMywlFVBQ9FUjUNC+uX0JkcNZJ3598c4ZzndBEaLdqfPHC
DtWucRdnHubTncKECgYbspsfH6zuntFk8FgsD1gZ1K9izMAwVBsKSS+MeOz8oxx8
rPq4Tscqs/9mpist/PqxEu0fvTC3xsyMbxLA4hAORmgpdnbWIaQ=
=q2B4
-----END PGP SIGNATURE-----
Merge tag 'docs-6.5' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's been a relatively calm cycle in docsland. We do have:
- Some initial page-table documentation from Linus (the other Linus)
- Regression-handling documentation improvements from Thorsten
- Addition of kerneldoc documentation for the ERR_PTR() and related
macros from James Seo
... and the usual collection of fixes and updates"
* tag 'docs-6.5' of git://git.lwn.net/linux:
docs: consolidate storage interfaces
Documentation: update git configuration for Link: tag
Documentation: KVM: make corrections to vcpu-requests.rst
Documentation: KVM: make corrections to ppc-pv.rst
Documentation: KVM: make corrections to locking.rst
Documentation: KVM: make corrections to halt-polling.rst
Documentation: virt: correct location of haltpoll module params
Documentation/mm: Initial page table documentation
docs: crypto: async-tx-api: fix typo in struct name
docs/doc-guide: Clarify how to write tables
docs: handling-regressions: rework section about fixing procedures
docs: process: fix a typoed cross-reference
docs: submitting-patches: Discuss interleaved replies
MAINTAINERS: direct process doc changes to a dedicated ML
Documentation: core-api: Add error pointer functions to kernel-api
err.h: Add missing kerneldocs for error pointer functions
Documentation: conf.py: Add __force to c_id_attributes
docs: clarify KVM related kernel parameters' descriptions
docs: consolidate human interface subsystems
docs: admin-guide: Add information about intel_pstate active mode
Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy. Move
Documentation/arm64 into arch/ (along with the Chinese equvalent
translations) and fix up documentation references.
Cc: Will Deacon <will@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Hu Haowen <src.res@email.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Yantengsi <siyanteng@loongson.cn>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Clarify language. Clean up grammar. Hyphenate some words.
Change "low-ops" to "low-level" since "low-ops" isn't defined or even
mentioned anywhere else in the kernel source tree.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230610054310.6242-1-rdunlap@infradead.org
Make corrections to punctuation and grammar.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: kvm@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230612030810.23376-5-rdunlap@infradead.org
Correct the path of a header file.
Change "guest to ... guest" to "guest to ... host" in one place.
Hyphenate "32-bit" systems.
Add a comma at one parenthetical phrase.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: Alexander Graf <agraf@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230612030810.23376-4-rdunlap@infradead.org
Module parameters are in sysfs, not debugfs, so change that.
Remove superfluous "that" following "Note:".
Hyphenate "system-wide" values.
Hyphenate "trade-off".
Don't treat "denial of service" as a verb.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230612030810.23376-2-rdunlap@infradead.org
Module parameters are located in sysfs, not debugfs, so correct the
statement.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230610054302.6223-1-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Add a capability for userspace to specify the eager split chunk size.
The chunk size specifies how many pages to break at a time, using a
single allocation. Bigger the chunk size, more pages need to be
allocated ahead of time.
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20230426172330.1439644-6-ricarkol@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
- Expose and allow R/W access to the PCIe DVSEC capability through
vfio-pci, as we already do with the legacy vendor capability.
(K V P Satyanarayana)
- Fix kernel-doc issues with structure definitions. (Simon Horman)
- Clarify ordering of operations relative to the kvm-vfio device for
driver dependencies against the kvm pointer. (Yi Liu)
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmRRPjAbHGFsZXgud2ls
bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiePoQAJkUBO4bN3BvTG3iOkh1
vGrD3llhoUeD3AetT+5Y7pX0ml6wx9SLjcuFIUyYDMomLebpFXQAlI3gjE5eGbnf
PVfn6zJDrcA5w2tFwkJHUnN+RgcaPW5tiviP6XkRFpS95HCQRsFdp9GhI2U3X7q7
1G1pTV4Fb6BPhU0zAOueW1HKovTFDi3qbzaG5JjosdZc8V32/USQ6ekkXSGMbuzN
HsqLi2ujrtwAYluNKGxsuiDKXBFT8z8Mji7S9Z8RqoR0IF9q8JMUC0UMImn/XhKN
4XgCsibPt4o+RSStAf3vUPoP1l3I5J+3dE4ynLiHN8mNmbJ2gCLzQx18Dfww3V9n
Ao8wdwZvvskfc+gkEJAh47OFf9GvCvvHtg3uZS9KPDidS3uY8K4ooIr4Pv1F+QMh
rcnjZRFCakPB2tijmtg+1/0dY5/YEAn0mYWQ8ph5Ips36q30WBrbhBLJ47wE6PZ0
bsChcCU+4wK9e20lzMewa9QYnrRkvjGbmIo3GnjyPt4aG0VM08orIePy7ZVen3S3
+ieXBbuW61BLN/fZBFrDB4IxmekWq2TkkxSQCe8rn+jgs7qUUlFzn34Iqna6QTQV
pG+O3pyflULZGZHkij4lOn1i2SqqbLxZofYose9hHq5r6ALD5EFNg3QVLuiAc7Tb
3ctmfePHowrzKMyHMlXb8DE7
=Ylh5
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.4-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Expose and allow R/W access to the PCIe DVSEC capability through
vfio-pci, as we already do with the legacy vendor capability
(K V P Satyanarayana)
- Fix kernel-doc issues with structure definitions (Simon Horman)
- Clarify ordering of operations relative to the kvm-vfio device for
driver dependencies against the kvm pointer (Yi Liu)
* tag 'vfio-v6.4-rc1' of https://github.com/awilliam/linux-vfio:
docs: kvm: vfio: Suggest KVM_DEV_VFIO_GROUP_ADD vs VFIO_GROUP_GET_DEVICE_FD ordering
vfio: correct kdoc for ops structures
vfio/pci: Add DVSEC PCI Extended Config Capability to user visible list.
* More phys_to_virt conversions
* Improvement of AP management for VSIE (nested virtualization)
ARM64:
* Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
* New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features
being moved to VMMs rather than be implemented in the kernel.
* Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one.
This last part allows the NV timer code to be implemented on
top.
* A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
* The usual selftest fixes and improvements.
KVM x86 changes for 6.4:
* Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
and by giving the guest control of CR0.WP when EPT is enabled on VMX
(VMX-only because SVM doesn't support per-bit controls)
* Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
as a bool
* Move AMD_PSFD to cpufeatures.h and purge KVM's definition
* Avoid unnecessary writes+flushes when the guest is only adding new PTEs
* Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
when emulating invalidations
* Clean up the range-based flushing APIs
* Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
changed SPTE" overhead associated with writing the entire entry
* Track the number of "tail" entries in a pte_list_desc to avoid having
to walk (potentially) all descriptors during insertion and deletion,
which gets quite expensive if the guest is spamming fork()
* Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
* Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, similar to CPUID features
* Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
* Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
x86 AMD:
* Add support for virtual NMIs
* Fixes for edge cases related to virtual interrupts
x86 Intel:
* Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
* Fix a bug in emulation of ENCLS in compatibility mode
* Allow emulation of NOP and PAUSE for L2
* AMX selftests improvements
* Misc cleanups
MIPS:
* Constify MIPS's internal callbacks (a leftover from the hardware enabling
rework that landed in 6.3)
Generic:
* Drop unnecessary casts from "void *" throughout kvm_main.c
* Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
size by 8 bytes on 64-bit kernels by utilizing a padding hole
Documentation:
* Fix goof introduced by the conversion to rST
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
=S2Hq
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"s390:
- More phys_to_virt conversions
- Improvement of AP management for VSIE (nested virtualization)
ARM64:
- Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
- New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features being
moved to VMMs rather than be implemented in the kernel.
- Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one. This
last part allows the NV timer code to be implemented on top.
- A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
- The usual selftest fixes and improvements.
x86:
- Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
enabled, and by giving the guest control of CR0.WP when EPT is
enabled on VMX (VMX-only because SVM doesn't support per-bit
controls)
- Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
return as a bool
- Move AMD_PSFD to cpufeatures.h and purge KVM's definition
- Avoid unnecessary writes+flushes when the guest is only adding new
PTEs
- Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
optimizations when emulating invalidations
- Clean up the range-based flushing APIs
- Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
single A/D bit using a LOCK AND instead of XCHG, and skip all of
the "handle changed SPTE" overhead associated with writing the
entire entry
- Track the number of "tail" entries in a pte_list_desc to avoid
having to walk (potentially) all descriptors during insertion and
deletion, which gets quite expensive if the guest is spamming
fork()
- Disallow virtualizing legacy LBRs if architectural LBRs are
available, the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably
PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features
- Overhaul the vmx_pmu_caps selftest to better validate
PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- AMD SVM:
- Add support for virtual NMIs
- Fixes for edge cases related to virtual interrupts
- Intel AMX:
- Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
XTILE_DATA is not being reported due to userspace not opting in
via prctl()
- Fix a bug in emulation of ENCLS in compatibility mode
- Allow emulation of NOP and PAUSE for L2
- AMX selftests improvements
- Misc cleanups
MIPS:
- Constify MIPS's internal callbacks (a leftover from the hardware
enabling rework that landed in 6.3)
Generic:
- Drop unnecessary casts from "void *" throughout kvm_main.c
- Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
struct size by 8 bytes on 64-bit kernels by utilizing a padding
hole
Documentation:
- Fix goof introduced by the conversion to rST"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
KVM: s390: pci: fix virtual-physical confusion on module unload/load
KVM: s390: vsie: clarifications on setting the APCB
KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
KVM: selftests: Test the PMU event "Instructions retired"
KVM: selftests: Copy full counter values from guest in PMU event filter test
KVM: selftests: Use error codes to signal errors in PMU event filter test
KVM: selftests: Print detailed info in PMU event filter asserts
KVM: selftests: Add helpers for PMC asserts in PMU event filter test
KVM: selftests: Add a common helper for the PMU event filter guest code
KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
KVM: arm64: vhe: Drop extra isb() on guest exit
KVM: arm64: vhe: Synchronise with page table walker on MMU update
KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
KVM: arm64: nvhe: Synchronise with page table walker on TLBI
KVM: arm64: Handle 32bit CNTPCTSS traps
KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
KVM: arm64: vgic: Don't acquire its_lock before config_lock
KVM: selftests: Add test to verify KVM's supported XCR0
...
- Drop unnecessary casts from "void *" throughout kvm_main.c
- Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
size by 8 bytes on 64-bit kernels by utilizing a padding hole
- Fix a documentation format goof that was introduced when the KVM docs
were converted to ReST
- Constify MIPS's internal callbacks (a leftover from the hardware enabling
rework that landed in 6.3)
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmRGrVkSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL52ZAP/0/6KOa6ZSvkRh+7MwQDkfeXkkbRIyyY
ItPspXCqCmD9X79m2r/5PCfpLgWDizROzOxLXb2bMhh7DqPczWWMvwEfZxBRK9LN
5zpHRdiiJJLR0HMdQtWkM5tdDCw/v37aQPkWyaZC/zDi2Zv6YPtPJVEBd38Squoh
vJ8zQp3c1qxHJWKvNaS6JY7NQ1B1sI3e7H9VEldR2d3RAinuAnIMgi+I8WqU6RT1
IdIYkemKrgquO9OPGeBxMV4ri5Km9FBdzb8LRkzzfYaELzVsrRxhXBOc9zaasgYK
YVbJSINeq5dIpwoXI9tqDBJTUIAPJ3NOwK/4E6qc6YEIZoT7euKGgGAqI879TSKm
zNR8b1ijVu5DquJbDFP8AR2UZnqCEIQ/EuuJdkHxFE5wQnNjgNJtSHZVJX/cKqW9
wnXCqK6wQoAUq7pUgyqTsy3SCiRQddEtwsMcf/CdWRPXcgDqQ1P3UmVupLcEtL0I
B+I7S+L64/KOHGeQsEKrohAOFBsMFVEkSkthyflg6/RFv1heHo2lx3njFKYm9lCW
LDCd70+iHD8e5/X4RCWAjB0EaqM3MYpAU2UtD8Pbjx/DiZDLWEjDD0B2LkI0uinS
+Ebdc5M9zGrNawiAzvF+MhZfDWut4Cr0tS5cPttXX3lg8aPl3nZL2G3nlk4vgpec
jgNvjwQ5hUyv
=Qw05
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-generic-6.4' of https://github.com/kvm-x86/linux into HEAD
Common KVM changes for 6.4:
- Drop unnecessary casts from "void *" throughout kvm_main.c
- Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
size by 8 bytes on 64-bit kernels by utilizing a padding hole
- Fix a documentation format goof that was introduced when the KVM docs
were converted to ReST
- Constify MIPS's internal callbacks (a leftover from the hardware enabling
rework that landed in 6.3)