* Fix the guest view of the ID registers, making the relevant fields
   writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)
 
 * Correcly expose S1PIE to guests, fixing a regression introduced
   in 6.12-rc1 with the S1POE support
 
 * Fix the recycling of stage-2 shadow MMUs by tracking the context
   (are we allowed to block or not) as well as the recycling state
 
 * Address a couple of issues with the vgic when userspace misconfigures
   the emulation, resulting in various splats. Headaches courtesy
   of our Syzkaller friends
 
 * Stop wasting space in the HYP idmap, as we are dangerously close
   to the 4kB limit, and this has already exploded in -next
 
 * Fix another race in vgic_init()
 
 * Fix a UBSAN error when faking the cache topology with MTE
   enabled
 
 RISCV:
 
 * RISCV: KVM: use raw_spinlock for critical section in imsic
 
 x86:
 
 * A bandaid for lack of XCR0 setup in selftests, which causes trouble
   if the compiler is configured to have x86-64-v3 (with AVX) as the
   default ISA.  Proper XCR0 setup will come in the next merge window.
 
 * Fix an issue where KVM would not ignore low bits of the nested CR3
   and potentially leak up to 31 bytes out of the guest memory's bounds
 
 * Fix case in which an out-of-date cached value for the segments could
   by returned by KVM_GET_SREGS.
 
 * More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL
 
 * Override MTRR state for KVM confidential guests, making it WB by
   default as is already the case for Hyper-V guests.
 
 Generic:
 
 * Remove a couple of unused functions
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmcVK54UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOfrgf7BRyihd28OGaqVuv2BqGYrxqfOkd6
 ZqpJDOy+X7UE3iG5NhTxw4mghCJFhOwIL7gDSZwPLe6D2k01oqPSP2pLMqXb5oOv
 /EkltRvzG0YIH3sjZY5PROrMMxnvSKkJKxETFxFQQzMKRym2v/T5LAzrium58YIT
 vWZXxo2HTPXOw/U5upAqqMYJMeeJEL3kurVHtOsPytUFjrIOl0BfeKvgjOwonDIh
 Awm4JZwk0+1d8sYfkuzsSrTQmtshDCx1jkFN1juirt90s1EwgmOvVKiHo3gMsVP9
 veDRoLTx2fM/r7TrhoHo46DTA2vbfmCltWcT0cn5x8P24BFGXXe/IDJIHA==
 =IVlI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix the guest view of the ID registers, making the relevant fields
     writable from userspace (affecting ID_AA64DFR0_EL1 and
     ID_AA64PFR1_EL1)

   - Correcly expose S1PIE to guests, fixing a regression introduced in
     6.12-rc1 with the S1POE support

   - Fix the recycling of stage-2 shadow MMUs by tracking the context
     (are we allowed to block or not) as well as the recycling state

   - Address a couple of issues with the vgic when userspace
     misconfigures the emulation, resulting in various splats. Headaches
     courtesy of our Syzkaller friends

   - Stop wasting space in the HYP idmap, as we are dangerously close to
     the 4kB limit, and this has already exploded in -next

   - Fix another race in vgic_init()

   - Fix a UBSAN error when faking the cache topology with MTE enabled

  RISCV:

   - RISCV: KVM: use raw_spinlock for critical section in imsic

  x86:

   - A bandaid for lack of XCR0 setup in selftests, which causes trouble
     if the compiler is configured to have x86-64-v3 (with AVX) as the
     default ISA. Proper XCR0 setup will come in the next merge window.

   - Fix an issue where KVM would not ignore low bits of the nested CR3
     and potentially leak up to 31 bytes out of the guest memory's
     bounds

   - Fix case in which an out-of-date cached value for the segments
     could by returned by KVM_GET_SREGS.

   - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL

   - Override MTRR state for KVM confidential guests, making it WB by
     default as is already the case for Hyper-V guests.

  Generic:

   - Remove a couple of unused functions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
  RISCV: KVM: use raw_spinlock for critical section in imsic
  KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
  KVM: selftests: x86: Avoid using SSE/AVX instructions
  KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
  KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
  KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
  KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
  KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
  x86/kvm: Override default caching mode for SEV-SNP and TDX
  KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
  KVM: Remove unused kvm_vcpu_gfn_to_pfn
  KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
  KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
  KVM: arm64: Fix shift-out-of-bounds bug
  KVM: arm64: Shave a few bytes from the EL2 idmap code
  KVM: arm64: Don't eagerly teardown the vgic on init error
  KVM: arm64: Expose S1PIE to guests
  KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
  KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
  KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
  ...
This commit is contained in:
Linus Torvalds 2024-10-21 11:22:04 -07:00
commit d129377639
25 changed files with 277 additions and 103 deletions

View File

@ -8098,13 +8098,15 @@ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
disabled. disabled.
KVM_X86_QUIRK_SLOT_ZAP_ALL By default, KVM invalidates all SPTEs in KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM
fast way for memslot deletion when VM type invalidates all SPTEs in all memslots and
is KVM_X86_DEFAULT_VM. address spaces when a memslot is deleted or
When this quirk is disabled or when VM type moved. When this quirk is disabled (or the
is other than KVM_X86_DEFAULT_VM, KVM zaps VM type isn't KVM_X86_DEFAULT_VM), KVM only
only leaf SPTEs that are within the range of ensures the backing memory of the deleted
the memslot being deleted. or moved memslot isn't reachable, i.e KVM
_may_ invalidate only SPTEs related to the
memslot.
=================================== ============================================ =================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID 7.32 KVM_CAP_MAX_VCPU_ID

View File

@ -136,7 +136,7 @@ For direct sp, we can easily avoid it since the spte of direct sp is fixed
to gfn. For indirect sp, we disabled fast page fault for simplicity. to gfn. For indirect sp, we disabled fast page fault for simplicity.
A solution for indirect sp could be to pin the gfn, for example via A solution for indirect sp could be to pin the gfn, for example via
kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg. After the pinning: gfn_to_pfn_memslot_atomic, before the cmpxchg. After the pinning:
- We have held the refcount of pfn; that means the pfn can not be freed and - We have held the refcount of pfn; that means the pfn can not be freed and
be reused for another gfn. be reused for another gfn.

View File

@ -178,6 +178,7 @@ struct kvm_nvhe_init_params {
unsigned long hcr_el2; unsigned long hcr_el2;
unsigned long vttbr; unsigned long vttbr;
unsigned long vtcr; unsigned long vtcr;
unsigned long tmp;
}; };
/* /*

View File

@ -51,6 +51,7 @@
#define KVM_REQ_RELOAD_PMU KVM_ARCH_REQ(5) #define KVM_REQ_RELOAD_PMU KVM_ARCH_REQ(5)
#define KVM_REQ_SUSPEND KVM_ARCH_REQ(6) #define KVM_REQ_SUSPEND KVM_ARCH_REQ(6)
#define KVM_REQ_RESYNC_PMU_EL0 KVM_ARCH_REQ(7) #define KVM_REQ_RESYNC_PMU_EL0 KVM_ARCH_REQ(7)
#define KVM_REQ_NESTED_S2_UNMAP KVM_ARCH_REQ(8)
#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ #define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
KVM_DIRTY_LOG_INITIALLY_SET) KVM_DIRTY_LOG_INITIALLY_SET)
@ -211,6 +212,12 @@ struct kvm_s2_mmu {
*/ */
bool nested_stage2_enabled; bool nested_stage2_enabled;
/*
* true when this MMU needs to be unmapped before being used for a new
* purpose.
*/
bool pending_unmap;
/* /*
* 0: Nobody is currently using this, check vttbr for validity * 0: Nobody is currently using this, check vttbr for validity
* >0: Somebody is actively using this. * >0: Somebody is actively using this.

View File

@ -166,7 +166,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
int create_hyp_stack(phys_addr_t phys_addr, unsigned long *haddr); int create_hyp_stack(phys_addr_t phys_addr, unsigned long *haddr);
void __init free_hyp_pgds(void); void __init free_hyp_pgds(void);
void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size); void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
u64 size, bool may_block);
void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end); void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end); void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);

View File

@ -78,6 +78,8 @@ extern void kvm_s2_mmu_iterate_by_vmid(struct kvm *kvm, u16 vmid,
extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu); extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu); extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
extern void check_nested_vcpu_requests(struct kvm_vcpu *vcpu);
struct kvm_s2_trans { struct kvm_s2_trans {
phys_addr_t output; phys_addr_t output;
unsigned long block_size; unsigned long block_size;
@ -124,7 +126,7 @@ extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
struct kvm_s2_trans *trans); struct kvm_s2_trans *trans);
extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2); extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
extern void kvm_nested_s2_wp(struct kvm *kvm); extern void kvm_nested_s2_wp(struct kvm *kvm);
extern void kvm_nested_s2_unmap(struct kvm *kvm); extern void kvm_nested_s2_unmap(struct kvm *kvm, bool may_block);
extern void kvm_nested_s2_flush(struct kvm *kvm); extern void kvm_nested_s2_flush(struct kvm *kvm);
unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val); unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val);

View File

@ -146,6 +146,7 @@ int main(void)
DEFINE(NVHE_INIT_HCR_EL2, offsetof(struct kvm_nvhe_init_params, hcr_el2)); DEFINE(NVHE_INIT_HCR_EL2, offsetof(struct kvm_nvhe_init_params, hcr_el2));
DEFINE(NVHE_INIT_VTTBR, offsetof(struct kvm_nvhe_init_params, vttbr)); DEFINE(NVHE_INIT_VTTBR, offsetof(struct kvm_nvhe_init_params, vttbr));
DEFINE(NVHE_INIT_VTCR, offsetof(struct kvm_nvhe_init_params, vtcr)); DEFINE(NVHE_INIT_VTCR, offsetof(struct kvm_nvhe_init_params, vtcr));
DEFINE(NVHE_INIT_TMP, offsetof(struct kvm_nvhe_init_params, tmp));
#endif #endif
#ifdef CONFIG_CPU_PM #ifdef CONFIG_CPU_PM
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp)); DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));

View File

@ -997,6 +997,9 @@ static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
static int check_vcpu_requests(struct kvm_vcpu *vcpu) static int check_vcpu_requests(struct kvm_vcpu *vcpu)
{ {
if (kvm_request_pending(vcpu)) { if (kvm_request_pending(vcpu)) {
if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu))
return -EIO;
if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
kvm_vcpu_sleep(vcpu); kvm_vcpu_sleep(vcpu);
@ -1031,6 +1034,8 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_dirty_ring_check_request(vcpu)) if (kvm_dirty_ring_check_request(vcpu))
return 0; return 0;
check_nested_vcpu_requests(vcpu);
} }
return 1; return 1;

View File

@ -24,28 +24,25 @@
.align 11 .align 11
SYM_CODE_START(__kvm_hyp_init) SYM_CODE_START(__kvm_hyp_init)
ventry __invalid // Synchronous EL2t ventry . // Synchronous EL2t
ventry __invalid // IRQ EL2t ventry . // IRQ EL2t
ventry __invalid // FIQ EL2t ventry . // FIQ EL2t
ventry __invalid // Error EL2t ventry . // Error EL2t
ventry __invalid // Synchronous EL2h ventry . // Synchronous EL2h
ventry __invalid // IRQ EL2h ventry . // IRQ EL2h
ventry __invalid // FIQ EL2h ventry . // FIQ EL2h
ventry __invalid // Error EL2h ventry . // Error EL2h
ventry __do_hyp_init // Synchronous 64-bit EL1 ventry __do_hyp_init // Synchronous 64-bit EL1
ventry __invalid // IRQ 64-bit EL1 ventry . // IRQ 64-bit EL1
ventry __invalid // FIQ 64-bit EL1 ventry . // FIQ 64-bit EL1
ventry __invalid // Error 64-bit EL1 ventry . // Error 64-bit EL1
ventry __invalid // Synchronous 32-bit EL1 ventry . // Synchronous 32-bit EL1
ventry __invalid // IRQ 32-bit EL1 ventry . // IRQ 32-bit EL1
ventry __invalid // FIQ 32-bit EL1 ventry . // FIQ 32-bit EL1
ventry __invalid // Error 32-bit EL1 ventry . // Error 32-bit EL1
__invalid:
b .
/* /*
* Only uses x0..x3 so as to not clobber callee-saved SMCCC registers. * Only uses x0..x3 so as to not clobber callee-saved SMCCC registers.
@ -76,6 +73,13 @@ __do_hyp_init:
eret eret
SYM_CODE_END(__kvm_hyp_init) SYM_CODE_END(__kvm_hyp_init)
SYM_CODE_START_LOCAL(__kvm_init_el2_state)
/* Initialize EL2 CPU state to sane values. */
init_el2_state // Clobbers x0..x2
finalise_el2_state
ret
SYM_CODE_END(__kvm_init_el2_state)
/* /*
* Initialize the hypervisor in EL2. * Initialize the hypervisor in EL2.
* *
@ -102,9 +106,12 @@ SYM_CODE_START_LOCAL(___kvm_hyp_init)
// TPIDR_EL2 is used to preserve x0 across the macro maze... // TPIDR_EL2 is used to preserve x0 across the macro maze...
isb isb
msr tpidr_el2, x0 msr tpidr_el2, x0
init_el2_state str lr, [x0, #NVHE_INIT_TMP]
finalise_el2_state
bl __kvm_init_el2_state
mrs x0, tpidr_el2 mrs x0, tpidr_el2
ldr lr, [x0, #NVHE_INIT_TMP]
1: 1:
ldr x1, [x0, #NVHE_INIT_TPIDR_EL2] ldr x1, [x0, #NVHE_INIT_TPIDR_EL2]
@ -199,9 +206,8 @@ SYM_CODE_START_LOCAL(__kvm_hyp_init_cpu)
2: msr SPsel, #1 // We want to use SP_EL{1,2} 2: msr SPsel, #1 // We want to use SP_EL{1,2}
/* Initialize EL2 CPU state to sane values. */ bl __kvm_init_el2_state
init_el2_state // Clobbers x0..x2
finalise_el2_state
__init_el2_nvhe_prepare_eret __init_el2_nvhe_prepare_eret
/* Enable MMU, set vectors and stack. */ /* Enable MMU, set vectors and stack. */

View File

@ -317,7 +317,7 @@ int kvm_smccc_call_handler(struct kvm_vcpu *vcpu)
* to the guest, and hide SSBS so that the * to the guest, and hide SSBS so that the
* guest stays protected. * guest stays protected.
*/ */
if (cpus_have_final_cap(ARM64_SSBS)) if (kvm_has_feat(vcpu->kvm, ID_AA64PFR1_EL1, SSBS, IMP))
break; break;
fallthrough; fallthrough;
case SPECTRE_UNAFFECTED: case SPECTRE_UNAFFECTED:
@ -428,7 +428,7 @@ int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
* Convert the workaround level into an easy-to-compare number, where higher * Convert the workaround level into an easy-to-compare number, where higher
* values mean better protection. * values mean better protection.
*/ */
static int get_kernel_wa_level(u64 regid) static int get_kernel_wa_level(struct kvm_vcpu *vcpu, u64 regid)
{ {
switch (regid) { switch (regid) {
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1: case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
@ -449,7 +449,7 @@ static int get_kernel_wa_level(u64 regid)
* don't have any FW mitigation if SSBS is there at * don't have any FW mitigation if SSBS is there at
* all times. * all times.
*/ */
if (cpus_have_final_cap(ARM64_SSBS)) if (kvm_has_feat(vcpu->kvm, ID_AA64PFR1_EL1, SSBS, IMP))
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL; return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
fallthrough; fallthrough;
case SPECTRE_UNAFFECTED: case SPECTRE_UNAFFECTED:
@ -486,7 +486,7 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1: case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2: case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3: case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK; val = get_kernel_wa_level(vcpu, reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
break; break;
case KVM_REG_ARM_STD_BMAP: case KVM_REG_ARM_STD_BMAP:
val = READ_ONCE(smccc_feat->std_bmap); val = READ_ONCE(smccc_feat->std_bmap);
@ -588,7 +588,7 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
if (val & ~KVM_REG_FEATURE_LEVEL_MASK) if (val & ~KVM_REG_FEATURE_LEVEL_MASK)
return -EINVAL; return -EINVAL;
if (get_kernel_wa_level(reg->id) < val) if (get_kernel_wa_level(vcpu, reg->id) < val)
return -EINVAL; return -EINVAL;
return 0; return 0;
@ -624,7 +624,7 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
* We can deal with NOT_AVAIL on NOT_REQUIRED, but not the * We can deal with NOT_AVAIL on NOT_REQUIRED, but not the
* other way around. * other way around.
*/ */
if (get_kernel_wa_level(reg->id) < wa_level) if (get_kernel_wa_level(vcpu, reg->id) < wa_level)
return -EINVAL; return -EINVAL;
return 0; return 0;

View File

@ -328,9 +328,10 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
may_block)); may_block));
} }
void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size) void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
u64 size, bool may_block)
{ {
__unmap_stage2_range(mmu, start, size, true); __unmap_stage2_range(mmu, start, size, may_block);
} }
void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end) void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
@ -1015,7 +1016,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
if (!(vma->vm_flags & VM_PFNMAP)) { if (!(vma->vm_flags & VM_PFNMAP)) {
gpa_t gpa = addr + (vm_start - memslot->userspace_addr); gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
kvm_stage2_unmap_range(&kvm->arch.mmu, gpa, vm_end - vm_start); kvm_stage2_unmap_range(&kvm->arch.mmu, gpa, vm_end - vm_start, true);
} }
hva = vm_end; hva = vm_end;
} while (hva < reg_end); } while (hva < reg_end);
@ -1042,7 +1043,7 @@ void stage2_unmap_vm(struct kvm *kvm)
kvm_for_each_memslot(memslot, bkt, slots) kvm_for_each_memslot(memslot, bkt, slots)
stage2_unmap_memslot(kvm, memslot); stage2_unmap_memslot(kvm, memslot);
kvm_nested_s2_unmap(kvm); kvm_nested_s2_unmap(kvm, true);
write_unlock(&kvm->mmu_lock); write_unlock(&kvm->mmu_lock);
mmap_read_unlock(current->mm); mmap_read_unlock(current->mm);
@ -1912,7 +1913,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
(range->end - range->start) << PAGE_SHIFT, (range->end - range->start) << PAGE_SHIFT,
range->may_block); range->may_block);
kvm_nested_s2_unmap(kvm); kvm_nested_s2_unmap(kvm, range->may_block);
return false; return false;
} }
@ -2179,8 +2180,8 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
phys_addr_t size = slot->npages << PAGE_SHIFT; phys_addr_t size = slot->npages << PAGE_SHIFT;
write_lock(&kvm->mmu_lock); write_lock(&kvm->mmu_lock);
kvm_stage2_unmap_range(&kvm->arch.mmu, gpa, size); kvm_stage2_unmap_range(&kvm->arch.mmu, gpa, size, true);
kvm_nested_s2_unmap(kvm); kvm_nested_s2_unmap(kvm, true);
write_unlock(&kvm->mmu_lock); write_unlock(&kvm->mmu_lock);
} }

View File

@ -632,9 +632,9 @@ static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
/* Set the scene for the next search */ /* Set the scene for the next search */
kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size; kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
/* Clear the old state */ /* Make sure we don't forget to do the laundry */
if (kvm_s2_mmu_valid(s2_mmu)) if (kvm_s2_mmu_valid(s2_mmu))
kvm_stage2_unmap_range(s2_mmu, 0, kvm_phys_size(s2_mmu)); s2_mmu->pending_unmap = true;
/* /*
* The virtual VMID (modulo CnP) will be used as a key when matching * The virtual VMID (modulo CnP) will be used as a key when matching
@ -650,6 +650,16 @@ static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
out: out:
atomic_inc(&s2_mmu->refcnt); atomic_inc(&s2_mmu->refcnt);
/*
* Set the vCPU request to perform an unmap, even if the pending unmap
* originates from another vCPU. This guarantees that the MMU has been
* completely unmapped before any vCPU actually uses it, and allows
* multiple vCPUs to lend a hand with completing the unmap.
*/
if (s2_mmu->pending_unmap)
kvm_make_request(KVM_REQ_NESTED_S2_UNMAP, vcpu);
return s2_mmu; return s2_mmu;
} }
@ -663,6 +673,13 @@ void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
{ {
/*
* The vCPU kept its reference on the MMU after the last put, keep
* rolling with it.
*/
if (vcpu->arch.hw_mmu)
return;
if (is_hyp_ctxt(vcpu)) { if (is_hyp_ctxt(vcpu)) {
vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
} else { } else {
@ -674,11 +691,19 @@ void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
{ {
if (kvm_is_nested_s2_mmu(vcpu->kvm, vcpu->arch.hw_mmu)) { /*
* Keep a reference on the associated stage-2 MMU if the vCPU is
* scheduling out and not in WFI emulation, suggesting it is likely to
* reuse the MMU sometime soon.
*/
if (vcpu->scheduled_out && !vcpu_get_flag(vcpu, IN_WFI))
return;
if (kvm_is_nested_s2_mmu(vcpu->kvm, vcpu->arch.hw_mmu))
atomic_dec(&vcpu->arch.hw_mmu->refcnt); atomic_dec(&vcpu->arch.hw_mmu->refcnt);
vcpu->arch.hw_mmu = NULL; vcpu->arch.hw_mmu = NULL;
} }
}
/* /*
* Returns non-zero if permission fault is handled by injecting it to the next * Returns non-zero if permission fault is handled by injecting it to the next
@ -730,7 +755,7 @@ void kvm_nested_s2_wp(struct kvm *kvm)
} }
} }
void kvm_nested_s2_unmap(struct kvm *kvm) void kvm_nested_s2_unmap(struct kvm *kvm, bool may_block)
{ {
int i; int i;
@ -740,7 +765,7 @@ void kvm_nested_s2_unmap(struct kvm *kvm)
struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i]; struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
if (kvm_s2_mmu_valid(mmu)) if (kvm_s2_mmu_valid(mmu))
kvm_stage2_unmap_range(mmu, 0, kvm_phys_size(mmu)); kvm_stage2_unmap_range(mmu, 0, kvm_phys_size(mmu), may_block);
} }
} }
@ -1184,3 +1209,17 @@ int kvm_init_nv_sysregs(struct kvm *kvm)
return 0; return 0;
} }
void check_nested_vcpu_requests(struct kvm_vcpu *vcpu)
{
if (kvm_check_request(KVM_REQ_NESTED_S2_UNMAP, vcpu)) {
struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
write_lock(&vcpu->kvm->mmu_lock);
if (mmu->pending_unmap) {
kvm_stage2_unmap_range(mmu, 0, kvm_phys_size(mmu), true);
mmu->pending_unmap = false;
}
write_unlock(&vcpu->kvm->mmu_lock);
}
}

View File

@ -1527,6 +1527,14 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_RNDR_trap);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_NMI);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_GCS);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_THE);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTEX);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_DF2);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_PFAR);
break; break;
case SYS_ID_AA64PFR2_EL1: case SYS_ID_AA64PFR2_EL1:
/* We only expose FPMR */ /* We only expose FPMR */
@ -1550,7 +1558,8 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK; val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK;
break; break;
case SYS_ID_AA64MMFR3_EL1: case SYS_ID_AA64MMFR3_EL1:
val &= ID_AA64MMFR3_EL1_TCRX | ID_AA64MMFR3_EL1_S1POE; val &= ID_AA64MMFR3_EL1_TCRX | ID_AA64MMFR3_EL1_S1POE |
ID_AA64MMFR3_EL1_S1PIE;
break; break;
case SYS_ID_MMFR4_EL1: case SYS_ID_MMFR4_EL1:
val &= ~ARM64_FEATURE_MASK(ID_MMFR4_EL1_CCIDX); val &= ~ARM64_FEATURE_MASK(ID_MMFR4_EL1_CCIDX);
@ -1985,7 +1994,7 @@ static u64 reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
* one cache line. * one cache line.
*/ */
if (kvm_has_mte(vcpu->kvm)) if (kvm_has_mte(vcpu->kvm))
clidr |= 2 << CLIDR_TTYPE_SHIFT(loc); clidr |= 2ULL << CLIDR_TTYPE_SHIFT(loc);
__vcpu_sys_reg(vcpu, r->reg) = clidr; __vcpu_sys_reg(vcpu, r->reg) = clidr;
@ -2376,7 +2385,19 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_AA64PFR0_EL1_RAS | ID_AA64PFR0_EL1_RAS |
ID_AA64PFR0_EL1_AdvSIMD | ID_AA64PFR0_EL1_AdvSIMD |
ID_AA64PFR0_EL1_FP), }, ID_AA64PFR0_EL1_FP), },
ID_SANITISED(ID_AA64PFR1_EL1), ID_WRITABLE(ID_AA64PFR1_EL1, ~(ID_AA64PFR1_EL1_PFAR |
ID_AA64PFR1_EL1_DF2 |
ID_AA64PFR1_EL1_MTEX |
ID_AA64PFR1_EL1_THE |
ID_AA64PFR1_EL1_GCS |
ID_AA64PFR1_EL1_MTE_frac |
ID_AA64PFR1_EL1_NMI |
ID_AA64PFR1_EL1_RNDR_trap |
ID_AA64PFR1_EL1_SME |
ID_AA64PFR1_EL1_RES0 |
ID_AA64PFR1_EL1_MPAM_frac |
ID_AA64PFR1_EL1_RAS_frac |
ID_AA64PFR1_EL1_MTE)),
ID_WRITABLE(ID_AA64PFR2_EL1, ID_AA64PFR2_EL1_FPMR), ID_WRITABLE(ID_AA64PFR2_EL1, ID_AA64PFR2_EL1_FPMR),
ID_UNALLOCATED(4,3), ID_UNALLOCATED(4,3),
ID_WRITABLE(ID_AA64ZFR0_EL1, ~ID_AA64ZFR0_EL1_RES0), ID_WRITABLE(ID_AA64ZFR0_EL1, ~ID_AA64ZFR0_EL1_RES0),
@ -2390,7 +2411,21 @@ static const struct sys_reg_desc sys_reg_descs[] = {
.get_user = get_id_reg, .get_user = get_id_reg,
.set_user = set_id_aa64dfr0_el1, .set_user = set_id_aa64dfr0_el1,
.reset = read_sanitised_id_aa64dfr0_el1, .reset = read_sanitised_id_aa64dfr0_el1,
.val = ID_AA64DFR0_EL1_PMUVer_MASK | /*
* Prior to FEAT_Debugv8.9, the architecture defines context-aware
* breakpoints (CTX_CMPs) as the highest numbered breakpoints (BRPs).
* KVM does not trap + emulate the breakpoint registers, and as such
* cannot support a layout that misaligns with the underlying hardware.
* While it may be possible to describe a subset that aligns with
* hardware, just prevent changes to BRPs and CTX_CMPs altogether for
* simplicity.
*
* See DDI0487K.a, section D2.8.3 Breakpoint types and linking
* of breakpoints for more details.
*/
.val = ID_AA64DFR0_EL1_DoubleLock_MASK |
ID_AA64DFR0_EL1_WRPs_MASK |
ID_AA64DFR0_EL1_PMUVer_MASK |
ID_AA64DFR0_EL1_DebugVer_MASK, }, ID_AA64DFR0_EL1_DebugVer_MASK, },
ID_SANITISED(ID_AA64DFR1_EL1), ID_SANITISED(ID_AA64DFR1_EL1),
ID_UNALLOCATED(5,2), ID_UNALLOCATED(5,2),
@ -2433,6 +2468,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_AA64MMFR2_EL1_NV | ID_AA64MMFR2_EL1_NV |
ID_AA64MMFR2_EL1_CCIDX)), ID_AA64MMFR2_EL1_CCIDX)),
ID_WRITABLE(ID_AA64MMFR3_EL1, (ID_AA64MMFR3_EL1_TCRX | ID_WRITABLE(ID_AA64MMFR3_EL1, (ID_AA64MMFR3_EL1_TCRX |
ID_AA64MMFR3_EL1_S1PIE |
ID_AA64MMFR3_EL1_S1POE)), ID_AA64MMFR3_EL1_S1POE)),
ID_SANITISED(ID_AA64MMFR4_EL1), ID_SANITISED(ID_AA64MMFR4_EL1),
ID_UNALLOCATED(7,5), ID_UNALLOCATED(7,5),
@ -2903,7 +2939,7 @@ static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
* Drop all shadow S2s, resulting in S1/S2 TLBIs for each of the * Drop all shadow S2s, resulting in S1/S2 TLBIs for each of the
* corresponding VMIDs. * corresponding VMIDs.
*/ */
kvm_nested_s2_unmap(vcpu->kvm); kvm_nested_s2_unmap(vcpu->kvm, true);
write_unlock(&vcpu->kvm->mmu_lock); write_unlock(&vcpu->kvm->mmu_lock);
@ -2955,7 +2991,30 @@ union tlbi_info {
static void s2_mmu_unmap_range(struct kvm_s2_mmu *mmu, static void s2_mmu_unmap_range(struct kvm_s2_mmu *mmu,
const union tlbi_info *info) const union tlbi_info *info)
{ {
kvm_stage2_unmap_range(mmu, info->range.start, info->range.size); /*
* The unmap operation is allowed to drop the MMU lock and block, which
* means that @mmu could be used for a different context than the one
* currently being invalidated.
*
* This behavior is still safe, as:
*
* 1) The vCPU(s) that recycled the MMU are responsible for invalidating
* the entire MMU before reusing it, which still honors the intent
* of a TLBI.
*
* 2) Until the guest TLBI instruction is 'retired' (i.e. increment PC
* and ERET to the guest), other vCPUs are allowed to use stale
* translations.
*
* 3) Accidentally unmapping an unrelated MMU context is nonfatal, and
* at worst may cause more aborts for shadow stage-2 fills.
*
* Dropping the MMU lock also implies that shadow stage-2 fills could
* happen behind the back of the TLBI. This is still safe, though, as
* the L1 needs to put its stage-2 in a consistent state before doing
* the TLBI.
*/
kvm_stage2_unmap_range(mmu, info->range.start, info->range.size, true);
} }
static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p, static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
@ -3050,7 +3109,11 @@ static void s2_mmu_unmap_ipa(struct kvm_s2_mmu *mmu,
max_size = compute_tlb_inval_range(mmu, info->ipa.addr); max_size = compute_tlb_inval_range(mmu, info->ipa.addr);
base_addr &= ~(max_size - 1); base_addr &= ~(max_size - 1);
kvm_stage2_unmap_range(mmu, base_addr, max_size); /*
* See comment in s2_mmu_unmap_range() for why this is allowed to
* reschedule.
*/
kvm_stage2_unmap_range(mmu, base_addr, max_size, true);
} }
static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p, static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,

View File

@ -417,9 +417,29 @@ static void __kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
kfree(vgic_cpu->private_irqs); kfree(vgic_cpu->private_irqs);
vgic_cpu->private_irqs = NULL; vgic_cpu->private_irqs = NULL;
if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
/*
* If this vCPU is being destroyed because of a failed creation
* then unregister the redistributor to avoid leaving behind a
* dangling pointer to the vCPU struct.
*
* vCPUs that have been successfully created (i.e. added to
* kvm->vcpu_array) get unregistered in kvm_vgic_destroy(), as
* this function gets called while holding kvm->arch.config_lock
* in the VM teardown path and would otherwise introduce a lock
* inversion w.r.t. kvm->srcu.
*
* vCPUs that failed creation are torn down outside of the
* kvm->arch.config_lock and do not get unregistered in
* kvm_vgic_destroy(), meaning it is both safe and necessary to
* do so here.
*/
if (kvm_get_vcpu_by_id(vcpu->kvm, vcpu->vcpu_id) != vcpu)
vgic_unregister_redist_iodev(vcpu);
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF; vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
} }
}
void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu) void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
{ {
@ -524,22 +544,31 @@ int kvm_vgic_map_resources(struct kvm *kvm)
if (ret) if (ret)
goto out; goto out;
dist->ready = true;
dist_base = dist->vgic_dist_base; dist_base = dist->vgic_dist_base;
mutex_unlock(&kvm->arch.config_lock); mutex_unlock(&kvm->arch.config_lock);
ret = vgic_register_dist_iodev(kvm, dist_base, type); ret = vgic_register_dist_iodev(kvm, dist_base, type);
if (ret) if (ret) {
kvm_err("Unable to register VGIC dist MMIO regions\n"); kvm_err("Unable to register VGIC dist MMIO regions\n");
goto out_slots;
}
/*
* kvm_io_bus_register_dev() guarantees all readers see the new MMIO
* registration before returning through synchronize_srcu(), which also
* implies a full memory barrier. As such, marking the distributor as
* 'ready' here is guaranteed to be ordered after all vCPUs having seen
* a completely configured distributor.
*/
dist->ready = true;
goto out_slots; goto out_slots;
out: out:
mutex_unlock(&kvm->arch.config_lock); mutex_unlock(&kvm->arch.config_lock);
out_slots: out_slots:
mutex_unlock(&kvm->slots_lock);
if (ret) if (ret)
kvm_vgic_destroy(kvm); kvm_vm_dead(kvm);
mutex_unlock(&kvm->slots_lock);
return ret; return ret;
} }

View File

@ -236,7 +236,12 @@ static int vgic_set_common_attr(struct kvm_device *dev,
mutex_lock(&dev->kvm->arch.config_lock); mutex_lock(&dev->kvm->arch.config_lock);
if (vgic_ready(dev->kvm) || dev->kvm->arch.vgic.nr_spis) /*
* Either userspace has already configured NR_IRQS or
* the vgic has already been initialized and vgic_init()
* supplied a default amount of SPIs.
*/
if (dev->kvm->arch.vgic.nr_spis)
ret = -EBUSY; ret = -EBUSY;
else else
dev->kvm->arch.vgic.nr_spis = dev->kvm->arch.vgic.nr_spis =

View File

@ -55,7 +55,7 @@ struct imsic {
/* IMSIC SW-file */ /* IMSIC SW-file */
struct imsic_mrif *swfile; struct imsic_mrif *swfile;
phys_addr_t swfile_pa; phys_addr_t swfile_pa;
spinlock_t swfile_extirq_lock; raw_spinlock_t swfile_extirq_lock;
}; };
#define imsic_vs_csr_read(__c) \ #define imsic_vs_csr_read(__c) \
@ -622,7 +622,7 @@ static void imsic_swfile_extirq_update(struct kvm_vcpu *vcpu)
* interruptions between reading topei and updating pending status. * interruptions between reading topei and updating pending status.
*/ */
spin_lock_irqsave(&imsic->swfile_extirq_lock, flags); raw_spin_lock_irqsave(&imsic->swfile_extirq_lock, flags);
if (imsic_mrif_atomic_read(mrif, &mrif->eidelivery) && if (imsic_mrif_atomic_read(mrif, &mrif->eidelivery) &&
imsic_mrif_topei(mrif, imsic->nr_eix, imsic->nr_msis)) imsic_mrif_topei(mrif, imsic->nr_eix, imsic->nr_msis))
@ -630,7 +630,7 @@ static void imsic_swfile_extirq_update(struct kvm_vcpu *vcpu)
else else
kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_EXT); kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_EXT);
spin_unlock_irqrestore(&imsic->swfile_extirq_lock, flags); raw_spin_unlock_irqrestore(&imsic->swfile_extirq_lock, flags);
} }
static void imsic_swfile_read(struct kvm_vcpu *vcpu, bool clear, static void imsic_swfile_read(struct kvm_vcpu *vcpu, bool clear,
@ -1051,7 +1051,7 @@ int kvm_riscv_vcpu_aia_imsic_init(struct kvm_vcpu *vcpu)
} }
imsic->swfile = page_to_virt(swfile_page); imsic->swfile = page_to_virt(swfile_page);
imsic->swfile_pa = page_to_phys(swfile_page); imsic->swfile_pa = page_to_phys(swfile_page);
spin_lock_init(&imsic->swfile_extirq_lock); raw_spin_lock_init(&imsic->swfile_extirq_lock);
/* Setup IO device */ /* Setup IO device */
kvm_iodevice_init(&imsic->iodev, &imsic_iodoev_ops); kvm_iodevice_init(&imsic->iodev, &imsic_iodoev_ops);

View File

@ -37,6 +37,7 @@
#include <asm/apic.h> #include <asm/apic.h>
#include <asm/apicdef.h> #include <asm/apicdef.h>
#include <asm/hypervisor.h> #include <asm/hypervisor.h>
#include <asm/mtrr.h>
#include <asm/tlb.h> #include <asm/tlb.h>
#include <asm/cpuidle_haltpoll.h> #include <asm/cpuidle_haltpoll.h>
#include <asm/ptrace.h> #include <asm/ptrace.h>
@ -980,6 +981,9 @@ static void __init kvm_init_platform(void)
} }
kvmclock_init(); kvmclock_init();
x86_platform.apic_post_init = kvm_apic_init; x86_platform.apic_post_init = kvm_apic_init;
/* Set WB as the default cache mode for SEV-SNP and TDX */
mtrr_overwrite_state(NULL, 0, MTRR_TYPE_WRBACK);
} }
#if defined(CONFIG_AMD_MEM_ENCRYPT) #if defined(CONFIG_AMD_MEM_ENCRYPT)

View File

@ -1556,6 +1556,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{ {
bool flush = false; bool flush = false;
/*
* To prevent races with vCPUs faulting in a gfn using stale data,
* zapping a gfn range must be protected by mmu_invalidate_in_progress
* (and mmu_invalidate_seq). The only exception is memslot deletion;
* in that case, SRCU synchronization ensures that SPTEs are zapped
* after all vCPUs have unlocked SRCU, guaranteeing that vCPUs see the
* invalid slot.
*/
lockdep_assert_once(kvm->mmu_invalidate_in_progress ||
lockdep_is_held(&kvm->slots_lock));
if (kvm_memslots_have_rmaps(kvm)) if (kvm_memslots_have_rmaps(kvm))
flush = __kvm_rmap_zap_gfn_range(kvm, range->slot, flush = __kvm_rmap_zap_gfn_range(kvm, range->slot,
range->start, range->end, range->start, range->end,
@ -1884,14 +1895,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
if (is_obsolete_sp((_kvm), (_sp))) { \ if (is_obsolete_sp((_kvm), (_sp))) { \
} else } else
#define for_each_gfn_valid_sp(_kvm, _sp, _gfn) \ #define for_each_gfn_valid_sp_with_gptes(_kvm, _sp, _gfn) \
for_each_valid_sp(_kvm, _sp, \ for_each_valid_sp(_kvm, _sp, \
&(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)]) \ &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)]) \
if ((_sp)->gfn != (_gfn)) {} else if ((_sp)->gfn != (_gfn) || !sp_has_gptes(_sp)) {} else
#define for_each_gfn_valid_sp_with_gptes(_kvm, _sp, _gfn) \
for_each_gfn_valid_sp(_kvm, _sp, _gfn) \
if (!sp_has_gptes(_sp)) {} else
static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
{ {
@ -7063,15 +7070,15 @@ static void kvm_mmu_zap_memslot_pages_and_flush(struct kvm *kvm,
/* /*
* Since accounting information is stored in struct kvm_arch_memory_slot, * Since accounting information is stored in struct kvm_arch_memory_slot,
* shadow pages deletion (e.g. unaccount_shadowed()) requires that all * all MMU pages that are shadowing guest PTEs must be zapped before the
* gfns with a shadow page have a corresponding memslot. Do so before * memslot is deleted, as freeing such pages after the memslot is freed
* the memslot goes away. * will result in use-after-free, e.g. in unaccount_shadowed().
*/ */
for (i = 0; i < slot->npages; i++) { for (i = 0; i < slot->npages; i++) {
struct kvm_mmu_page *sp; struct kvm_mmu_page *sp;
gfn_t gfn = slot->base_gfn + i; gfn_t gfn = slot->base_gfn + i;
for_each_gfn_valid_sp(kvm, sp, gfn) for_each_gfn_valid_sp_with_gptes(kvm, sp, gfn)
kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) { if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {

View File

@ -63,8 +63,12 @@ static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
u64 pdpte; u64 pdpte;
int ret; int ret;
/*
* Note, nCR3 is "assumed" to be 32-byte aligned, i.e. the CPU ignores
* nCR3[4:0] when loading PDPTEs from memory.
*/
ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(cr3), &pdpte, ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(cr3), &pdpte,
offset_in_page(cr3) + index * 8, 8); (cr3 & GENMASK(11, 5)) + index * 8, 8);
if (ret) if (ret)
return 0; return 0;
return pdpte; return pdpte;

View File

@ -4888,9 +4888,6 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmx->hv_deadline_tsc = -1; vmx->hv_deadline_tsc = -1;
kvm_set_cr8(vcpu, 0); kvm_set_cr8(vcpu, 0);
vmx_segment_cache_clear(vmx);
kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS);
seg_setup(VCPU_SREG_CS); seg_setup(VCPU_SREG_CS);
vmcs_write16(GUEST_CS_SELECTOR, 0xf000); vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
vmcs_writel(GUEST_CS_BASE, 0xffff0000ul); vmcs_writel(GUEST_CS_BASE, 0xffff0000ul);
@ -4917,6 +4914,9 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmcs_writel(GUEST_IDTR_BASE, 0); vmcs_writel(GUEST_IDTR_BASE, 0);
vmcs_write32(GUEST_IDTR_LIMIT, 0xffff); vmcs_write32(GUEST_IDTR_LIMIT, 0xffff);
vmx_segment_cache_clear(vmx);
kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS);
vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0);
vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0); vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0);

View File

@ -1313,8 +1313,6 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu); struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn); struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map); int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty); void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn); unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);

View File

@ -244,6 +244,7 @@ CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \
-fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \ -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
-I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \ -I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \
-I$(<D) -Iinclude/$(ARCH_DIR) -I ../rseq -I.. $(EXTRA_CFLAGS) \ -I$(<D) -Iinclude/$(ARCH_DIR) -I ../rseq -I.. $(EXTRA_CFLAGS) \
-march=x86-64-v2 \
$(KHDR_INCLUDES) $(KHDR_INCLUDES)
ifeq ($(ARCH),s390) ifeq ($(ARCH),s390)
CFLAGS += -march=z10 CFLAGS += -march=z10

View File

@ -68,6 +68,8 @@ struct test_feature_reg {
} }
static const struct reg_ftr_bits ftr_id_aa64dfr0_el1[] = { static const struct reg_ftr_bits ftr_id_aa64dfr0_el1[] = {
S_REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, DoubleLock, 0),
REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, WRPs, 0),
S_REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, PMUVer, 0), S_REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, PMUVer, 0),
REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, DebugVer, ID_AA64DFR0_EL1_DebugVer_IMP), REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, DebugVer, ID_AA64DFR0_EL1_DebugVer_IMP),
REG_FTR_END, REG_FTR_END,
@ -134,6 +136,13 @@ static const struct reg_ftr_bits ftr_id_aa64pfr0_el1[] = {
REG_FTR_END, REG_FTR_END,
}; };
static const struct reg_ftr_bits ftr_id_aa64pfr1_el1[] = {
REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64PFR1_EL1, CSV2_frac, 0),
REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64PFR1_EL1, SSBS, ID_AA64PFR1_EL1_SSBS_NI),
REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64PFR1_EL1, BT, 0),
REG_FTR_END,
};
static const struct reg_ftr_bits ftr_id_aa64mmfr0_el1[] = { static const struct reg_ftr_bits ftr_id_aa64mmfr0_el1[] = {
REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64MMFR0_EL1, ECV, 0), REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64MMFR0_EL1, ECV, 0),
REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64MMFR0_EL1, EXS, 0), REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64MMFR0_EL1, EXS, 0),
@ -200,6 +209,7 @@ static struct test_feature_reg test_regs[] = {
TEST_REG(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1_el1), TEST_REG(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1_el1),
TEST_REG(SYS_ID_AA64ISAR2_EL1, ftr_id_aa64isar2_el1), TEST_REG(SYS_ID_AA64ISAR2_EL1, ftr_id_aa64isar2_el1),
TEST_REG(SYS_ID_AA64PFR0_EL1, ftr_id_aa64pfr0_el1), TEST_REG(SYS_ID_AA64PFR0_EL1, ftr_id_aa64pfr0_el1),
TEST_REG(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1_el1),
TEST_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0_el1), TEST_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0_el1),
TEST_REG(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1_el1), TEST_REG(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1_el1),
TEST_REG(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2_el1), TEST_REG(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2_el1),
@ -569,9 +579,9 @@ int main(void)
test_cnt = ARRAY_SIZE(ftr_id_aa64dfr0_el1) + ARRAY_SIZE(ftr_id_dfr0_el1) + test_cnt = ARRAY_SIZE(ftr_id_aa64dfr0_el1) + ARRAY_SIZE(ftr_id_dfr0_el1) +
ARRAY_SIZE(ftr_id_aa64isar0_el1) + ARRAY_SIZE(ftr_id_aa64isar1_el1) + ARRAY_SIZE(ftr_id_aa64isar0_el1) + ARRAY_SIZE(ftr_id_aa64isar1_el1) +
ARRAY_SIZE(ftr_id_aa64isar2_el1) + ARRAY_SIZE(ftr_id_aa64pfr0_el1) + ARRAY_SIZE(ftr_id_aa64isar2_el1) + ARRAY_SIZE(ftr_id_aa64pfr0_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr0_el1) + ARRAY_SIZE(ftr_id_aa64mmfr1_el1) + ARRAY_SIZE(ftr_id_aa64pfr1_el1) + ARRAY_SIZE(ftr_id_aa64mmfr0_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr2_el1) + ARRAY_SIZE(ftr_id_aa64zfr0_el1) - ARRAY_SIZE(ftr_id_aa64mmfr1_el1) + ARRAY_SIZE(ftr_id_aa64mmfr2_el1) +
ARRAY_SIZE(test_regs) + 2; ARRAY_SIZE(ftr_id_aa64zfr0_el1) - ARRAY_SIZE(test_regs) + 2;
ksft_set_plan(test_cnt); ksft_set_plan(test_cnt);

View File

@ -60,7 +60,7 @@ static bool is_cpuid_mangled(const struct kvm_cpuid_entry2 *entrie)
{ {
int i; int i;
for (i = 0; i < sizeof(mangled_cpuids); i++) { for (i = 0; i < ARRAY_SIZE(mangled_cpuids); i++) {
if (mangled_cpuids[i].function == entrie->function && if (mangled_cpuids[i].function == entrie->function &&
mangled_cpuids[i].index == entrie->index) mangled_cpuids[i].index == entrie->index)
return true; return true;

View File

@ -3035,24 +3035,12 @@ kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gf
} }
EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
{
return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
{ {
return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn); return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
} }
EXPORT_SYMBOL_GPL(gfn_to_pfn); EXPORT_SYMBOL_GPL(gfn_to_pfn);
kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
{
return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn, int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
struct page **pages, int nr_pages) struct page **pages, int nr_pages)
{ {