[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
/*
|
|
|
|
* Kernel-based Virtual Machine driver for Linux
|
|
|
|
*
|
|
|
|
* This module enables machines with Intel VT-x extensions to run virtual
|
|
|
|
* machines without emulation or binary translation.
|
|
|
|
*
|
|
|
|
* MMU support
|
|
|
|
*
|
|
|
|
* Copyright (C) 2006 Qumranet, Inc.
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Yaniv Kamay <yaniv@qumranet.com>
|
|
|
|
* Avi Kivity <avi@qumranet.com>
|
|
|
|
*
|
|
|
|
* This work is licensed under the terms of the GNU GPL, version 2. See
|
|
|
|
* the COPYING file in the top-level directory.
|
|
|
|
*
|
|
|
|
*/
|
2007-06-28 14:15:57 -04:00
|
|
|
|
2007-12-14 09:35:10 +08:00
|
|
|
#include "mmu.h"
|
2009-05-31 22:58:47 +03:00
|
|
|
#include "kvm_cache_regs.h"
|
2007-06-28 14:15:57 -04:00
|
|
|
|
2007-12-16 11:02:48 +02:00
|
|
|
#include <linux/kvm_host.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/string.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/highmem.h>
|
|
|
|
#include <linux/module.h>
|
2007-11-26 14:08:14 +02:00
|
|
|
#include <linux/swap.h>
|
2008-02-23 11:44:30 -03:00
|
|
|
#include <linux/hugetlb.h>
|
2008-02-22 12:21:37 -05:00
|
|
|
#include <linux/compiler.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2007-06-28 14:15:57 -04:00
|
|
|
#include <asm/page.h>
|
|
|
|
#include <asm/cmpxchg.h>
|
2007-11-21 14:08:40 +02:00
|
|
|
#include <asm/io.h>
|
2008-11-17 19:03:13 -02:00
|
|
|
#include <asm/vmx.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2008-02-07 13:47:41 +01:00
|
|
|
/*
|
|
|
|
* When setting this variable to true it enables Two-Dimensional-Paging
|
|
|
|
* where the hardware walks 2 page tables:
|
|
|
|
* 1. the guest-virtual to guest-physical
|
|
|
|
* 2. while doing 1. it walks guest-physical to host-physical
|
|
|
|
* If the hardware supports that we don't need to do shadow paging.
|
|
|
|
*/
|
2008-02-22 12:21:37 -05:00
|
|
|
bool tdp_enabled = false;
|
2008-02-07 13:47:41 +01:00
|
|
|
|
2007-01-05 16:36:56 -08:00
|
|
|
#undef MMU_DEBUG
|
|
|
|
|
|
|
|
#undef AUDIT
|
|
|
|
|
|
|
|
#ifdef AUDIT
|
|
|
|
static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg);
|
|
|
|
#else
|
|
|
|
static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) {}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef MMU_DEBUG
|
|
|
|
|
|
|
|
#define pgprintk(x...) do { if (dbg) printk(x); } while (0)
|
|
|
|
#define rmap_printk(x...) do { if (dbg) printk(x); } while (0)
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
#define pgprintk(x...) do { } while (0)
|
|
|
|
#define rmap_printk(x...) do { } while (0)
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#if defined(MMU_DEBUG) || defined(AUDIT)
|
2008-06-22 16:45:24 +03:00
|
|
|
static int dbg = 0;
|
|
|
|
module_param(dbg, bool, 0644);
|
2007-01-05 16:36:56 -08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2008-09-23 13:18:41 -03:00
|
|
|
static int oos_shadow = 1;
|
|
|
|
module_param(oos_shadow, bool, 0644);
|
|
|
|
|
2007-04-25 14:17:25 +08:00
|
|
|
#ifndef MMU_DEBUG
|
|
|
|
#define ASSERT(x) do { } while (0)
|
|
|
|
#else
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
#define ASSERT(x) \
|
|
|
|
if (!(x)) { \
|
|
|
|
printk(KERN_WARNING "assertion failed %s:%d: %s\n", \
|
|
|
|
__FILE__, __LINE__, #x); \
|
|
|
|
}
|
2007-04-25 14:17:25 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
#define PT_FIRST_AVAIL_BITS_SHIFT 9
|
|
|
|
#define PT64_SECOND_AVAIL_BITS_SHIFT 52
|
|
|
|
|
|
|
|
#define VALID_PAGE(x) ((x) != INVALID_PAGE)
|
|
|
|
|
|
|
|
#define PT64_LEVEL_BITS 9
|
|
|
|
|
|
|
|
#define PT64_LEVEL_SHIFT(level) \
|
2007-10-08 09:02:08 -04:00
|
|
|
(PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
#define PT64_LEVEL_MASK(level) \
|
|
|
|
(((1ULL << PT64_LEVEL_BITS) - 1) << PT64_LEVEL_SHIFT(level))
|
|
|
|
|
|
|
|
#define PT64_INDEX(address, level)\
|
|
|
|
(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
|
|
|
|
|
|
|
|
|
|
|
|
#define PT32_LEVEL_BITS 10
|
|
|
|
|
|
|
|
#define PT32_LEVEL_SHIFT(level) \
|
2007-10-08 09:02:08 -04:00
|
|
|
(PAGE_SHIFT + (level - 1) * PT32_LEVEL_BITS)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
#define PT32_LEVEL_MASK(level) \
|
|
|
|
(((1ULL << PT32_LEVEL_BITS) - 1) << PT32_LEVEL_SHIFT(level))
|
2009-07-27 16:30:45 +02:00
|
|
|
#define PT32_LVL_OFFSET_MASK(level) \
|
|
|
|
(PT32_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT32_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
#define PT32_INDEX(address, level)\
|
|
|
|
(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
|
|
|
|
|
|
|
|
|
2007-03-09 13:04:31 +02:00
|
|
|
#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
#define PT64_DIR_BASE_ADDR_MASK \
|
|
|
|
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
|
2009-07-27 16:30:45 +02:00
|
|
|
#define PT64_LVL_ADDR_MASK(level) \
|
|
|
|
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT64_LEVEL_BITS))) - 1))
|
|
|
|
#define PT64_LVL_OFFSET_MASK(level) \
|
|
|
|
(PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT64_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
#define PT32_BASE_ADDR_MASK PAGE_MASK
|
|
|
|
#define PT32_DIR_BASE_ADDR_MASK \
|
|
|
|
(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
|
2009-07-27 16:30:45 +02:00
|
|
|
#define PT32_LVL_ADDR_MASK(level) \
|
|
|
|
(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT32_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2007-11-21 02:06:21 +02:00
|
|
|
#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
|
|
|
|
| PT64_NX_MASK)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
#define PFERR_PRESENT_MASK (1U << 0)
|
|
|
|
#define PFERR_WRITE_MASK (1U << 1)
|
|
|
|
#define PFERR_USER_MASK (1U << 2)
|
2009-03-30 16:21:08 +08:00
|
|
|
#define PFERR_RSVD_MASK (1U << 3)
|
2007-01-26 00:56:41 -08:00
|
|
|
#define PFERR_FETCH_MASK (1U << 4)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2009-07-27 16:30:45 +02:00
|
|
|
#define PT_PDPE_LEVEL 3
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
#define PT_DIRECTORY_LEVEL 2
|
|
|
|
#define PT_PAGE_TABLE_LEVEL 1
|
|
|
|
|
2007-01-05 16:36:38 -08:00
|
|
|
#define RMAP_EXT 4
|
|
|
|
|
2007-12-09 16:15:46 +02:00
|
|
|
#define ACC_EXEC_MASK 1
|
|
|
|
#define ACC_WRITE_MASK PT_WRITABLE_MASK
|
|
|
|
#define ACC_USER_MASK PT_USER_MASK
|
|
|
|
#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
|
|
|
|
|
2009-07-06 12:21:32 +03:00
|
|
|
#define CREATE_TRACE_POINTS
|
|
|
|
#include "mmutrace.h"
|
|
|
|
|
2009-09-23 21:47:17 +03:00
|
|
|
#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
|
|
|
|
|
2008-08-21 17:49:56 +03:00
|
|
|
#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
|
|
|
|
|
2007-01-05 16:36:38 -08:00
|
|
|
struct kvm_rmap_desc {
|
2009-06-10 14:24:23 +03:00
|
|
|
u64 *sptes[RMAP_EXT];
|
2007-01-05 16:36:38 -08:00
|
|
|
struct kvm_rmap_desc *more;
|
|
|
|
};
|
|
|
|
|
2008-12-25 14:39:47 +02:00
|
|
|
struct kvm_shadow_walk_iterator {
|
|
|
|
u64 addr;
|
|
|
|
hpa_t shadow_addr;
|
|
|
|
int level;
|
|
|
|
u64 *sptep;
|
|
|
|
unsigned index;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define for_each_shadow_entry(_vcpu, _addr, _walker) \
|
|
|
|
for (shadow_walk_init(&(_walker), _vcpu, _addr); \
|
|
|
|
shadow_walk_okay(&(_walker)); \
|
|
|
|
shadow_walk_next(&(_walker)))
|
|
|
|
|
|
|
|
|
2008-09-23 13:18:39 -03:00
|
|
|
struct kvm_unsync_walk {
|
|
|
|
int (*entry) (struct kvm_mmu_page *sp, struct kvm_unsync_walk *walk);
|
|
|
|
};
|
|
|
|
|
2008-09-23 13:18:36 -03:00
|
|
|
typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
|
|
|
|
|
2007-04-15 16:31:09 +03:00
|
|
|
static struct kmem_cache *pte_chain_cache;
|
|
|
|
static struct kmem_cache *rmap_desc_cache;
|
2007-05-30 12:34:53 +03:00
|
|
|
static struct kmem_cache *mmu_page_header_cache;
|
2007-04-15 16:31:09 +03:00
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
static u64 __read_mostly shadow_trap_nonpresent_pte;
|
|
|
|
static u64 __read_mostly shadow_notrap_nonpresent_pte;
|
2008-04-25 21:13:50 +08:00
|
|
|
static u64 __read_mostly shadow_base_present_pte;
|
|
|
|
static u64 __read_mostly shadow_nx_mask;
|
|
|
|
static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
|
|
|
|
static u64 __read_mostly shadow_user_mask;
|
|
|
|
static u64 __read_mostly shadow_accessed_mask;
|
|
|
|
static u64 __read_mostly shadow_dirty_mask;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
static inline u64 rsvd_bits(int s, int e)
|
|
|
|
{
|
|
|
|
return ((1ULL << (e - s + 1)) - 1) << s;
|
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte)
|
|
|
|
{
|
|
|
|
shadow_trap_nonpresent_pte = trap_pte;
|
|
|
|
shadow_notrap_nonpresent_pte = notrap_pte;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes);
|
|
|
|
|
2008-04-25 21:13:50 +08:00
|
|
|
void kvm_mmu_set_base_ptes(u64 base_pte)
|
|
|
|
{
|
|
|
|
shadow_base_present_pte = base_pte;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes);
|
|
|
|
|
|
|
|
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
|
2009-04-27 20:35:42 +08:00
|
|
|
u64 dirty_mask, u64 nx_mask, u64 x_mask)
|
2008-04-25 21:13:50 +08:00
|
|
|
{
|
|
|
|
shadow_user_mask = user_mask;
|
|
|
|
shadow_accessed_mask = accessed_mask;
|
|
|
|
shadow_dirty_mask = dirty_mask;
|
|
|
|
shadow_nx_mask = nx_mask;
|
|
|
|
shadow_x_mask = x_mask;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
static int is_write_protection(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
return vcpu->arch.cr0 & X86_CR0_WP;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int is_cpuid_PSE36(void)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2007-01-26 00:56:41 -08:00
|
|
|
static int is_nx(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
return vcpu->arch.shadow_efer & EFER_NX;
|
2007-01-26 00:56:41 -08:00
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
static int is_shadow_present_pte(u64 pte)
|
|
|
|
{
|
|
|
|
return pte != shadow_trap_nonpresent_pte
|
|
|
|
&& pte != shadow_notrap_nonpresent_pte;
|
|
|
|
}
|
|
|
|
|
2008-02-23 11:44:30 -03:00
|
|
|
static int is_large_pte(u64 pte)
|
|
|
|
{
|
|
|
|
return pte & PT_PAGE_SIZE_MASK;
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
static int is_writeble_pte(unsigned long pte)
|
|
|
|
{
|
|
|
|
return pte & PT_WRITABLE_MASK;
|
|
|
|
}
|
|
|
|
|
2009-06-10 14:12:05 +03:00
|
|
|
static int is_dirty_gpte(unsigned long pte)
|
2007-10-11 12:32:30 +02:00
|
|
|
{
|
2009-06-10 12:56:54 +03:00
|
|
|
return pte & PT_DIRTY_MASK;
|
2007-10-11 12:32:30 +02:00
|
|
|
}
|
|
|
|
|
2009-06-10 14:12:05 +03:00
|
|
|
static int is_rmap_spte(u64 pte)
|
2007-01-05 16:36:38 -08:00
|
|
|
{
|
2008-03-23 12:18:19 +02:00
|
|
|
return is_shadow_present_pte(pte);
|
2007-01-05 16:36:38 -08:00
|
|
|
}
|
|
|
|
|
2009-06-10 12:27:03 -03:00
|
|
|
static int is_last_spte(u64 pte, int level)
|
|
|
|
{
|
|
|
|
if (level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
return 1;
|
2009-07-27 16:30:44 +02:00
|
|
|
if (is_large_pte(pte))
|
2009-06-10 12:27:03 -03:00
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-04-02 14:46:56 -05:00
|
|
|
static pfn_t spte_to_pfn(u64 pte)
|
2008-03-23 15:06:23 +02:00
|
|
|
{
|
2008-04-02 14:46:56 -05:00
|
|
|
return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
|
2008-03-23 15:06:23 +02:00
|
|
|
}
|
|
|
|
|
2007-11-21 13:54:47 +02:00
|
|
|
static gfn_t pse36_gfn_delta(u32 gpte)
|
|
|
|
{
|
|
|
|
int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
|
|
|
|
|
|
|
|
return (gpte & PT32_DIR_PSE36_MASK) << shift;
|
|
|
|
}
|
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
static void __set_spte(u64 *sptep, u64 spte)
|
2007-05-31 15:46:04 +03:00
|
|
|
{
|
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
set_64bit((unsigned long *)sptep, spte);
|
|
|
|
#else
|
|
|
|
set_64bit((unsigned long long *)sptep, spte);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2007-01-05 16:36:54 -08:00
|
|
|
static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
|
2007-09-10 11:28:17 +03:00
|
|
|
struct kmem_cache *base_cache, int min)
|
2007-01-05 16:36:53 -08:00
|
|
|
{
|
|
|
|
void *obj;
|
|
|
|
|
|
|
|
if (cache->nobjs >= min)
|
2007-01-05 16:36:54 -08:00
|
|
|
return 0;
|
2007-01-05 16:36:53 -08:00
|
|
|
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
|
2007-09-10 11:28:17 +03:00
|
|
|
obj = kmem_cache_zalloc(base_cache, GFP_KERNEL);
|
2007-01-05 16:36:53 -08:00
|
|
|
if (!obj)
|
2007-01-05 16:36:54 -08:00
|
|
|
return -ENOMEM;
|
2007-01-05 16:36:53 -08:00
|
|
|
cache->objects[cache->nobjs++] = obj;
|
|
|
|
}
|
2007-01-05 16:36:54 -08:00
|
|
|
return 0;
|
2007-01-05 16:36:53 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
|
|
|
|
{
|
|
|
|
while (mc->nobjs)
|
|
|
|
kfree(mc->objects[--mc->nobjs]);
|
|
|
|
}
|
|
|
|
|
2007-07-20 08:18:27 +03:00
|
|
|
static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
|
2007-09-10 11:28:17 +03:00
|
|
|
int min)
|
2007-07-20 08:18:27 +03:00
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
if (cache->nobjs >= min)
|
|
|
|
return 0;
|
|
|
|
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
|
2007-09-10 11:28:17 +03:00
|
|
|
page = alloc_page(GFP_KERNEL);
|
2007-07-20 08:18:27 +03:00
|
|
|
if (!page)
|
|
|
|
return -ENOMEM;
|
|
|
|
set_page_private(page, 0);
|
|
|
|
cache->objects[cache->nobjs++] = page_address(page);
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc)
|
|
|
|
{
|
|
|
|
while (mc->nobjs)
|
2007-07-21 09:06:46 +03:00
|
|
|
free_page((unsigned long)mc->objects[--mc->nobjs]);
|
2007-07-20 08:18:27 +03:00
|
|
|
}
|
|
|
|
|
2007-09-10 11:28:17 +03:00
|
|
|
static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
|
2007-01-05 16:36:53 -08:00
|
|
|
{
|
2007-01-05 16:36:54 -08:00
|
|
|
int r;
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_chain_cache,
|
2007-09-10 11:28:17 +03:00
|
|
|
pte_chain_cache, 4);
|
2007-01-05 16:36:54 -08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_rmap_desc_cache,
|
2008-10-28 18:16:58 -02:00
|
|
|
rmap_desc_cache, 4);
|
2007-05-30 12:34:53 +03:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
|
2007-05-30 12:34:53 +03:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
|
2007-09-10 11:28:17 +03:00
|
|
|
mmu_page_header_cache, 4);
|
2007-01-05 16:36:54 -08:00
|
|
|
out:
|
|
|
|
return r;
|
2007-01-05 16:36:53 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache);
|
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_rmap_desc_cache);
|
|
|
|
mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
|
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
|
2007-01-05 16:36:53 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
|
|
|
|
size_t size)
|
|
|
|
{
|
|
|
|
void *p;
|
|
|
|
|
|
|
|
BUG_ON(!mc->nobjs);
|
|
|
|
p = mc->objects[--mc->nobjs];
|
|
|
|
return p;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_chain_cache,
|
2007-01-05 16:36:53 -08:00
|
|
|
sizeof(struct kvm_pte_chain));
|
|
|
|
}
|
|
|
|
|
2007-07-17 13:04:56 +03:00
|
|
|
static void mmu_free_pte_chain(struct kvm_pte_chain *pc)
|
2007-01-05 16:36:53 -08:00
|
|
|
{
|
2007-07-17 13:04:56 +03:00
|
|
|
kfree(pc);
|
2007-01-05 16:36:53 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
return mmu_memory_cache_alloc(&vcpu->arch.mmu_rmap_desc_cache,
|
2007-01-05 16:36:53 -08:00
|
|
|
sizeof(struct kvm_rmap_desc));
|
|
|
|
}
|
|
|
|
|
2007-07-17 13:04:56 +03:00
|
|
|
static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
|
2007-01-05 16:36:53 -08:00
|
|
|
{
|
2007-07-17 13:04:56 +03:00
|
|
|
kfree(rd);
|
2007-01-05 16:36:53 -08:00
|
|
|
}
|
|
|
|
|
2008-02-23 11:44:30 -03:00
|
|
|
/*
|
|
|
|
* Return the pointer to the largepage write count for a given
|
|
|
|
* gfn, handling slots that are not large page aligned.
|
|
|
|
*/
|
2009-07-27 16:30:43 +02:00
|
|
|
static int *slot_largepage_idx(gfn_t gfn,
|
|
|
|
struct kvm_memory_slot *slot,
|
|
|
|
int level)
|
2008-02-23 11:44:30 -03:00
|
|
|
{
|
|
|
|
unsigned long idx;
|
|
|
|
|
2009-07-27 16:30:43 +02:00
|
|
|
idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
|
|
|
|
(slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
|
|
|
|
return &slot->lpage_info[level - 2][idx].write_count;
|
2008-02-23 11:44:30 -03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void account_shadowed(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
2009-07-27 16:30:43 +02:00
|
|
|
struct kvm_memory_slot *slot;
|
2008-02-23 11:44:30 -03:00
|
|
|
int *write_count;
|
2009-07-27 16:30:43 +02:00
|
|
|
int i;
|
2008-02-23 11:44:30 -03:00
|
|
|
|
2008-10-03 17:40:32 +03:00
|
|
|
gfn = unalias_gfn(kvm, gfn);
|
2009-07-27 16:30:43 +02:00
|
|
|
|
|
|
|
slot = gfn_to_memslot_unaliased(kvm, gfn);
|
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
|
|
|
write_count = slot_largepage_idx(gfn, slot, i);
|
|
|
|
*write_count += 1;
|
|
|
|
}
|
2008-02-23 11:44:30 -03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
2009-07-27 16:30:43 +02:00
|
|
|
struct kvm_memory_slot *slot;
|
2008-02-23 11:44:30 -03:00
|
|
|
int *write_count;
|
2009-07-27 16:30:43 +02:00
|
|
|
int i;
|
2008-02-23 11:44:30 -03:00
|
|
|
|
2008-10-03 17:40:32 +03:00
|
|
|
gfn = unalias_gfn(kvm, gfn);
|
2009-07-27 16:30:43 +02:00
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
|
|
|
slot = gfn_to_memslot_unaliased(kvm, gfn);
|
|
|
|
write_count = slot_largepage_idx(gfn, slot, i);
|
|
|
|
*write_count -= 1;
|
|
|
|
WARN_ON(*write_count < 0);
|
|
|
|
}
|
2008-02-23 11:44:30 -03:00
|
|
|
}
|
|
|
|
|
2009-07-27 16:30:43 +02:00
|
|
|
static int has_wrprotected_page(struct kvm *kvm,
|
|
|
|
gfn_t gfn,
|
|
|
|
int level)
|
2008-02-23 11:44:30 -03:00
|
|
|
{
|
2008-10-03 17:40:32 +03:00
|
|
|
struct kvm_memory_slot *slot;
|
2008-02-23 11:44:30 -03:00
|
|
|
int *largepage_idx;
|
|
|
|
|
2008-10-03 17:40:32 +03:00
|
|
|
gfn = unalias_gfn(kvm, gfn);
|
|
|
|
slot = gfn_to_memslot_unaliased(kvm, gfn);
|
2008-02-23 11:44:30 -03:00
|
|
|
if (slot) {
|
2009-07-27 16:30:43 +02:00
|
|
|
largepage_idx = slot_largepage_idx(gfn, slot, level);
|
2008-02-23 11:44:30 -03:00
|
|
|
return *largepage_idx;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2009-07-27 16:30:43 +02:00
|
|
|
static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
|
2008-02-23 11:44:30 -03:00
|
|
|
{
|
2009-07-27 16:30:43 +02:00
|
|
|
unsigned long page_size = PAGE_SIZE;
|
2008-02-23 11:44:30 -03:00
|
|
|
struct vm_area_struct *vma;
|
|
|
|
unsigned long addr;
|
2009-07-27 16:30:43 +02:00
|
|
|
int i, ret = 0;
|
2008-02-23 11:44:30 -03:00
|
|
|
|
|
|
|
addr = gfn_to_hva(kvm, gfn);
|
|
|
|
if (kvm_is_error_hva(addr))
|
2009-07-27 16:30:43 +02:00
|
|
|
return page_size;
|
2008-02-23 11:44:30 -03:00
|
|
|
|
2008-09-16 20:54:47 -03:00
|
|
|
down_read(¤t->mm->mmap_sem);
|
2008-02-23 11:44:30 -03:00
|
|
|
vma = find_vma(current->mm, addr);
|
2009-07-27 16:30:43 +02:00
|
|
|
if (!vma)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
page_size = vma_kernel_pagesize(vma);
|
|
|
|
|
|
|
|
out:
|
2008-09-16 20:54:47 -03:00
|
|
|
up_read(¤t->mm->mmap_sem);
|
2008-02-23 11:44:30 -03:00
|
|
|
|
2009-07-27 16:30:43 +02:00
|
|
|
for (i = PT_PAGE_TABLE_LEVEL;
|
|
|
|
i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
|
|
|
|
if (page_size >= KVM_HPAGE_SIZE(i))
|
|
|
|
ret = i;
|
|
|
|
else
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2008-09-16 20:54:47 -03:00
|
|
|
return ret;
|
2008-02-23 11:44:30 -03:00
|
|
|
}
|
|
|
|
|
2009-07-27 16:30:43 +02:00
|
|
|
static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
|
2008-02-23 11:44:30 -03:00
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot;
|
2009-07-27 16:30:43 +02:00
|
|
|
int host_level;
|
|
|
|
int level = PT_PAGE_TABLE_LEVEL;
|
2008-02-23 11:44:30 -03:00
|
|
|
|
|
|
|
slot = gfn_to_memslot(vcpu->kvm, large_gfn);
|
|
|
|
if (slot && slot->dirty_bitmap)
|
2009-07-27 16:30:43 +02:00
|
|
|
return PT_PAGE_TABLE_LEVEL;
|
2008-02-23 11:44:30 -03:00
|
|
|
|
2009-07-27 16:30:43 +02:00
|
|
|
host_level = host_mapping_level(vcpu->kvm, large_gfn);
|
|
|
|
|
|
|
|
if (host_level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
return host_level;
|
|
|
|
|
|
|
|
for (level = PT_DIRECTORY_LEVEL; level <= host_level; ++level) {
|
|
|
|
|
|
|
|
if (has_wrprotected_page(vcpu->kvm, large_gfn, level))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return level - 1;
|
2008-02-23 11:44:30 -03:00
|
|
|
}
|
|
|
|
|
2007-09-27 14:11:22 +02:00
|
|
|
/*
|
|
|
|
* Take gfn and return the reverse mapping to it.
|
|
|
|
* Note: gfn must be unaliased before this function get called
|
|
|
|
*/
|
|
|
|
|
2009-07-27 16:30:42 +02:00
|
|
|
static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
|
2007-09-27 14:11:22 +02:00
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot;
|
2008-02-23 11:44:30 -03:00
|
|
|
unsigned long idx;
|
2007-09-27 14:11:22 +02:00
|
|
|
|
|
|
|
slot = gfn_to_memslot(kvm, gfn);
|
2009-07-27 16:30:42 +02:00
|
|
|
if (likely(level == PT_PAGE_TABLE_LEVEL))
|
2008-02-23 11:44:30 -03:00
|
|
|
return &slot->rmap[gfn - slot->base_gfn];
|
|
|
|
|
2009-07-27 16:30:42 +02:00
|
|
|
idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
|
|
|
|
(slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
|
2008-02-23 11:44:30 -03:00
|
|
|
|
2009-07-27 16:30:42 +02:00
|
|
|
return &slot->lpage_info[level - 2][idx].rmap_pde;
|
2007-09-27 14:11:22 +02:00
|
|
|
}
|
|
|
|
|
2007-01-05 16:36:38 -08:00
|
|
|
/*
|
|
|
|
* Reverse mapping data structures:
|
|
|
|
*
|
2007-09-27 14:11:22 +02:00
|
|
|
* If rmapp bit zero is zero, then rmapp point to the shadw page table entry
|
|
|
|
* that points to page_address(page).
|
2007-01-05 16:36:38 -08:00
|
|
|
*
|
2007-09-27 14:11:22 +02:00
|
|
|
* If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc
|
|
|
|
* containing more mappings.
|
2009-08-05 15:43:58 -03:00
|
|
|
*
|
|
|
|
* Returns the number of rmap entries before the spte was added or zero if
|
|
|
|
* the spte was not added.
|
|
|
|
*
|
2007-01-05 16:36:38 -08:00
|
|
|
*/
|
2009-07-27 16:30:42 +02:00
|
|
|
static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
|
2007-01-05 16:36:38 -08:00
|
|
|
{
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-05 16:36:38 -08:00
|
|
|
struct kvm_rmap_desc *desc;
|
2007-09-27 14:11:22 +02:00
|
|
|
unsigned long *rmapp;
|
2009-08-05 15:43:58 -03:00
|
|
|
int i, count = 0;
|
2007-01-05 16:36:38 -08:00
|
|
|
|
2009-06-10 14:12:05 +03:00
|
|
|
if (!is_rmap_spte(*spte))
|
2009-08-05 15:43:58 -03:00
|
|
|
return count;
|
2007-09-27 14:11:22 +02:00
|
|
|
gfn = unalias_gfn(vcpu->kvm, gfn);
|
2007-11-21 15:28:32 +02:00
|
|
|
sp = page_header(__pa(spte));
|
|
|
|
sp->gfns[spte - sp->spt] = gfn;
|
2009-07-27 16:30:42 +02:00
|
|
|
rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
|
2007-09-27 14:11:22 +02:00
|
|
|
if (!*rmapp) {
|
2007-01-05 16:36:38 -08:00
|
|
|
rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte);
|
2007-09-27 14:11:22 +02:00
|
|
|
*rmapp = (unsigned long)spte;
|
|
|
|
} else if (!(*rmapp & 1)) {
|
2007-01-05 16:36:38 -08:00
|
|
|
rmap_printk("rmap_add: %p %llx 1->many\n", spte, *spte);
|
2007-01-05 16:36:53 -08:00
|
|
|
desc = mmu_alloc_rmap_desc(vcpu);
|
2009-06-10 14:24:23 +03:00
|
|
|
desc->sptes[0] = (u64 *)*rmapp;
|
|
|
|
desc->sptes[1] = spte;
|
2007-09-27 14:11:22 +02:00
|
|
|
*rmapp = (unsigned long)desc | 1;
|
2007-01-05 16:36:38 -08:00
|
|
|
} else {
|
|
|
|
rmap_printk("rmap_add: %p %llx many->many\n", spte, *spte);
|
2007-09-27 14:11:22 +02:00
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
2009-06-10 14:24:23 +03:00
|
|
|
while (desc->sptes[RMAP_EXT-1] && desc->more) {
|
2007-01-05 16:36:38 -08:00
|
|
|
desc = desc->more;
|
2009-08-05 15:43:58 -03:00
|
|
|
count += RMAP_EXT;
|
|
|
|
}
|
2009-06-10 14:24:23 +03:00
|
|
|
if (desc->sptes[RMAP_EXT-1]) {
|
2007-01-05 16:36:53 -08:00
|
|
|
desc->more = mmu_alloc_rmap_desc(vcpu);
|
2007-01-05 16:36:38 -08:00
|
|
|
desc = desc->more;
|
|
|
|
}
|
2009-06-10 14:24:23 +03:00
|
|
|
for (i = 0; desc->sptes[i]; ++i)
|
2007-01-05 16:36:38 -08:00
|
|
|
;
|
2009-06-10 14:24:23 +03:00
|
|
|
desc->sptes[i] = spte;
|
2007-01-05 16:36:38 -08:00
|
|
|
}
|
2009-08-05 15:43:58 -03:00
|
|
|
return count;
|
2007-01-05 16:36:38 -08:00
|
|
|
}
|
|
|
|
|
2007-09-27 14:11:22 +02:00
|
|
|
static void rmap_desc_remove_entry(unsigned long *rmapp,
|
2007-01-05 16:36:38 -08:00
|
|
|
struct kvm_rmap_desc *desc,
|
|
|
|
int i,
|
|
|
|
struct kvm_rmap_desc *prev_desc)
|
|
|
|
{
|
|
|
|
int j;
|
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
for (j = RMAP_EXT - 1; !desc->sptes[j] && j > i; --j)
|
2007-01-05 16:36:38 -08:00
|
|
|
;
|
2009-06-10 14:24:23 +03:00
|
|
|
desc->sptes[i] = desc->sptes[j];
|
|
|
|
desc->sptes[j] = NULL;
|
2007-01-05 16:36:38 -08:00
|
|
|
if (j != 0)
|
|
|
|
return;
|
|
|
|
if (!prev_desc && !desc->more)
|
2009-06-10 14:24:23 +03:00
|
|
|
*rmapp = (unsigned long)desc->sptes[0];
|
2007-01-05 16:36:38 -08:00
|
|
|
else
|
|
|
|
if (prev_desc)
|
|
|
|
prev_desc->more = desc->more;
|
|
|
|
else
|
2007-09-27 14:11:22 +02:00
|
|
|
*rmapp = (unsigned long)desc->more | 1;
|
2007-07-17 13:04:56 +03:00
|
|
|
mmu_free_rmap_desc(desc);
|
2007-01-05 16:36:38 -08:00
|
|
|
}
|
|
|
|
|
2007-09-27 14:11:22 +02:00
|
|
|
static void rmap_remove(struct kvm *kvm, u64 *spte)
|
2007-01-05 16:36:38 -08:00
|
|
|
{
|
|
|
|
struct kvm_rmap_desc *desc;
|
|
|
|
struct kvm_rmap_desc *prev_desc;
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn_t pfn;
|
2007-09-27 14:11:22 +02:00
|
|
|
unsigned long *rmapp;
|
2007-01-05 16:36:38 -08:00
|
|
|
int i;
|
|
|
|
|
2009-06-10 14:12:05 +03:00
|
|
|
if (!is_rmap_spte(*spte))
|
2007-01-05 16:36:38 -08:00
|
|
|
return;
|
2007-11-21 15:28:32 +02:00
|
|
|
sp = page_header(__pa(spte));
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn = spte_to_pfn(*spte);
|
2008-04-25 21:13:50 +08:00
|
|
|
if (*spte & shadow_accessed_mask)
|
2008-04-02 14:46:56 -05:00
|
|
|
kvm_set_pfn_accessed(pfn);
|
2007-11-20 11:49:33 +02:00
|
|
|
if (is_writeble_pte(*spte))
|
2009-09-23 21:47:16 +03:00
|
|
|
kvm_set_pfn_dirty(pfn);
|
2009-07-27 16:30:42 +02:00
|
|
|
rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level);
|
2007-09-27 14:11:22 +02:00
|
|
|
if (!*rmapp) {
|
2007-01-05 16:36:38 -08:00
|
|
|
printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
|
|
|
|
BUG();
|
2007-09-27 14:11:22 +02:00
|
|
|
} else if (!(*rmapp & 1)) {
|
2007-01-05 16:36:38 -08:00
|
|
|
rmap_printk("rmap_remove: %p %llx 1->0\n", spte, *spte);
|
2007-09-27 14:11:22 +02:00
|
|
|
if ((u64 *)*rmapp != spte) {
|
2007-01-05 16:36:38 -08:00
|
|
|
printk(KERN_ERR "rmap_remove: %p %llx 1->BUG\n",
|
|
|
|
spte, *spte);
|
|
|
|
BUG();
|
|
|
|
}
|
2007-09-27 14:11:22 +02:00
|
|
|
*rmapp = 0;
|
2007-01-05 16:36:38 -08:00
|
|
|
} else {
|
|
|
|
rmap_printk("rmap_remove: %p %llx many->many\n", spte, *spte);
|
2007-09-27 14:11:22 +02:00
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
2007-01-05 16:36:38 -08:00
|
|
|
prev_desc = NULL;
|
|
|
|
while (desc) {
|
2009-06-10 14:24:23 +03:00
|
|
|
for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i)
|
|
|
|
if (desc->sptes[i] == spte) {
|
2007-09-27 14:11:22 +02:00
|
|
|
rmap_desc_remove_entry(rmapp,
|
2007-01-05 16:36:53 -08:00
|
|
|
desc, i,
|
2007-01-05 16:36:38 -08:00
|
|
|
prev_desc);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
prev_desc = desc;
|
|
|
|
desc = desc->more;
|
|
|
|
}
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-10-16 14:42:30 +02:00
|
|
|
static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
|
2007-01-05 16:36:43 -08:00
|
|
|
{
|
|
|
|
struct kvm_rmap_desc *desc;
|
2007-10-16 14:42:30 +02:00
|
|
|
struct kvm_rmap_desc *prev_desc;
|
|
|
|
u64 *prev_spte;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!*rmapp)
|
|
|
|
return NULL;
|
|
|
|
else if (!(*rmapp & 1)) {
|
|
|
|
if (!spte)
|
|
|
|
return (u64 *)*rmapp;
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
|
|
|
prev_desc = NULL;
|
|
|
|
prev_spte = NULL;
|
|
|
|
while (desc) {
|
2009-06-10 14:24:23 +03:00
|
|
|
for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) {
|
2007-10-16 14:42:30 +02:00
|
|
|
if (prev_spte == spte)
|
2009-06-10 14:24:23 +03:00
|
|
|
return desc->sptes[i];
|
|
|
|
prev_spte = desc->sptes[i];
|
2007-10-16 14:42:30 +02:00
|
|
|
}
|
|
|
|
desc = desc->more;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2008-12-01 22:32:03 -02:00
|
|
|
static int rmap_write_protect(struct kvm *kvm, u64 gfn)
|
2007-10-16 14:42:30 +02:00
|
|
|
{
|
2007-09-27 14:11:22 +02:00
|
|
|
unsigned long *rmapp;
|
2007-01-05 16:36:43 -08:00
|
|
|
u64 *spte;
|
2009-07-27 16:30:42 +02:00
|
|
|
int i, write_protected = 0;
|
2007-01-05 16:36:43 -08:00
|
|
|
|
2007-10-10 20:08:41 -05:00
|
|
|
gfn = unalias_gfn(kvm, gfn);
|
2009-07-27 16:30:42 +02:00
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL);
|
2007-01-05 16:36:43 -08:00
|
|
|
|
2007-10-16 14:42:30 +02:00
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
2007-01-05 16:36:43 -08:00
|
|
|
BUG_ON(!spte);
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
|
2007-12-18 06:08:27 +08:00
|
|
|
if (is_writeble_pte(*spte)) {
|
2009-06-10 14:24:23 +03:00
|
|
|
__set_spte(spte, *spte & ~PT_WRITABLE_MASK);
|
2007-12-18 06:08:27 +08:00
|
|
|
write_protected = 1;
|
|
|
|
}
|
2007-10-16 14:43:46 +02:00
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
2007-01-05 16:36:43 -08:00
|
|
|
}
|
2008-03-20 18:17:24 +02:00
|
|
|
if (write_protected) {
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn_t pfn;
|
2008-03-20 18:17:24 +02:00
|
|
|
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn = spte_to_pfn(*spte);
|
|
|
|
kvm_set_pfn_dirty(pfn);
|
2008-03-20 18:17:24 +02:00
|
|
|
}
|
|
|
|
|
2008-02-23 11:44:30 -03:00
|
|
|
/* check for huge page mappings */
|
2009-07-27 16:30:42 +02:00
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, i);
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
BUG_ON(!spte);
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
|
|
|
|
pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn);
|
|
|
|
if (is_writeble_pte(*spte)) {
|
|
|
|
rmap_remove(kvm, spte);
|
|
|
|
--kvm->stat.lpages;
|
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
|
|
|
spte = NULL;
|
|
|
|
write_protected = 1;
|
|
|
|
}
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
2008-02-23 11:44:30 -03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-12-01 22:32:03 -02:00
|
|
|
return write_protected;
|
2007-01-05 16:36:43 -08:00
|
|
|
}
|
|
|
|
|
2009-10-09 11:42:56 +00:00
|
|
|
static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2008-07-25 16:24:52 +02:00
|
|
|
{
|
|
|
|
u64 *spte;
|
|
|
|
int need_tlb_flush = 0;
|
|
|
|
|
|
|
|
while ((spte = rmap_next(kvm, rmapp, NULL))) {
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
|
|
|
|
rmap_remove(kvm, spte);
|
2009-06-10 14:24:23 +03:00
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
2008-07-25 16:24:52 +02:00
|
|
|
need_tlb_flush = 1;
|
|
|
|
}
|
|
|
|
return need_tlb_flush;
|
|
|
|
}
|
|
|
|
|
2009-10-09 11:42:56 +00:00
|
|
|
static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2009-09-23 21:47:18 +03:00
|
|
|
{
|
|
|
|
int need_flush = 0;
|
|
|
|
u64 *spte, new_spte;
|
|
|
|
pte_t *ptep = (pte_t *)data;
|
|
|
|
pfn_t new_pfn;
|
|
|
|
|
|
|
|
WARN_ON(pte_huge(*ptep));
|
|
|
|
new_pfn = pte_pfn(*ptep);
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
BUG_ON(!is_shadow_present_pte(*spte));
|
|
|
|
rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
|
|
|
|
need_flush = 1;
|
|
|
|
if (pte_write(*ptep)) {
|
|
|
|
rmap_remove(kvm, spte);
|
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
} else {
|
|
|
|
new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
|
|
|
|
new_spte |= (u64)new_pfn << PAGE_SHIFT;
|
|
|
|
|
|
|
|
new_spte &= ~PT_WRITABLE_MASK;
|
|
|
|
new_spte &= ~SPTE_HOST_WRITEABLE;
|
|
|
|
if (is_writeble_pte(*spte))
|
|
|
|
kvm_set_pfn_dirty(spte_to_pfn(*spte));
|
|
|
|
__set_spte(spte, new_spte);
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (need_flush)
|
|
|
|
kvm_flush_remote_tlbs(kvm);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-10-09 11:42:56 +00:00
|
|
|
static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
|
|
|
|
unsigned long data,
|
2009-09-23 21:47:18 +03:00
|
|
|
int (*handler)(struct kvm *kvm, unsigned long *rmapp,
|
2009-10-09 11:42:56 +00:00
|
|
|
unsigned long data))
|
2008-07-25 16:24:52 +02:00
|
|
|
{
|
2009-07-27 16:30:44 +02:00
|
|
|
int i, j;
|
2008-07-25 16:24:52 +02:00
|
|
|
int retval = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If mmap_sem isn't taken, we can look the memslots with only
|
|
|
|
* the mmu_lock by skipping over the slots with userspace_addr == 0.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < kvm->nmemslots; i++) {
|
|
|
|
struct kvm_memory_slot *memslot = &kvm->memslots[i];
|
|
|
|
unsigned long start = memslot->userspace_addr;
|
|
|
|
unsigned long end;
|
|
|
|
|
|
|
|
/* mmu_lock protects userspace_addr */
|
|
|
|
if (!start)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
end = start + (memslot->npages << PAGE_SHIFT);
|
|
|
|
if (hva >= start && hva < end) {
|
|
|
|
gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
|
2009-07-27 16:30:44 +02:00
|
|
|
|
2009-09-23 21:47:18 +03:00
|
|
|
retval |= handler(kvm, &memslot->rmap[gfn_offset],
|
|
|
|
data);
|
2009-07-27 16:30:44 +02:00
|
|
|
|
|
|
|
for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
|
|
|
|
int idx = gfn_offset;
|
|
|
|
idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
|
|
|
|
retval |= handler(kvm,
|
2009-09-23 21:47:18 +03:00
|
|
|
&memslot->lpage_info[j][idx].rmap_pde,
|
|
|
|
data);
|
2009-07-27 16:30:44 +02:00
|
|
|
}
|
2008-07-25 16:24:52 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
|
|
|
|
{
|
2009-09-23 21:47:18 +03:00
|
|
|
return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
|
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
|
|
|
|
{
|
2009-10-09 11:42:56 +00:00
|
|
|
kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
|
2008-07-25 16:24:52 +02:00
|
|
|
}
|
|
|
|
|
2009-10-09 11:42:56 +00:00
|
|
|
static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2008-07-25 16:24:52 +02:00
|
|
|
{
|
|
|
|
u64 *spte;
|
|
|
|
int young = 0;
|
|
|
|
|
2008-09-08 15:12:30 +08:00
|
|
|
/* always return old for EPT */
|
|
|
|
if (!shadow_accessed_mask)
|
|
|
|
return 0;
|
|
|
|
|
2008-07-25 16:24:52 +02:00
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
int _young;
|
|
|
|
u64 _spte = *spte;
|
|
|
|
BUG_ON(!(_spte & PT_PRESENT_MASK));
|
|
|
|
_young = _spte & PT_ACCESSED_MASK;
|
|
|
|
if (_young) {
|
|
|
|
young = 1;
|
|
|
|
clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
|
|
|
|
}
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
|
|
|
}
|
|
|
|
return young;
|
|
|
|
}
|
|
|
|
|
2009-08-05 15:43:58 -03:00
|
|
|
#define RMAP_RECYCLE_THRESHOLD 1000
|
|
|
|
|
2009-07-27 16:30:44 +02:00
|
|
|
static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
|
2009-08-05 15:43:58 -03:00
|
|
|
{
|
|
|
|
unsigned long *rmapp;
|
2009-07-27 16:30:44 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
sp = page_header(__pa(spte));
|
2009-08-05 15:43:58 -03:00
|
|
|
|
|
|
|
gfn = unalias_gfn(vcpu->kvm, gfn);
|
2009-07-27 16:30:44 +02:00
|
|
|
rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
|
2009-08-05 15:43:58 -03:00
|
|
|
|
2009-09-23 21:47:18 +03:00
|
|
|
kvm_unmap_rmapp(vcpu->kvm, rmapp, 0);
|
2009-08-05 15:43:58 -03:00
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
}
|
|
|
|
|
2008-07-25 16:24:52 +02:00
|
|
|
int kvm_age_hva(struct kvm *kvm, unsigned long hva)
|
|
|
|
{
|
2009-09-23 21:47:18 +03:00
|
|
|
return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp);
|
2008-07-25 16:24:52 +02:00
|
|
|
}
|
|
|
|
|
2007-04-25 14:17:25 +08:00
|
|
|
#ifdef MMU_DEBUG
|
2007-05-06 15:50:58 +03:00
|
|
|
static int is_empty_shadow_page(u64 *spt)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2007-01-05 16:36:50 -08:00
|
|
|
u64 *pos;
|
|
|
|
u64 *end;
|
|
|
|
|
2007-05-06 15:50:58 +03:00
|
|
|
for (pos = spt, end = pos + PAGE_SIZE / sizeof(u64); pos != end; pos++)
|
2008-05-20 16:21:13 +03:00
|
|
|
if (is_shadow_present_pte(*pos)) {
|
2008-03-03 12:59:56 -08:00
|
|
|
printk(KERN_ERR "%s: %p %llx\n", __func__,
|
2007-01-05 16:36:50 -08:00
|
|
|
pos, *pos);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
return 0;
|
2007-01-05 16:36:50 -08:00
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
return 1;
|
|
|
|
}
|
2007-04-25 14:17:25 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
2007-01-05 16:36:49 -08:00
|
|
|
{
|
2007-11-21 15:28:32 +02:00
|
|
|
ASSERT(is_empty_shadow_page(sp->spt));
|
|
|
|
list_del(&sp->link);
|
|
|
|
__free_page(virt_to_page(sp->spt));
|
|
|
|
__free_page(virt_to_page(sp->gfns));
|
|
|
|
kfree(sp);
|
2007-12-14 10:01:48 +08:00
|
|
|
++kvm->arch.n_free_mmu_pages;
|
2007-01-05 16:36:49 -08:00
|
|
|
}
|
|
|
|
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
static unsigned kvm_page_table_hashfn(gfn_t gfn)
|
|
|
|
{
|
2008-01-07 13:20:25 +02:00
|
|
|
return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
}
|
|
|
|
|
2007-01-05 16:36:42 -08:00
|
|
|
static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
|
|
|
|
u64 *parent_pte)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
|
|
|
|
sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
|
|
|
|
sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
|
2007-11-21 15:28:32 +02:00
|
|
|
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
|
2007-12-14 10:01:48 +08:00
|
|
|
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
|
2008-12-01 22:32:04 -02:00
|
|
|
INIT_LIST_HEAD(&sp->oos_link);
|
2008-10-16 17:30:57 +08:00
|
|
|
bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
|
2007-11-21 15:28:32 +02:00
|
|
|
sp->multimapped = 0;
|
|
|
|
sp->parent_pte = parent_pte;
|
2007-12-14 10:01:48 +08:00
|
|
|
--vcpu->kvm->arch.n_free_mmu_pages;
|
2007-11-21 15:28:32 +02:00
|
|
|
return sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
2007-01-05 16:36:53 -08:00
|
|
|
static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp, u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!parent_pte)
|
|
|
|
return;
|
2007-11-21 15:28:32 +02:00
|
|
|
if (!sp->multimapped) {
|
|
|
|
u64 *old = sp->parent_pte;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
|
|
|
|
if (!old) {
|
2007-11-21 15:28:32 +02:00
|
|
|
sp->parent_pte = parent_pte;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
return;
|
|
|
|
}
|
2007-11-21 15:28:32 +02:00
|
|
|
sp->multimapped = 1;
|
2007-01-05 16:36:53 -08:00
|
|
|
pte_chain = mmu_alloc_pte_chain(vcpu);
|
2007-11-21 15:28:32 +02:00
|
|
|
INIT_HLIST_HEAD(&sp->parent_ptes);
|
|
|
|
hlist_add_head(&pte_chain->link, &sp->parent_ptes);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
pte_chain->parent_ptes[0] = old;
|
|
|
|
}
|
2007-11-21 15:28:32 +02:00
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
if (pte_chain->parent_ptes[NR_PTE_CHAIN_ENTRIES-1])
|
|
|
|
continue;
|
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i)
|
|
|
|
if (!pte_chain->parent_ptes[i]) {
|
|
|
|
pte_chain->parent_ptes[i] = parent_pte;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
2007-01-05 16:36:53 -08:00
|
|
|
pte_chain = mmu_alloc_pte_chain(vcpu);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
BUG_ON(!pte_chain);
|
2007-11-21 15:28:32 +02:00
|
|
|
hlist_add_head(&pte_chain->link, &sp->parent_ptes);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
pte_chain->parent_ptes[0] = parent_pte;
|
|
|
|
}
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
u64 *parent_pte)
|
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
int i;
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
if (!sp->multimapped) {
|
|
|
|
BUG_ON(sp->parent_pte != parent_pte);
|
|
|
|
sp->parent_pte = NULL;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
return;
|
|
|
|
}
|
2007-11-21 15:28:32 +02:00
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|
|
|
|
if (!pte_chain->parent_ptes[i])
|
|
|
|
break;
|
|
|
|
if (pte_chain->parent_ptes[i] != parent_pte)
|
|
|
|
continue;
|
2007-01-05 16:36:46 -08:00
|
|
|
while (i + 1 < NR_PTE_CHAIN_ENTRIES
|
|
|
|
&& pte_chain->parent_ptes[i + 1]) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
pte_chain->parent_ptes[i]
|
|
|
|
= pte_chain->parent_ptes[i + 1];
|
|
|
|
++i;
|
|
|
|
}
|
|
|
|
pte_chain->parent_ptes[i] = NULL;
|
2007-01-05 16:36:46 -08:00
|
|
|
if (i == 0) {
|
|
|
|
hlist_del(&pte_chain->link);
|
2007-07-17 13:04:56 +03:00
|
|
|
mmu_free_pte_chain(pte_chain);
|
2007-11-21 15:28:32 +02:00
|
|
|
if (hlist_empty(&sp->parent_ptes)) {
|
|
|
|
sp->multimapped = 0;
|
|
|
|
sp->parent_pte = NULL;
|
2007-01-05 16:36:46 -08:00
|
|
|
}
|
|
|
|
}
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:36 -03:00
|
|
|
|
|
|
|
static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
|
|
|
mmu_parent_walk_fn fn)
|
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
struct kvm_mmu_page *parent_sp;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!sp->multimapped && sp->parent_pte) {
|
|
|
|
parent_sp = page_header(__pa(sp->parent_pte));
|
|
|
|
fn(vcpu, parent_sp);
|
|
|
|
mmu_parent_walk(vcpu, parent_sp, fn);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|
|
|
|
if (!pte_chain->parent_ptes[i])
|
|
|
|
break;
|
|
|
|
parent_sp = page_header(__pa(pte_chain->parent_ptes[i]));
|
|
|
|
fn(vcpu, parent_sp);
|
|
|
|
mmu_parent_walk(vcpu, parent_sp, fn);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:40 -03:00
|
|
|
static void kvm_mmu_update_unsync_bitmap(u64 *spte)
|
|
|
|
{
|
|
|
|
unsigned int index;
|
|
|
|
struct kvm_mmu_page *sp = page_header(__pa(spte));
|
|
|
|
|
|
|
|
index = spte - sp->spt;
|
2008-12-01 22:32:02 -02:00
|
|
|
if (!__test_and_set_bit(index, sp->unsync_child_bitmap))
|
|
|
|
sp->unsync_children++;
|
|
|
|
WARN_ON(!sp->unsync_children);
|
2008-09-23 13:18:40 -03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_mmu_update_parents_unsync(struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!sp->parent_pte)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!sp->multimapped) {
|
|
|
|
kvm_mmu_update_unsync_bitmap(sp->parent_pte);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|
|
|
|
if (!pte_chain->parent_ptes[i])
|
|
|
|
break;
|
|
|
|
kvm_mmu_update_unsync_bitmap(pte_chain->parent_ptes[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int unsync_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
kvm_mmu_update_parents_unsync(sp);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_mmu_mark_parents_unsync(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
mmu_parent_walk(vcpu, sp, unsync_walk_fn);
|
|
|
|
kvm_mmu_update_parents_unsync(sp);
|
|
|
|
}
|
|
|
|
|
2008-05-29 14:55:03 +03:00
|
|
|
static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
|
|
|
|
sp->spt[i] = shadow_trap_nonpresent_pte;
|
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:33 -03:00
|
|
|
static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:35 -03:00
|
|
|
static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
#define KVM_PAGE_ARRAY_NR 16
|
|
|
|
|
|
|
|
struct kvm_mmu_pages {
|
|
|
|
struct mmu_page_and_offset {
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
unsigned int idx;
|
|
|
|
} page[KVM_PAGE_ARRAY_NR];
|
|
|
|
unsigned int nr;
|
|
|
|
};
|
|
|
|
|
2008-09-23 13:18:40 -03:00
|
|
|
#define for_each_unsync_children(bitmap, idx) \
|
|
|
|
for (idx = find_first_bit(bitmap, 512); \
|
|
|
|
idx < 512; \
|
|
|
|
idx = find_next_bit(bitmap, 512, idx+1))
|
|
|
|
|
2009-02-21 02:19:13 +01:00
|
|
|
static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
|
|
|
|
int idx)
|
2008-09-23 13:18:39 -03:00
|
|
|
{
|
2008-12-01 22:32:02 -02:00
|
|
|
int i;
|
2008-09-23 13:18:39 -03:00
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
if (sp->unsync)
|
|
|
|
for (i=0; i < pvec->nr; i++)
|
|
|
|
if (pvec->page[i].sp == sp)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
pvec->page[pvec->nr].sp = sp;
|
|
|
|
pvec->page[pvec->nr].idx = idx;
|
|
|
|
pvec->nr++;
|
|
|
|
return (pvec->nr == KVM_PAGE_ARRAY_NR);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
|
|
|
{
|
|
|
|
int i, ret, nr_unsync_leaf = 0;
|
2008-09-23 13:18:39 -03:00
|
|
|
|
2008-09-23 13:18:40 -03:00
|
|
|
for_each_unsync_children(sp->unsync_child_bitmap, i) {
|
2008-09-23 13:18:39 -03:00
|
|
|
u64 ent = sp->spt[i];
|
|
|
|
|
2008-12-22 18:49:30 -02:00
|
|
|
if (is_shadow_present_pte(ent) && !is_large_pte(ent)) {
|
2008-09-23 13:18:39 -03:00
|
|
|
struct kvm_mmu_page *child;
|
|
|
|
child = page_header(ent & PT64_BASE_ADDR_MASK);
|
|
|
|
|
|
|
|
if (child->unsync_children) {
|
2008-12-01 22:32:02 -02:00
|
|
|
if (mmu_pages_add(pvec, child, i))
|
|
|
|
return -ENOSPC;
|
|
|
|
|
|
|
|
ret = __mmu_unsync_walk(child, pvec);
|
|
|
|
if (!ret)
|
|
|
|
__clear_bit(i, sp->unsync_child_bitmap);
|
|
|
|
else if (ret > 0)
|
|
|
|
nr_unsync_leaf += ret;
|
|
|
|
else
|
2008-09-23 13:18:39 -03:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (child->unsync) {
|
2008-12-01 22:32:02 -02:00
|
|
|
nr_unsync_leaf++;
|
|
|
|
if (mmu_pages_add(pvec, child, i))
|
|
|
|
return -ENOSPC;
|
2008-09-23 13:18:39 -03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:40 -03:00
|
|
|
if (find_first_bit(sp->unsync_child_bitmap, 512) == 512)
|
2008-09-23 13:18:39 -03:00
|
|
|
sp->unsync_children = 0;
|
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
return nr_unsync_leaf;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_unsync_walk(struct kvm_mmu_page *sp,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
|
|
|
{
|
|
|
|
if (!sp->unsync_children)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
mmu_pages_add(pvec, sp, 0);
|
|
|
|
return __mmu_unsync_walk(sp, pvec);
|
2008-09-23 13:18:39 -03:00
|
|
|
}
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
{
|
|
|
|
unsigned index;
|
|
|
|
struct hlist_head *bucket;
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
struct hlist_node *node;
|
|
|
|
|
2008-03-03 12:59:56 -08:00
|
|
|
pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
|
2008-01-07 13:20:25 +02:00
|
|
|
index = kvm_page_table_hashfn(gfn);
|
2007-12-14 10:01:48 +08:00
|
|
|
bucket = &kvm->arch.mmu_page_hash[index];
|
2007-11-21 15:28:32 +02:00
|
|
|
hlist_for_each_entry(sp, node, bucket, hash_link)
|
2009-01-11 13:02:10 +02:00
|
|
|
if (sp->gfn == gfn && !sp->role.direct
|
2008-02-20 14:47:24 -05:00
|
|
|
&& !sp->role.invalid) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
pgprintk("%s: found role %x\n",
|
2008-03-03 12:59:56 -08:00
|
|
|
__func__, sp->role.word);
|
2007-11-21 15:28:32 +02:00
|
|
|
return sp;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:39 -03:00
|
|
|
static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
WARN_ON(!sp->unsync);
|
|
|
|
sp->unsync = 0;
|
|
|
|
--kvm->stat.mmu_unsync;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
|
|
|
|
|
|
|
|
static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
if (sp->role.glevels != vcpu->arch.mmu.root_level) {
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2009-07-06 15:58:14 +03:00
|
|
|
trace_kvm_mmu_sync_page(sp);
|
2008-12-01 22:32:03 -02:00
|
|
|
if (rmap_write_protect(vcpu->kvm, sp->gfn))
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2008-11-21 19:13:58 +01:00
|
|
|
kvm_unlink_unsync_page(vcpu->kvm, sp);
|
2008-09-23 13:18:39 -03:00
|
|
|
if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
struct mmu_page_path {
|
|
|
|
struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
|
|
|
|
unsigned int idx[PT64_ROOT_LEVEL-1];
|
2008-09-23 13:18:39 -03:00
|
|
|
};
|
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
#define for_each_sp(pvec, sp, parents, i) \
|
|
|
|
for (i = mmu_pages_next(&pvec, &parents, -1), \
|
|
|
|
sp = pvec.page[i].sp; \
|
|
|
|
i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \
|
|
|
|
i = mmu_pages_next(&pvec, &parents, i))
|
|
|
|
|
2009-02-21 02:19:13 +01:00
|
|
|
static int mmu_pages_next(struct kvm_mmu_pages *pvec,
|
|
|
|
struct mmu_page_path *parents,
|
|
|
|
int i)
|
2008-12-01 22:32:02 -02:00
|
|
|
{
|
|
|
|
int n;
|
|
|
|
|
|
|
|
for (n = i+1; n < pvec->nr; n++) {
|
|
|
|
struct kvm_mmu_page *sp = pvec->page[n].sp;
|
|
|
|
|
|
|
|
if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
|
|
|
|
parents->idx[0] = pvec->page[n].idx;
|
|
|
|
return n;
|
|
|
|
}
|
|
|
|
|
|
|
|
parents->parent[sp->role.level-2] = sp;
|
|
|
|
parents->idx[sp->role.level-1] = pvec->page[n].idx;
|
|
|
|
}
|
|
|
|
|
|
|
|
return n;
|
|
|
|
}
|
|
|
|
|
2009-02-21 02:19:13 +01:00
|
|
|
static void mmu_pages_clear_parents(struct mmu_page_path *parents)
|
2008-09-23 13:18:39 -03:00
|
|
|
{
|
2008-12-01 22:32:02 -02:00
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
unsigned int level = 0;
|
|
|
|
|
|
|
|
do {
|
|
|
|
unsigned int idx = parents->idx[level];
|
2008-09-23 13:18:39 -03:00
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
sp = parents->parent[level];
|
|
|
|
if (!sp)
|
|
|
|
return;
|
|
|
|
|
|
|
|
--sp->unsync_children;
|
|
|
|
WARN_ON((int)sp->unsync_children < 0);
|
|
|
|
__clear_bit(idx, sp->unsync_child_bitmap);
|
|
|
|
level++;
|
|
|
|
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
|
2008-09-23 13:18:39 -03:00
|
|
|
}
|
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
|
|
|
|
struct mmu_page_path *parents,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
2008-09-23 13:18:39 -03:00
|
|
|
{
|
2008-12-01 22:32:02 -02:00
|
|
|
parents->parent[parent->role.level-1] = NULL;
|
|
|
|
pvec->nr = 0;
|
|
|
|
}
|
2008-09-23 13:18:39 -03:00
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
static void mmu_sync_children(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *parent)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
struct mmu_page_path parents;
|
|
|
|
struct kvm_mmu_pages pages;
|
|
|
|
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
while (mmu_unsync_walk(parent, &pages)) {
|
2008-12-01 22:32:03 -02:00
|
|
|
int protected = 0;
|
|
|
|
|
|
|
|
for_each_sp(pages, sp, parents, i)
|
|
|
|
protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
|
|
|
|
|
|
|
|
if (protected)
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
for_each_sp(pages, sp, parents, i) {
|
|
|
|
kvm_sync_page(vcpu, sp);
|
|
|
|
mmu_pages_clear_parents(&parents);
|
|
|
|
}
|
2008-09-23 13:18:39 -03:00
|
|
|
cond_resched_lock(&vcpu->kvm->mmu_lock);
|
2008-12-01 22:32:02 -02:00
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
}
|
2008-09-23 13:18:39 -03:00
|
|
|
}
|
|
|
|
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
|
|
|
|
gfn_t gfn,
|
|
|
|
gva_t gaddr,
|
|
|
|
unsigned level,
|
2009-01-11 13:02:10 +02:00
|
|
|
int direct,
|
2007-12-09 17:00:02 +02:00
|
|
|
unsigned access,
|
2008-02-26 22:12:10 +02:00
|
|
|
u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
{
|
|
|
|
union kvm_mmu_page_role role;
|
|
|
|
unsigned index;
|
|
|
|
unsigned quadrant;
|
|
|
|
struct hlist_head *bucket;
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2008-09-23 13:18:39 -03:00
|
|
|
struct hlist_node *node, *tmp;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
|
2008-12-21 19:20:09 +02:00
|
|
|
role = vcpu->arch.mmu.base_role;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
role.level = level;
|
2009-01-11 13:02:10 +02:00
|
|
|
role.direct = direct;
|
2007-12-09 17:00:02 +02:00
|
|
|
role.access = access;
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
|
|
|
|
quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
|
|
|
|
role.quadrant = quadrant;
|
|
|
|
}
|
2008-01-07 13:20:25 +02:00
|
|
|
index = kvm_page_table_hashfn(gfn);
|
2007-12-14 10:01:48 +08:00
|
|
|
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
|
2008-09-23 13:18:39 -03:00
|
|
|
hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
|
|
|
|
if (sp->gfn == gfn) {
|
|
|
|
if (sp->unsync)
|
|
|
|
if (kvm_sync_page(vcpu, sp))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (sp->role.word != role.word)
|
|
|
|
continue;
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
mmu_page_add_parent_pte(vcpu, sp, parent_pte);
|
2008-09-23 13:18:40 -03:00
|
|
|
if (sp->unsync_children) {
|
|
|
|
set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
|
|
|
|
kvm_mmu_mark_parents_unsync(vcpu, sp);
|
|
|
|
}
|
2009-07-06 15:58:14 +03:00
|
|
|
trace_kvm_mmu_get_page(sp, false);
|
2007-11-21 15:28:32 +02:00
|
|
|
return sp;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
}
|
2007-12-18 19:47:18 +02:00
|
|
|
++vcpu->kvm->stat.mmu_cache_miss;
|
2007-11-21 15:28:32 +02:00
|
|
|
sp = kvm_mmu_alloc_page(vcpu, parent_pte);
|
|
|
|
if (!sp)
|
|
|
|
return sp;
|
|
|
|
sp->gfn = gfn;
|
|
|
|
sp->role = role;
|
|
|
|
hlist_add_head(&sp->hash_link, bucket);
|
2009-01-11 13:02:10 +02:00
|
|
|
if (!direct) {
|
2008-12-01 22:32:03 -02:00
|
|
|
if (rmap_write_protect(vcpu->kvm, gfn))
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2008-09-23 13:18:39 -03:00
|
|
|
account_shadowed(vcpu->kvm, gfn);
|
|
|
|
}
|
2008-05-29 14:56:28 +03:00
|
|
|
if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte)
|
|
|
|
vcpu->arch.mmu.prefetch_page(vcpu, sp);
|
|
|
|
else
|
|
|
|
nonpaging_prefetch_page(vcpu, sp);
|
2009-07-06 15:58:14 +03:00
|
|
|
trace_kvm_mmu_get_page(sp, true);
|
2007-11-21 15:28:32 +02:00
|
|
|
return sp;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
}
|
|
|
|
|
2008-12-25 14:39:47 +02:00
|
|
|
static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
|
|
|
|
struct kvm_vcpu *vcpu, u64 addr)
|
|
|
|
{
|
|
|
|
iterator->addr = addr;
|
|
|
|
iterator->shadow_addr = vcpu->arch.mmu.root_hpa;
|
|
|
|
iterator->level = vcpu->arch.mmu.shadow_root_level;
|
|
|
|
if (iterator->level == PT32E_ROOT_LEVEL) {
|
|
|
|
iterator->shadow_addr
|
|
|
|
= vcpu->arch.mmu.pae_root[(addr >> 30) & 3];
|
|
|
|
iterator->shadow_addr &= PT64_BASE_ADDR_MASK;
|
|
|
|
--iterator->level;
|
|
|
|
if (!iterator->shadow_addr)
|
|
|
|
iterator->level = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
|
|
|
|
{
|
|
|
|
if (iterator->level < PT_PAGE_TABLE_LEVEL)
|
|
|
|
return false;
|
2009-06-11 12:07:41 -03:00
|
|
|
|
|
|
|
if (iterator->level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
if (is_large_pte(*iterator->sptep))
|
|
|
|
return false;
|
|
|
|
|
2008-12-25 14:39:47 +02:00
|
|
|
iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level);
|
|
|
|
iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
|
|
|
|
{
|
|
|
|
iterator->shadow_addr = *iterator->sptep & PT64_BASE_ADDR_MASK;
|
|
|
|
--iterator->level;
|
|
|
|
}
|
|
|
|
|
2007-07-17 13:04:56 +03:00
|
|
|
static void kvm_mmu_page_unlink_children(struct kvm *kvm,
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp)
|
2007-01-05 16:36:45 -08:00
|
|
|
{
|
2007-01-05 16:36:46 -08:00
|
|
|
unsigned i;
|
|
|
|
u64 *pt;
|
|
|
|
u64 ent;
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
pt = sp->spt;
|
2007-01-05 16:36:46 -08:00
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
ent = pt[i];
|
|
|
|
|
2008-02-23 11:44:30 -03:00
|
|
|
if (is_shadow_present_pte(ent)) {
|
2009-06-10 12:27:03 -03:00
|
|
|
if (!is_last_spte(ent, sp->role.level)) {
|
2008-02-23 11:44:30 -03:00
|
|
|
ent &= PT64_BASE_ADDR_MASK;
|
|
|
|
mmu_page_remove_parent_pte(page_header(ent),
|
|
|
|
&pt[i]);
|
|
|
|
} else {
|
2009-06-10 12:27:03 -03:00
|
|
|
if (is_large_pte(ent))
|
|
|
|
--kvm->stat.lpages;
|
2008-02-23 11:44:30 -03:00
|
|
|
rmap_remove(kvm, &pt[i]);
|
|
|
|
}
|
|
|
|
}
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
pt[i] = shadow_trap_nonpresent_pte;
|
2007-01-05 16:36:46 -08:00
|
|
|
}
|
2007-01-05 16:36:45 -08:00
|
|
|
}
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
{
|
2007-11-21 15:28:32 +02:00
|
|
|
mmu_page_remove_parent_pte(sp, parent_pte);
|
2007-01-05 16:36:45 -08:00
|
|
|
}
|
|
|
|
|
2007-09-23 14:10:49 +02:00
|
|
|
static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
int i;
|
2009-06-09 15:56:29 +03:00
|
|
|
struct kvm_vcpu *vcpu;
|
2007-09-23 14:10:49 +02:00
|
|
|
|
2009-06-09 15:56:29 +03:00
|
|
|
kvm_for_each_vcpu(i, vcpu, kvm)
|
|
|
|
vcpu->arch.last_pte_updated = NULL;
|
2007-09-23 14:10:49 +02:00
|
|
|
}
|
|
|
|
|
2008-07-11 17:59:46 +03:00
|
|
|
static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
|
2007-01-05 16:36:45 -08:00
|
|
|
{
|
|
|
|
u64 *parent_pte;
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
while (sp->multimapped || sp->parent_pte) {
|
|
|
|
if (!sp->multimapped)
|
|
|
|
parent_pte = sp->parent_pte;
|
2007-01-05 16:36:45 -08:00
|
|
|
else {
|
|
|
|
struct kvm_pte_chain *chain;
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
chain = container_of(sp->parent_ptes.first,
|
2007-01-05 16:36:45 -08:00
|
|
|
struct kvm_pte_chain, link);
|
|
|
|
parent_pte = chain->parent_ptes[0];
|
|
|
|
}
|
2007-01-05 16:36:46 -08:00
|
|
|
BUG_ON(!parent_pte);
|
2007-11-21 15:28:32 +02:00
|
|
|
kvm_mmu_put_page(sp, parent_pte);
|
2009-06-10 14:24:23 +03:00
|
|
|
__set_spte(parent_pte, shadow_trap_nonpresent_pte);
|
2007-01-05 16:36:45 -08:00
|
|
|
}
|
2008-07-11 17:59:46 +03:00
|
|
|
}
|
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
static int mmu_zap_unsync_children(struct kvm *kvm,
|
|
|
|
struct kvm_mmu_page *parent)
|
2008-09-23 13:18:39 -03:00
|
|
|
{
|
2008-12-01 22:32:02 -02:00
|
|
|
int i, zapped = 0;
|
|
|
|
struct mmu_page_path parents;
|
|
|
|
struct kvm_mmu_pages pages;
|
2008-09-23 13:18:39 -03:00
|
|
|
|
2008-12-01 22:32:02 -02:00
|
|
|
if (parent->role.level == PT_PAGE_TABLE_LEVEL)
|
2008-09-23 13:18:39 -03:00
|
|
|
return 0;
|
2008-12-01 22:32:02 -02:00
|
|
|
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
while (mmu_unsync_walk(parent, &pages)) {
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
for_each_sp(pages, sp, parents, i) {
|
|
|
|
kvm_mmu_zap_page(kvm, sp);
|
|
|
|
mmu_pages_clear_parents(&parents);
|
|
|
|
}
|
|
|
|
zapped += pages.nr;
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
}
|
|
|
|
|
|
|
|
return zapped;
|
2008-09-23 13:18:39 -03:00
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:37 -03:00
|
|
|
static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
2008-07-11 17:59:46 +03:00
|
|
|
{
|
2008-09-23 13:18:39 -03:00
|
|
|
int ret;
|
2009-07-06 15:58:14 +03:00
|
|
|
|
|
|
|
trace_kvm_mmu_zap_page(sp);
|
2008-07-11 17:59:46 +03:00
|
|
|
++kvm->stat.mmu_shadow_zapped;
|
2008-09-23 13:18:39 -03:00
|
|
|
ret = mmu_zap_unsync_children(kvm, sp);
|
2007-11-21 15:28:32 +02:00
|
|
|
kvm_mmu_page_unlink_children(kvm, sp);
|
2008-07-11 17:59:46 +03:00
|
|
|
kvm_mmu_unlink_parents(kvm, sp);
|
2008-07-11 18:07:26 +03:00
|
|
|
kvm_flush_remote_tlbs(kvm);
|
2009-01-11 13:02:10 +02:00
|
|
|
if (!sp->role.invalid && !sp->role.direct)
|
2008-07-11 18:07:26 +03:00
|
|
|
unaccount_shadowed(kvm, sp->gfn);
|
2008-09-23 13:18:39 -03:00
|
|
|
if (sp->unsync)
|
|
|
|
kvm_unlink_unsync_page(kvm, sp);
|
2007-11-21 15:28:32 +02:00
|
|
|
if (!sp->root_count) {
|
|
|
|
hlist_del(&sp->hash_link);
|
|
|
|
kvm_mmu_free_page(kvm, sp);
|
2008-02-20 14:47:24 -05:00
|
|
|
} else {
|
|
|
|
sp->role.invalid = 1;
|
2008-07-11 18:07:26 +03:00
|
|
|
list_move(&sp->link, &kvm->arch.active_mmu_pages);
|
2008-02-20 14:47:24 -05:00
|
|
|
kvm_reload_remote_mmus(kvm);
|
|
|
|
}
|
2007-09-23 14:10:49 +02:00
|
|
|
kvm_mmu_reset_last_pte_updated(kvm);
|
2008-09-23 13:18:39 -03:00
|
|
|
return ret;
|
2007-01-05 16:36:45 -08:00
|
|
|
}
|
|
|
|
|
2007-10-02 18:52:55 +02:00
|
|
|
/*
|
|
|
|
* Changing the number of mmu pages allocated to the vm
|
|
|
|
* Note: if kvm_nr_mmu_pages is too small, you will get dead lock
|
|
|
|
*/
|
|
|
|
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages)
|
|
|
|
{
|
2009-07-22 13:05:49 -03:00
|
|
|
int used_pages;
|
|
|
|
|
|
|
|
used_pages = kvm->arch.n_alloc_mmu_pages - kvm->arch.n_free_mmu_pages;
|
|
|
|
used_pages = max(0, used_pages);
|
|
|
|
|
2007-10-02 18:52:55 +02:00
|
|
|
/*
|
|
|
|
* If we set the number of mmu pages to be smaller be than the
|
|
|
|
* number of actived pages , we must to free some mmu pages before we
|
|
|
|
* change the value
|
|
|
|
*/
|
|
|
|
|
2009-07-22 13:05:49 -03:00
|
|
|
if (used_pages > kvm_nr_mmu_pages) {
|
|
|
|
while (used_pages > kvm_nr_mmu_pages) {
|
2007-10-02 18:52:55 +02:00
|
|
|
struct kvm_mmu_page *page;
|
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
page = container_of(kvm->arch.active_mmu_pages.prev,
|
2007-10-02 18:52:55 +02:00
|
|
|
struct kvm_mmu_page, link);
|
|
|
|
kvm_mmu_zap_page(kvm, page);
|
2009-07-22 13:05:49 -03:00
|
|
|
used_pages--;
|
2007-10-02 18:52:55 +02:00
|
|
|
}
|
2007-12-14 10:01:48 +08:00
|
|
|
kvm->arch.n_free_mmu_pages = 0;
|
2007-10-02 18:52:55 +02:00
|
|
|
}
|
|
|
|
else
|
2007-12-14 10:01:48 +08:00
|
|
|
kvm->arch.n_free_mmu_pages += kvm_nr_mmu_pages
|
|
|
|
- kvm->arch.n_alloc_mmu_pages;
|
2007-10-02 18:52:55 +02:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
kvm->arch.n_alloc_mmu_pages = kvm_nr_mmu_pages;
|
2007-10-02 18:52:55 +02:00
|
|
|
}
|
|
|
|
|
2007-10-10 19:25:50 -05:00
|
|
|
static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
|
2007-01-05 16:36:45 -08:00
|
|
|
{
|
|
|
|
unsigned index;
|
|
|
|
struct hlist_head *bucket;
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-05 16:36:45 -08:00
|
|
|
struct hlist_node *node, *n;
|
|
|
|
int r;
|
|
|
|
|
2008-03-03 12:59:56 -08:00
|
|
|
pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
|
2007-01-05 16:36:45 -08:00
|
|
|
r = 0;
|
2008-01-07 13:20:25 +02:00
|
|
|
index = kvm_page_table_hashfn(gfn);
|
2007-12-14 10:01:48 +08:00
|
|
|
bucket = &kvm->arch.mmu_page_hash[index];
|
2007-11-21 15:28:32 +02:00
|
|
|
hlist_for_each_entry_safe(sp, node, n, bucket, hash_link)
|
2009-01-11 13:02:10 +02:00
|
|
|
if (sp->gfn == gfn && !sp->role.direct) {
|
2008-03-03 12:59:56 -08:00
|
|
|
pgprintk("%s: gfn %lx role %x\n", __func__, gfn,
|
2007-11-21 15:28:32 +02:00
|
|
|
sp->role.word);
|
2007-01-05 16:36:45 -08:00
|
|
|
r = 1;
|
2008-09-23 13:18:37 -03:00
|
|
|
if (kvm_mmu_zap_page(kvm, sp))
|
|
|
|
n = bucket->first;
|
2007-01-05 16:36:45 -08:00
|
|
|
}
|
|
|
|
return r;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
}
|
|
|
|
|
2007-10-10 19:25:50 -05:00
|
|
|
static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
|
2007-05-31 15:08:29 +03:00
|
|
|
{
|
2009-01-06 13:00:27 +02:00
|
|
|
unsigned index;
|
|
|
|
struct hlist_head *bucket;
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2009-01-06 13:00:27 +02:00
|
|
|
struct hlist_node *node, *nn;
|
2007-05-31 15:08:29 +03:00
|
|
|
|
2009-01-06 13:00:27 +02:00
|
|
|
index = kvm_page_table_hashfn(gfn);
|
|
|
|
bucket = &kvm->arch.mmu_page_hash[index];
|
|
|
|
hlist_for_each_entry_safe(sp, node, nn, bucket, hash_link) {
|
2009-01-11 13:02:10 +02:00
|
|
|
if (sp->gfn == gfn && !sp->role.direct
|
2009-01-06 13:00:27 +02:00
|
|
|
&& !sp->role.invalid) {
|
|
|
|
pgprintk("%s: zap %lx %x\n",
|
|
|
|
__func__, gfn, sp->role.word);
|
|
|
|
kvm_mmu_zap_page(kvm, sp);
|
|
|
|
}
|
2007-05-31 15:08:29 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-11-21 14:20:22 +02:00
|
|
|
static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2007-11-21 14:20:22 +02:00
|
|
|
int slot = memslot_id(kvm, gfn_to_memslot(kvm, gfn));
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp = page_header(__pa(pte));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2008-10-16 17:30:57 +08:00
|
|
|
__set_bit(slot, sp->slot_bitmap);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:38 -03:00
|
|
|
static void mmu_convert_notrap(struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
u64 *pt = sp->spt;
|
|
|
|
|
|
|
|
if (shadow_trap_nonpresent_pte == shadow_notrap_nonpresent_pte)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
if (pt[i] == shadow_notrap_nonpresent_pte)
|
2009-06-10 14:24:23 +03:00
|
|
|
__set_spte(&pt[i], shadow_trap_nonpresent_pte);
|
2008-09-23 13:18:38 -03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-03-20 12:46:50 +02:00
|
|
|
struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
2008-02-10 18:04:15 +02:00
|
|
|
struct page *page;
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, gva);
|
2007-03-20 12:46:50 +02:00
|
|
|
|
|
|
|
if (gpa == UNMAPPED_GVA)
|
|
|
|
return NULL;
|
2008-02-10 18:04:15 +02:00
|
|
|
|
|
|
|
page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
|
|
|
|
|
|
|
|
return page;
|
2007-03-20 12:46:50 +02:00
|
|
|
}
|
|
|
|
|
2008-10-09 16:01:56 +08:00
|
|
|
/*
|
|
|
|
* The function is based on mtrr_type_lookup() in
|
|
|
|
* arch/x86/kernel/cpu/mtrr/generic.c
|
|
|
|
*/
|
|
|
|
static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
|
|
|
|
u64 start, u64 end)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
u64 base, mask;
|
|
|
|
u8 prev_match, curr_match;
|
|
|
|
int num_var_ranges = KVM_NR_VAR_MTRR;
|
|
|
|
|
|
|
|
if (!mtrr_state->enabled)
|
|
|
|
return 0xFF;
|
|
|
|
|
|
|
|
/* Make end inclusive end, instead of exclusive */
|
|
|
|
end--;
|
|
|
|
|
|
|
|
/* Look in fixed ranges. Just return the type as per start */
|
|
|
|
if (mtrr_state->have_fixed && (start < 0x100000)) {
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
if (start < 0x80000) {
|
|
|
|
idx = 0;
|
|
|
|
idx += (start >> 16);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
} else if (start < 0xC0000) {
|
|
|
|
idx = 1 * 8;
|
|
|
|
idx += ((start - 0x80000) >> 14);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
} else if (start < 0x1000000) {
|
|
|
|
idx = 3 * 8;
|
|
|
|
idx += ((start - 0xC0000) >> 12);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Look in variable ranges
|
|
|
|
* Look of multiple ranges matching this address and pick type
|
|
|
|
* as per MTRR precedence
|
|
|
|
*/
|
|
|
|
if (!(mtrr_state->enabled & 2))
|
|
|
|
return mtrr_state->def_type;
|
|
|
|
|
|
|
|
prev_match = 0xFF;
|
|
|
|
for (i = 0; i < num_var_ranges; ++i) {
|
|
|
|
unsigned short start_state, end_state;
|
|
|
|
|
|
|
|
if (!(mtrr_state->var_ranges[i].mask_lo & (1 << 11)))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
base = (((u64)mtrr_state->var_ranges[i].base_hi) << 32) +
|
|
|
|
(mtrr_state->var_ranges[i].base_lo & PAGE_MASK);
|
|
|
|
mask = (((u64)mtrr_state->var_ranges[i].mask_hi) << 32) +
|
|
|
|
(mtrr_state->var_ranges[i].mask_lo & PAGE_MASK);
|
|
|
|
|
|
|
|
start_state = ((start & mask) == (base & mask));
|
|
|
|
end_state = ((end & mask) == (base & mask));
|
|
|
|
if (start_state != end_state)
|
|
|
|
return 0xFE;
|
|
|
|
|
|
|
|
if ((start & mask) != (base & mask))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
curr_match = mtrr_state->var_ranges[i].base_lo & 0xff;
|
|
|
|
if (prev_match == 0xFF) {
|
|
|
|
prev_match = curr_match;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match == MTRR_TYPE_UNCACHABLE ||
|
|
|
|
curr_match == MTRR_TYPE_UNCACHABLE)
|
|
|
|
return MTRR_TYPE_UNCACHABLE;
|
|
|
|
|
|
|
|
if ((prev_match == MTRR_TYPE_WRBACK &&
|
|
|
|
curr_match == MTRR_TYPE_WRTHROUGH) ||
|
|
|
|
(prev_match == MTRR_TYPE_WRTHROUGH &&
|
|
|
|
curr_match == MTRR_TYPE_WRBACK)) {
|
|
|
|
prev_match = MTRR_TYPE_WRTHROUGH;
|
|
|
|
curr_match = MTRR_TYPE_WRTHROUGH;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match != curr_match)
|
|
|
|
return MTRR_TYPE_UNCACHABLE;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match != 0xFF)
|
|
|
|
return prev_match;
|
|
|
|
|
|
|
|
return mtrr_state->def_type;
|
|
|
|
}
|
|
|
|
|
2009-04-27 20:35:42 +08:00
|
|
|
u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
|
2008-10-09 16:01:56 +08:00
|
|
|
{
|
|
|
|
u8 mtrr;
|
|
|
|
|
|
|
|
mtrr = get_mtrr_type(&vcpu->arch.mtrr_state, gfn << PAGE_SHIFT,
|
|
|
|
(gfn << PAGE_SHIFT) + PAGE_SIZE);
|
|
|
|
if (mtrr == 0xfe || mtrr == 0xff)
|
|
|
|
mtrr = MTRR_TYPE_WRBACK;
|
|
|
|
return mtrr;
|
|
|
|
}
|
2009-04-27 20:35:42 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type);
|
2008-10-09 16:01:56 +08:00
|
|
|
|
2008-09-23 13:18:39 -03:00
|
|
|
static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
unsigned index;
|
|
|
|
struct hlist_head *bucket;
|
|
|
|
struct kvm_mmu_page *s;
|
|
|
|
struct hlist_node *node, *n;
|
|
|
|
|
2009-07-06 15:58:14 +03:00
|
|
|
trace_kvm_mmu_unsync_page(sp);
|
2008-09-23 13:18:39 -03:00
|
|
|
index = kvm_page_table_hashfn(sp->gfn);
|
|
|
|
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
|
|
|
|
/* don't unsync if pagetable is shadowed with multiple roles */
|
|
|
|
hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
|
2009-01-11 13:02:10 +02:00
|
|
|
if (s->gfn != sp->gfn || s->role.direct)
|
2008-09-23 13:18:39 -03:00
|
|
|
continue;
|
|
|
|
if (s->role.word != sp->role.word)
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
++vcpu->kvm->stat.mmu_unsync;
|
|
|
|
sp->unsync = 1;
|
2008-12-01 22:32:04 -02:00
|
|
|
|
2009-04-05 14:54:47 -03:00
|
|
|
kvm_mmu_mark_parents_unsync(vcpu, sp);
|
2008-12-01 22:32:04 -02:00
|
|
|
|
2008-09-23 13:18:39 -03:00
|
|
|
mmu_convert_notrap(sp);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
|
|
|
|
bool can_unsync)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_page *shadow;
|
|
|
|
|
|
|
|
shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
|
|
|
|
if (shadow) {
|
|
|
|
if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
|
|
|
|
return 1;
|
|
|
|
if (shadow->unsync)
|
|
|
|
return 0;
|
2008-09-23 13:18:41 -03:00
|
|
|
if (can_unsync && oos_shadow)
|
2008-09-23 13:18:39 -03:00
|
|
|
return kvm_unsync_page(vcpu, shadow);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
2008-09-23 13:18:30 -03:00
|
|
|
unsigned pte_access, int user_fault,
|
2009-07-27 16:30:44 +02:00
|
|
|
int write_fault, int dirty, int level,
|
2009-04-05 14:54:47 -03:00
|
|
|
gfn_t gfn, pfn_t pfn, bool speculative,
|
2009-09-23 21:47:17 +03:00
|
|
|
bool can_unsync, bool reset_host_protection)
|
2007-12-09 17:40:31 +02:00
|
|
|
{
|
|
|
|
u64 spte;
|
2008-09-23 13:18:30 -03:00
|
|
|
int ret = 0;
|
2008-10-09 16:01:57 +08:00
|
|
|
|
2007-12-09 17:40:31 +02:00
|
|
|
/*
|
|
|
|
* We don't set the accessed bit, since we sometimes want to see
|
|
|
|
* whether the guest actually used the pte (in order to detect
|
|
|
|
* demand paging).
|
|
|
|
*/
|
2008-04-25 21:13:50 +08:00
|
|
|
spte = shadow_base_present_pte | shadow_dirty_mask;
|
2008-03-18 11:05:52 +02:00
|
|
|
if (!speculative)
|
2008-08-27 20:01:04 +03:00
|
|
|
spte |= shadow_accessed_mask;
|
2007-12-09 17:40:31 +02:00
|
|
|
if (!dirty)
|
|
|
|
pte_access &= ~ACC_WRITE_MASK;
|
2008-04-25 21:13:50 +08:00
|
|
|
if (pte_access & ACC_EXEC_MASK)
|
|
|
|
spte |= shadow_x_mask;
|
|
|
|
else
|
|
|
|
spte |= shadow_nx_mask;
|
2007-12-09 17:40:31 +02:00
|
|
|
if (pte_access & ACC_USER_MASK)
|
2008-04-25 21:13:50 +08:00
|
|
|
spte |= shadow_user_mask;
|
2009-07-27 16:30:44 +02:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL)
|
2008-02-23 11:44:30 -03:00
|
|
|
spte |= PT_PAGE_SIZE_MASK;
|
2009-04-27 20:35:42 +08:00
|
|
|
if (tdp_enabled)
|
|
|
|
spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
|
|
|
|
kvm_is_mmio_pfn(pfn));
|
2007-12-09 17:40:31 +02:00
|
|
|
|
2009-09-23 21:47:17 +03:00
|
|
|
if (reset_host_protection)
|
|
|
|
spte |= SPTE_HOST_WRITEABLE;
|
|
|
|
|
2008-04-02 14:46:56 -05:00
|
|
|
spte |= (u64)pfn << PAGE_SHIFT;
|
2007-12-09 17:40:31 +02:00
|
|
|
|
|
|
|
if ((pte_access & ACC_WRITE_MASK)
|
|
|
|
|| (write_fault && !is_write_protection(vcpu) && !user_fault)) {
|
|
|
|
|
2009-07-27 16:30:44 +02:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL &&
|
|
|
|
has_wrprotected_page(vcpu->kvm, gfn, level)) {
|
2008-09-23 13:18:32 -03:00
|
|
|
ret = 1;
|
|
|
|
spte = shadow_trap_nonpresent_pte;
|
|
|
|
goto set_pte;
|
|
|
|
}
|
|
|
|
|
2007-12-09 17:40:31 +02:00
|
|
|
spte |= PT_WRITABLE_MASK;
|
|
|
|
|
2008-11-25 15:58:07 +01:00
|
|
|
/*
|
|
|
|
* Optimization: for pte sync, if spte was writable the hash
|
|
|
|
* lookup is unnecessary (and expensive). Write protection
|
|
|
|
* is responsibility of mmu_get_page / kvm_sync_page.
|
|
|
|
* Same reasoning can be applied to dirty page accounting.
|
|
|
|
*/
|
2009-06-10 14:24:23 +03:00
|
|
|
if (!can_unsync && is_writeble_pte(*sptep))
|
2008-11-25 15:58:07 +01:00
|
|
|
goto set_pte;
|
|
|
|
|
2008-09-23 13:18:39 -03:00
|
|
|
if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
|
2007-12-09 17:40:31 +02:00
|
|
|
pgprintk("%s: found shadow page for %lx, marking ro\n",
|
2008-03-03 12:59:56 -08:00
|
|
|
__func__, gfn);
|
2008-09-23 13:18:30 -03:00
|
|
|
ret = 1;
|
2007-12-09 17:40:31 +02:00
|
|
|
pte_access &= ~ACC_WRITE_MASK;
|
2008-09-23 13:18:31 -03:00
|
|
|
if (is_writeble_pte(spte))
|
2007-12-09 17:40:31 +02:00
|
|
|
spte &= ~PT_WRITABLE_MASK;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pte_access & ACC_WRITE_MASK)
|
|
|
|
mark_page_dirty(vcpu->kvm, gfn);
|
|
|
|
|
2008-09-23 13:18:32 -03:00
|
|
|
set_pte:
|
2009-06-10 14:24:23 +03:00
|
|
|
__set_spte(sptep, spte);
|
2008-09-23 13:18:30 -03:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
2008-09-23 13:18:30 -03:00
|
|
|
unsigned pt_access, unsigned pte_access,
|
|
|
|
int user_fault, int write_fault, int dirty,
|
2009-07-27 16:30:44 +02:00
|
|
|
int *ptwrite, int level, gfn_t gfn,
|
2009-09-23 21:47:17 +03:00
|
|
|
pfn_t pfn, bool speculative,
|
|
|
|
bool reset_host_protection)
|
2008-09-23 13:18:30 -03:00
|
|
|
{
|
|
|
|
int was_rmapped = 0;
|
2009-06-10 14:24:23 +03:00
|
|
|
int was_writeble = is_writeble_pte(*sptep);
|
2009-08-05 15:43:58 -03:00
|
|
|
int rmap_count;
|
2008-09-23 13:18:30 -03:00
|
|
|
|
|
|
|
pgprintk("%s: spte %llx access %x write_fault %d"
|
|
|
|
" user_fault %d gfn %lx\n",
|
2009-06-10 14:24:23 +03:00
|
|
|
__func__, *sptep, pt_access,
|
2008-09-23 13:18:30 -03:00
|
|
|
write_fault, user_fault, gfn);
|
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
if (is_rmap_spte(*sptep)) {
|
2008-09-23 13:18:30 -03:00
|
|
|
/*
|
|
|
|
* If we overwrite a PTE page pointer with a 2MB PMD, unlink
|
|
|
|
* the parent of the now unreachable PTE.
|
|
|
|
*/
|
2009-07-27 16:30:44 +02:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL &&
|
|
|
|
!is_large_pte(*sptep)) {
|
2008-09-23 13:18:30 -03:00
|
|
|
struct kvm_mmu_page *child;
|
2009-06-10 14:24:23 +03:00
|
|
|
u64 pte = *sptep;
|
2008-09-23 13:18:30 -03:00
|
|
|
|
|
|
|
child = page_header(pte & PT64_BASE_ADDR_MASK);
|
2009-06-10 14:24:23 +03:00
|
|
|
mmu_page_remove_parent_pte(child, sptep);
|
|
|
|
} else if (pfn != spte_to_pfn(*sptep)) {
|
2008-09-23 13:18:30 -03:00
|
|
|
pgprintk("hfn old %lx new %lx\n",
|
2009-06-10 14:24:23 +03:00
|
|
|
spte_to_pfn(*sptep), pfn);
|
|
|
|
rmap_remove(vcpu->kvm, sptep);
|
2009-02-18 14:08:59 +01:00
|
|
|
} else
|
|
|
|
was_rmapped = 1;
|
2008-09-23 13:18:30 -03:00
|
|
|
}
|
2009-07-27 16:30:44 +02:00
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
|
2009-09-23 21:47:17 +03:00
|
|
|
dirty, level, gfn, pfn, speculative, true,
|
|
|
|
reset_host_protection)) {
|
2008-09-23 13:18:30 -03:00
|
|
|
if (write_fault)
|
|
|
|
*ptwrite = 1;
|
2008-09-23 13:18:31 -03:00
|
|
|
kvm_x86_ops->tlb_flush(vcpu);
|
|
|
|
}
|
2008-09-23 13:18:30 -03:00
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
|
2008-09-23 13:18:30 -03:00
|
|
|
pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n",
|
2009-06-10 14:24:23 +03:00
|
|
|
is_large_pte(*sptep)? "2MB" : "4kB",
|
2009-07-09 16:36:01 +02:00
|
|
|
*sptep & PT_PRESENT_MASK ?"RW":"R", gfn,
|
|
|
|
*sptep, sptep);
|
2009-06-10 14:24:23 +03:00
|
|
|
if (!was_rmapped && is_large_pte(*sptep))
|
2008-02-23 11:44:30 -03:00
|
|
|
++vcpu->kvm->stat.lpages;
|
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
page_header_update_slot(vcpu->kvm, sptep, gfn);
|
2007-12-09 17:40:31 +02:00
|
|
|
if (!was_rmapped) {
|
2009-07-27 16:30:42 +02:00
|
|
|
rmap_count = rmap_add(vcpu, sptep, gfn);
|
2009-09-23 21:47:16 +03:00
|
|
|
kvm_release_pfn_clean(pfn);
|
2009-08-05 15:43:58 -03:00
|
|
|
if (rmap_count > RMAP_RECYCLE_THRESHOLD)
|
2009-07-27 16:30:44 +02:00
|
|
|
rmap_recycle(vcpu, sptep, gfn);
|
2008-01-12 23:49:09 +02:00
|
|
|
} else {
|
|
|
|
if (was_writeble)
|
2008-04-02 14:46:56 -05:00
|
|
|
kvm_release_pfn_dirty(pfn);
|
2008-01-12 23:49:09 +02:00
|
|
|
else
|
2008-04-02 14:46:56 -05:00
|
|
|
kvm_release_pfn_clean(pfn);
|
2007-12-09 17:40:31 +02:00
|
|
|
}
|
2008-05-15 13:51:35 +03:00
|
|
|
if (speculative) {
|
2009-06-10 14:24:23 +03:00
|
|
|
vcpu->arch.last_pte_updated = sptep;
|
2008-05-15 13:51:35 +03:00
|
|
|
vcpu->arch.last_pte_gfn = gfn;
|
|
|
|
}
|
2007-12-09 17:40:31 +02:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2008-12-25 14:54:25 +02:00
|
|
|
static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
|
2009-07-27 16:30:44 +02:00
|
|
|
int level, gfn_t gfn, pfn_t pfn)
|
2008-08-22 19:28:04 +03:00
|
|
|
{
|
2008-12-25 14:54:25 +02:00
|
|
|
struct kvm_shadow_walk_iterator iterator;
|
2008-08-22 19:28:04 +03:00
|
|
|
struct kvm_mmu_page *sp;
|
2008-12-25 14:54:25 +02:00
|
|
|
int pt_write = 0;
|
2008-08-22 19:28:04 +03:00
|
|
|
gfn_t pseudo_gfn;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2008-12-25 14:54:25 +02:00
|
|
|
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
|
2009-07-27 16:30:44 +02:00
|
|
|
if (iterator.level == level) {
|
2008-12-25 14:54:25 +02:00
|
|
|
mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
|
|
|
|
0, write, 1, &pt_write,
|
2009-09-23 21:47:17 +03:00
|
|
|
level, gfn, pfn, false, true);
|
2008-12-25 14:54:25 +02:00
|
|
|
++vcpu->stat.pf_fixed;
|
|
|
|
break;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
2008-12-25 14:54:25 +02:00
|
|
|
if (*iterator.sptep == shadow_trap_nonpresent_pte) {
|
|
|
|
pseudo_gfn = (iterator.addr & PT64_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
|
|
|
|
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
|
|
|
|
iterator.level - 1,
|
|
|
|
1, ACC_ALL, iterator.sptep);
|
|
|
|
if (!sp) {
|
|
|
|
pgprintk("nonpaging_map: ENOMEM\n");
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2008-08-22 19:28:04 +03:00
|
|
|
|
2009-06-10 14:24:23 +03:00
|
|
|
__set_spte(iterator.sptep,
|
|
|
|
__pa(sp->spt)
|
|
|
|
| PT_PRESENT_MASK | PT_WRITABLE_MASK
|
|
|
|
| shadow_user_mask | shadow_x_mask);
|
2008-12-25 14:54:25 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return pt_write;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
2007-12-20 19:18:22 -05:00
|
|
|
static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
|
|
|
|
{
|
|
|
|
int r;
|
2009-07-27 16:30:44 +02:00
|
|
|
int level;
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn_t pfn;
|
2008-07-25 16:24:52 +02:00
|
|
|
unsigned long mmu_seq;
|
2007-12-20 19:18:26 -05:00
|
|
|
|
2009-07-27 16:30:44 +02:00
|
|
|
level = mapping_level(vcpu, gfn);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This path builds a PAE pagetable - so we can map 2mb pages at
|
|
|
|
* maximum. Therefore check if the level is larger than that.
|
|
|
|
*/
|
|
|
|
if (level > PT_DIRECTORY_LEVEL)
|
|
|
|
level = PT_DIRECTORY_LEVEL;
|
|
|
|
|
|
|
|
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
|
2008-02-23 11:44:30 -03:00
|
|
|
|
2008-07-25 16:24:52 +02:00
|
|
|
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-16 20:54:47 -03:00
|
|
|
smp_rmb();
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
2007-12-20 19:18:26 -05:00
|
|
|
|
2008-01-24 11:44:11 +02:00
|
|
|
/* mmio */
|
2008-04-02 14:46:56 -05:00
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
2008-01-24 11:44:11 +02:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-07-25 16:24:52 +02:00
|
|
|
if (mmu_notifier_retry(vcpu, mmu_seq))
|
|
|
|
goto out_unlock;
|
2007-12-31 15:27:49 +02:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2009-07-27 16:30:44 +02:00
|
|
|
r = __direct_map(vcpu, v, write, level, gfn, pfn);
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
|
2007-12-20 19:18:22 -05:00
|
|
|
return r;
|
2008-07-25 16:24:52 +02:00
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return 0;
|
2007-12-20 19:18:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-01-05 16:36:40 -08:00
|
|
|
static void mmu_free_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
int i;
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-05 16:36:40 -08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
2007-06-05 12:17:03 +03:00
|
|
|
return;
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
2007-01-05 16:36:40 -08:00
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
sp = page_header(root);
|
|
|
|
--sp->root_count;
|
2008-02-20 14:47:24 -05:00
|
|
|
if (!sp->root_count && sp->role.invalid)
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-01-05 16:36:40 -08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < 4; ++i) {
|
2007-12-13 23:50:52 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
2007-01-05 16:36:40 -08:00
|
|
|
|
2007-04-12 17:35:58 +03:00
|
|
|
if (root) {
|
|
|
|
root &= PT64_BASE_ADDR_MASK;
|
2007-11-21 15:28:32 +02:00
|
|
|
sp = page_header(root);
|
|
|
|
--sp->root_count;
|
2008-02-20 14:47:24 -05:00
|
|
|
if (!sp->root_count && sp->role.invalid)
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
2007-04-12 17:35:58 +03:00
|
|
|
}
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
|
2007-01-05 16:36:40 -08:00
|
|
|
}
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
|
2007-01-05 16:36:40 -08:00
|
|
|
}
|
|
|
|
|
2009-05-12 18:55:45 -03:00
|
|
|
static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
|
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (!kvm_is_visible_gfn(vcpu->kvm, root_gfn)) {
|
|
|
|
set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
|
|
|
|
ret = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
|
2007-01-05 16:36:40 -08:00
|
|
|
{
|
|
|
|
int i;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
gfn_t root_gfn;
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2009-01-11 13:02:10 +02:00
|
|
|
int direct = 0;
|
2009-05-31 22:58:47 +03:00
|
|
|
u64 pdptr;
|
2007-01-05 16:36:51 -08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT;
|
2007-01-05 16:36:40 -08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
2007-01-05 16:36:40 -08:00
|
|
|
|
|
|
|
ASSERT(!VALID_PAGE(root));
|
2008-02-07 13:47:44 +01:00
|
|
|
if (tdp_enabled)
|
2009-01-11 13:02:10 +02:00
|
|
|
direct = 1;
|
2009-05-12 18:55:45 -03:00
|
|
|
if (mmu_check_root(vcpu, root_gfn))
|
|
|
|
return 1;
|
2007-11-21 15:28:32 +02:00
|
|
|
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
|
2009-01-11 13:02:10 +02:00
|
|
|
PT64_ROOT_LEVEL, direct,
|
2008-02-07 13:47:44 +01:00
|
|
|
ACC_ALL, NULL);
|
2007-11-21 15:28:32 +02:00
|
|
|
root = __pa(sp->spt);
|
|
|
|
++sp->root_count;
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = root;
|
2009-05-12 18:55:45 -03:00
|
|
|
return 0;
|
2007-01-05 16:36:40 -08:00
|
|
|
}
|
2009-01-11 13:02:10 +02:00
|
|
|
direct = !is_paging(vcpu);
|
2008-02-07 13:47:44 +01:00
|
|
|
if (tdp_enabled)
|
2009-01-11 13:02:10 +02:00
|
|
|
direct = 1;
|
2007-01-05 16:36:40 -08:00
|
|
|
for (i = 0; i < 4; ++i) {
|
2007-12-13 23:50:52 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
2007-01-05 16:36:40 -08:00
|
|
|
|
|
|
|
ASSERT(!VALID_PAGE(root));
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
|
2009-05-31 22:58:47 +03:00
|
|
|
pdptr = kvm_pdptr_read(vcpu, i);
|
2009-06-10 14:12:05 +03:00
|
|
|
if (!is_present_gpte(pdptr)) {
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = 0;
|
2007-04-12 17:35:58 +03:00
|
|
|
continue;
|
|
|
|
}
|
2009-05-31 22:58:47 +03:00
|
|
|
root_gfn = pdptr >> PAGE_SHIFT;
|
2007-12-13 23:50:52 +08:00
|
|
|
} else if (vcpu->arch.mmu.root_level == 0)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
root_gfn = 0;
|
2009-05-12 18:55:45 -03:00
|
|
|
if (mmu_check_root(vcpu, root_gfn))
|
|
|
|
return 1;
|
2007-11-21 15:28:32 +02:00
|
|
|
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
|
2009-01-11 13:02:10 +02:00
|
|
|
PT32_ROOT_LEVEL, direct,
|
2008-02-26 22:12:10 +02:00
|
|
|
ACC_ALL, NULL);
|
2007-11-21 15:28:32 +02:00
|
|
|
root = __pa(sp->spt);
|
|
|
|
++sp->root_count;
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
|
2007-01-05 16:36:40 -08:00
|
|
|
}
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
|
2009-05-12 18:55:45 -03:00
|
|
|
return 0;
|
2007-01-05 16:36:40 -08:00
|
|
|
}
|
|
|
|
|
2008-09-23 13:18:34 -03:00
|
|
|
static void mmu_sync_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
|
|
|
return;
|
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
|
|
|
sp = page_header(root);
|
|
|
|
mmu_sync_children(vcpu, sp);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < 4; ++i) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
|
|
|
|
2009-05-12 18:55:45 -03:00
|
|
|
if (root && VALID_PAGE(root)) {
|
2008-09-23 13:18:34 -03:00
|
|
|
root &= PT64_BASE_ADDR_MASK;
|
|
|
|
sp = page_header(root);
|
|
|
|
mmu_sync_children(vcpu, sp);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
mmu_sync_roots(vcpu);
|
2008-12-01 22:32:04 -02:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2008-09-23 13:18:34 -03:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr)
|
|
|
|
{
|
|
|
|
return vaddr;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
|
2007-11-21 14:54:16 +02:00
|
|
|
u32 error_code)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2007-12-09 18:43:00 +02:00
|
|
|
gfn_t gfn;
|
2007-01-05 16:36:54 -08:00
|
|
|
int r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2008-03-03 12:59:56 -08:00
|
|
|
pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
|
2007-01-05 16:36:54 -08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
2007-01-05 16:36:53 -08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2007-12-09 18:43:00 +02:00
|
|
|
gfn = gva >> PAGE_SHIFT;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2007-12-09 18:43:00 +02:00
|
|
|
return nonpaging_map(vcpu, gva & PAGE_MASK,
|
|
|
|
error_code & PFERR_WRITE_MASK, gfn);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
2008-02-07 13:47:44 +01:00
|
|
|
static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
|
|
|
|
u32 error_code)
|
|
|
|
{
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn_t pfn;
|
2008-02-07 13:47:44 +01:00
|
|
|
int r;
|
2009-07-27 16:30:44 +02:00
|
|
|
int level;
|
2008-02-23 11:44:30 -03:00
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
2008-07-25 16:24:52 +02:00
|
|
|
unsigned long mmu_seq;
|
2008-02-07 13:47:44 +01:00
|
|
|
|
|
|
|
ASSERT(vcpu);
|
|
|
|
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
|
|
|
|
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
2009-07-27 16:30:44 +02:00
|
|
|
level = mapping_level(vcpu, gfn);
|
|
|
|
|
|
|
|
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
|
|
|
|
|
2008-07-25 16:24:52 +02:00
|
|
|
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-16 20:54:47 -03:00
|
|
|
smp_rmb();
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
2008-02-07 13:47:44 +01:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-07-25 16:24:52 +02:00
|
|
|
if (mmu_notifier_retry(vcpu, mmu_seq))
|
|
|
|
goto out_unlock;
|
2008-02-07 13:47:44 +01:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
|
|
|
r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
|
2009-07-27 16:30:44 +02:00
|
|
|
level, gfn, pfn);
|
2008-02-07 13:47:44 +01:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
return r;
|
2008-07-25 16:24:52 +02:00
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return 0;
|
2008-02-07 13:47:44 +01:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
static void nonpaging_free(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-01-05 16:36:40 -08:00
|
|
|
mmu_free_roots(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int nonpaging_init_context(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
context->new_cr3 = nonpaging_new_cr3;
|
|
|
|
context->page_fault = nonpaging_page_fault;
|
|
|
|
context->gva_to_gpa = nonpaging_gva_to_gpa;
|
|
|
|
context->free = nonpaging_free;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
context->prefetch_page = nonpaging_prefetch_page;
|
2008-09-23 13:18:33 -03:00
|
|
|
context->sync_page = nonpaging_sync_page;
|
2008-09-23 13:18:35 -03:00
|
|
|
context->invlpg = nonpaging_invlpg;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
context->root_level = 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
context->shadow_root_level = PT32E_ROOT_LEVEL;
|
2007-06-04 15:58:30 +03:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-11-21 02:57:59 +02:00
|
|
|
void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2007-04-19 17:27:43 +03:00
|
|
|
++vcpu->stat.tlb_flush;
|
2007-09-09 15:41:59 +03:00
|
|
|
kvm_x86_ops->tlb_flush(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void paging_new_cr3(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2008-03-03 12:59:56 -08:00
|
|
|
pgprintk("%s: cr3 %lx\n", __func__, vcpu->arch.cr3);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 16:36:43 -08:00
|
|
|
mmu_free_roots(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void inject_page_fault(struct kvm_vcpu *vcpu,
|
|
|
|
u64 addr,
|
|
|
|
u32 err_code)
|
|
|
|
{
|
2007-11-25 14:04:58 +02:00
|
|
|
kvm_inject_page_fault(vcpu, addr, err_code);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void paging_free(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
nonpaging_free(vcpu);
|
|
|
|
}
|
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
|
|
|
|
{
|
|
|
|
int bit7;
|
|
|
|
|
|
|
|
bit7 = (gpte >> 7) & 1;
|
|
|
|
return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
#define PTTYPE 64
|
|
|
|
#include "paging_tmpl.h"
|
|
|
|
#undef PTTYPE
|
|
|
|
|
|
|
|
#define PTTYPE 32
|
|
|
|
#include "paging_tmpl.h"
|
|
|
|
#undef PTTYPE
|
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
|
|
|
|
{
|
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
|
|
|
int maxphyaddr = cpuid_maxphyaddr(vcpu);
|
|
|
|
u64 exb_bit_rsvd = 0;
|
|
|
|
|
|
|
|
if (!is_nx(vcpu))
|
|
|
|
exb_bit_rsvd = rsvd_bits(63, 63);
|
|
|
|
switch (level) {
|
|
|
|
case PT32_ROOT_LEVEL:
|
|
|
|
/* no rsvd bits for 2 level 4K page table entries */
|
|
|
|
context->rsvd_bits_mask[0][1] = 0;
|
|
|
|
context->rsvd_bits_mask[0][0] = 0;
|
|
|
|
if (is_cpuid_PSE36())
|
|
|
|
/* 36bits PSE 4MB page */
|
|
|
|
context->rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
|
|
|
|
else
|
|
|
|
/* 32 bits PSE 4MB page */
|
|
|
|
context->rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
|
2009-05-19 13:29:27 +03:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[1][0];
|
2009-03-30 16:21:08 +08:00
|
|
|
break;
|
|
|
|
case PT32E_ROOT_LEVEL:
|
2009-03-31 23:03:45 +08:00
|
|
|
context->rsvd_bits_mask[0][2] =
|
|
|
|
rsvd_bits(maxphyaddr, 63) |
|
|
|
|
rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 62); /* PDE */
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 62); /* PTE */
|
|
|
|
context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 62) |
|
|
|
|
rsvd_bits(13, 20); /* large page */
|
2009-05-19 13:29:27 +03:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[1][0];
|
2009-03-30 16:21:08 +08:00
|
|
|
break;
|
|
|
|
case PT64_ROOT_LEVEL:
|
|
|
|
context->rsvd_bits_mask[0][3] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
|
|
|
|
context->rsvd_bits_mask[0][2] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
|
|
|
|
context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 51);
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51);
|
|
|
|
context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
|
2009-07-27 16:30:45 +02:00
|
|
|
context->rsvd_bits_mask[1][2] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) |
|
|
|
|
rsvd_bits(13, 29);
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 51) |
|
|
|
|
rsvd_bits(13, 20); /* large page */
|
2009-05-19 13:29:27 +03:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[1][0];
|
2009-03-30 16:21:08 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-01-05 16:36:40 -08:00
|
|
|
static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
ASSERT(is_pae(vcpu));
|
|
|
|
context->new_cr3 = paging_new_cr3;
|
|
|
|
context->page_fault = paging64_page_fault;
|
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
context->prefetch_page = paging64_prefetch_page;
|
2008-09-23 13:18:33 -03:00
|
|
|
context->sync_page = paging64_sync_page;
|
2008-09-23 13:18:35 -03:00
|
|
|
context->invlpg = paging64_invlpg;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
context->free = paging_free;
|
2007-01-05 16:36:40 -08:00
|
|
|
context->root_level = level;
|
|
|
|
context->shadow_root_level = level;
|
2007-06-04 15:58:30 +03:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-01-05 16:36:40 -08:00
|
|
|
static int paging64_init_context(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
|
2007-01-05 16:36:40 -08:00
|
|
|
return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
static int paging32_init_context(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
context->new_cr3 = paging_new_cr3;
|
|
|
|
context->page_fault = paging32_page_fault;
|
|
|
|
context->gva_to_gpa = paging32_gva_to_gpa;
|
|
|
|
context->free = paging_free;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
context->prefetch_page = paging32_prefetch_page;
|
2008-09-23 13:18:33 -03:00
|
|
|
context->sync_page = paging32_sync_page;
|
2008-09-23 13:18:35 -03:00
|
|
|
context->invlpg = paging32_invlpg;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
context->root_level = PT32_ROOT_LEVEL;
|
|
|
|
context->shadow_root_level = PT32E_ROOT_LEVEL;
|
2007-06-04 15:58:30 +03:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int paging32E_init_context(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
|
2007-01-05 16:36:40 -08:00
|
|
|
return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
2008-02-07 13:47:44 +01:00
|
|
|
static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
|
|
|
|
|
|
|
context->new_cr3 = nonpaging_new_cr3;
|
|
|
|
context->page_fault = tdp_page_fault;
|
|
|
|
context->free = nonpaging_free;
|
|
|
|
context->prefetch_page = nonpaging_prefetch_page;
|
2008-09-23 13:18:33 -03:00
|
|
|
context->sync_page = nonpaging_sync_page;
|
2008-09-23 13:18:35 -03:00
|
|
|
context->invlpg = nonpaging_invlpg;
|
2008-04-25 10:20:22 +08:00
|
|
|
context->shadow_root_level = kvm_x86_ops->get_tdp_level();
|
2008-02-07 13:47:44 +01:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
|
|
|
|
|
|
|
if (!is_paging(vcpu)) {
|
|
|
|
context->gva_to_gpa = nonpaging_gva_to_gpa;
|
|
|
|
context->root_level = 0;
|
|
|
|
} else if (is_long_mode(vcpu)) {
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
|
2008-02-07 13:47:44 +01:00
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
|
|
|
context->root_level = PT64_ROOT_LEVEL;
|
|
|
|
} else if (is_pae(vcpu)) {
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
|
2008-02-07 13:47:44 +01:00
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
|
|
|
context->root_level = PT32E_ROOT_LEVEL;
|
|
|
|
} else {
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
|
2008-02-07 13:47:44 +01:00
|
|
|
context->gva_to_gpa = paging32_gva_to_gpa;
|
|
|
|
context->root_level = PT32_ROOT_LEVEL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2008-12-21 19:20:09 +02:00
|
|
|
int r;
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
|
|
|
if (!is_paging(vcpu))
|
2008-12-21 19:20:09 +02:00
|
|
|
r = nonpaging_init_context(vcpu);
|
2006-12-29 16:49:37 -08:00
|
|
|
else if (is_long_mode(vcpu))
|
2008-12-21 19:20:09 +02:00
|
|
|
r = paging64_init_context(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
else if (is_pae(vcpu))
|
2008-12-21 19:20:09 +02:00
|
|
|
r = paging32E_init_context(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
else
|
2008-12-21 19:20:09 +02:00
|
|
|
r = paging32_init_context(vcpu);
|
|
|
|
|
|
|
|
vcpu->arch.mmu.base_role.glevels = vcpu->arch.mmu.root_level;
|
|
|
|
|
|
|
|
return r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
2008-02-07 13:47:44 +01:00
|
|
|
static int init_kvm_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2008-04-02 14:46:56 -05:00
|
|
|
vcpu->arch.update_pte.pfn = bad_pfn;
|
|
|
|
|
2008-02-07 13:47:44 +01:00
|
|
|
if (tdp_enabled)
|
|
|
|
return init_kvm_tdp_mmu(vcpu);
|
|
|
|
else
|
|
|
|
return init_kvm_softmmu(vcpu);
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
static void destroy_kvm_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
if (VALID_PAGE(vcpu->arch.mmu.root_hpa)) {
|
|
|
|
vcpu->arch.mmu.free(vcpu);
|
|
|
|
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
|
2007-06-04 15:58:30 +03:00
|
|
|
{
|
|
|
|
destroy_kvm_mmu(vcpu);
|
|
|
|
return init_kvm_mmu(vcpu);
|
|
|
|
}
|
2007-10-10 14:26:45 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
|
2007-06-04 15:58:30 +03:00
|
|
|
|
|
|
|
int kvm_mmu_load(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2007-01-05 16:36:53 -08:00
|
|
|
int r;
|
|
|
|
|
2007-01-05 16:36:54 -08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
2007-06-04 15:58:30 +03:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2007-12-31 15:27:49 +02:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2009-05-12 18:55:45 -03:00
|
|
|
r = mmu_alloc_roots(vcpu);
|
2008-09-23 13:18:34 -03:00
|
|
|
mmu_sync_roots(vcpu);
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2009-05-12 18:55:45 -03:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2009-07-09 17:00:42 +08:00
|
|
|
/* set_cr3() should ensure TLB has been flushed */
|
2007-12-13 23:50:52 +08:00
|
|
|
kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
|
2007-01-05 16:36:53 -08:00
|
|
|
out:
|
|
|
|
return r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
2007-06-04 15:58:30 +03:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_load);
|
|
|
|
|
|
|
|
void kvm_mmu_unload(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
mmu_free_roots(vcpu);
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2007-05-01 14:16:52 +03:00
|
|
|
static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp,
|
2007-03-08 17:13:32 +02:00
|
|
|
u64 *spte)
|
|
|
|
{
|
|
|
|
u64 pte;
|
|
|
|
struct kvm_mmu_page *child;
|
|
|
|
|
|
|
|
pte = *spte;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
if (is_shadow_present_pte(pte)) {
|
2009-06-10 12:27:03 -03:00
|
|
|
if (is_last_spte(pte, sp->role.level))
|
2007-09-27 14:11:22 +02:00
|
|
|
rmap_remove(vcpu->kvm, spte);
|
2007-03-08 17:13:32 +02:00
|
|
|
else {
|
|
|
|
child = page_header(pte & PT64_BASE_ADDR_MASK);
|
2007-07-17 13:04:56 +03:00
|
|
|
mmu_page_remove_parent_pte(child, spte);
|
2007-03-08 17:13:32 +02:00
|
|
|
}
|
|
|
|
}
|
2009-06-10 14:24:23 +03:00
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
2008-02-23 11:44:30 -03:00
|
|
|
if (is_large_pte(pte))
|
|
|
|
--vcpu->kvm->stat.lpages;
|
2007-03-08 17:13:32 +02:00
|
|
|
}
|
|
|
|
|
2007-05-01 16:53:31 +03:00
|
|
|
static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp,
|
2007-05-01 16:53:31 +03:00
|
|
|
u64 *spte,
|
2008-01-07 11:14:20 +02:00
|
|
|
const void *new)
|
2007-05-01 16:53:31 +03:00
|
|
|
{
|
2008-06-11 20:32:40 -03:00
|
|
|
if (sp->role.level != PT_PAGE_TABLE_LEVEL) {
|
2009-07-27 16:30:46 +02:00
|
|
|
++vcpu->kvm->stat.mmu_pde_zapped;
|
|
|
|
return;
|
2008-06-11 20:32:40 -03:00
|
|
|
}
|
2007-05-01 16:53:31 +03:00
|
|
|
|
2007-11-18 16:37:07 +02:00
|
|
|
++vcpu->kvm->stat.mmu_pte_updated;
|
2007-11-21 15:28:32 +02:00
|
|
|
if (sp->role.glevels == PT32_ROOT_LEVEL)
|
2008-01-07 11:14:20 +02:00
|
|
|
paging32_update_pte(vcpu, sp, spte, new);
|
2007-05-01 16:53:31 +03:00
|
|
|
else
|
2008-01-07 11:14:20 +02:00
|
|
|
paging64_update_pte(vcpu, sp, spte, new);
|
2007-05-01 16:53:31 +03:00
|
|
|
}
|
|
|
|
|
2007-11-21 02:06:21 +02:00
|
|
|
static bool need_remote_flush(u64 old, u64 new)
|
|
|
|
{
|
|
|
|
if (!is_shadow_present_pte(old))
|
|
|
|
return false;
|
|
|
|
if (!is_shadow_present_pte(new))
|
|
|
|
return true;
|
|
|
|
if ((old ^ new) & PT64_BASE_ADDR_MASK)
|
|
|
|
return true;
|
|
|
|
old ^= PT64_NX_MASK;
|
|
|
|
new ^= PT64_NX_MASK;
|
|
|
|
return (old & ~new & PT64_PERM_MASK) != 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, u64 old, u64 new)
|
|
|
|
{
|
|
|
|
if (need_remote_flush(old, new))
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
else
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
}
|
|
|
|
|
2007-09-23 14:10:49 +02:00
|
|
|
static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
u64 *spte = vcpu->arch.last_pte_updated;
|
2007-09-23 14:10:49 +02:00
|
|
|
|
2008-04-25 21:13:50 +08:00
|
|
|
return !!(spte && (*spte & shadow_accessed_mask));
|
2007-09-23 14:10:49 +02:00
|
|
|
}
|
|
|
|
|
2007-12-30 12:29:05 +02:00
|
|
|
static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
|
|
|
|
const u8 *new, int bytes)
|
|
|
|
{
|
|
|
|
gfn_t gfn;
|
|
|
|
int r;
|
|
|
|
u64 gpte = 0;
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn_t pfn;
|
2007-12-30 12:29:05 +02:00
|
|
|
|
|
|
|
if (bytes != 4 && bytes != 8)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Assume that the pte write on a page table of the same type
|
|
|
|
* as the current vcpu paging mode. This is nearly always true
|
|
|
|
* (might be false while changing modes). Note it is verified later
|
|
|
|
* by update_pte().
|
|
|
|
*/
|
|
|
|
if (is_pae(vcpu)) {
|
|
|
|
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
|
|
|
|
if ((bytes == 4) && (gpa % 4 == 0)) {
|
|
|
|
r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8);
|
|
|
|
if (r)
|
|
|
|
return;
|
|
|
|
memcpy((void *)&gpte + (gpa % 8), new, 4);
|
|
|
|
} else if ((bytes == 8) && (gpa % 8 == 0)) {
|
|
|
|
memcpy((void *)&gpte, new, 8);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if ((bytes == 4) && (gpa % 4 == 0))
|
|
|
|
memcpy((void *)&gpte, new, 4);
|
|
|
|
}
|
2009-06-10 14:12:05 +03:00
|
|
|
if (!is_present_gpte(gpte))
|
2007-12-30 12:29:05 +02:00
|
|
|
return;
|
|
|
|
gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
|
2008-02-10 18:04:15 +02:00
|
|
|
|
2008-07-25 16:24:52 +02:00
|
|
|
vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-16 20:54:47 -03:00
|
|
|
smp_rmb();
|
2008-04-02 14:46:56 -05:00
|
|
|
pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
2008-02-10 18:04:15 +02:00
|
|
|
|
2008-04-02 14:46:56 -05:00
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
2008-01-24 11:44:11 +02:00
|
|
|
return;
|
|
|
|
}
|
2007-12-30 12:29:05 +02:00
|
|
|
vcpu->arch.update_pte.gfn = gfn;
|
2008-04-02 14:46:56 -05:00
|
|
|
vcpu->arch.update_pte.pfn = pfn;
|
2007-12-30 12:29:05 +02:00
|
|
|
}
|
|
|
|
|
2008-05-15 13:51:35 +03:00
|
|
|
static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
|
|
|
u64 *spte = vcpu->arch.last_pte_updated;
|
|
|
|
|
|
|
|
if (spte
|
|
|
|
&& vcpu->arch.last_pte_gfn == gfn
|
|
|
|
&& shadow_accessed_mask
|
|
|
|
&& !(*spte & shadow_accessed_mask)
|
|
|
|
&& is_shadow_present_pte(*spte))
|
|
|
|
set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
|
|
|
|
}
|
|
|
|
|
2007-05-01 14:16:52 +03:00
|
|
|
void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
|
2008-12-01 22:32:05 -02:00
|
|
|
const u8 *new, int bytes,
|
|
|
|
bool guest_initiated)
|
2007-01-05 16:36:44 -08:00
|
|
|
{
|
2007-01-05 16:36:45 -08:00
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-05 16:36:48 -08:00
|
|
|
struct hlist_node *node, *n;
|
2007-01-05 16:36:45 -08:00
|
|
|
struct hlist_head *bucket;
|
|
|
|
unsigned index;
|
2008-01-07 11:14:20 +02:00
|
|
|
u64 entry, gentry;
|
2007-01-05 16:36:45 -08:00
|
|
|
u64 *spte;
|
|
|
|
unsigned offset = offset_in_page(gpa);
|
2007-01-05 16:36:48 -08:00
|
|
|
unsigned pte_size;
|
2007-01-05 16:36:45 -08:00
|
|
|
unsigned page_offset;
|
2007-01-05 16:36:48 -08:00
|
|
|
unsigned misaligned;
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 16:44:05 +03:00
|
|
|
unsigned quadrant;
|
2007-01-05 16:36:45 -08:00
|
|
|
int level;
|
2007-01-05 16:36:50 -08:00
|
|
|
int flooded = 0;
|
2007-03-08 17:13:32 +02:00
|
|
|
int npte;
|
2008-01-07 11:14:20 +02:00
|
|
|
int r;
|
2007-01-05 16:36:45 -08:00
|
|
|
|
2008-03-03 12:59:56 -08:00
|
|
|
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
|
2007-12-30 12:29:05 +02:00
|
|
|
mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes);
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-05-15 13:51:35 +03:00
|
|
|
kvm_mmu_access_page(vcpu, gfn);
|
2007-12-31 15:27:49 +02:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2007-11-18 16:37:07 +02:00
|
|
|
++vcpu->kvm->stat.mmu_pte_write;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
kvm_mmu_audit(vcpu, "pre pte write");
|
2008-12-01 22:32:05 -02:00
|
|
|
if (guest_initiated) {
|
|
|
|
if (gfn == vcpu->arch.last_pt_write_gfn
|
|
|
|
&& !last_updated_pte_accessed(vcpu)) {
|
|
|
|
++vcpu->arch.last_pt_write_count;
|
|
|
|
if (vcpu->arch.last_pt_write_count >= 3)
|
|
|
|
flooded = 1;
|
|
|
|
} else {
|
|
|
|
vcpu->arch.last_pt_write_gfn = gfn;
|
|
|
|
vcpu->arch.last_pt_write_count = 1;
|
|
|
|
vcpu->arch.last_pte_updated = NULL;
|
|
|
|
}
|
2007-01-05 16:36:50 -08:00
|
|
|
}
|
2008-01-07 13:20:25 +02:00
|
|
|
index = kvm_page_table_hashfn(gfn);
|
2007-12-14 10:01:48 +08:00
|
|
|
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
|
2007-11-21 15:28:32 +02:00
|
|
|
hlist_for_each_entry_safe(sp, node, n, bucket, hash_link) {
|
2009-01-11 13:02:10 +02:00
|
|
|
if (sp->gfn != gfn || sp->role.direct || sp->role.invalid)
|
2007-01-05 16:36:45 -08:00
|
|
|
continue;
|
2007-11-21 15:28:32 +02:00
|
|
|
pte_size = sp->role.glevels == PT32_ROOT_LEVEL ? 4 : 8;
|
2007-01-05 16:36:48 -08:00
|
|
|
misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
|
2007-04-30 14:47:02 +03:00
|
|
|
misaligned |= bytes < 4;
|
2007-01-05 16:36:50 -08:00
|
|
|
if (misaligned || flooded) {
|
2007-01-05 16:36:48 -08:00
|
|
|
/*
|
|
|
|
* Misaligned accesses are too much trouble to fix
|
|
|
|
* up; also, they usually indicate a page is not used
|
|
|
|
* as a page table.
|
2007-01-05 16:36:50 -08:00
|
|
|
*
|
|
|
|
* If we're seeing too many writes to a page,
|
|
|
|
* it may no longer be a page table, or we may be
|
|
|
|
* forking, in which case it is better to unmap the
|
|
|
|
* page.
|
2007-01-05 16:36:48 -08:00
|
|
|
*/
|
|
|
|
pgprintk("misaligned: gpa %llx bytes %d role %x\n",
|
2007-11-21 15:28:32 +02:00
|
|
|
gpa, bytes, sp->role.word);
|
2008-09-23 13:18:37 -03:00
|
|
|
if (kvm_mmu_zap_page(vcpu->kvm, sp))
|
|
|
|
n = bucket->first;
|
2007-11-18 16:37:07 +02:00
|
|
|
++vcpu->kvm->stat.mmu_flooded;
|
2007-01-05 16:36:48 -08:00
|
|
|
continue;
|
|
|
|
}
|
2007-01-05 16:36:45 -08:00
|
|
|
page_offset = offset;
|
2007-11-21 15:28:32 +02:00
|
|
|
level = sp->role.level;
|
2007-03-08 17:13:32 +02:00
|
|
|
npte = 1;
|
2007-11-21 15:28:32 +02:00
|
|
|
if (sp->role.glevels == PT32_ROOT_LEVEL) {
|
2007-03-08 17:13:32 +02:00
|
|
|
page_offset <<= 1; /* 32->64 */
|
|
|
|
/*
|
|
|
|
* A 32-bit pde maps 4MB while the shadow pdes map
|
|
|
|
* only 2MB. So we need to double the offset again
|
|
|
|
* and zap two pdes instead of one.
|
|
|
|
*/
|
|
|
|
if (level == PT32_ROOT_LEVEL) {
|
2007-04-18 11:18:18 +03:00
|
|
|
page_offset &= ~7; /* kill rounding error */
|
2007-03-08 17:13:32 +02:00
|
|
|
page_offset <<= 1;
|
|
|
|
npte = 2;
|
|
|
|
}
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 16:44:05 +03:00
|
|
|
quadrant = page_offset >> PAGE_SHIFT;
|
2007-01-05 16:36:45 -08:00
|
|
|
page_offset &= ~PAGE_MASK;
|
2007-11-21 15:28:32 +02:00
|
|
|
if (quadrant != sp->role.quadrant)
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 16:44:05 +03:00
|
|
|
continue;
|
2007-01-05 16:36:45 -08:00
|
|
|
}
|
2007-11-21 15:28:32 +02:00
|
|
|
spte = &sp->spt[page_offset / sizeof(*spte)];
|
2008-01-07 11:14:20 +02:00
|
|
|
if ((gpa & (pte_size - 1)) || (bytes < pte_size)) {
|
|
|
|
gentry = 0;
|
|
|
|
r = kvm_read_guest_atomic(vcpu->kvm,
|
|
|
|
gpa & ~(u64)(pte_size - 1),
|
|
|
|
&gentry, pte_size);
|
|
|
|
new = (const void *)&gentry;
|
|
|
|
if (r < 0)
|
|
|
|
new = NULL;
|
|
|
|
}
|
2007-03-08 17:13:32 +02:00
|
|
|
while (npte--) {
|
2007-11-21 02:06:21 +02:00
|
|
|
entry = *spte;
|
2007-11-21 15:28:32 +02:00
|
|
|
mmu_pte_write_zap_pte(vcpu, sp, spte);
|
2008-01-07 11:14:20 +02:00
|
|
|
if (new)
|
|
|
|
mmu_pte_write_new_pte(vcpu, sp, spte, new);
|
2007-11-21 02:06:21 +02:00
|
|
|
mmu_pte_write_flush_tlb(vcpu, entry, *spte);
|
2007-03-08 17:13:32 +02:00
|
|
|
++spte;
|
2007-01-05 16:36:45 -08:00
|
|
|
}
|
|
|
|
}
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
kvm_mmu_audit(vcpu, "post pte write");
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2008-04-02 14:46:56 -05:00
|
|
|
if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
|
|
|
|
kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
|
|
|
|
vcpu->arch.update_pte.pfn = bad_pfn;
|
2007-12-30 12:29:05 +02:00
|
|
|
}
|
2007-01-05 16:36:44 -08:00
|
|
|
}
|
|
|
|
|
2007-01-05 16:36:45 -08:00
|
|
|
int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
2007-12-20 19:18:22 -05:00
|
|
|
gpa_t gpa;
|
|
|
|
int r;
|
2007-01-05 16:36:45 -08:00
|
|
|
|
2009-08-27 13:37:06 +03:00
|
|
|
if (tdp_enabled)
|
|
|
|
return 0;
|
|
|
|
|
2007-12-20 19:18:22 -05:00
|
|
|
gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, gva);
|
|
|
|
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2007-12-20 19:18:22 -05:00
|
|
|
r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-12-20 19:18:22 -05:00
|
|
|
return r;
|
2007-01-05 16:36:45 -08:00
|
|
|
}
|
2008-07-19 08:57:05 +03:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
|
2007-01-05 16:36:45 -08:00
|
|
|
|
2007-09-14 20:26:06 +03:00
|
|
|
void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
|
2007-01-05 16:36:47 -08:00
|
|
|
{
|
2009-07-28 15:26:58 -03:00
|
|
|
while (vcpu->kvm->arch.n_free_mmu_pages < KVM_REFILL_PAGES &&
|
|
|
|
!list_empty(&vcpu->kvm->arch.active_mmu_pages)) {
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-05 16:36:47 -08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev,
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page, link);
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
2007-11-18 16:37:07 +02:00
|
|
|
++vcpu->kvm->stat.mmu_recycled;
|
2007-01-05 16:36:47 -08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-10-28 18:48:59 +02:00
|
|
|
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code)
|
|
|
|
{
|
|
|
|
int r;
|
|
|
|
enum emulation_result er;
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code);
|
2007-10-28 18:48:59 +02:00
|
|
|
if (r < 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (!r) {
|
|
|
|
r = 1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2007-10-28 18:52:05 +02:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
2007-10-28 18:48:59 +02:00
|
|
|
er = emulate_instruction(vcpu, vcpu->run, cr2, error_code, 0);
|
|
|
|
|
|
|
|
switch (er) {
|
|
|
|
case EMULATE_DONE:
|
|
|
|
return 1;
|
|
|
|
case EMULATE_DO_MMIO:
|
|
|
|
++vcpu->stat.mmio_exits;
|
|
|
|
return 0;
|
|
|
|
case EMULATE_FAIL:
|
2009-06-11 15:43:28 +03:00
|
|
|
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
|
|
|
|
vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
|
|
|
|
return 0;
|
2007-10-28 18:48:59 +02:00
|
|
|
default:
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
|
|
|
|
|
2008-09-23 13:18:35 -03:00
|
|
|
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
|
|
|
vcpu->arch.mmu.invlpg(vcpu, gva);
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
++vcpu->stat.invlpg;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
|
|
|
|
|
2008-02-07 13:47:41 +01:00
|
|
|
void kvm_enable_tdp(void)
|
|
|
|
{
|
|
|
|
tdp_enabled = true;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_enable_tdp);
|
|
|
|
|
2008-07-14 20:36:36 +02:00
|
|
|
void kvm_disable_tdp(void)
|
|
|
|
{
|
|
|
|
tdp_enabled = false;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_disable_tdp);
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
static void free_mmu_pages(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
free_page((unsigned long)vcpu->arch.mmu.pae_root);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int alloc_mmu_pages(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-01-05 16:36:40 -08:00
|
|
|
struct page *page;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
ASSERT(vcpu);
|
|
|
|
|
2007-01-05 16:36:40 -08:00
|
|
|
/*
|
|
|
|
* When emulating 32-bit mode, cr3 is only 32 bits even on x86_64.
|
|
|
|
* Therefore we need to allocate shadow page tables in the first
|
|
|
|
* 4GB of memory, which happens to fit the DMA32 zone.
|
|
|
|
*/
|
|
|
|
page = alloc_page(GFP_KERNEL | __GFP_DMA32);
|
|
|
|
if (!page)
|
|
|
|
goto error_1;
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root = page_address(page);
|
2007-01-05 16:36:40 -08:00
|
|
|
for (i = 0; i < 4; ++i)
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
|
2007-01-05 16:36:40 -08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
error_1:
|
|
|
|
free_mmu_pages(vcpu);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2006-12-29 16:50:01 -08:00
|
|
|
int kvm_mmu_create(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2006-12-29 16:50:01 -08:00
|
|
|
return alloc_mmu_pages(vcpu);
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2006-12-29 16:50:01 -08:00
|
|
|
int kvm_mmu_setup(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
2006-12-22 01:05:28 -08:00
|
|
|
|
2006-12-29 16:50:01 -08:00
|
|
|
return init_kvm_mmu(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
|
|
|
|
|
|
|
destroy_kvm_mmu(vcpu);
|
|
|
|
free_mmu_pages(vcpu);
|
2007-01-05 16:36:53 -08:00
|
|
|
mmu_free_memory_caches(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
|
|
|
|
2007-07-17 13:04:56 +03:00
|
|
|
void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
{
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
int i;
|
|
|
|
u64 *pt;
|
|
|
|
|
2008-10-16 17:30:57 +08:00
|
|
|
if (!test_bit(slot, sp->slot_bitmap))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
continue;
|
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
pt = sp->spt;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
|
|
|
|
/* avoid RMW */
|
2007-10-16 14:43:46 +02:00
|
|
|
if (pt[i] & PT_WRITABLE_MASK)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
pt[i] &= ~PT_WRITABLE_MASK;
|
|
|
|
}
|
2008-08-27 16:40:51 +03:00
|
|
|
kvm_flush_remote_tlbs(kvm);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 02:21:36 -08:00
|
|
|
}
|
2007-01-05 16:36:56 -08:00
|
|
|
|
2007-07-17 13:04:56 +03:00
|
|
|
void kvm_mmu_zap_all(struct kvm *kvm)
|
2007-03-30 13:06:33 +03:00
|
|
|
{
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp, *node;
|
2007-03-30 13:06:33 +03:00
|
|
|
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
|
2008-09-23 13:18:37 -03:00
|
|
|
if (kvm_mmu_zap_page(kvm, sp))
|
|
|
|
node = container_of(kvm->arch.active_mmu_pages.next,
|
|
|
|
struct kvm_mmu_page, link);
|
2007-12-20 19:18:26 -05:00
|
|
|
spin_unlock(&kvm->mmu_lock);
|
2007-03-30 13:06:33 +03:00
|
|
|
|
2007-07-17 13:04:56 +03:00
|
|
|
kvm_flush_remote_tlbs(kvm);
|
2007-03-30 13:06:33 +03:00
|
|
|
}
|
|
|
|
|
2008-04-27 12:14:13 -07:00
|
|
|
static void kvm_mmu_remove_one_alloc_mmu_page(struct kvm *kvm)
|
2008-03-30 15:17:21 +03:00
|
|
|
{
|
|
|
|
struct kvm_mmu_page *page;
|
|
|
|
|
|
|
|
page = container_of(kvm->arch.active_mmu_pages.prev,
|
|
|
|
struct kvm_mmu_page, link);
|
|
|
|
kvm_mmu_zap_page(kvm, page);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
|
|
|
|
{
|
|
|
|
struct kvm *kvm;
|
|
|
|
struct kvm *kvm_freed = NULL;
|
|
|
|
int cache_count = 0;
|
|
|
|
|
|
|
|
spin_lock(&kvm_lock);
|
|
|
|
|
|
|
|
list_for_each_entry(kvm, &vm_list, vm_list) {
|
|
|
|
int npages;
|
|
|
|
|
2008-07-03 18:33:02 -03:00
|
|
|
if (!down_read_trylock(&kvm->slots_lock))
|
|
|
|
continue;
|
2008-03-30 15:17:21 +03:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
npages = kvm->arch.n_alloc_mmu_pages -
|
|
|
|
kvm->arch.n_free_mmu_pages;
|
|
|
|
cache_count += npages;
|
|
|
|
if (!kvm_freed && nr_to_scan > 0 && npages > 0) {
|
|
|
|
kvm_mmu_remove_one_alloc_mmu_page(kvm);
|
|
|
|
cache_count--;
|
|
|
|
kvm_freed = kvm;
|
|
|
|
}
|
|
|
|
nr_to_scan--;
|
|
|
|
|
|
|
|
spin_unlock(&kvm->mmu_lock);
|
2008-07-03 18:33:02 -03:00
|
|
|
up_read(&kvm->slots_lock);
|
2008-03-30 15:17:21 +03:00
|
|
|
}
|
|
|
|
if (kvm_freed)
|
|
|
|
list_move_tail(&kvm_freed->vm_list, &vm_list);
|
|
|
|
|
|
|
|
spin_unlock(&kvm_lock);
|
|
|
|
|
|
|
|
return cache_count;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct shrinker mmu_shrinker = {
|
|
|
|
.shrink = mmu_shrink,
|
|
|
|
.seeks = DEFAULT_SEEKS * 10,
|
|
|
|
};
|
|
|
|
|
2008-05-22 10:37:48 +02:00
|
|
|
static void mmu_destroy_caches(void)
|
2007-04-15 16:31:09 +03:00
|
|
|
{
|
|
|
|
if (pte_chain_cache)
|
|
|
|
kmem_cache_destroy(pte_chain_cache);
|
|
|
|
if (rmap_desc_cache)
|
|
|
|
kmem_cache_destroy(rmap_desc_cache);
|
2007-05-30 12:34:53 +03:00
|
|
|
if (mmu_page_header_cache)
|
|
|
|
kmem_cache_destroy(mmu_page_header_cache);
|
2007-04-15 16:31:09 +03:00
|
|
|
}
|
|
|
|
|
2008-03-30 15:17:21 +03:00
|
|
|
void kvm_mmu_module_exit(void)
|
|
|
|
{
|
|
|
|
mmu_destroy_caches();
|
|
|
|
unregister_shrinker(&mmu_shrinker);
|
|
|
|
}
|
|
|
|
|
2007-04-15 16:31:09 +03:00
|
|
|
int kvm_mmu_module_init(void)
|
|
|
|
{
|
|
|
|
pte_chain_cache = kmem_cache_create("kvm_pte_chain",
|
|
|
|
sizeof(struct kvm_pte_chain),
|
2007-07-20 10:11:58 +09:00
|
|
|
0, 0, NULL);
|
2007-04-15 16:31:09 +03:00
|
|
|
if (!pte_chain_cache)
|
|
|
|
goto nomem;
|
|
|
|
rmap_desc_cache = kmem_cache_create("kvm_rmap_desc",
|
|
|
|
sizeof(struct kvm_rmap_desc),
|
2007-07-20 10:11:58 +09:00
|
|
|
0, 0, NULL);
|
2007-04-15 16:31:09 +03:00
|
|
|
if (!rmap_desc_cache)
|
|
|
|
goto nomem;
|
|
|
|
|
2007-05-30 12:34:53 +03:00
|
|
|
mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header",
|
|
|
|
sizeof(struct kvm_mmu_page),
|
2007-07-20 10:11:58 +09:00
|
|
|
0, 0, NULL);
|
2007-05-30 12:34:53 +03:00
|
|
|
if (!mmu_page_header_cache)
|
|
|
|
goto nomem;
|
|
|
|
|
2008-03-30 15:17:21 +03:00
|
|
|
register_shrinker(&mmu_shrinker);
|
|
|
|
|
2007-04-15 16:31:09 +03:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
nomem:
|
2008-03-30 15:17:21 +03:00
|
|
|
mmu_destroy_caches();
|
2007-04-15 16:31:09 +03:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2007-11-20 13:11:38 +08:00
|
|
|
/*
|
|
|
|
* Caculate mmu pages needed for kvm.
|
|
|
|
*/
|
|
|
|
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
unsigned int nr_mmu_pages;
|
|
|
|
unsigned int nr_pages = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < kvm->nmemslots; i++)
|
|
|
|
nr_pages += kvm->memslots[i].npages;
|
|
|
|
|
|
|
|
nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
|
|
|
|
nr_mmu_pages = max(nr_mmu_pages,
|
|
|
|
(unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
|
|
|
|
|
|
|
|
return nr_mmu_pages;
|
|
|
|
}
|
|
|
|
|
2008-02-22 12:21:37 -05:00
|
|
|
static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer,
|
|
|
|
unsigned len)
|
|
|
|
{
|
|
|
|
if (len > buffer->len)
|
|
|
|
return NULL;
|
|
|
|
return buffer->ptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer,
|
|
|
|
unsigned len)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
|
|
|
|
ret = pv_mmu_peek_buffer(buffer, len);
|
|
|
|
if (!ret)
|
|
|
|
return ret;
|
|
|
|
buffer->ptr += len;
|
|
|
|
buffer->len -= len;
|
|
|
|
buffer->processed += len;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu,
|
|
|
|
gpa_t addr, gpa_t value)
|
|
|
|
{
|
|
|
|
int bytes = 8;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
if (!is_long_mode(vcpu) && !is_pae(vcpu))
|
|
|
|
bytes = 4;
|
|
|
|
|
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
2008-03-29 20:17:59 -03:00
|
|
|
if (!emulator_write_phys(vcpu, addr, &value, bytes))
|
2008-02-22 12:21:37 -05:00
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2009-05-24 22:15:25 +03:00
|
|
|
kvm_set_cr3(vcpu, vcpu->arch.cr3);
|
2008-02-22 12:21:37 -05:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr)
|
|
|
|
{
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
mmu_unshadow(vcpu->kvm, addr >> PAGE_SHIFT);
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_op_one(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_pv_mmu_op_buffer *buffer)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_op_header *header;
|
|
|
|
|
|
|
|
header = pv_mmu_peek_buffer(buffer, sizeof *header);
|
|
|
|
if (!header)
|
|
|
|
return 0;
|
|
|
|
switch (header->op) {
|
|
|
|
case KVM_MMU_OP_WRITE_PTE: {
|
|
|
|
struct kvm_mmu_op_write_pte *wpte;
|
|
|
|
|
|
|
|
wpte = pv_mmu_read_buffer(buffer, sizeof *wpte);
|
|
|
|
if (!wpte)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_write(vcpu, wpte->pte_phys,
|
|
|
|
wpte->pte_val);
|
|
|
|
}
|
|
|
|
case KVM_MMU_OP_FLUSH_TLB: {
|
|
|
|
struct kvm_mmu_op_flush_tlb *ftlb;
|
|
|
|
|
|
|
|
ftlb = pv_mmu_read_buffer(buffer, sizeof *ftlb);
|
|
|
|
if (!ftlb)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_flush_tlb(vcpu);
|
|
|
|
}
|
|
|
|
case KVM_MMU_OP_RELEASE_PT: {
|
|
|
|
struct kvm_mmu_op_release_pt *rpt;
|
|
|
|
|
|
|
|
rpt = pv_mmu_read_buffer(buffer, sizeof *rpt);
|
|
|
|
if (!rpt)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_release_pt(vcpu, rpt->pt_phys);
|
|
|
|
}
|
|
|
|
default: return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
|
|
|
|
gpa_t addr, unsigned long *ret)
|
|
|
|
{
|
|
|
|
int r;
|
2008-08-11 10:01:49 -07:00
|
|
|
struct kvm_pv_mmu_op_buffer *buffer = &vcpu->arch.mmu_op_buffer;
|
2008-02-22 12:21:37 -05:00
|
|
|
|
2008-08-11 10:01:49 -07:00
|
|
|
buffer->ptr = buffer->buf;
|
|
|
|
buffer->len = min_t(unsigned long, bytes, sizeof buffer->buf);
|
|
|
|
buffer->processed = 0;
|
2008-02-22 12:21:37 -05:00
|
|
|
|
2008-08-11 10:01:49 -07:00
|
|
|
r = kvm_read_guest(vcpu->kvm, addr, buffer->buf, buffer->len);
|
2008-02-22 12:21:37 -05:00
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
2008-08-11 10:01:49 -07:00
|
|
|
while (buffer->len) {
|
|
|
|
r = kvm_pv_mmu_op_one(vcpu, buffer);
|
2008-02-22 12:21:37 -05:00
|
|
|
if (r < 0)
|
|
|
|
goto out;
|
|
|
|
if (r == 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
r = 1;
|
|
|
|
out:
|
2008-08-11 10:01:49 -07:00
|
|
|
*ret = buffer->processed;
|
2008-02-22 12:21:37 -05:00
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2009-06-11 12:07:42 -03:00
|
|
|
int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
|
|
|
|
{
|
|
|
|
struct kvm_shadow_walk_iterator iterator;
|
|
|
|
int nr_sptes = 0;
|
|
|
|
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
for_each_shadow_entry(vcpu, addr, iterator) {
|
|
|
|
sptes[iterator.level-1] = *iterator.sptep;
|
|
|
|
nr_sptes++;
|
|
|
|
if (!is_shadow_present_pte(*iterator.sptep))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
return nr_sptes;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
|
|
|
|
|
2007-01-05 16:36:56 -08:00
|
|
|
#ifdef AUDIT
|
|
|
|
|
|
|
|
static const char *audit_msg;
|
|
|
|
|
|
|
|
static gva_t canonicalize(gva_t gva)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
gva = (long long)(gva << 16) >> 16;
|
|
|
|
#endif
|
|
|
|
return gva;
|
|
|
|
}
|
|
|
|
|
2009-06-10 12:27:04 -03:00
|
|
|
|
|
|
|
typedef void (*inspect_spte_fn) (struct kvm *kvm, struct kvm_mmu_page *sp,
|
|
|
|
u64 *sptep);
|
|
|
|
|
|
|
|
static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp,
|
|
|
|
inspect_spte_fn fn)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
u64 ent = sp->spt[i];
|
|
|
|
|
|
|
|
if (is_shadow_present_pte(ent)) {
|
2009-06-10 12:27:08 -03:00
|
|
|
if (!is_last_spte(ent, sp->role.level)) {
|
2009-06-10 12:27:04 -03:00
|
|
|
struct kvm_mmu_page *child;
|
|
|
|
child = page_header(ent & PT64_BASE_ADDR_MASK);
|
|
|
|
__mmu_spte_walk(kvm, child, fn);
|
2009-06-10 12:27:08 -03:00
|
|
|
} else
|
2009-06-10 12:27:04 -03:00
|
|
|
fn(kvm, sp, &sp->spt[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
|
|
|
return;
|
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
|
|
|
sp = page_header(root);
|
|
|
|
__mmu_spte_walk(vcpu->kvm, sp, fn);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < 4; ++i) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
|
|
|
|
|
|
|
if (root && VALID_PAGE(root)) {
|
|
|
|
root &= PT64_BASE_ADDR_MASK;
|
|
|
|
sp = page_header(root);
|
|
|
|
__mmu_spte_walk(vcpu->kvm, sp, fn);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2007-01-05 16:36:56 -08:00
|
|
|
static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
|
|
|
|
gva_t va, int level)
|
|
|
|
{
|
|
|
|
u64 *pt = __va(page_pte & PT64_BASE_ADDR_MASK);
|
|
|
|
int i;
|
|
|
|
gva_t va_delta = 1ul << (PAGE_SHIFT + 9 * (level - 1));
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i, va += va_delta) {
|
|
|
|
u64 ent = pt[i];
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
if (ent == shadow_trap_nonpresent_pte)
|
2007-01-05 16:36:56 -08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
va = canonicalize(va);
|
2009-06-10 12:27:08 -03:00
|
|
|
if (is_shadow_present_pte(ent) && !is_last_spte(ent, level))
|
|
|
|
audit_mappings_page(vcpu, ent, va, level - 1);
|
|
|
|
else {
|
2007-12-13 23:50:52 +08:00
|
|
|
gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, va);
|
2009-04-25 12:43:21 +02:00
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
|
|
|
pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
|
|
|
hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT;
|
2007-01-05 16:36:56 -08:00
|
|
|
|
2009-06-10 12:27:07 -03:00
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
if (is_shadow_present_pte(ent)
|
2007-01-05 16:36:56 -08:00
|
|
|
&& (ent & PT64_BASE_ADDR_MASK) != hpa)
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
printk(KERN_ERR "xx audit error: (%s) levels %d"
|
|
|
|
" gva %lx gpa %llx hpa %llx ent %llx %d\n",
|
2007-12-13 23:50:52 +08:00
|
|
|
audit_msg, vcpu->arch.mmu.root_level,
|
2007-10-08 09:02:08 -04:00
|
|
|
va, gpa, hpa, ent,
|
|
|
|
is_shadow_present_pte(ent));
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
else if (ent == shadow_notrap_nonpresent_pte
|
|
|
|
&& !is_error_hpa(hpa))
|
|
|
|
printk(KERN_ERR "audit: (%s) notrap shadow,"
|
|
|
|
" valid guest gva %lx\n", audit_msg, va);
|
2008-04-02 14:46:56 -05:00
|
|
|
kvm_release_pfn_clean(pfn);
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 18:58:32 +02:00
|
|
|
|
2007-01-05 16:36:56 -08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void audit_mappings(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-03-08 11:48:09 +02:00
|
|
|
unsigned i;
|
2007-01-05 16:36:56 -08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.root_level == 4)
|
|
|
|
audit_mappings_page(vcpu, vcpu->arch.mmu.root_hpa, 0, 4);
|
2007-01-05 16:36:56 -08:00
|
|
|
else
|
|
|
|
for (i = 0; i < 4; ++i)
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.pae_root[i] & PT_PRESENT_MASK)
|
2007-01-05 16:36:56 -08:00
|
|
|
audit_mappings_page(vcpu,
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i],
|
2007-01-05 16:36:56 -08:00
|
|
|
i << 30,
|
|
|
|
2);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int count_rmaps(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
int nmaps = 0;
|
|
|
|
int i, j, k;
|
|
|
|
|
|
|
|
for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
|
|
|
|
struct kvm_memory_slot *m = &vcpu->kvm->memslots[i];
|
|
|
|
struct kvm_rmap_desc *d;
|
|
|
|
|
|
|
|
for (j = 0; j < m->npages; ++j) {
|
2007-09-27 14:11:22 +02:00
|
|
|
unsigned long *rmapp = &m->rmap[j];
|
2007-01-05 16:36:56 -08:00
|
|
|
|
2007-09-27 14:11:22 +02:00
|
|
|
if (!*rmapp)
|
2007-01-05 16:36:56 -08:00
|
|
|
continue;
|
2007-09-27 14:11:22 +02:00
|
|
|
if (!(*rmapp & 1)) {
|
2007-01-05 16:36:56 -08:00
|
|
|
++nmaps;
|
|
|
|
continue;
|
|
|
|
}
|
2007-09-27 14:11:22 +02:00
|
|
|
d = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
2007-01-05 16:36:56 -08:00
|
|
|
while (d) {
|
|
|
|
for (k = 0; k < RMAP_EXT; ++k)
|
2009-06-10 14:24:23 +03:00
|
|
|
if (d->sptes[k])
|
2007-01-05 16:36:56 -08:00
|
|
|
++nmaps;
|
|
|
|
else
|
|
|
|
break;
|
|
|
|
d = d->more;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return nmaps;
|
|
|
|
}
|
|
|
|
|
2009-06-10 12:27:04 -03:00
|
|
|
void inspect_spte_has_rmap(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *sptep)
|
|
|
|
{
|
|
|
|
unsigned long *rmapp;
|
|
|
|
struct kvm_mmu_page *rev_sp;
|
|
|
|
gfn_t gfn;
|
|
|
|
|
|
|
|
if (*sptep & PT_WRITABLE_MASK) {
|
|
|
|
rev_sp = page_header(__pa(sptep));
|
|
|
|
gfn = rev_sp->gfns[sptep - rev_sp->spt];
|
|
|
|
|
|
|
|
if (!gfn_to_memslot(kvm, gfn)) {
|
|
|
|
if (!printk_ratelimit())
|
|
|
|
return;
|
|
|
|
printk(KERN_ERR "%s: no memslot for gfn %ld\n",
|
|
|
|
audit_msg, gfn);
|
|
|
|
printk(KERN_ERR "%s: index %ld of sp (gfn=%lx)\n",
|
|
|
|
audit_msg, sptep - rev_sp->spt,
|
|
|
|
rev_sp->gfn);
|
|
|
|
dump_stack();
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2009-06-10 12:27:08 -03:00
|
|
|
rmapp = gfn_to_rmap(kvm, rev_sp->gfns[sptep - rev_sp->spt],
|
|
|
|
is_large_pte(*sptep));
|
2009-06-10 12:27:04 -03:00
|
|
|
if (!*rmapp) {
|
|
|
|
if (!printk_ratelimit())
|
|
|
|
return;
|
|
|
|
printk(KERN_ERR "%s: no rmap for writable spte %llx\n",
|
|
|
|
audit_msg, *sptep);
|
|
|
|
dump_stack();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
mmu_spte_walk(vcpu, inspect_spte_has_rmap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu)
|
2007-01-05 16:36:56 -08:00
|
|
|
{
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-05 16:36:56 -08:00
|
|
|
int i;
|
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) {
|
2007-11-21 15:28:32 +02:00
|
|
|
u64 *pt = sp->spt;
|
2007-01-05 16:36:56 -08:00
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
if (sp->role.level != PT_PAGE_TABLE_LEVEL)
|
2007-01-05 16:36:56 -08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
u64 ent = pt[i];
|
|
|
|
|
|
|
|
if (!(ent & PT_PRESENT_MASK))
|
|
|
|
continue;
|
|
|
|
if (!(ent & PT_WRITABLE_MASK))
|
|
|
|
continue;
|
2009-06-10 12:27:04 -03:00
|
|
|
inspect_spte_has_rmap(vcpu->kvm, sp, &pt[i]);
|
2007-01-05 16:36:56 -08:00
|
|
|
}
|
|
|
|
}
|
2009-06-10 12:27:04 -03:00
|
|
|
return;
|
2007-01-05 16:36:56 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void audit_rmap(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2009-06-10 12:27:04 -03:00
|
|
|
check_writable_mappings_rmap(vcpu);
|
|
|
|
count_rmaps(vcpu);
|
2007-01-05 16:36:56 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void audit_write_protection(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-11-21 15:28:32 +02:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-09-27 14:11:22 +02:00
|
|
|
struct kvm_memory_slot *slot;
|
|
|
|
unsigned long *rmapp;
|
2009-06-10 12:27:05 -03:00
|
|
|
u64 *spte;
|
2007-09-27 14:11:22 +02:00
|
|
|
gfn_t gfn;
|
2007-01-05 16:36:56 -08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) {
|
2009-01-11 13:02:10 +02:00
|
|
|
if (sp->role.direct)
|
2007-01-05 16:36:56 -08:00
|
|
|
continue;
|
2009-06-10 12:27:05 -03:00
|
|
|
if (sp->unsync)
|
|
|
|
continue;
|
2007-01-05 16:36:56 -08:00
|
|
|
|
2007-11-21 15:28:32 +02:00
|
|
|
gfn = unalias_gfn(vcpu->kvm, sp->gfn);
|
2008-10-03 17:40:32 +03:00
|
|
|
slot = gfn_to_memslot_unaliased(vcpu->kvm, sp->gfn);
|
2007-09-27 14:11:22 +02:00
|
|
|
rmapp = &slot->rmap[gfn - slot->base_gfn];
|
2009-06-10 12:27:05 -03:00
|
|
|
|
|
|
|
spte = rmap_next(vcpu->kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
if (*spte & PT_WRITABLE_MASK)
|
|
|
|
printk(KERN_ERR "%s: (%s) shadow page has "
|
|
|
|
"writable mappings: gfn %lx role %x\n",
|
2008-03-03 12:59:56 -08:00
|
|
|
__func__, audit_msg, sp->gfn,
|
2007-11-21 15:28:32 +02:00
|
|
|
sp->role.word);
|
2009-06-10 12:27:05 -03:00
|
|
|
spte = rmap_next(vcpu->kvm, rmapp, spte);
|
|
|
|
}
|
2007-01-05 16:36:56 -08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg)
|
|
|
|
{
|
|
|
|
int olddbg = dbg;
|
|
|
|
|
|
|
|
dbg = 0;
|
|
|
|
audit_msg = msg;
|
|
|
|
audit_rmap(vcpu);
|
|
|
|
audit_write_protection(vcpu);
|
2009-06-10 12:27:07 -03:00
|
|
|
if (strcmp("pre pte write", audit_msg) != 0)
|
|
|
|
audit_mappings(vcpu);
|
2009-06-10 12:27:04 -03:00
|
|
|
audit_writable_sptes_have_rmaps(vcpu);
|
2007-01-05 16:36:56 -08:00
|
|
|
dbg = olddbg;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|