mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-08 14:23:19 +00:00
OK, this has the big virtio 1.0 implementation, as specified by OASIS.
On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU5B9cAAoJENkgDmzRrbjxPacP/jajliXX353JJ/g/hkZ6oDN5 o7FhELBKiUMr7enVZYwj2BBYk5OM36nB9pQkiqHMSbjJGoS5IK70enxb4YRxSHBn YCLblZMNqutGS0kclZ9DDysztjAhxH7CvLM6pMZ7eHP0f3+FM/QhbxHfbG9DTBUH 2U/nybvd3M/+YBe7ptwQdrH8aOCAD6RTIsXellfm99dNMK6K/5lqnWQ98WSXmNXq vyvdaAQsqqUkmxtajjcBumaCH4/SehOJJjUqojCMsR3aBkgOBWDZJURMek+KA5Dt X996fBsTAlvTtCUKRrmLTb2ScDH7fu+jwbWRqMYDk8zpEr3XqiLTTPV4/TiHGmi7 Wiw3g1wIY1YbETlZyongB5MIoVyUfmDAd+bT8nBsj3KIITD84gOUQFDMl6d63c0I z6A9Pu/UzpJGsXZT3WoFLi6TO67QyhOseqZnhS4wBgLabjxffNM7yov9RVKUVH/n JHunnpUk2iTtSgscBarOBz5867dstuurnaUIspZthVBo6y6N0z+GrU+agJ8Y4DXx mvwzeYLhQH2208PjxPFiah/kA/gHNm1m678TbpS+CUsgmpQiJ4gTwtazDSi4TwZY Hs9T9GulkzpZIzEyKL3qG2TsfyDhW5Avn+GvKInAT9+Fkig4BnP3DUONBxcwGZ78 eI3FDUWsE36NqE5ECWmz =ivCe -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits) virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice. virtio_net: unconditionally define struct virtio_net_hdr_v1. tools/lguest: don't use legacy definitions for net device in example launcher. virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined. tools/lguest: use common error macros in the example launcher. tools/lguest: give virtqueues names for better error messages tools/lguest: more documentation and checking of virtio 1.0 compliance. lguest: don't look in console features to find emerg_wr. tools/lguest: don't start devices until DRIVER_OK status set. tools/lguest: handle indirect partway through chain. tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: rename virtio_pci_cfg_cap field to match spec. tools/lguest: fix features_accepted logic in example launcher. tools/lguest: handle device reset correctly in example launcher. virtual: Documentation: simplify and generalize paravirt_ops.txt lguest: remove NOTIFY call and eventfd facility. lguest: remove NOTIFY facility from demonstration launcher. lguest: use the PCI console device's emerg_wr for early boot messages. lguest: always put console in PCI slot #1. ...
This commit is contained in:
commit
53861af9a1
@ -1,137 +0,0 @@
|
||||
Paravirt_ops on IA64
|
||||
====================
|
||||
21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp>
|
||||
|
||||
|
||||
Introduction
|
||||
------------
|
||||
The aim of this documentation is to help with maintainability and/or to
|
||||
encourage people to use paravirt_ops/IA64.
|
||||
|
||||
paravirt_ops (pv_ops in short) is a way for virtualization support of
|
||||
Linux kernel on x86. Several ways for virtualization support were
|
||||
proposed, paravirt_ops is the winner.
|
||||
On the other hand, now there are also several IA64 virtualization
|
||||
technologies like kvm/IA64, xen/IA64 and many other academic IA64
|
||||
hypervisors so that it is good to add generic virtualization
|
||||
infrastructure on Linux/IA64.
|
||||
|
||||
|
||||
What is paravirt_ops?
|
||||
---------------------
|
||||
It has been developed on x86 as virtualization support via API, not ABI.
|
||||
It allows each hypervisor to override operations which are important for
|
||||
hypervisors at API level. And it allows a single kernel binary to run on
|
||||
all supported execution environments including native machine.
|
||||
Essentially paravirt_ops is a set of function pointers which represent
|
||||
operations corresponding to low level sensitive instructions and high
|
||||
level functionalities in various area. But one significant difference
|
||||
from usual function pointer table is that it allows optimization with
|
||||
binary patch. It is because some of these operations are very
|
||||
performance sensitive and indirect call overhead is not negligible.
|
||||
With binary patch, indirect C function call can be transformed into
|
||||
direct C function call or in-place execution to eliminate the overhead.
|
||||
|
||||
Thus, operations of paravirt_ops are classified into three categories.
|
||||
- simple indirect call
|
||||
These operations correspond to high level functionality so that the
|
||||
overhead of indirect call isn't very important.
|
||||
|
||||
- indirect call which allows optimization with binary patch
|
||||
Usually these operations correspond to low level instructions. They
|
||||
are called frequently and performance critical. So the overhead is
|
||||
very important.
|
||||
|
||||
- a set of macros for hand written assembly code
|
||||
Hand written assembly codes (.S files) also need paravirtualization
|
||||
because they include sensitive instructions or some of code paths in
|
||||
them are very performance critical.
|
||||
|
||||
|
||||
The relation to the IA64 machine vector
|
||||
---------------------------------------
|
||||
Linux/IA64 has the IA64 machine vector functionality which allows the
|
||||
kernel to switch implementations (e.g. initialization, ipi, dma api...)
|
||||
depending on executing platform.
|
||||
We can replace some implementations very easily defining a new machine
|
||||
vector. Thus another approach for virtualization support would be
|
||||
enhancing the machine vector functionality.
|
||||
But paravirt_ops approach was taken because
|
||||
- virtualization support needs wider support than machine vector does.
|
||||
e.g. low level instruction paravirtualization. It must be
|
||||
initialized very early before platform detection.
|
||||
|
||||
- virtualization support needs more functionality like binary patch.
|
||||
Probably the calling overhead might not be very large compared to the
|
||||
emulation overhead of virtualization. However in the native case, the
|
||||
overhead should be eliminated completely.
|
||||
A single kernel binary should run on each environment including native,
|
||||
and the overhead of paravirt_ops on native environment should be as
|
||||
small as possible.
|
||||
|
||||
- for full virtualization technology, e.g. KVM/IA64 or
|
||||
Xen/IA64 HVM domain, the result would be
|
||||
(the emulated platform machine vector. probably dig) + (pv_ops).
|
||||
This means that the virtualization support layer should be under
|
||||
the machine vector layer.
|
||||
|
||||
Possibly it might be better to move some function pointers from
|
||||
paravirt_ops to machine vector. In fact, Xen domU case utilizes both
|
||||
pv_ops and machine vector.
|
||||
|
||||
|
||||
IA64 paravirt_ops
|
||||
-----------------
|
||||
In this section, the concrete paravirt_ops will be discussed.
|
||||
Because of the architecture difference between ia64 and x86, the
|
||||
resulting set of functions is very different from x86 pv_ops.
|
||||
|
||||
- C function pointer tables
|
||||
They are not very performance critical so that simple C indirect
|
||||
function call is acceptable. The following structures are defined at
|
||||
this moment. For details see linux/include/asm-ia64/paravirt.h
|
||||
- struct pv_info
|
||||
This structure describes the execution environment.
|
||||
- struct pv_init_ops
|
||||
This structure describes the various initialization hooks.
|
||||
- struct pv_iosapic_ops
|
||||
This structure describes hooks to iosapic operations.
|
||||
- struct pv_irq_ops
|
||||
This structure describes hooks to irq related operations
|
||||
- struct pv_time_op
|
||||
This structure describes hooks to steal time accounting.
|
||||
|
||||
- a set of indirect calls which need optimization
|
||||
Currently this class of functions correspond to a subset of IA64
|
||||
intrinsics. At this moment the optimization with binary patch isn't
|
||||
implemented yet.
|
||||
struct pv_cpu_op is defined. For details see
|
||||
linux/include/asm-ia64/paravirt_privop.h
|
||||
Mostly they correspond to ia64 intrinsics 1-to-1.
|
||||
Caveat: Now they are defined as C indirect function pointers, but in
|
||||
order to support binary patch optimization, they will be changed
|
||||
using GCC extended inline assembly code.
|
||||
|
||||
- a set of macros for hand written assembly code (.S files)
|
||||
For maintenance purpose, the taken approach for .S files is single
|
||||
source code and compile multiple times with different macros definitions.
|
||||
Each pv_ops instance must define those macros to compile.
|
||||
The important thing here is that sensitive, but non-privileged
|
||||
instructions must be paravirtualized and that some privileged
|
||||
instructions also need paravirtualization for reasonable performance.
|
||||
Developers who modify .S files must be aware of that. At this moment
|
||||
an easy checker is implemented to detect paravirtualization breakage.
|
||||
But it doesn't cover all the cases.
|
||||
|
||||
Sometimes this set of macros is called pv_cpu_asm_op. But there is no
|
||||
corresponding structure in the source code.
|
||||
Those macros mostly 1:1 correspond to a subset of privileged
|
||||
instructions. See linux/include/asm-ia64/native/inst.h.
|
||||
And some functions written in assembly also need to be overrided so
|
||||
that each pv_ops instance have to define some macros. Again see
|
||||
linux/include/asm-ia64/native/inst.h.
|
||||
|
||||
|
||||
Those structures must be initialized very early before start_kernel.
|
||||
Probably initialized in head.S using multi entry point or some other trick.
|
||||
For native case implementation see linux/arch/ia64/kernel/paravirt.c.
|
@ -2,6 +2,9 @@ Virtualization support in the Linux kernel.
|
||||
|
||||
00-INDEX
|
||||
- this file.
|
||||
|
||||
paravirt_ops.txt
|
||||
- Describes the Linux kernel pv_ops to support different hypervisors
|
||||
kvm/
|
||||
- Kernel Virtual Machine. See also http://linux-kvm.org
|
||||
uml/
|
||||
|
32
Documentation/virtual/paravirt_ops.txt
Normal file
32
Documentation/virtual/paravirt_ops.txt
Normal file
@ -0,0 +1,32 @@
|
||||
Paravirt_ops
|
||||
============
|
||||
|
||||
Linux provides support for different hypervisor virtualization technologies.
|
||||
Historically different binary kernels would be required in order to support
|
||||
different hypervisors, this restriction was removed with pv_ops.
|
||||
Linux pv_ops is a virtualization API which enables support for different
|
||||
hypervisors. It allows each hypervisor to override critical operations and
|
||||
allows a single kernel binary to run on all supported execution environments
|
||||
including native machine -- without any hypervisors.
|
||||
|
||||
pv_ops provides a set of function pointers which represent operations
|
||||
corresponding to low level critical instructions and high level
|
||||
functionalities in various areas. pv-ops allows for optimizations at run
|
||||
time by enabling binary patching of the low-ops critical operations
|
||||
at boot time.
|
||||
|
||||
pv_ops operations are classified into three categories:
|
||||
|
||||
- simple indirect call
|
||||
These operations correspond to high level functionality where it is
|
||||
known that the overhead of indirect call isn't very important.
|
||||
|
||||
- indirect call which allows optimization with binary patch
|
||||
Usually these operations correspond to low level critical instructions. They
|
||||
are called frequently and are performance critical. The overhead is
|
||||
very important.
|
||||
|
||||
- a set of macros for hand written assembly code
|
||||
Hand written assembly codes (.S files) also need paravirtualization
|
||||
because they include sensitive instructions or some of code paths in
|
||||
them are very performance critical.
|
@ -7302,7 +7302,7 @@ M: Alok Kataria <akataria@vmware.com>
|
||||
M: Rusty Russell <rusty@rustcorp.com.au>
|
||||
L: virtualization@lists.linux-foundation.org
|
||||
S: Supported
|
||||
F: Documentation/ia64/paravirt_ops.txt
|
||||
F: Documentation/virtual/paravirt_ops.txt
|
||||
F: arch/*/kernel/paravirt*
|
||||
F: arch/*/include/asm/paravirt.h
|
||||
|
||||
|
@ -1,35 +0,0 @@
|
||||
/* ASB2305 PCI I/O mapping handler
|
||||
*
|
||||
* Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
|
||||
* Written by David Howells (dhowells@redhat.com)
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public Licence
|
||||
* as published by the Free Software Foundation; either version
|
||||
* 2 of the Licence, or (at your option) any later version.
|
||||
*/
|
||||
#include <linux/pci.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
/*
|
||||
* Create a virtual mapping cookie for a PCI BAR (memory or IO)
|
||||
*/
|
||||
void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
|
||||
{
|
||||
resource_size_t start = pci_resource_start(dev, bar);
|
||||
resource_size_t len = pci_resource_len(dev, bar);
|
||||
unsigned long flags = pci_resource_flags(dev, bar);
|
||||
|
||||
if (!len || !start)
|
||||
return NULL;
|
||||
|
||||
if ((flags & IORESOURCE_IO) || (flags & IORESOURCE_MEM)) {
|
||||
if (flags & IORESOURCE_CACHEABLE && !(flags & IORESOURCE_IO))
|
||||
return ioremap(start, len);
|
||||
else
|
||||
return ioremap_nocache(start, len);
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
EXPORT_SYMBOL(pci_iomap);
|
@ -16,6 +16,7 @@
|
||||
struct zpci_iomap_entry {
|
||||
u32 fh;
|
||||
u8 bar;
|
||||
u16 count;
|
||||
};
|
||||
|
||||
extern struct zpci_iomap_entry *zpci_iomap_start;
|
||||
|
@ -259,7 +259,10 @@ void __iowrite64_copy(void __iomem *to, const void *from, size_t count)
|
||||
}
|
||||
|
||||
/* Create a virtual mapping cookie for a PCI BAR */
|
||||
void __iomem *pci_iomap(struct pci_dev *pdev, int bar, unsigned long max)
|
||||
void __iomem *pci_iomap_range(struct pci_dev *pdev,
|
||||
int bar,
|
||||
unsigned long offset,
|
||||
unsigned long max)
|
||||
{
|
||||
struct zpci_dev *zdev = get_zdev(pdev);
|
||||
u64 addr;
|
||||
@ -270,14 +273,27 @@ void __iomem *pci_iomap(struct pci_dev *pdev, int bar, unsigned long max)
|
||||
|
||||
idx = zdev->bars[bar].map_idx;
|
||||
spin_lock(&zpci_iomap_lock);
|
||||
zpci_iomap_start[idx].fh = zdev->fh;
|
||||
zpci_iomap_start[idx].bar = bar;
|
||||
if (zpci_iomap_start[idx].count++) {
|
||||
BUG_ON(zpci_iomap_start[idx].fh != zdev->fh ||
|
||||
zpci_iomap_start[idx].bar != bar);
|
||||
} else {
|
||||
zpci_iomap_start[idx].fh = zdev->fh;
|
||||
zpci_iomap_start[idx].bar = bar;
|
||||
}
|
||||
/* Detect overrun */
|
||||
BUG_ON(!zpci_iomap_start[idx].count);
|
||||
spin_unlock(&zpci_iomap_lock);
|
||||
|
||||
addr = ZPCI_IOMAP_ADDR_BASE | ((u64) idx << 48);
|
||||
return (void __iomem *) addr;
|
||||
return (void __iomem *) addr + offset;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(pci_iomap);
|
||||
EXPORT_SYMBOL_GPL(pci_iomap_range);
|
||||
|
||||
void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
|
||||
{
|
||||
return pci_iomap_range(dev, bar, 0, maxlen);
|
||||
}
|
||||
EXPORT_SYMBOL(pci_iomap);
|
||||
|
||||
void pci_iounmap(struct pci_dev *pdev, void __iomem *addr)
|
||||
{
|
||||
@ -285,8 +301,12 @@ void pci_iounmap(struct pci_dev *pdev, void __iomem *addr)
|
||||
|
||||
idx = (((__force u64) addr) & ~ZPCI_IOMAP_ADDR_BASE) >> 48;
|
||||
spin_lock(&zpci_iomap_lock);
|
||||
zpci_iomap_start[idx].fh = 0;
|
||||
zpci_iomap_start[idx].bar = 0;
|
||||
/* Detect underrun */
|
||||
BUG_ON(!zpci_iomap_start[idx].count);
|
||||
if (!--zpci_iomap_start[idx].count) {
|
||||
zpci_iomap_start[idx].fh = 0;
|
||||
zpci_iomap_start[idx].bar = 0;
|
||||
}
|
||||
spin_unlock(&zpci_iomap_lock);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(pci_iounmap);
|
||||
|
@ -16,7 +16,6 @@
|
||||
#define LHCALL_SET_PTE 14
|
||||
#define LHCALL_SET_PGD 15
|
||||
#define LHCALL_LOAD_TLS 16
|
||||
#define LHCALL_NOTIFY 17
|
||||
#define LHCALL_LOAD_GDT_ENTRY 18
|
||||
#define LHCALL_SEND_INTERRUPTS 19
|
||||
|
||||
|
@ -56,6 +56,9 @@
|
||||
#include <linux/virtio_console.h>
|
||||
#include <linux/pm.h>
|
||||
#include <linux/export.h>
|
||||
#include <linux/pci.h>
|
||||
#include <linux/virtio_pci.h>
|
||||
#include <asm/acpi.h>
|
||||
#include <asm/apic.h>
|
||||
#include <asm/lguest.h>
|
||||
#include <asm/paravirt.h>
|
||||
@ -71,6 +74,8 @@
|
||||
#include <asm/stackprotector.h>
|
||||
#include <asm/reboot.h> /* for struct machine_ops */
|
||||
#include <asm/kvm_para.h>
|
||||
#include <asm/pci_x86.h>
|
||||
#include <asm/pci-direct.h>
|
||||
|
||||
/*G:010
|
||||
* Welcome to the Guest!
|
||||
@ -831,6 +836,24 @@ static struct irq_chip lguest_irq_controller = {
|
||||
.irq_unmask = enable_lguest_irq,
|
||||
};
|
||||
|
||||
static int lguest_enable_irq(struct pci_dev *dev)
|
||||
{
|
||||
u8 line = 0;
|
||||
|
||||
/* We literally use the PCI interrupt line as the irq number. */
|
||||
pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &line);
|
||||
irq_set_chip_and_handler_name(line, &lguest_irq_controller,
|
||||
handle_level_irq, "level");
|
||||
dev->irq = line;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* We don't do hotplug PCI, so this shouldn't be called. */
|
||||
static void lguest_disable_irq(struct pci_dev *dev)
|
||||
{
|
||||
WARN_ON(1);
|
||||
}
|
||||
|
||||
/*
|
||||
* This sets up the Interrupt Descriptor Table (IDT) entry for each hardware
|
||||
* interrupt (except 128, which is used for system calls), and then tells the
|
||||
@ -1181,25 +1204,136 @@ static __init char *lguest_memory_setup(void)
|
||||
return "LGUEST";
|
||||
}
|
||||
|
||||
/* Offset within PCI config space of BAR access capability. */
|
||||
static int console_cfg_offset = 0;
|
||||
static int console_access_cap;
|
||||
|
||||
/* Set up so that we access off in bar0 (on bus 0, device 1, function 0) */
|
||||
static void set_cfg_window(u32 cfg_offset, u32 off)
|
||||
{
|
||||
write_pci_config_byte(0, 1, 0,
|
||||
cfg_offset + offsetof(struct virtio_pci_cap, bar),
|
||||
0);
|
||||
write_pci_config(0, 1, 0,
|
||||
cfg_offset + offsetof(struct virtio_pci_cap, length),
|
||||
4);
|
||||
write_pci_config(0, 1, 0,
|
||||
cfg_offset + offsetof(struct virtio_pci_cap, offset),
|
||||
off);
|
||||
}
|
||||
|
||||
static void write_bar_via_cfg(u32 cfg_offset, u32 off, u32 val)
|
||||
{
|
||||
/*
|
||||
* We could set this up once, then leave it; nothing else in the *
|
||||
* kernel should touch these registers. But if it went wrong, that
|
||||
* would be a horrible bug to find.
|
||||
*/
|
||||
set_cfg_window(cfg_offset, off);
|
||||
write_pci_config(0, 1, 0,
|
||||
cfg_offset + sizeof(struct virtio_pci_cap), val);
|
||||
}
|
||||
|
||||
static void probe_pci_console(void)
|
||||
{
|
||||
u8 cap, common_cap = 0, device_cap = 0;
|
||||
/* Offset within BAR0 */
|
||||
u32 device_offset;
|
||||
u32 device_len;
|
||||
|
||||
/* Avoid recursive printk into here. */
|
||||
console_cfg_offset = -1;
|
||||
|
||||
if (!early_pci_allowed()) {
|
||||
printk(KERN_ERR "lguest: early PCI access not allowed!\n");
|
||||
return;
|
||||
}
|
||||
|
||||
/* We expect a console PCI device at BUS0, slot 1. */
|
||||
if (read_pci_config(0, 1, 0, 0) != 0x10431AF4) {
|
||||
printk(KERN_ERR "lguest: PCI device is %#x!\n",
|
||||
read_pci_config(0, 1, 0, 0));
|
||||
return;
|
||||
}
|
||||
|
||||
/* Find the capabilities we need (must be in bar0) */
|
||||
cap = read_pci_config_byte(0, 1, 0, PCI_CAPABILITY_LIST);
|
||||
while (cap) {
|
||||
u8 vndr = read_pci_config_byte(0, 1, 0, cap);
|
||||
if (vndr == PCI_CAP_ID_VNDR) {
|
||||
u8 type, bar;
|
||||
u32 offset, length;
|
||||
|
||||
type = read_pci_config_byte(0, 1, 0,
|
||||
cap + offsetof(struct virtio_pci_cap, cfg_type));
|
||||
bar = read_pci_config_byte(0, 1, 0,
|
||||
cap + offsetof(struct virtio_pci_cap, bar));
|
||||
offset = read_pci_config(0, 1, 0,
|
||||
cap + offsetof(struct virtio_pci_cap, offset));
|
||||
length = read_pci_config(0, 1, 0,
|
||||
cap + offsetof(struct virtio_pci_cap, length));
|
||||
|
||||
switch (type) {
|
||||
case VIRTIO_PCI_CAP_DEVICE_CFG:
|
||||
if (bar == 0) {
|
||||
device_cap = cap;
|
||||
device_offset = offset;
|
||||
device_len = length;
|
||||
}
|
||||
break;
|
||||
case VIRTIO_PCI_CAP_PCI_CFG:
|
||||
console_access_cap = cap;
|
||||
break;
|
||||
}
|
||||
}
|
||||
cap = read_pci_config_byte(0, 1, 0, cap + PCI_CAP_LIST_NEXT);
|
||||
}
|
||||
if (!device_cap || !console_access_cap) {
|
||||
printk(KERN_ERR "lguest: No caps (%u/%u/%u) in console!\n",
|
||||
common_cap, device_cap, console_access_cap);
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* Note that we can't check features, until we've set the DRIVER
|
||||
* status bit. We don't want to do that until we have a real driver,
|
||||
* so we just check that the device-specific config has room for
|
||||
* emerg_wr. If it doesn't support VIRTIO_CONSOLE_F_EMERG_WRITE
|
||||
* it should ignore the access.
|
||||
*/
|
||||
if (device_len < (offsetof(struct virtio_console_config, emerg_wr)
|
||||
+ sizeof(u32))) {
|
||||
printk(KERN_ERR "lguest: console missing emerg_wr field\n");
|
||||
return;
|
||||
}
|
||||
|
||||
console_cfg_offset = device_offset;
|
||||
printk(KERN_INFO "lguest: Console via virtio-pci emerg_wr\n");
|
||||
}
|
||||
|
||||
/*
|
||||
* We will eventually use the virtio console device to produce console output,
|
||||
* but before that is set up we use LHCALL_NOTIFY on normal memory to produce
|
||||
* console output.
|
||||
* but before that is set up we use the virtio PCI console's backdoor mmio
|
||||
* access and the "emergency" write facility (which is legal even before the
|
||||
* device is configured).
|
||||
*/
|
||||
static __init int early_put_chars(u32 vtermno, const char *buf, int count)
|
||||
{
|
||||
char scratch[17];
|
||||
unsigned int len = count;
|
||||
/* If we couldn't find PCI console, forget it. */
|
||||
if (console_cfg_offset < 0)
|
||||
return count;
|
||||
|
||||
/* We use a nul-terminated string, so we make a copy. Icky, huh? */
|
||||
if (len > sizeof(scratch) - 1)
|
||||
len = sizeof(scratch) - 1;
|
||||
scratch[len] = '\0';
|
||||
memcpy(scratch, buf, len);
|
||||
hcall(LHCALL_NOTIFY, __pa(scratch), 0, 0, 0);
|
||||
if (unlikely(!console_cfg_offset)) {
|
||||
probe_pci_console();
|
||||
if (console_cfg_offset < 0)
|
||||
return count;
|
||||
}
|
||||
|
||||
/* This routine returns the number of bytes actually written. */
|
||||
return len;
|
||||
write_bar_via_cfg(console_access_cap,
|
||||
console_cfg_offset
|
||||
+ offsetof(struct virtio_console_config, emerg_wr),
|
||||
buf[0]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -1399,14 +1533,6 @@ __init void lguest_init(void)
|
||||
/* Hook in our special panic hypercall code. */
|
||||
atomic_notifier_chain_register(&panic_notifier_list, &paniced);
|
||||
|
||||
/*
|
||||
* The IDE code spends about 3 seconds probing for disks: if we reserve
|
||||
* all the I/O ports up front it can't get them and so doesn't probe.
|
||||
* Other device drivers are similar (but less severe). This cuts the
|
||||
* kernel boot time on my machine from 4.1 seconds to 0.45 seconds.
|
||||
*/
|
||||
paravirt_disable_iospace();
|
||||
|
||||
/*
|
||||
* This is messy CPU setup stuff which the native boot code does before
|
||||
* start_kernel, so we have to do, too:
|
||||
@ -1436,6 +1562,13 @@ __init void lguest_init(void)
|
||||
/* Register our very early console. */
|
||||
virtio_cons_early_init(early_put_chars);
|
||||
|
||||
/* Don't let ACPI try to control our PCI interrupts. */
|
||||
disable_acpi();
|
||||
|
||||
/* We control them ourselves, by overriding these two hooks. */
|
||||
pcibios_enable_irq = lguest_enable_irq;
|
||||
pcibios_disable_irq = lguest_disable_irq;
|
||||
|
||||
/*
|
||||
* Last of all, we set the power management poweroff hook to point to
|
||||
* the Guest routine to power off, and the reboot hook to our restart
|
||||
|
@ -28,8 +28,7 @@ struct virtio_blk_vq {
|
||||
char name[VQ_NAME_LEN];
|
||||
} ____cacheline_aligned_in_smp;
|
||||
|
||||
struct virtio_blk
|
||||
{
|
||||
struct virtio_blk {
|
||||
struct virtio_device *vdev;
|
||||
|
||||
/* The disk structure for the kernel. */
|
||||
@ -52,8 +51,7 @@ struct virtio_blk
|
||||
struct virtio_blk_vq *vqs;
|
||||
};
|
||||
|
||||
struct virtblk_req
|
||||
{
|
||||
struct virtblk_req {
|
||||
struct request *req;
|
||||
struct virtio_blk_outhdr out_hdr;
|
||||
struct virtio_scsi_inhdr in_hdr;
|
||||
@ -575,6 +573,12 @@ static int virtblk_probe(struct virtio_device *vdev)
|
||||
u16 min_io_size;
|
||||
u8 physical_block_exp, alignment_offset;
|
||||
|
||||
if (!vdev->config->get) {
|
||||
dev_err(&vdev->dev, "%s failure: config access disabled\n",
|
||||
__func__);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
err = ida_simple_get(&vd_index_ida, 0, minor_to_index(1 << MINORBITS),
|
||||
GFP_KERNEL);
|
||||
if (err < 0)
|
||||
|
@ -1986,7 +1986,10 @@ static int virtcons_probe(struct virtio_device *vdev)
|
||||
bool multiport;
|
||||
bool early = early_put_chars != NULL;
|
||||
|
||||
if (!vdev->config->get) {
|
||||
/* We only need a config space if features are offered */
|
||||
if (!vdev->config->get &&
|
||||
(virtio_has_feature(vdev, VIRTIO_CONSOLE_F_SIZE)
|
||||
|| virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT))) {
|
||||
dev_err(&vdev->dev, "%s failure: config access disabled\n",
|
||||
__func__);
|
||||
return -EINVAL;
|
||||
|
@ -1,6 +1,3 @@
|
||||
# Guest requires the device configuration and probing code.
|
||||
obj-$(CONFIG_LGUEST_GUEST) += lguest_device.o
|
||||
|
||||
# Host requires the other files, which can be a module.
|
||||
obj-$(CONFIG_LGUEST) += lg.o
|
||||
lg-y = core.o hypercalls.o page_tables.o interrupts_and_traps.o \
|
||||
|
@ -208,6 +208,14 @@ void __lgwrite(struct lg_cpu *cpu, unsigned long addr, const void *b,
|
||||
*/
|
||||
int run_guest(struct lg_cpu *cpu, unsigned long __user *user)
|
||||
{
|
||||
/* If the launcher asked for a register with LHREQ_GETREG */
|
||||
if (cpu->reg_read) {
|
||||
if (put_user(*cpu->reg_read, user))
|
||||
return -EFAULT;
|
||||
cpu->reg_read = NULL;
|
||||
return sizeof(*cpu->reg_read);
|
||||
}
|
||||
|
||||
/* We stop running once the Guest is dead. */
|
||||
while (!cpu->lg->dead) {
|
||||
unsigned int irq;
|
||||
@ -217,21 +225,12 @@ int run_guest(struct lg_cpu *cpu, unsigned long __user *user)
|
||||
if (cpu->hcall)
|
||||
do_hypercalls(cpu);
|
||||
|
||||
/*
|
||||
* It's possible the Guest did a NOTIFY hypercall to the
|
||||
* Launcher.
|
||||
*/
|
||||
if (cpu->pending_notify) {
|
||||
/*
|
||||
* Does it just needs to write to a registered
|
||||
* eventfd (ie. the appropriate virtqueue thread)?
|
||||
*/
|
||||
if (!send_notify_to_eventfd(cpu)) {
|
||||
/* OK, we tell the main Launcher. */
|
||||
if (put_user(cpu->pending_notify, user))
|
||||
return -EFAULT;
|
||||
return sizeof(cpu->pending_notify);
|
||||
}
|
||||
/* Do we have to tell the Launcher about a trap? */
|
||||
if (cpu->pending.trap) {
|
||||
if (copy_to_user(user, &cpu->pending,
|
||||
sizeof(cpu->pending)))
|
||||
return -EFAULT;
|
||||
return sizeof(cpu->pending);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -117,9 +117,6 @@ static void do_hcall(struct lg_cpu *cpu, struct hcall_args *args)
|
||||
/* Similarly, this sets the halted flag for run_guest(). */
|
||||
cpu->halted = 1;
|
||||
break;
|
||||
case LHCALL_NOTIFY:
|
||||
cpu->pending_notify = args->arg1;
|
||||
break;
|
||||
default:
|
||||
/* It should be an architecture-specific hypercall. */
|
||||
if (lguest_arch_do_hcall(cpu, args))
|
||||
@ -189,7 +186,7 @@ static void do_async_hcalls(struct lg_cpu *cpu)
|
||||
* Stop doing hypercalls if they want to notify the Launcher:
|
||||
* it needs to service this first.
|
||||
*/
|
||||
if (cpu->pending_notify)
|
||||
if (cpu->pending.trap)
|
||||
break;
|
||||
}
|
||||
}
|
||||
@ -280,7 +277,7 @@ void do_hypercalls(struct lg_cpu *cpu)
|
||||
* NOTIFY to the Launcher, we want to return now. Otherwise we do
|
||||
* the hypercall.
|
||||
*/
|
||||
if (!cpu->pending_notify) {
|
||||
if (!cpu->pending.trap) {
|
||||
do_hcall(cpu, cpu->hcall);
|
||||
/*
|
||||
* Tricky point: we reset the hcall pointer to mark the
|
||||
|
@ -50,7 +50,10 @@ struct lg_cpu {
|
||||
/* Bitmap of what has changed: see CHANGED_* above. */
|
||||
int changed;
|
||||
|
||||
unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */
|
||||
/* Pending operation. */
|
||||
struct lguest_pending pending;
|
||||
|
||||
unsigned long *reg_read; /* register from LHREQ_GETREG */
|
||||
|
||||
/* At end of a page shared mapped over lguest_pages in guest. */
|
||||
unsigned long regs_page;
|
||||
@ -78,24 +81,18 @@ struct lg_cpu {
|
||||
struct lg_cpu_arch arch;
|
||||
};
|
||||
|
||||
struct lg_eventfd {
|
||||
unsigned long addr;
|
||||
struct eventfd_ctx *event;
|
||||
};
|
||||
|
||||
struct lg_eventfd_map {
|
||||
unsigned int num;
|
||||
struct lg_eventfd map[];
|
||||
};
|
||||
|
||||
/* The private info the thread maintains about the guest. */
|
||||
struct lguest {
|
||||
struct lguest_data __user *lguest_data;
|
||||
struct lg_cpu cpus[NR_CPUS];
|
||||
unsigned int nr_cpus;
|
||||
|
||||
/* Valid guest memory pages must be < this. */
|
||||
u32 pfn_limit;
|
||||
|
||||
/* Device memory is >= pfn_limit and < device_limit. */
|
||||
u32 device_limit;
|
||||
|
||||
/*
|
||||
* This provides the offset to the base of guest-physical memory in the
|
||||
* Launcher.
|
||||
@ -110,8 +107,6 @@ struct lguest {
|
||||
unsigned int stack_pages;
|
||||
u32 tsc_khz;
|
||||
|
||||
struct lg_eventfd_map *eventfds;
|
||||
|
||||
/* Dead? */
|
||||
const char *dead;
|
||||
};
|
||||
@ -197,8 +192,10 @@ void guest_pagetable_flush_user(struct lg_cpu *cpu);
|
||||
void guest_set_pte(struct lg_cpu *cpu, unsigned long gpgdir,
|
||||
unsigned long vaddr, pte_t val);
|
||||
void map_switcher_in_guest(struct lg_cpu *cpu, struct lguest_pages *pages);
|
||||
bool demand_page(struct lg_cpu *cpu, unsigned long cr2, int errcode);
|
||||
bool demand_page(struct lg_cpu *cpu, unsigned long cr2, int errcode,
|
||||
unsigned long *iomem);
|
||||
void pin_page(struct lg_cpu *cpu, unsigned long vaddr);
|
||||
bool __guest_pa(struct lg_cpu *cpu, unsigned long vaddr, unsigned long *paddr);
|
||||
unsigned long guest_pa(struct lg_cpu *cpu, unsigned long vaddr);
|
||||
void page_table_guest_data_init(struct lg_cpu *cpu);
|
||||
|
||||
@ -210,6 +207,7 @@ void lguest_arch_handle_trap(struct lg_cpu *cpu);
|
||||
int lguest_arch_init_hypercalls(struct lg_cpu *cpu);
|
||||
int lguest_arch_do_hcall(struct lg_cpu *cpu, struct hcall_args *args);
|
||||
void lguest_arch_setup_regs(struct lg_cpu *cpu, unsigned long start);
|
||||
unsigned long *lguest_arch_regptr(struct lg_cpu *cpu, size_t reg_off, bool any);
|
||||
|
||||
/* <arch>/switcher.S: */
|
||||
extern char start_switcher_text[], end_switcher_text[], switch_to_guest[];
|
||||
|
@ -1,540 +0,0 @@
|
||||
/*P:050
|
||||
* Lguest guests use a very simple method to describe devices. It's a
|
||||
* series of device descriptors contained just above the top of normal Guest
|
||||
* memory.
|
||||
*
|
||||
* We use the standard "virtio" device infrastructure, which provides us with a
|
||||
* console, a network and a block driver. Each one expects some configuration
|
||||
* information and a "virtqueue" or two to send and receive data.
|
||||
:*/
|
||||
#include <linux/init.h>
|
||||
#include <linux/bootmem.h>
|
||||
#include <linux/lguest_launcher.h>
|
||||
#include <linux/virtio.h>
|
||||
#include <linux/virtio_config.h>
|
||||
#include <linux/interrupt.h>
|
||||
#include <linux/virtio_ring.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/export.h>
|
||||
#include <linux/slab.h>
|
||||
#include <asm/io.h>
|
||||
#include <asm/paravirt.h>
|
||||
#include <asm/lguest_hcall.h>
|
||||
|
||||
/* The pointer to our (page) of device descriptions. */
|
||||
static void *lguest_devices;
|
||||
|
||||
/*
|
||||
* For Guests, device memory can be used as normal memory, so we cast away the
|
||||
* __iomem to quieten sparse.
|
||||
*/
|
||||
static inline void *lguest_map(unsigned long phys_addr, unsigned long pages)
|
||||
{
|
||||
return (__force void *)ioremap_cache(phys_addr, PAGE_SIZE*pages);
|
||||
}
|
||||
|
||||
static inline void lguest_unmap(void *addr)
|
||||
{
|
||||
iounmap((__force void __iomem *)addr);
|
||||
}
|
||||
|
||||
/*D:100
|
||||
* Each lguest device is just a virtio device plus a pointer to its entry
|
||||
* in the lguest_devices page.
|
||||
*/
|
||||
struct lguest_device {
|
||||
struct virtio_device vdev;
|
||||
|
||||
/* The entry in the lguest_devices page for this device. */
|
||||
struct lguest_device_desc *desc;
|
||||
};
|
||||
|
||||
/*
|
||||
* Since the virtio infrastructure hands us a pointer to the virtio_device all
|
||||
* the time, it helps to have a curt macro to get a pointer to the struct
|
||||
* lguest_device it's enclosed in.
|
||||
*/
|
||||
#define to_lgdev(vd) container_of(vd, struct lguest_device, vdev)
|
||||
|
||||
/*D:130
|
||||
* Device configurations
|
||||
*
|
||||
* The configuration information for a device consists of one or more
|
||||
* virtqueues, a feature bitmap, and some configuration bytes. The
|
||||
* configuration bytes don't really matter to us: the Launcher sets them up, and
|
||||
* the driver will look at them during setup.
|
||||
*
|
||||
* A convenient routine to return the device's virtqueue config array:
|
||||
* immediately after the descriptor.
|
||||
*/
|
||||
static struct lguest_vqconfig *lg_vq(const struct lguest_device_desc *desc)
|
||||
{
|
||||
return (void *)(desc + 1);
|
||||
}
|
||||
|
||||
/* The features come immediately after the virtqueues. */
|
||||
static u8 *lg_features(const struct lguest_device_desc *desc)
|
||||
{
|
||||
return (void *)(lg_vq(desc) + desc->num_vq);
|
||||
}
|
||||
|
||||
/* The config space comes after the two feature bitmasks. */
|
||||
static u8 *lg_config(const struct lguest_device_desc *desc)
|
||||
{
|
||||
return lg_features(desc) + desc->feature_len * 2;
|
||||
}
|
||||
|
||||
/* The total size of the config page used by this device (incl. desc) */
|
||||
static unsigned desc_size(const struct lguest_device_desc *desc)
|
||||
{
|
||||
return sizeof(*desc)
|
||||
+ desc->num_vq * sizeof(struct lguest_vqconfig)
|
||||
+ desc->feature_len * 2
|
||||
+ desc->config_len;
|
||||
}
|
||||
|
||||
/* This gets the device's feature bits. */
|
||||
static u64 lg_get_features(struct virtio_device *vdev)
|
||||
{
|
||||
unsigned int i;
|
||||
u32 features = 0;
|
||||
struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
|
||||
u8 *in_features = lg_features(desc);
|
||||
|
||||
/* We do this the slow but generic way. */
|
||||
for (i = 0; i < min(desc->feature_len * 8, 32); i++)
|
||||
if (in_features[i / 8] & (1 << (i % 8)))
|
||||
features |= (1 << i);
|
||||
|
||||
return features;
|
||||
}
|
||||
|
||||
/*
|
||||
* To notify on reset or feature finalization, we (ab)use the NOTIFY
|
||||
* hypercall, with the descriptor address of the device.
|
||||
*/
|
||||
static void status_notify(struct virtio_device *vdev)
|
||||
{
|
||||
unsigned long offset = (void *)to_lgdev(vdev)->desc - lguest_devices;
|
||||
|
||||
hcall(LHCALL_NOTIFY, (max_pfn << PAGE_SHIFT) + offset, 0, 0, 0);
|
||||
}
|
||||
|
||||
/*
|
||||
* The virtio core takes the features the Host offers, and copies the ones
|
||||
* supported by the driver into the vdev->features array. Once that's all
|
||||
* sorted out, this routine is called so we can tell the Host which features we
|
||||
* understand and accept.
|
||||
*/
|
||||
static int lg_finalize_features(struct virtio_device *vdev)
|
||||
{
|
||||
unsigned int i, bits;
|
||||
struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
|
||||
/* Second half of bitmap is features we accept. */
|
||||
u8 *out_features = lg_features(desc) + desc->feature_len;
|
||||
|
||||
/* Give virtio_ring a chance to accept features. */
|
||||
vring_transport_features(vdev);
|
||||
|
||||
/* Make sure we don't have any features > 32 bits! */
|
||||
BUG_ON((u32)vdev->features != vdev->features);
|
||||
|
||||
/*
|
||||
* Since lguest is currently x86-only, we're little-endian. That
|
||||
* means we could just memcpy. But it's not time critical, and in
|
||||
* case someone copies this code, we do it the slow, obvious way.
|
||||
*/
|
||||
memset(out_features, 0, desc->feature_len);
|
||||
bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
|
||||
for (i = 0; i < bits; i++) {
|
||||
if (__virtio_test_bit(vdev, i))
|
||||
out_features[i / 8] |= (1 << (i % 8));
|
||||
}
|
||||
|
||||
/* Tell Host we've finished with this device's feature negotiation */
|
||||
status_notify(vdev);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Once they've found a field, getting a copy of it is easy. */
|
||||
static void lg_get(struct virtio_device *vdev, unsigned int offset,
|
||||
void *buf, unsigned len)
|
||||
{
|
||||
struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
|
||||
|
||||
/* Check they didn't ask for more than the length of the config! */
|
||||
BUG_ON(offset + len > desc->config_len);
|
||||
memcpy(buf, lg_config(desc) + offset, len);
|
||||
}
|
||||
|
||||
/* Setting the contents is also trivial. */
|
||||
static void lg_set(struct virtio_device *vdev, unsigned int offset,
|
||||
const void *buf, unsigned len)
|
||||
{
|
||||
struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
|
||||
|
||||
/* Check they didn't ask for more than the length of the config! */
|
||||
BUG_ON(offset + len > desc->config_len);
|
||||
memcpy(lg_config(desc) + offset, buf, len);
|
||||
}
|
||||
|
||||
/*
|
||||
* The operations to get and set the status word just access the status field
|
||||
* of the device descriptor.
|
||||
*/
|
||||
static u8 lg_get_status(struct virtio_device *vdev)
|
||||
{
|
||||
return to_lgdev(vdev)->desc->status;
|
||||
}
|
||||
|
||||
static void lg_set_status(struct virtio_device *vdev, u8 status)
|
||||
{
|
||||
BUG_ON(!status);
|
||||
to_lgdev(vdev)->desc->status = status;
|
||||
|
||||
/* Tell Host immediately if we failed. */
|
||||
if (status & VIRTIO_CONFIG_S_FAILED)
|
||||
status_notify(vdev);
|
||||
}
|
||||
|
||||
static void lg_reset(struct virtio_device *vdev)
|
||||
{
|
||||
/* 0 status means "reset" */
|
||||
to_lgdev(vdev)->desc->status = 0;
|
||||
status_notify(vdev);
|
||||
}
|
||||
|
||||
/*
|
||||
* Virtqueues
|
||||
*
|
||||
* The other piece of infrastructure virtio needs is a "virtqueue": a way of
|
||||
* the Guest device registering buffers for the other side to read from or
|
||||
* write into (ie. send and receive buffers). Each device can have multiple
|
||||
* virtqueues: for example the console driver uses one queue for sending and
|
||||
* another for receiving.
|
||||
*
|
||||
* Fortunately for us, a very fast shared-memory-plus-descriptors virtqueue
|
||||
* already exists in virtio_ring.c. We just need to connect it up.
|
||||
*
|
||||
* We start with the information we need to keep about each virtqueue.
|
||||
*/
|
||||
|
||||
/*D:140 This is the information we remember about each virtqueue. */
|
||||
struct lguest_vq_info {
|
||||
/* A copy of the information contained in the device config. */
|
||||
struct lguest_vqconfig config;
|
||||
|
||||
/* The address where we mapped the virtio ring, so we can unmap it. */
|
||||
void *pages;
|
||||
};
|
||||
|
||||
/*
|
||||
* When the virtio_ring code wants to prod the Host, it calls us here and we
|
||||
* make a hypercall. We hand the physical address of the virtqueue so the Host
|
||||
* knows which virtqueue we're talking about.
|
||||
*/
|
||||
static bool lg_notify(struct virtqueue *vq)
|
||||
{
|
||||
/*
|
||||
* We store our virtqueue information in the "priv" pointer of the
|
||||
* virtqueue structure.
|
||||
*/
|
||||
struct lguest_vq_info *lvq = vq->priv;
|
||||
|
||||
hcall(LHCALL_NOTIFY, lvq->config.pfn << PAGE_SHIFT, 0, 0, 0);
|
||||
return true;
|
||||
}
|
||||
|
||||
/* An extern declaration inside a C file is bad form. Don't do it. */
|
||||
extern int lguest_setup_irq(unsigned int irq);
|
||||
|
||||
/*
|
||||
* This routine finds the Nth virtqueue described in the configuration of
|
||||
* this device and sets it up.
|
||||
*
|
||||
* This is kind of an ugly duckling. It'd be nicer to have a standard
|
||||
* representation of a virtqueue in the configuration space, but it seems that
|
||||
* everyone wants to do it differently. The KVM coders want the Guest to
|
||||
* allocate its own pages and tell the Host where they are, but for lguest it's
|
||||
* simpler for the Host to simply tell us where the pages are.
|
||||
*/
|
||||
static struct virtqueue *lg_find_vq(struct virtio_device *vdev,
|
||||
unsigned index,
|
||||
void (*callback)(struct virtqueue *vq),
|
||||
const char *name)
|
||||
{
|
||||
struct lguest_device *ldev = to_lgdev(vdev);
|
||||
struct lguest_vq_info *lvq;
|
||||
struct virtqueue *vq;
|
||||
int err;
|
||||
|
||||
if (!name)
|
||||
return NULL;
|
||||
|
||||
/* We must have this many virtqueues. */
|
||||
if (index >= ldev->desc->num_vq)
|
||||
return ERR_PTR(-ENOENT);
|
||||
|
||||
lvq = kmalloc(sizeof(*lvq), GFP_KERNEL);
|
||||
if (!lvq)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
/*
|
||||
* Make a copy of the "struct lguest_vqconfig" entry, which sits after
|
||||
* the descriptor. We need a copy because the config space might not
|
||||
* be aligned correctly.
|
||||
*/
|
||||
memcpy(&lvq->config, lg_vq(ldev->desc)+index, sizeof(lvq->config));
|
||||
|
||||
printk("Mapping virtqueue %i addr %lx\n", index,
|
||||
(unsigned long)lvq->config.pfn << PAGE_SHIFT);
|
||||
/* Figure out how many pages the ring will take, and map that memory */
|
||||
lvq->pages = lguest_map((unsigned long)lvq->config.pfn << PAGE_SHIFT,
|
||||
DIV_ROUND_UP(vring_size(lvq->config.num,
|
||||
LGUEST_VRING_ALIGN),
|
||||
PAGE_SIZE));
|
||||
if (!lvq->pages) {
|
||||
err = -ENOMEM;
|
||||
goto free_lvq;
|
||||
}
|
||||
|
||||
/*
|
||||
* OK, tell virtio_ring.c to set up a virtqueue now we know its size
|
||||
* and we've got a pointer to its pages. Note that we set weak_barriers
|
||||
* to 'true': the host just a(nother) SMP CPU, so we only need inter-cpu
|
||||
* barriers.
|
||||
*/
|
||||
vq = vring_new_virtqueue(index, lvq->config.num, LGUEST_VRING_ALIGN, vdev,
|
||||
true, lvq->pages, lg_notify, callback, name);
|
||||
if (!vq) {
|
||||
err = -ENOMEM;
|
||||
goto unmap;
|
||||
}
|
||||
|
||||
/* Make sure the interrupt is allocated. */
|
||||
err = lguest_setup_irq(lvq->config.irq);
|
||||
if (err)
|
||||
goto destroy_vring;
|
||||
|
||||
/*
|
||||
* Tell the interrupt for this virtqueue to go to the virtio_ring
|
||||
* interrupt handler.
|
||||
*
|
||||
* FIXME: We used to have a flag for the Host to tell us we could use
|
||||
* the interrupt as a source of randomness: it'd be nice to have that
|
||||
* back.
|
||||
*/
|
||||
err = request_irq(lvq->config.irq, vring_interrupt, IRQF_SHARED,
|
||||
dev_name(&vdev->dev), vq);
|
||||
if (err)
|
||||
goto free_desc;
|
||||
|
||||
/*
|
||||
* Last of all we hook up our 'struct lguest_vq_info" to the
|
||||
* virtqueue's priv pointer.
|
||||
*/
|
||||
vq->priv = lvq;
|
||||
return vq;
|
||||
|
||||
free_desc:
|
||||
irq_free_desc(lvq->config.irq);
|
||||
destroy_vring:
|
||||
vring_del_virtqueue(vq);
|
||||
unmap:
|
||||
lguest_unmap(lvq->pages);
|
||||
free_lvq:
|
||||
kfree(lvq);
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
/*:*/
|
||||
|
||||
/* Cleaning up a virtqueue is easy */
|
||||
static void lg_del_vq(struct virtqueue *vq)
|
||||
{
|
||||
struct lguest_vq_info *lvq = vq->priv;
|
||||
|
||||
/* Release the interrupt */
|
||||
free_irq(lvq->config.irq, vq);
|
||||
/* Tell virtio_ring.c to free the virtqueue. */
|
||||
vring_del_virtqueue(vq);
|
||||
/* Unmap the pages containing the ring. */
|
||||
lguest_unmap(lvq->pages);
|
||||
/* Free our own queue information. */
|
||||
kfree(lvq);
|
||||
}
|
||||
|
||||
static void lg_del_vqs(struct virtio_device *vdev)
|
||||
{
|
||||
struct virtqueue *vq, *n;
|
||||
|
||||
list_for_each_entry_safe(vq, n, &vdev->vqs, list)
|
||||
lg_del_vq(vq);
|
||||
}
|
||||
|
||||
static int lg_find_vqs(struct virtio_device *vdev, unsigned nvqs,
|
||||
struct virtqueue *vqs[],
|
||||
vq_callback_t *callbacks[],
|
||||
const char *names[])
|
||||
{
|
||||
struct lguest_device *ldev = to_lgdev(vdev);
|
||||
int i;
|
||||
|
||||
/* We must have this many virtqueues. */
|
||||
if (nvqs > ldev->desc->num_vq)
|
||||
return -ENOENT;
|
||||
|
||||
for (i = 0; i < nvqs; ++i) {
|
||||
vqs[i] = lg_find_vq(vdev, i, callbacks[i], names[i]);
|
||||
if (IS_ERR(vqs[i]))
|
||||
goto error;
|
||||
}
|
||||
return 0;
|
||||
|
||||
error:
|
||||
lg_del_vqs(vdev);
|
||||
return PTR_ERR(vqs[i]);
|
||||
}
|
||||
|
||||
static const char *lg_bus_name(struct virtio_device *vdev)
|
||||
{
|
||||
return "";
|
||||
}
|
||||
|
||||
/* The ops structure which hooks everything together. */
|
||||
static const struct virtio_config_ops lguest_config_ops = {
|
||||
.get_features = lg_get_features,
|
||||
.finalize_features = lg_finalize_features,
|
||||
.get = lg_get,
|
||||
.set = lg_set,
|
||||
.get_status = lg_get_status,
|
||||
.set_status = lg_set_status,
|
||||
.reset = lg_reset,
|
||||
.find_vqs = lg_find_vqs,
|
||||
.del_vqs = lg_del_vqs,
|
||||
.bus_name = lg_bus_name,
|
||||
};
|
||||
|
||||
/*
|
||||
* The root device for the lguest virtio devices. This makes them appear as
|
||||
* /sys/devices/lguest/0,1,2 not /sys/devices/0,1,2.
|
||||
*/
|
||||
static struct device *lguest_root;
|
||||
|
||||
/*D:120
|
||||
* This is the core of the lguest bus: actually adding a new device.
|
||||
* It's a separate function because it's neater that way, and because an
|
||||
* earlier version of the code supported hotplug and unplug. They were removed
|
||||
* early on because they were never used.
|
||||
*
|
||||
* As Andrew Tridgell says, "Untested code is buggy code".
|
||||
*
|
||||
* It's worth reading this carefully: we start with a pointer to the new device
|
||||
* descriptor in the "lguest_devices" page, and the offset into the device
|
||||
* descriptor page so we can uniquely identify it if things go badly wrong.
|
||||
*/
|
||||
static void add_lguest_device(struct lguest_device_desc *d,
|
||||
unsigned int offset)
|
||||
{
|
||||
struct lguest_device *ldev;
|
||||
|
||||
/* Start with zeroed memory; Linux's device layer counts on it. */
|
||||
ldev = kzalloc(sizeof(*ldev), GFP_KERNEL);
|
||||
if (!ldev) {
|
||||
printk(KERN_EMERG "Cannot allocate lguest dev %u type %u\n",
|
||||
offset, d->type);
|
||||
return;
|
||||
}
|
||||
|
||||
/* This devices' parent is the lguest/ dir. */
|
||||
ldev->vdev.dev.parent = lguest_root;
|
||||
/*
|
||||
* The device type comes straight from the descriptor. There's also a
|
||||
* device vendor field in the virtio_device struct, which we leave as
|
||||
* 0.
|
||||
*/
|
||||
ldev->vdev.id.device = d->type;
|
||||
/*
|
||||
* We have a simple set of routines for querying the device's
|
||||
* configuration information and setting its status.
|
||||
*/
|
||||
ldev->vdev.config = &lguest_config_ops;
|
||||
/* And we remember the device's descriptor for lguest_config_ops. */
|
||||
ldev->desc = d;
|
||||
|
||||
/*
|
||||
* register_virtio_device() sets up the generic fields for the struct
|
||||
* virtio_device and calls device_register(). This makes the bus
|
||||
* infrastructure look for a matching driver.
|
||||
*/
|
||||
if (register_virtio_device(&ldev->vdev) != 0) {
|
||||
printk(KERN_ERR "Failed to register lguest dev %u type %u\n",
|
||||
offset, d->type);
|
||||
kfree(ldev);
|
||||
}
|
||||
}
|
||||
|
||||
/*D:110
|
||||
* scan_devices() simply iterates through the device page. The type 0 is
|
||||
* reserved to mean "end of devices".
|
||||
*/
|
||||
static void scan_devices(void)
|
||||
{
|
||||
unsigned int i;
|
||||
struct lguest_device_desc *d;
|
||||
|
||||
/* We start at the page beginning, and skip over each entry. */
|
||||
for (i = 0; i < PAGE_SIZE; i += desc_size(d)) {
|
||||
d = lguest_devices + i;
|
||||
|
||||
/* Once we hit a zero, stop. */
|
||||
if (d->type == 0)
|
||||
break;
|
||||
|
||||
printk("Device at %i has size %u\n", i, desc_size(d));
|
||||
add_lguest_device(d, i);
|
||||
}
|
||||
}
|
||||
|
||||
/*D:105
|
||||
* Fairly early in boot, lguest_devices_init() is called to set up the
|
||||
* lguest device infrastructure. We check that we are a Guest by checking
|
||||
* pv_info.name: there are other ways of checking, but this seems most
|
||||
* obvious to me.
|
||||
*
|
||||
* So we can access the "struct lguest_device_desc"s easily, we map that memory
|
||||
* and store the pointer in the global "lguest_devices". Then we register a
|
||||
* root device from which all our devices will hang (this seems to be the
|
||||
* correct sysfs incantation).
|
||||
*
|
||||
* Finally we call scan_devices() which adds all the devices found in the
|
||||
* lguest_devices page.
|
||||
*/
|
||||
static int __init lguest_devices_init(void)
|
||||
{
|
||||
if (strcmp(pv_info.name, "lguest") != 0)
|
||||
return 0;
|
||||
|
||||
lguest_root = root_device_register("lguest");
|
||||
if (IS_ERR(lguest_root))
|
||||
panic("Could not register lguest root");
|
||||
|
||||
/* Devices are in a single page above top of "normal" mem */
|
||||
lguest_devices = lguest_map(max_pfn<<PAGE_SHIFT, 1);
|
||||
|
||||
scan_devices();
|
||||
return 0;
|
||||
}
|
||||
/* We do this after core stuff, but before the drivers. */
|
||||
postcore_initcall(lguest_devices_init);
|
||||
|
||||
/*D:150
|
||||
* At this point in the journey we used to now wade through the lguest
|
||||
* devices themselves: net, block and console. Since they're all now virtio
|
||||
* devices rather than lguest-specific, I've decided to ignore them. Mostly,
|
||||
* they're kind of boring. But this does mean you'll never experience the
|
||||
* thrill of reading the forbidden love scene buried deep in the block driver.
|
||||
*
|
||||
* "make Launcher" beckons, where we answer questions like "Where do Guests
|
||||
* come from?", and "What do you do when someone asks for optimization?".
|
||||
*/
|
@ -2,175 +2,62 @@
|
||||
* launcher controls and communicates with the Guest. For example,
|
||||
* the first write will tell us the Guest's memory layout and entry
|
||||
* point. A read will run the Guest until something happens, such as
|
||||
* a signal or the Guest doing a NOTIFY out to the Launcher. There is
|
||||
* also a way for the Launcher to attach eventfds to particular NOTIFY
|
||||
* values instead of returning from the read() call.
|
||||
* a signal or the Guest accessing a device.
|
||||
:*/
|
||||
#include <linux/uaccess.h>
|
||||
#include <linux/miscdevice.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/sched.h>
|
||||
#include <linux/eventfd.h>
|
||||
#include <linux/file.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/export.h>
|
||||
#include "lg.h"
|
||||
|
||||
/*L:056
|
||||
* Before we move on, let's jump ahead and look at what the kernel does when
|
||||
* it needs to look up the eventfds. That will complete our picture of how we
|
||||
* use RCU.
|
||||
*
|
||||
* The notification value is in cpu->pending_notify: we return true if it went
|
||||
* to an eventfd.
|
||||
*/
|
||||
bool send_notify_to_eventfd(struct lg_cpu *cpu)
|
||||
{
|
||||
unsigned int i;
|
||||
struct lg_eventfd_map *map;
|
||||
|
||||
/*
|
||||
* This "rcu_read_lock()" helps track when someone is still looking at
|
||||
* the (RCU-using) eventfds array. It's not actually a lock at all;
|
||||
* indeed it's a noop in many configurations. (You didn't expect me to
|
||||
* explain all the RCU secrets here, did you?)
|
||||
*/
|
||||
rcu_read_lock();
|
||||
/*
|
||||
* rcu_dereference is the counter-side of rcu_assign_pointer(); it
|
||||
* makes sure we don't access the memory pointed to by
|
||||
* cpu->lg->eventfds before cpu->lg->eventfds is set. Sounds crazy,
|
||||
* but Alpha allows this! Paul McKenney points out that a really
|
||||
* aggressive compiler could have the same effect:
|
||||
* http://lists.ozlabs.org/pipermail/lguest/2009-July/001560.html
|
||||
*
|
||||
* So play safe, use rcu_dereference to get the rcu-protected pointer:
|
||||
*/
|
||||
map = rcu_dereference(cpu->lg->eventfds);
|
||||
/*
|
||||
* Simple array search: even if they add an eventfd while we do this,
|
||||
* we'll continue to use the old array and just won't see the new one.
|
||||
*/
|
||||
for (i = 0; i < map->num; i++) {
|
||||
if (map->map[i].addr == cpu->pending_notify) {
|
||||
eventfd_signal(map->map[i].event, 1);
|
||||
cpu->pending_notify = 0;
|
||||
break;
|
||||
}
|
||||
}
|
||||
/* We're done with the rcu-protected variable cpu->lg->eventfds. */
|
||||
rcu_read_unlock();
|
||||
|
||||
/* If we cleared the notification, it's because we found a match. */
|
||||
return cpu->pending_notify == 0;
|
||||
}
|
||||
|
||||
/*L:055
|
||||
* One of the more tricksy tricks in the Linux Kernel is a technique called
|
||||
* Read Copy Update. Since one point of lguest is to teach lguest journeyers
|
||||
* about kernel coding, I use it here. (In case you're curious, other purposes
|
||||
* include learning about virtualization and instilling a deep appreciation for
|
||||
* simplicity and puppies).
|
||||
*
|
||||
* We keep a simple array which maps LHCALL_NOTIFY values to eventfds, but we
|
||||
* add new eventfds without ever blocking readers from accessing the array.
|
||||
* The current Launcher only does this during boot, so that never happens. But
|
||||
* Read Copy Update is cool, and adding a lock risks damaging even more puppies
|
||||
* than this code does.
|
||||
*
|
||||
* We allocate a brand new one-larger array, copy the old one and add our new
|
||||
* element. Then we make the lg eventfd pointer point to the new array.
|
||||
* That's the easy part: now we need to free the old one, but we need to make
|
||||
* sure no slow CPU somewhere is still looking at it. That's what
|
||||
* synchronize_rcu does for us: waits until every CPU has indicated that it has
|
||||
* moved on to know it's no longer using the old one.
|
||||
*
|
||||
* If that's unclear, see http://en.wikipedia.org/wiki/Read-copy-update.
|
||||
*/
|
||||
static int add_eventfd(struct lguest *lg, unsigned long addr, int fd)
|
||||
{
|
||||
struct lg_eventfd_map *new, *old = lg->eventfds;
|
||||
|
||||
/*
|
||||
* We don't allow notifications on value 0 anyway (pending_notify of
|
||||
* 0 means "nothing pending").
|
||||
*/
|
||||
if (!addr)
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
* Replace the old array with the new one, carefully: others can
|
||||
* be accessing it at the same time.
|
||||
*/
|
||||
new = kmalloc(sizeof(*new) + sizeof(new->map[0]) * (old->num + 1),
|
||||
GFP_KERNEL);
|
||||
if (!new)
|
||||
return -ENOMEM;
|
||||
|
||||
/* First make identical copy. */
|
||||
memcpy(new->map, old->map, sizeof(old->map[0]) * old->num);
|
||||
new->num = old->num;
|
||||
|
||||
/* Now append new entry. */
|
||||
new->map[new->num].addr = addr;
|
||||
new->map[new->num].event = eventfd_ctx_fdget(fd);
|
||||
if (IS_ERR(new->map[new->num].event)) {
|
||||
int err = PTR_ERR(new->map[new->num].event);
|
||||
kfree(new);
|
||||
return err;
|
||||
}
|
||||
new->num++;
|
||||
|
||||
/*
|
||||
* Now put new one in place: rcu_assign_pointer() is a fancy way of
|
||||
* doing "lg->eventfds = new", but it uses memory barriers to make
|
||||
* absolutely sure that the contents of "new" written above is nailed
|
||||
* down before we actually do the assignment.
|
||||
*
|
||||
* We have to think about these kinds of things when we're operating on
|
||||
* live data without locks.
|
||||
*/
|
||||
rcu_assign_pointer(lg->eventfds, new);
|
||||
|
||||
/*
|
||||
* We're not in a big hurry. Wait until no one's looking at old
|
||||
* version, then free it.
|
||||
*/
|
||||
synchronize_rcu();
|
||||
kfree(old);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*L:052
|
||||
* Receiving notifications from the Guest is usually done by attaching a
|
||||
* particular LHCALL_NOTIFY value to an event filedescriptor. The eventfd will
|
||||
* become readable when the Guest does an LHCALL_NOTIFY with that value.
|
||||
*
|
||||
* This is really convenient for processing each virtqueue in a separate
|
||||
* thread.
|
||||
*/
|
||||
static int attach_eventfd(struct lguest *lg, const unsigned long __user *input)
|
||||
The Launcher can get the registers, and also set some of them.
|
||||
*/
|
||||
static int getreg_setup(struct lg_cpu *cpu, const unsigned long __user *input)
|
||||
{
|
||||
unsigned long addr, fd;
|
||||
int err;
|
||||
unsigned long which;
|
||||
|
||||
if (get_user(addr, input) != 0)
|
||||
/* We re-use the ptrace structure to specify which register to read. */
|
||||
if (get_user(which, input) != 0)
|
||||
return -EFAULT;
|
||||
|
||||
/*
|
||||
* We set up the cpu register pointer, and their next read will
|
||||
* actually get the value (instead of running the guest).
|
||||
*
|
||||
* The last argument 'true' says we can access any register.
|
||||
*/
|
||||
cpu->reg_read = lguest_arch_regptr(cpu, which, true);
|
||||
if (!cpu->reg_read)
|
||||
return -ENOENT;
|
||||
|
||||
/* And because this is a write() call, we return the length used. */
|
||||
return sizeof(unsigned long) * 2;
|
||||
}
|
||||
|
||||
static int setreg(struct lg_cpu *cpu, const unsigned long __user *input)
|
||||
{
|
||||
unsigned long which, value, *reg;
|
||||
|
||||
/* We re-use the ptrace structure to specify which register to read. */
|
||||
if (get_user(which, input) != 0)
|
||||
return -EFAULT;
|
||||
input++;
|
||||
if (get_user(fd, input) != 0)
|
||||
if (get_user(value, input) != 0)
|
||||
return -EFAULT;
|
||||
|
||||
/*
|
||||
* Just make sure two callers don't add eventfds at once. We really
|
||||
* only need to lock against callers adding to the same Guest, so using
|
||||
* the Big Lguest Lock is overkill. But this is setup, not a fast path.
|
||||
*/
|
||||
mutex_lock(&lguest_lock);
|
||||
err = add_eventfd(lg, addr, fd);
|
||||
mutex_unlock(&lguest_lock);
|
||||
/* The last argument 'false' means we can't access all registers. */
|
||||
reg = lguest_arch_regptr(cpu, which, false);
|
||||
if (!reg)
|
||||
return -ENOENT;
|
||||
|
||||
return err;
|
||||
*reg = value;
|
||||
|
||||
/* And because this is a write() call, we return the length used. */
|
||||
return sizeof(unsigned long) * 3;
|
||||
}
|
||||
|
||||
/*L:050
|
||||
@ -194,6 +81,23 @@ static int user_send_irq(struct lg_cpu *cpu, const unsigned long __user *input)
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*L:053
|
||||
* Deliver a trap: this is used by the Launcher if it can't emulate
|
||||
* an instruction.
|
||||
*/
|
||||
static int trap(struct lg_cpu *cpu, const unsigned long __user *input)
|
||||
{
|
||||
unsigned long trapnum;
|
||||
|
||||
if (get_user(trapnum, input) != 0)
|
||||
return -EFAULT;
|
||||
|
||||
if (!deliver_trap(cpu, trapnum))
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*L:040
|
||||
* Once our Guest is initialized, the Launcher makes it run by reading
|
||||
* from /dev/lguest.
|
||||
@ -237,8 +141,8 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
|
||||
* If we returned from read() last time because the Guest sent I/O,
|
||||
* clear the flag.
|
||||
*/
|
||||
if (cpu->pending_notify)
|
||||
cpu->pending_notify = 0;
|
||||
if (cpu->pending.trap)
|
||||
cpu->pending.trap = 0;
|
||||
|
||||
/* Run the Guest until something interesting happens. */
|
||||
return run_guest(cpu, (unsigned long __user *)user);
|
||||
@ -319,7 +223,7 @@ static int initialize(struct file *file, const unsigned long __user *input)
|
||||
/* "struct lguest" contains all we (the Host) know about a Guest. */
|
||||
struct lguest *lg;
|
||||
int err;
|
||||
unsigned long args[3];
|
||||
unsigned long args[4];
|
||||
|
||||
/*
|
||||
* We grab the Big Lguest lock, which protects against multiple
|
||||
@ -343,21 +247,15 @@ static int initialize(struct file *file, const unsigned long __user *input)
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
lg->eventfds = kmalloc(sizeof(*lg->eventfds), GFP_KERNEL);
|
||||
if (!lg->eventfds) {
|
||||
err = -ENOMEM;
|
||||
goto free_lg;
|
||||
}
|
||||
lg->eventfds->num = 0;
|
||||
|
||||
/* Populate the easy fields of our "struct lguest" */
|
||||
lg->mem_base = (void __user *)args[0];
|
||||
lg->pfn_limit = args[1];
|
||||
lg->device_limit = args[3];
|
||||
|
||||
/* This is the first cpu (cpu 0) and it will start booting at args[2] */
|
||||
err = lg_cpu_start(&lg->cpus[0], 0, args[2]);
|
||||
if (err)
|
||||
goto free_eventfds;
|
||||
goto free_lg;
|
||||
|
||||
/*
|
||||
* Initialize the Guest's shadow page tables. This allocates
|
||||
@ -378,8 +276,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
|
||||
free_regs:
|
||||
/* FIXME: This should be in free_vcpu */
|
||||
free_page(lg->cpus[0].regs_page);
|
||||
free_eventfds:
|
||||
kfree(lg->eventfds);
|
||||
free_lg:
|
||||
kfree(lg);
|
||||
unlock:
|
||||
@ -432,8 +328,12 @@ static ssize_t write(struct file *file, const char __user *in,
|
||||
return initialize(file, input);
|
||||
case LHREQ_IRQ:
|
||||
return user_send_irq(cpu, input);
|
||||
case LHREQ_EVENTFD:
|
||||
return attach_eventfd(lg, input);
|
||||
case LHREQ_GETREG:
|
||||
return getreg_setup(cpu, input);
|
||||
case LHREQ_SETREG:
|
||||
return setreg(cpu, input);
|
||||
case LHREQ_TRAP:
|
||||
return trap(cpu, input);
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
@ -478,11 +378,6 @@ static int close(struct inode *inode, struct file *file)
|
||||
mmput(lg->cpus[i].mm);
|
||||
}
|
||||
|
||||
/* Release any eventfds they registered. */
|
||||
for (i = 0; i < lg->eventfds->num; i++)
|
||||
eventfd_ctx_put(lg->eventfds->map[i].event);
|
||||
kfree(lg->eventfds);
|
||||
|
||||
/*
|
||||
* If lg->dead doesn't contain an error code it will be NULL or a
|
||||
* kmalloc()ed string, either of which is ok to hand to kfree().
|
||||
|
@ -250,6 +250,16 @@ static void release_pte(pte_t pte)
|
||||
}
|
||||
/*:*/
|
||||
|
||||
static bool gpte_in_iomem(struct lg_cpu *cpu, pte_t gpte)
|
||||
{
|
||||
/* We don't handle large pages. */
|
||||
if (pte_flags(gpte) & _PAGE_PSE)
|
||||
return false;
|
||||
|
||||
return (pte_pfn(gpte) >= cpu->lg->pfn_limit
|
||||
&& pte_pfn(gpte) < cpu->lg->device_limit);
|
||||
}
|
||||
|
||||
static bool check_gpte(struct lg_cpu *cpu, pte_t gpte)
|
||||
{
|
||||
if ((pte_flags(gpte) & _PAGE_PSE) ||
|
||||
@ -374,8 +384,14 @@ static pte_t *find_spte(struct lg_cpu *cpu, unsigned long vaddr, bool allocate,
|
||||
*
|
||||
* If we fixed up the fault (ie. we mapped the address), this routine returns
|
||||
* true. Otherwise, it was a real fault and we need to tell the Guest.
|
||||
*
|
||||
* There's a corner case: they're trying to access memory between
|
||||
* pfn_limit and device_limit, which is I/O memory. In this case, we
|
||||
* return false and set @iomem to the physical address, so the the
|
||||
* Launcher can handle the instruction manually.
|
||||
*/
|
||||
bool demand_page(struct lg_cpu *cpu, unsigned long vaddr, int errcode)
|
||||
bool demand_page(struct lg_cpu *cpu, unsigned long vaddr, int errcode,
|
||||
unsigned long *iomem)
|
||||
{
|
||||
unsigned long gpte_ptr;
|
||||
pte_t gpte;
|
||||
@ -383,6 +399,8 @@ bool demand_page(struct lg_cpu *cpu, unsigned long vaddr, int errcode)
|
||||
pmd_t gpmd;
|
||||
pgd_t gpgd;
|
||||
|
||||
*iomem = 0;
|
||||
|
||||
/* We never demand page the Switcher, so trying is a mistake. */
|
||||
if (vaddr >= switcher_addr)
|
||||
return false;
|
||||
@ -459,6 +477,12 @@ bool demand_page(struct lg_cpu *cpu, unsigned long vaddr, int errcode)
|
||||
if ((errcode & 4) && !(pte_flags(gpte) & _PAGE_USER))
|
||||
return false;
|
||||
|
||||
/* If they're accessing io memory, we expect a fault. */
|
||||
if (gpte_in_iomem(cpu, gpte)) {
|
||||
*iomem = (pte_pfn(gpte) << PAGE_SHIFT) | (vaddr & ~PAGE_MASK);
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* Check that the Guest PTE flags are OK, and the page number is below
|
||||
* the pfn_limit (ie. not mapping the Launcher binary).
|
||||
@ -553,7 +577,9 @@ static bool page_writable(struct lg_cpu *cpu, unsigned long vaddr)
|
||||
*/
|
||||
void pin_page(struct lg_cpu *cpu, unsigned long vaddr)
|
||||
{
|
||||
if (!page_writable(cpu, vaddr) && !demand_page(cpu, vaddr, 2))
|
||||
unsigned long iomem;
|
||||
|
||||
if (!page_writable(cpu, vaddr) && !demand_page(cpu, vaddr, 2, &iomem))
|
||||
kill_guest(cpu, "bad stack page %#lx", vaddr);
|
||||
}
|
||||
/*:*/
|
||||
@ -647,7 +673,7 @@ void guest_pagetable_flush_user(struct lg_cpu *cpu)
|
||||
/*:*/
|
||||
|
||||
/* We walk down the guest page tables to get a guest-physical address */
|
||||
unsigned long guest_pa(struct lg_cpu *cpu, unsigned long vaddr)
|
||||
bool __guest_pa(struct lg_cpu *cpu, unsigned long vaddr, unsigned long *paddr)
|
||||
{
|
||||
pgd_t gpgd;
|
||||
pte_t gpte;
|
||||
@ -656,31 +682,47 @@ unsigned long guest_pa(struct lg_cpu *cpu, unsigned long vaddr)
|
||||
#endif
|
||||
|
||||
/* Still not set up? Just map 1:1. */
|
||||
if (unlikely(cpu->linear_pages))
|
||||
return vaddr;
|
||||
if (unlikely(cpu->linear_pages)) {
|
||||
*paddr = vaddr;
|
||||
return true;
|
||||
}
|
||||
|
||||
/* First step: get the top-level Guest page table entry. */
|
||||
gpgd = lgread(cpu, gpgd_addr(cpu, vaddr), pgd_t);
|
||||
/* Toplevel not present? We can't map it in. */
|
||||
if (!(pgd_flags(gpgd) & _PAGE_PRESENT)) {
|
||||
kill_guest(cpu, "Bad address %#lx", vaddr);
|
||||
return -1UL;
|
||||
}
|
||||
if (!(pgd_flags(gpgd) & _PAGE_PRESENT))
|
||||
goto fail;
|
||||
|
||||
#ifdef CONFIG_X86_PAE
|
||||
gpmd = lgread(cpu, gpmd_addr(gpgd, vaddr), pmd_t);
|
||||
if (!(pmd_flags(gpmd) & _PAGE_PRESENT)) {
|
||||
kill_guest(cpu, "Bad address %#lx", vaddr);
|
||||
return -1UL;
|
||||
}
|
||||
if (!(pmd_flags(gpmd) & _PAGE_PRESENT))
|
||||
goto fail;
|
||||
gpte = lgread(cpu, gpte_addr(cpu, gpmd, vaddr), pte_t);
|
||||
#else
|
||||
gpte = lgread(cpu, gpte_addr(cpu, gpgd, vaddr), pte_t);
|
||||
#endif
|
||||
if (!(pte_flags(gpte) & _PAGE_PRESENT))
|
||||
kill_guest(cpu, "Bad address %#lx", vaddr);
|
||||
goto fail;
|
||||
|
||||
return pte_pfn(gpte) * PAGE_SIZE | (vaddr & ~PAGE_MASK);
|
||||
*paddr = pte_pfn(gpte) * PAGE_SIZE | (vaddr & ~PAGE_MASK);
|
||||
return true;
|
||||
|
||||
fail:
|
||||
*paddr = -1UL;
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* This is the version we normally use: kills the Guest if it uses a
|
||||
* bad address
|
||||
*/
|
||||
unsigned long guest_pa(struct lg_cpu *cpu, unsigned long vaddr)
|
||||
{
|
||||
unsigned long paddr;
|
||||
|
||||
if (!__guest_pa(cpu, vaddr, &paddr))
|
||||
kill_guest(cpu, "Bad address %#lx", vaddr);
|
||||
return paddr;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -912,7 +954,8 @@ static void __guest_set_pte(struct lg_cpu *cpu, int idx,
|
||||
* now. This shaves 10% off a copy-on-write
|
||||
* micro-benchmark.
|
||||
*/
|
||||
if (pte_flags(gpte) & (_PAGE_DIRTY | _PAGE_ACCESSED)) {
|
||||
if ((pte_flags(gpte) & (_PAGE_DIRTY | _PAGE_ACCESSED))
|
||||
&& !gpte_in_iomem(cpu, gpte)) {
|
||||
if (!check_gpte(cpu, gpte))
|
||||
return;
|
||||
set_pte(spte,
|
||||
|
@ -182,6 +182,52 @@ static void run_guest_once(struct lg_cpu *cpu, struct lguest_pages *pages)
|
||||
}
|
||||
/*:*/
|
||||
|
||||
unsigned long *lguest_arch_regptr(struct lg_cpu *cpu, size_t reg_off, bool any)
|
||||
{
|
||||
switch (reg_off) {
|
||||
case offsetof(struct pt_regs, bx):
|
||||
return &cpu->regs->ebx;
|
||||
case offsetof(struct pt_regs, cx):
|
||||
return &cpu->regs->ecx;
|
||||
case offsetof(struct pt_regs, dx):
|
||||
return &cpu->regs->edx;
|
||||
case offsetof(struct pt_regs, si):
|
||||
return &cpu->regs->esi;
|
||||
case offsetof(struct pt_regs, di):
|
||||
return &cpu->regs->edi;
|
||||
case offsetof(struct pt_regs, bp):
|
||||
return &cpu->regs->ebp;
|
||||
case offsetof(struct pt_regs, ax):
|
||||
return &cpu->regs->eax;
|
||||
case offsetof(struct pt_regs, ip):
|
||||
return &cpu->regs->eip;
|
||||
case offsetof(struct pt_regs, sp):
|
||||
return &cpu->regs->esp;
|
||||
}
|
||||
|
||||
/* Launcher can read these, but we don't allow any setting. */
|
||||
if (any) {
|
||||
switch (reg_off) {
|
||||
case offsetof(struct pt_regs, ds):
|
||||
return &cpu->regs->ds;
|
||||
case offsetof(struct pt_regs, es):
|
||||
return &cpu->regs->es;
|
||||
case offsetof(struct pt_regs, fs):
|
||||
return &cpu->regs->fs;
|
||||
case offsetof(struct pt_regs, gs):
|
||||
return &cpu->regs->gs;
|
||||
case offsetof(struct pt_regs, cs):
|
||||
return &cpu->regs->cs;
|
||||
case offsetof(struct pt_regs, flags):
|
||||
return &cpu->regs->eflags;
|
||||
case offsetof(struct pt_regs, ss):
|
||||
return &cpu->regs->ss;
|
||||
}
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*M:002
|
||||
* There are hooks in the scheduler which we can register to tell when we
|
||||
* get kicked off the CPU (preempt_notifier_register()). This would allow us
|
||||
@ -269,110 +315,73 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
|
||||
* usually attached to a PC.
|
||||
*
|
||||
* When the Guest uses one of these instructions, we get a trap (General
|
||||
* Protection Fault) and come here. We see if it's one of those troublesome
|
||||
* instructions and skip over it. We return true if we did.
|
||||
* Protection Fault) and come here. We queue this to be sent out to the
|
||||
* Launcher to handle.
|
||||
*/
|
||||
static int emulate_insn(struct lg_cpu *cpu)
|
||||
|
||||
/*
|
||||
* The eip contains the *virtual* address of the Guest's instruction:
|
||||
* we copy the instruction here so the Launcher doesn't have to walk
|
||||
* the page tables to decode it. We handle the case (eg. in a kernel
|
||||
* module) where the instruction is over two pages, and the pages are
|
||||
* virtually but not physically contiguous.
|
||||
*
|
||||
* The longest possible x86 instruction is 15 bytes, but we don't handle
|
||||
* anything that strange.
|
||||
*/
|
||||
static void copy_from_guest(struct lg_cpu *cpu,
|
||||
void *dst, unsigned long vaddr, size_t len)
|
||||
{
|
||||
u8 insn;
|
||||
unsigned int insnlen = 0, in = 0, small_operand = 0;
|
||||
/*
|
||||
* The eip contains the *virtual* address of the Guest's instruction:
|
||||
* walk the Guest's page tables to find the "physical" address.
|
||||
*/
|
||||
unsigned long physaddr = guest_pa(cpu, cpu->regs->eip);
|
||||
size_t to_page_end = PAGE_SIZE - (vaddr % PAGE_SIZE);
|
||||
unsigned long paddr;
|
||||
|
||||
/*
|
||||
* This must be the Guest kernel trying to do something, not userspace!
|
||||
* The bottom two bits of the CS segment register are the privilege
|
||||
* level.
|
||||
*/
|
||||
if ((cpu->regs->cs & 3) != GUEST_PL)
|
||||
return 0;
|
||||
BUG_ON(len > PAGE_SIZE);
|
||||
|
||||
/* Decoding x86 instructions is icky. */
|
||||
insn = lgread(cpu, physaddr, u8);
|
||||
|
||||
/*
|
||||
* Around 2.6.33, the kernel started using an emulation for the
|
||||
* cmpxchg8b instruction in early boot on many configurations. This
|
||||
* code isn't paravirtualized, and it tries to disable interrupts.
|
||||
* Ignore it, which will Mostly Work.
|
||||
*/
|
||||
if (insn == 0xfa) {
|
||||
/* "cli", or Clear Interrupt Enable instruction. Skip it. */
|
||||
cpu->regs->eip++;
|
||||
return 1;
|
||||
/* If it goes over a page, copy in two parts. */
|
||||
if (len > to_page_end) {
|
||||
/* But make sure the next page is mapped! */
|
||||
if (__guest_pa(cpu, vaddr + to_page_end, &paddr))
|
||||
copy_from_guest(cpu, dst + to_page_end,
|
||||
vaddr + to_page_end,
|
||||
len - to_page_end);
|
||||
else
|
||||
/* Otherwise fill with zeroes. */
|
||||
memset(dst + to_page_end, 0, len - to_page_end);
|
||||
len = to_page_end;
|
||||
}
|
||||
|
||||
/*
|
||||
* 0x66 is an "operand prefix". It means a 16, not 32 bit in/out.
|
||||
*/
|
||||
if (insn == 0x66) {
|
||||
small_operand = 1;
|
||||
/* The instruction is 1 byte so far, read the next byte. */
|
||||
insnlen = 1;
|
||||
insn = lgread(cpu, physaddr + insnlen, u8);
|
||||
}
|
||||
/* This will kill the guest if it isn't mapped, but that
|
||||
* shouldn't happen. */
|
||||
__lgread(cpu, dst, guest_pa(cpu, vaddr), len);
|
||||
}
|
||||
|
||||
/*
|
||||
* We can ignore the lower bit for the moment and decode the 4 opcodes
|
||||
* we need to emulate.
|
||||
*/
|
||||
switch (insn & 0xFE) {
|
||||
case 0xE4: /* in <next byte>,%al */
|
||||
insnlen += 2;
|
||||
in = 1;
|
||||
break;
|
||||
case 0xEC: /* in (%dx),%al */
|
||||
insnlen += 1;
|
||||
in = 1;
|
||||
break;
|
||||
case 0xE6: /* out %al,<next byte> */
|
||||
insnlen += 2;
|
||||
break;
|
||||
case 0xEE: /* out %al,(%dx) */
|
||||
insnlen += 1;
|
||||
break;
|
||||
default:
|
||||
/* OK, we don't know what this is, can't emulate. */
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* If it was an "IN" instruction, they expect the result to be read
|
||||
* into %eax, so we change %eax. We always return all-ones, which
|
||||
* traditionally means "there's nothing there".
|
||||
*/
|
||||
if (in) {
|
||||
/* Lower bit tells means it's a 32/16 bit access */
|
||||
if (insn & 0x1) {
|
||||
if (small_operand)
|
||||
cpu->regs->eax |= 0xFFFF;
|
||||
else
|
||||
cpu->regs->eax = 0xFFFFFFFF;
|
||||
} else
|
||||
cpu->regs->eax |= 0xFF;
|
||||
}
|
||||
/* Finally, we've "done" the instruction, so move past it. */
|
||||
cpu->regs->eip += insnlen;
|
||||
/* Success! */
|
||||
return 1;
|
||||
static void setup_emulate_insn(struct lg_cpu *cpu)
|
||||
{
|
||||
cpu->pending.trap = 13;
|
||||
copy_from_guest(cpu, cpu->pending.insn, cpu->regs->eip,
|
||||
sizeof(cpu->pending.insn));
|
||||
}
|
||||
|
||||
static void setup_iomem_insn(struct lg_cpu *cpu, unsigned long iomem_addr)
|
||||
{
|
||||
cpu->pending.trap = 14;
|
||||
cpu->pending.addr = iomem_addr;
|
||||
copy_from_guest(cpu, cpu->pending.insn, cpu->regs->eip,
|
||||
sizeof(cpu->pending.insn));
|
||||
}
|
||||
|
||||
/*H:050 Once we've re-enabled interrupts, we look at why the Guest exited. */
|
||||
void lguest_arch_handle_trap(struct lg_cpu *cpu)
|
||||
{
|
||||
unsigned long iomem_addr;
|
||||
|
||||
switch (cpu->regs->trapnum) {
|
||||
case 13: /* We've intercepted a General Protection Fault. */
|
||||
/*
|
||||
* Check if this was one of those annoying IN or OUT
|
||||
* instructions which we need to emulate. If so, we just go
|
||||
* back into the Guest after we've done it.
|
||||
*/
|
||||
/* Hand to Launcher to emulate those pesky IN and OUT insns */
|
||||
if (cpu->regs->errcode == 0) {
|
||||
if (emulate_insn(cpu))
|
||||
return;
|
||||
setup_emulate_insn(cpu);
|
||||
return;
|
||||
}
|
||||
break;
|
||||
case 14: /* We've intercepted a Page Fault. */
|
||||
@ -387,9 +396,16 @@ void lguest_arch_handle_trap(struct lg_cpu *cpu)
|
||||
* whether kernel or userspace code.
|
||||
*/
|
||||
if (demand_page(cpu, cpu->arch.last_pagefault,
|
||||
cpu->regs->errcode))
|
||||
cpu->regs->errcode, &iomem_addr))
|
||||
return;
|
||||
|
||||
/* Was this an access to memory mapped IO? */
|
||||
if (iomem_addr) {
|
||||
/* Tell Launcher, let it handle it. */
|
||||
setup_iomem_insn(cpu, iomem_addr);
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* OK, it's really not there (or not OK): the Guest needs to
|
||||
* know. We write out the cr2 value so it knows where the
|
||||
|
@ -1710,6 +1710,12 @@ static int virtnet_probe(struct virtio_device *vdev)
|
||||
struct virtnet_info *vi;
|
||||
u16 max_queue_pairs;
|
||||
|
||||
if (!vdev->config->get) {
|
||||
dev_err(&vdev->dev, "%s failure: config access disabled\n",
|
||||
__func__);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (!virtnet_validate_features(vdev))
|
||||
return -EINVAL;
|
||||
|
||||
|
@ -950,6 +950,12 @@ static int virtscsi_probe(struct virtio_device *vdev)
|
||||
u32 num_queues;
|
||||
struct scsi_host_template *hostt;
|
||||
|
||||
if (!vdev->config->get) {
|
||||
dev_err(&vdev->dev, "%s failure: config access disabled\n",
|
||||
__func__);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* We need to know how many queues before we allocate. */
|
||||
num_queues = virtscsi_config_get(vdev, num_queues) ? : 1;
|
||||
|
||||
|
@ -12,16 +12,32 @@ config VIRTIO_PCI
|
||||
depends on PCI
|
||||
select VIRTIO
|
||||
---help---
|
||||
This drivers provides support for virtio based paravirtual device
|
||||
This driver provides support for virtio based paravirtual device
|
||||
drivers over PCI. This requires that your VMM has appropriate PCI
|
||||
virtio backends. Most QEMU based VMMs should support these devices
|
||||
(like KVM or Xen).
|
||||
|
||||
Currently, the ABI is not considered stable so there is no guarantee
|
||||
that this version of the driver will work with your VMM.
|
||||
|
||||
If unsure, say M.
|
||||
|
||||
config VIRTIO_PCI_LEGACY
|
||||
bool "Support for legacy virtio draft 0.9.X and older devices"
|
||||
default y
|
||||
depends on VIRTIO_PCI
|
||||
---help---
|
||||
Virtio PCI Card 0.9.X Draft (circa 2014) and older device support.
|
||||
|
||||
This option enables building a transitional driver, supporting
|
||||
both devices conforming to Virtio 1 specification, and legacy devices.
|
||||
If disabled, you get a slightly smaller, non-transitional driver,
|
||||
with no legacy compatibility.
|
||||
|
||||
So look out into your driveway. Do you have a flying car? If
|
||||
so, you can happily disable this option and virtio will not
|
||||
break. Otherwise, leave it set. Unless you're testing what
|
||||
life will be like in The Future.
|
||||
|
||||
If unsure, say Y.
|
||||
|
||||
config VIRTIO_BALLOON
|
||||
tristate "Virtio balloon driver"
|
||||
depends on VIRTIO
|
||||
|
@ -1,5 +1,6 @@
|
||||
obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
|
||||
obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
|
||||
obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
|
||||
virtio_pci-y := virtio_pci_legacy.o virtio_pci_common.o
|
||||
virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
|
||||
virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
|
||||
obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
|
||||
|
@ -236,7 +236,10 @@ static int virtio_dev_probe(struct device *_d)
|
||||
if (err)
|
||||
goto err;
|
||||
|
||||
add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
|
||||
/* If probe didn't do it, mark device DRIVER_OK ourselves. */
|
||||
if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
|
||||
virtio_device_ready(dev);
|
||||
|
||||
if (drv->scan)
|
||||
drv->scan(dev);
|
||||
|
||||
|
@ -44,8 +44,7 @@ static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
|
||||
module_param(oom_pages, int, S_IRUSR | S_IWUSR);
|
||||
MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
|
||||
|
||||
struct virtio_balloon
|
||||
{
|
||||
struct virtio_balloon {
|
||||
struct virtio_device *vdev;
|
||||
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
|
||||
|
||||
@ -466,6 +465,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
|
||||
struct virtio_balloon *vb;
|
||||
int err;
|
||||
|
||||
if (!vdev->config->get) {
|
||||
dev_err(&vdev->dev, "%s failure: config access disabled\n",
|
||||
__func__);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
|
||||
if (!vb) {
|
||||
err = -ENOMEM;
|
||||
|
@ -1,7 +1,7 @@
|
||||
/*
|
||||
* Virtio memory mapped device driver
|
||||
*
|
||||
* Copyright 2011, ARM Ltd.
|
||||
* Copyright 2011-2014, ARM Ltd.
|
||||
*
|
||||
* This module allows virtio devices to be used over a virtual, memory mapped
|
||||
* platform device.
|
||||
@ -50,36 +50,6 @@
|
||||
*
|
||||
*
|
||||
*
|
||||
* Registers layout (all 32-bit wide):
|
||||
*
|
||||
* offset d. name description
|
||||
* ------ -- ---------------- -----------------
|
||||
*
|
||||
* 0x000 R MagicValue Magic value "virt"
|
||||
* 0x004 R Version Device version (current max. 1)
|
||||
* 0x008 R DeviceID Virtio device ID
|
||||
* 0x00c R VendorID Virtio vendor ID
|
||||
*
|
||||
* 0x010 R HostFeatures Features supported by the host
|
||||
* 0x014 W HostFeaturesSel Set of host features to access via HostFeatures
|
||||
*
|
||||
* 0x020 W GuestFeatures Features activated by the guest
|
||||
* 0x024 W GuestFeaturesSel Set of activated features to set via GuestFeatures
|
||||
* 0x028 W GuestPageSize Size of guest's memory page in bytes
|
||||
*
|
||||
* 0x030 W QueueSel Queue selector
|
||||
* 0x034 R QueueNumMax Maximum size of the currently selected queue
|
||||
* 0x038 W QueueNum Queue size for the currently selected queue
|
||||
* 0x03c W QueueAlign Used Ring alignment for the current queue
|
||||
* 0x040 RW QueuePFN PFN for the currently selected queue
|
||||
*
|
||||
* 0x050 W QueueNotify Queue notifier
|
||||
* 0x060 R InterruptStatus Interrupt status register
|
||||
* 0x064 W InterruptACK Interrupt acknowledge register
|
||||
* 0x070 RW Status Device status register
|
||||
*
|
||||
* 0x100+ RW Device-specific configuration space
|
||||
*
|
||||
* Based on Virtio PCI driver by Anthony Liguori, copyright IBM Corp. 2007
|
||||
*
|
||||
* This work is licensed under the terms of the GNU GPL, version 2 or later.
|
||||
@ -145,11 +115,16 @@ struct virtio_mmio_vq_info {
|
||||
static u64 vm_get_features(struct virtio_device *vdev)
|
||||
{
|
||||
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
|
||||
u64 features;
|
||||
|
||||
/* TODO: Features > 32 bits */
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_HOST_FEATURES_SEL);
|
||||
writel(1, vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES_SEL);
|
||||
features = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES);
|
||||
features <<= 32;
|
||||
|
||||
return readl(vm_dev->base + VIRTIO_MMIO_HOST_FEATURES);
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES_SEL);
|
||||
features |= readl(vm_dev->base + VIRTIO_MMIO_DEVICE_FEATURES);
|
||||
|
||||
return features;
|
||||
}
|
||||
|
||||
static int vm_finalize_features(struct virtio_device *vdev)
|
||||
@ -159,11 +134,20 @@ static int vm_finalize_features(struct virtio_device *vdev)
|
||||
/* Give virtio_ring a chance to accept features. */
|
||||
vring_transport_features(vdev);
|
||||
|
||||
/* Make sure we don't have any features > 32 bits! */
|
||||
BUG_ON((u32)vdev->features != vdev->features);
|
||||
/* Make sure there is are no mixed devices */
|
||||
if (vm_dev->version == 2 &&
|
||||
!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
|
||||
dev_err(&vdev->dev, "New virtio-mmio devices (version 2) must provide VIRTIO_F_VERSION_1 feature!\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES_SEL);
|
||||
writel(vdev->features, vm_dev->base + VIRTIO_MMIO_GUEST_FEATURES);
|
||||
writel(1, vm_dev->base + VIRTIO_MMIO_DRIVER_FEATURES_SEL);
|
||||
writel((u32)(vdev->features >> 32),
|
||||
vm_dev->base + VIRTIO_MMIO_DRIVER_FEATURES);
|
||||
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_DRIVER_FEATURES_SEL);
|
||||
writel((u32)vdev->features,
|
||||
vm_dev->base + VIRTIO_MMIO_DRIVER_FEATURES);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -275,7 +259,12 @@ static void vm_del_vq(struct virtqueue *vq)
|
||||
|
||||
/* Select and deactivate the queue */
|
||||
writel(index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
|
||||
if (vm_dev->version == 1) {
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
|
||||
} else {
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_READY);
|
||||
WARN_ON(readl(vm_dev->base + VIRTIO_MMIO_QUEUE_READY));
|
||||
}
|
||||
|
||||
size = PAGE_ALIGN(vring_size(info->num, VIRTIO_MMIO_VRING_ALIGN));
|
||||
free_pages_exact(info->queue, size);
|
||||
@ -312,7 +301,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
|
||||
writel(index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
|
||||
|
||||
/* Queue shouldn't already be set up. */
|
||||
if (readl(vm_dev->base + VIRTIO_MMIO_QUEUE_PFN)) {
|
||||
if (readl(vm_dev->base + (vm_dev->version == 1 ?
|
||||
VIRTIO_MMIO_QUEUE_PFN : VIRTIO_MMIO_QUEUE_READY))) {
|
||||
err = -ENOENT;
|
||||
goto error_available;
|
||||
}
|
||||
@ -356,13 +346,6 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
|
||||
info->num /= 2;
|
||||
}
|
||||
|
||||
/* Activate the queue */
|
||||
writel(info->num, vm_dev->base + VIRTIO_MMIO_QUEUE_NUM);
|
||||
writel(VIRTIO_MMIO_VRING_ALIGN,
|
||||
vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN);
|
||||
writel(virt_to_phys(info->queue) >> PAGE_SHIFT,
|
||||
vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
|
||||
|
||||
/* Create the vring */
|
||||
vq = vring_new_virtqueue(index, info->num, VIRTIO_MMIO_VRING_ALIGN, vdev,
|
||||
true, info->queue, vm_notify, callback, name);
|
||||
@ -371,6 +354,33 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
|
||||
goto error_new_virtqueue;
|
||||
}
|
||||
|
||||
/* Activate the queue */
|
||||
writel(info->num, vm_dev->base + VIRTIO_MMIO_QUEUE_NUM);
|
||||
if (vm_dev->version == 1) {
|
||||
writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN);
|
||||
writel(virt_to_phys(info->queue) >> PAGE_SHIFT,
|
||||
vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
|
||||
} else {
|
||||
u64 addr;
|
||||
|
||||
addr = virt_to_phys(info->queue);
|
||||
writel((u32)addr, vm_dev->base + VIRTIO_MMIO_QUEUE_DESC_LOW);
|
||||
writel((u32)(addr >> 32),
|
||||
vm_dev->base + VIRTIO_MMIO_QUEUE_DESC_HIGH);
|
||||
|
||||
addr = virt_to_phys(virtqueue_get_avail(vq));
|
||||
writel((u32)addr, vm_dev->base + VIRTIO_MMIO_QUEUE_AVAIL_LOW);
|
||||
writel((u32)(addr >> 32),
|
||||
vm_dev->base + VIRTIO_MMIO_QUEUE_AVAIL_HIGH);
|
||||
|
||||
addr = virt_to_phys(virtqueue_get_used(vq));
|
||||
writel((u32)addr, vm_dev->base + VIRTIO_MMIO_QUEUE_USED_LOW);
|
||||
writel((u32)(addr >> 32),
|
||||
vm_dev->base + VIRTIO_MMIO_QUEUE_USED_HIGH);
|
||||
|
||||
writel(1, vm_dev->base + VIRTIO_MMIO_QUEUE_READY);
|
||||
}
|
||||
|
||||
vq->priv = info;
|
||||
info->vq = vq;
|
||||
|
||||
@ -381,7 +391,12 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
|
||||
return vq;
|
||||
|
||||
error_new_virtqueue:
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
|
||||
if (vm_dev->version == 1) {
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
|
||||
} else {
|
||||
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_READY);
|
||||
WARN_ON(readl(vm_dev->base + VIRTIO_MMIO_QUEUE_READY));
|
||||
}
|
||||
free_pages_exact(info->queue, size);
|
||||
error_alloc_pages:
|
||||
kfree(info);
|
||||
@ -476,16 +491,32 @@ static int virtio_mmio_probe(struct platform_device *pdev)
|
||||
|
||||
/* Check device version */
|
||||
vm_dev->version = readl(vm_dev->base + VIRTIO_MMIO_VERSION);
|
||||
if (vm_dev->version != 1) {
|
||||
if (vm_dev->version < 1 || vm_dev->version > 2) {
|
||||
dev_err(&pdev->dev, "Version %ld not supported!\n",
|
||||
vm_dev->version);
|
||||
return -ENXIO;
|
||||
}
|
||||
|
||||
vm_dev->vdev.id.device = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_ID);
|
||||
if (vm_dev->vdev.id.device == 0) {
|
||||
/*
|
||||
* virtio-mmio device with an ID 0 is a (dummy) placeholder
|
||||
* with no function. End probing now with no error reported.
|
||||
*/
|
||||
return -ENODEV;
|
||||
}
|
||||
vm_dev->vdev.id.vendor = readl(vm_dev->base + VIRTIO_MMIO_VENDOR_ID);
|
||||
|
||||
writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_GUEST_PAGE_SIZE);
|
||||
/* Reject legacy-only IDs for version 2 devices */
|
||||
if (vm_dev->version == 2 &&
|
||||
virtio_device_is_legacy_only(vm_dev->vdev.id)) {
|
||||
dev_err(&pdev->dev, "Version 2 not supported for devices %u!\n",
|
||||
vm_dev->vdev.id.device);
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
if (vm_dev->version == 1)
|
||||
writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_GUEST_PAGE_SIZE);
|
||||
|
||||
platform_set_drvdata(pdev, vm_dev);
|
||||
|
||||
|
@ -19,6 +19,14 @@
|
||||
|
||||
#include "virtio_pci_common.h"
|
||||
|
||||
static bool force_legacy = false;
|
||||
|
||||
#if IS_ENABLED(CONFIG_VIRTIO_PCI_LEGACY)
|
||||
module_param(force_legacy, bool, 0444);
|
||||
MODULE_PARM_DESC(force_legacy,
|
||||
"Force legacy mode for transitional virtio 1 devices");
|
||||
#endif
|
||||
|
||||
/* wait for pending irq handlers */
|
||||
void vp_synchronize_vectors(struct virtio_device *vdev)
|
||||
{
|
||||
@ -464,15 +472,97 @@ static const struct pci_device_id virtio_pci_id_table[] = {
|
||||
|
||||
MODULE_DEVICE_TABLE(pci, virtio_pci_id_table);
|
||||
|
||||
static void virtio_pci_release_dev(struct device *_d)
|
||||
{
|
||||
struct virtio_device *vdev = dev_to_virtio(_d);
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
|
||||
/* As struct device is a kobject, it's not safe to
|
||||
* free the memory (including the reference counter itself)
|
||||
* until it's release callback. */
|
||||
kfree(vp_dev);
|
||||
}
|
||||
|
||||
static int virtio_pci_probe(struct pci_dev *pci_dev,
|
||||
const struct pci_device_id *id)
|
||||
{
|
||||
return virtio_pci_legacy_probe(pci_dev, id);
|
||||
struct virtio_pci_device *vp_dev;
|
||||
int rc;
|
||||
|
||||
/* allocate our structure and fill it out */
|
||||
vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
|
||||
if (!vp_dev)
|
||||
return -ENOMEM;
|
||||
|
||||
pci_set_drvdata(pci_dev, vp_dev);
|
||||
vp_dev->vdev.dev.parent = &pci_dev->dev;
|
||||
vp_dev->vdev.dev.release = virtio_pci_release_dev;
|
||||
vp_dev->pci_dev = pci_dev;
|
||||
INIT_LIST_HEAD(&vp_dev->virtqueues);
|
||||
spin_lock_init(&vp_dev->lock);
|
||||
|
||||
/* Disable MSI/MSIX to bring device to a known good state. */
|
||||
pci_msi_off(pci_dev);
|
||||
|
||||
/* enable the device */
|
||||
rc = pci_enable_device(pci_dev);
|
||||
if (rc)
|
||||
goto err_enable_device;
|
||||
|
||||
rc = pci_request_regions(pci_dev, "virtio-pci");
|
||||
if (rc)
|
||||
goto err_request_regions;
|
||||
|
||||
if (force_legacy) {
|
||||
rc = virtio_pci_legacy_probe(vp_dev);
|
||||
/* Also try modern mode if we can't map BAR0 (no IO space). */
|
||||
if (rc == -ENODEV || rc == -ENOMEM)
|
||||
rc = virtio_pci_modern_probe(vp_dev);
|
||||
if (rc)
|
||||
goto err_probe;
|
||||
} else {
|
||||
rc = virtio_pci_modern_probe(vp_dev);
|
||||
if (rc == -ENODEV)
|
||||
rc = virtio_pci_legacy_probe(vp_dev);
|
||||
if (rc)
|
||||
goto err_probe;
|
||||
}
|
||||
|
||||
pci_set_master(pci_dev);
|
||||
|
||||
rc = register_virtio_device(&vp_dev->vdev);
|
||||
if (rc)
|
||||
goto err_register;
|
||||
|
||||
return 0;
|
||||
|
||||
err_register:
|
||||
if (vp_dev->ioaddr)
|
||||
virtio_pci_legacy_remove(vp_dev);
|
||||
else
|
||||
virtio_pci_modern_remove(vp_dev);
|
||||
err_probe:
|
||||
pci_release_regions(pci_dev);
|
||||
err_request_regions:
|
||||
pci_disable_device(pci_dev);
|
||||
err_enable_device:
|
||||
kfree(vp_dev);
|
||||
return rc;
|
||||
}
|
||||
|
||||
static void virtio_pci_remove(struct pci_dev *pci_dev)
|
||||
{
|
||||
virtio_pci_legacy_remove(pci_dev);
|
||||
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
|
||||
|
||||
unregister_virtio_device(&vp_dev->vdev);
|
||||
|
||||
if (vp_dev->ioaddr)
|
||||
virtio_pci_legacy_remove(vp_dev);
|
||||
else
|
||||
virtio_pci_modern_remove(vp_dev);
|
||||
|
||||
pci_release_regions(pci_dev);
|
||||
pci_disable_device(pci_dev);
|
||||
}
|
||||
|
||||
static struct pci_driver virtio_pci_driver = {
|
||||
|
@ -53,12 +53,32 @@ struct virtio_pci_device {
|
||||
struct virtio_device vdev;
|
||||
struct pci_dev *pci_dev;
|
||||
|
||||
/* In legacy mode, these two point to within ->legacy. */
|
||||
/* Where to read and clear interrupt */
|
||||
u8 __iomem *isr;
|
||||
|
||||
/* Modern only fields */
|
||||
/* The IO mapping for the PCI config space (non-legacy mode) */
|
||||
struct virtio_pci_common_cfg __iomem *common;
|
||||
/* Device-specific data (non-legacy mode) */
|
||||
void __iomem *device;
|
||||
/* Base of vq notifications (non-legacy mode). */
|
||||
void __iomem *notify_base;
|
||||
|
||||
/* So we can sanity-check accesses. */
|
||||
size_t notify_len;
|
||||
size_t device_len;
|
||||
|
||||
/* Capability for when we need to map notifications per-vq. */
|
||||
int notify_map_cap;
|
||||
|
||||
/* Multiply queue_notify_off by this value. (non-legacy mode). */
|
||||
u32 notify_offset_multiplier;
|
||||
|
||||
/* Legacy only field */
|
||||
/* the IO mapping for the PCI config space */
|
||||
void __iomem *ioaddr;
|
||||
|
||||
/* the IO mapping for ISR operation */
|
||||
void __iomem *isr;
|
||||
|
||||
/* a list of queues so we can dispatch IRQs */
|
||||
spinlock_t lock;
|
||||
struct list_head virtqueues;
|
||||
@ -127,8 +147,19 @@ const char *vp_bus_name(struct virtio_device *vdev);
|
||||
*/
|
||||
int vp_set_vq_affinity(struct virtqueue *vq, int cpu);
|
||||
|
||||
int virtio_pci_legacy_probe(struct pci_dev *pci_dev,
|
||||
const struct pci_device_id *id);
|
||||
void virtio_pci_legacy_remove(struct pci_dev *pci_dev);
|
||||
#if IS_ENABLED(CONFIG_VIRTIO_PCI_LEGACY)
|
||||
int virtio_pci_legacy_probe(struct virtio_pci_device *);
|
||||
void virtio_pci_legacy_remove(struct virtio_pci_device *);
|
||||
#else
|
||||
static inline int virtio_pci_legacy_probe(struct virtio_pci_device *vp_dev)
|
||||
{
|
||||
return -ENODEV;
|
||||
}
|
||||
static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
int virtio_pci_modern_probe(struct virtio_pci_device *);
|
||||
void virtio_pci_modern_remove(struct virtio_pci_device *);
|
||||
|
||||
#endif
|
||||
|
@ -211,23 +211,10 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
|
||||
.set_vq_affinity = vp_set_vq_affinity,
|
||||
};
|
||||
|
||||
static void virtio_pci_release_dev(struct device *_d)
|
||||
{
|
||||
struct virtio_device *vdev = dev_to_virtio(_d);
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
|
||||
/* As struct device is a kobject, it's not safe to
|
||||
* free the memory (including the reference counter itself)
|
||||
* until it's release callback. */
|
||||
kfree(vp_dev);
|
||||
}
|
||||
|
||||
/* the PCI probing function */
|
||||
int virtio_pci_legacy_probe(struct pci_dev *pci_dev,
|
||||
const struct pci_device_id *id)
|
||||
int virtio_pci_legacy_probe(struct virtio_pci_device *vp_dev)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev;
|
||||
int err;
|
||||
struct pci_dev *pci_dev = vp_dev->pci_dev;
|
||||
|
||||
/* We only own devices >= 0x1000 and <= 0x103f: leave the rest. */
|
||||
if (pci_dev->device < 0x1000 || pci_dev->device > 0x103f)
|
||||
@ -239,41 +226,12 @@ int virtio_pci_legacy_probe(struct pci_dev *pci_dev,
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
/* allocate our structure and fill it out */
|
||||
vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
|
||||
if (vp_dev == NULL)
|
||||
vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0);
|
||||
if (!vp_dev->ioaddr)
|
||||
return -ENOMEM;
|
||||
|
||||
vp_dev->vdev.dev.parent = &pci_dev->dev;
|
||||
vp_dev->vdev.dev.release = virtio_pci_release_dev;
|
||||
vp_dev->vdev.config = &virtio_pci_config_ops;
|
||||
vp_dev->pci_dev = pci_dev;
|
||||
INIT_LIST_HEAD(&vp_dev->virtqueues);
|
||||
spin_lock_init(&vp_dev->lock);
|
||||
|
||||
/* Disable MSI/MSIX to bring device to a known good state. */
|
||||
pci_msi_off(pci_dev);
|
||||
|
||||
/* enable the device */
|
||||
err = pci_enable_device(pci_dev);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = pci_request_regions(pci_dev, "virtio-pci");
|
||||
if (err)
|
||||
goto out_enable_device;
|
||||
|
||||
vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0);
|
||||
if (vp_dev->ioaddr == NULL) {
|
||||
err = -ENOMEM;
|
||||
goto out_req_regions;
|
||||
}
|
||||
|
||||
vp_dev->isr = vp_dev->ioaddr + VIRTIO_PCI_ISR;
|
||||
|
||||
pci_set_drvdata(pci_dev, vp_dev);
|
||||
pci_set_master(pci_dev);
|
||||
|
||||
/* we use the subsystem vendor/device id as the virtio vendor/device
|
||||
* id. this allows us to use the same PCI vendor/device id for all
|
||||
* virtio devices and to identify the particular virtio driver by
|
||||
@ -281,36 +239,18 @@ int virtio_pci_legacy_probe(struct pci_dev *pci_dev,
|
||||
vp_dev->vdev.id.vendor = pci_dev->subsystem_vendor;
|
||||
vp_dev->vdev.id.device = pci_dev->subsystem_device;
|
||||
|
||||
vp_dev->vdev.config = &virtio_pci_config_ops;
|
||||
|
||||
vp_dev->config_vector = vp_config_vector;
|
||||
vp_dev->setup_vq = setup_vq;
|
||||
vp_dev->del_vq = del_vq;
|
||||
|
||||
/* finally register the virtio device */
|
||||
err = register_virtio_device(&vp_dev->vdev);
|
||||
if (err)
|
||||
goto out_set_drvdata;
|
||||
|
||||
return 0;
|
||||
|
||||
out_set_drvdata:
|
||||
pci_iounmap(pci_dev, vp_dev->ioaddr);
|
||||
out_req_regions:
|
||||
pci_release_regions(pci_dev);
|
||||
out_enable_device:
|
||||
pci_disable_device(pci_dev);
|
||||
out:
|
||||
kfree(vp_dev);
|
||||
return err;
|
||||
}
|
||||
|
||||
void virtio_pci_legacy_remove(struct pci_dev *pci_dev)
|
||||
void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
|
||||
struct pci_dev *pci_dev = vp_dev->pci_dev;
|
||||
|
||||
unregister_virtio_device(&vp_dev->vdev);
|
||||
|
||||
vp_del_vqs(&vp_dev->vdev);
|
||||
pci_iounmap(pci_dev, vp_dev->ioaddr);
|
||||
pci_release_regions(pci_dev);
|
||||
pci_disable_device(pci_dev);
|
||||
}
|
||||
|
695
drivers/virtio/virtio_pci_modern.c
Normal file
695
drivers/virtio/virtio_pci_modern.c
Normal file
@ -0,0 +1,695 @@
|
||||
/*
|
||||
* Virtio PCI driver - modern (virtio 1.0) device support
|
||||
*
|
||||
* This module allows virtio devices to be used over a virtual PCI device.
|
||||
* This can be used with QEMU based VMMs like KVM or Xen.
|
||||
*
|
||||
* Copyright IBM Corp. 2007
|
||||
* Copyright Red Hat, Inc. 2014
|
||||
*
|
||||
* Authors:
|
||||
* Anthony Liguori <aliguori@us.ibm.com>
|
||||
* Rusty Russell <rusty@rustcorp.com.au>
|
||||
* Michael S. Tsirkin <mst@redhat.com>
|
||||
*
|
||||
* This work is licensed under the terms of the GNU GPL, version 2 or later.
|
||||
* See the COPYING file in the top-level directory.
|
||||
*
|
||||
*/
|
||||
|
||||
#define VIRTIO_PCI_NO_LEGACY
|
||||
#include "virtio_pci_common.h"
|
||||
|
||||
static void __iomem *map_capability(struct pci_dev *dev, int off,
|
||||
size_t minlen,
|
||||
u32 align,
|
||||
u32 start, u32 size,
|
||||
size_t *len)
|
||||
{
|
||||
u8 bar;
|
||||
u32 offset, length;
|
||||
void __iomem *p;
|
||||
|
||||
pci_read_config_byte(dev, off + offsetof(struct virtio_pci_cap,
|
||||
bar),
|
||||
&bar);
|
||||
pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, offset),
|
||||
&offset);
|
||||
pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, length),
|
||||
&length);
|
||||
|
||||
if (length <= start) {
|
||||
dev_err(&dev->dev,
|
||||
"virtio_pci: bad capability len %u (>%u expected)\n",
|
||||
length, start);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
if (length - start < minlen) {
|
||||
dev_err(&dev->dev,
|
||||
"virtio_pci: bad capability len %u (>=%zu expected)\n",
|
||||
length, minlen);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
length -= start;
|
||||
|
||||
if (start + offset < offset) {
|
||||
dev_err(&dev->dev,
|
||||
"virtio_pci: map wrap-around %u+%u\n",
|
||||
start, offset);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
offset += start;
|
||||
|
||||
if (offset & (align - 1)) {
|
||||
dev_err(&dev->dev,
|
||||
"virtio_pci: offset %u not aligned to %u\n",
|
||||
offset, align);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
if (length > size)
|
||||
length = size;
|
||||
|
||||
if (len)
|
||||
*len = length;
|
||||
|
||||
if (minlen + offset < minlen ||
|
||||
minlen + offset > pci_resource_len(dev, bar)) {
|
||||
dev_err(&dev->dev,
|
||||
"virtio_pci: map virtio %zu@%u "
|
||||
"out of range on bar %i length %lu\n",
|
||||
minlen, offset,
|
||||
bar, (unsigned long)pci_resource_len(dev, bar));
|
||||
return NULL;
|
||||
}
|
||||
|
||||
p = pci_iomap_range(dev, bar, offset, length);
|
||||
if (!p)
|
||||
dev_err(&dev->dev,
|
||||
"virtio_pci: unable to map virtio %u@%u on bar %i\n",
|
||||
length, offset, bar);
|
||||
return p;
|
||||
}
|
||||
|
||||
static void iowrite64_twopart(u64 val, __le32 __iomem *lo, __le32 __iomem *hi)
|
||||
{
|
||||
iowrite32((u32)val, lo);
|
||||
iowrite32(val >> 32, hi);
|
||||
}
|
||||
|
||||
/* virtio config->get_features() implementation */
|
||||
static u64 vp_get_features(struct virtio_device *vdev)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
u64 features;
|
||||
|
||||
iowrite32(0, &vp_dev->common->device_feature_select);
|
||||
features = ioread32(&vp_dev->common->device_feature);
|
||||
iowrite32(1, &vp_dev->common->device_feature_select);
|
||||
features |= ((u64)ioread32(&vp_dev->common->device_feature) << 32);
|
||||
|
||||
return features;
|
||||
}
|
||||
|
||||
/* virtio config->finalize_features() implementation */
|
||||
static int vp_finalize_features(struct virtio_device *vdev)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
|
||||
/* Give virtio_ring a chance to accept features. */
|
||||
vring_transport_features(vdev);
|
||||
|
||||
if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
|
||||
dev_err(&vdev->dev, "virtio: device uses modern interface "
|
||||
"but does not have VIRTIO_F_VERSION_1\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
iowrite32(0, &vp_dev->common->guest_feature_select);
|
||||
iowrite32((u32)vdev->features, &vp_dev->common->guest_feature);
|
||||
iowrite32(1, &vp_dev->common->guest_feature_select);
|
||||
iowrite32(vdev->features >> 32, &vp_dev->common->guest_feature);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* virtio config->get() implementation */
|
||||
static void vp_get(struct virtio_device *vdev, unsigned offset,
|
||||
void *buf, unsigned len)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
u8 b;
|
||||
__le16 w;
|
||||
__le32 l;
|
||||
|
||||
BUG_ON(offset + len > vp_dev->device_len);
|
||||
|
||||
switch (len) {
|
||||
case 1:
|
||||
b = ioread8(vp_dev->device + offset);
|
||||
memcpy(buf, &b, sizeof b);
|
||||
break;
|
||||
case 2:
|
||||
w = cpu_to_le16(ioread16(vp_dev->device + offset));
|
||||
memcpy(buf, &w, sizeof w);
|
||||
break;
|
||||
case 4:
|
||||
l = cpu_to_le32(ioread32(vp_dev->device + offset));
|
||||
memcpy(buf, &l, sizeof l);
|
||||
break;
|
||||
case 8:
|
||||
l = cpu_to_le32(ioread32(vp_dev->device + offset));
|
||||
memcpy(buf, &l, sizeof l);
|
||||
l = cpu_to_le32(ioread32(vp_dev->device + offset + sizeof l));
|
||||
memcpy(buf + sizeof l, &l, sizeof l);
|
||||
break;
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
}
|
||||
|
||||
/* the config->set() implementation. it's symmetric to the config->get()
|
||||
* implementation */
|
||||
static void vp_set(struct virtio_device *vdev, unsigned offset,
|
||||
const void *buf, unsigned len)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
u8 b;
|
||||
__le16 w;
|
||||
__le32 l;
|
||||
|
||||
BUG_ON(offset + len > vp_dev->device_len);
|
||||
|
||||
switch (len) {
|
||||
case 1:
|
||||
memcpy(&b, buf, sizeof b);
|
||||
iowrite8(b, vp_dev->device + offset);
|
||||
break;
|
||||
case 2:
|
||||
memcpy(&w, buf, sizeof w);
|
||||
iowrite16(le16_to_cpu(w), vp_dev->device + offset);
|
||||
break;
|
||||
case 4:
|
||||
memcpy(&l, buf, sizeof l);
|
||||
iowrite32(le32_to_cpu(l), vp_dev->device + offset);
|
||||
break;
|
||||
case 8:
|
||||
memcpy(&l, buf, sizeof l);
|
||||
iowrite32(le32_to_cpu(l), vp_dev->device + offset);
|
||||
memcpy(&l, buf + sizeof l, sizeof l);
|
||||
iowrite32(le32_to_cpu(l), vp_dev->device + offset + sizeof l);
|
||||
break;
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
}
|
||||
|
||||
static u32 vp_generation(struct virtio_device *vdev)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
return ioread8(&vp_dev->common->config_generation);
|
||||
}
|
||||
|
||||
/* config->{get,set}_status() implementations */
|
||||
static u8 vp_get_status(struct virtio_device *vdev)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
return ioread8(&vp_dev->common->device_status);
|
||||
}
|
||||
|
||||
static void vp_set_status(struct virtio_device *vdev, u8 status)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
/* We should never be setting status to 0. */
|
||||
BUG_ON(status == 0);
|
||||
iowrite8(status, &vp_dev->common->device_status);
|
||||
}
|
||||
|
||||
static void vp_reset(struct virtio_device *vdev)
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
/* 0 status means a reset. */
|
||||
iowrite8(0, &vp_dev->common->device_status);
|
||||
/* Flush out the status write, and flush in device writes,
|
||||
* including MSI-X interrupts, if any. */
|
||||
ioread8(&vp_dev->common->device_status);
|
||||
/* Flush pending VQ/configuration callbacks. */
|
||||
vp_synchronize_vectors(vdev);
|
||||
}
|
||||
|
||||
static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
|
||||
{
|
||||
/* Setup the vector used for configuration events */
|
||||
iowrite16(vector, &vp_dev->common->msix_config);
|
||||
/* Verify we had enough resources to assign the vector */
|
||||
/* Will also flush the write out to device */
|
||||
return ioread16(&vp_dev->common->msix_config);
|
||||
}
|
||||
|
||||
static size_t vring_pci_size(u16 num)
|
||||
{
|
||||
/* We only need a cacheline separation. */
|
||||
return PAGE_ALIGN(vring_size(num, SMP_CACHE_BYTES));
|
||||
}
|
||||
|
||||
static void *alloc_virtqueue_pages(int *num)
|
||||
{
|
||||
void *pages;
|
||||
|
||||
/* TODO: allocate each queue chunk individually */
|
||||
for (; *num && vring_pci_size(*num) > PAGE_SIZE; *num /= 2) {
|
||||
pages = alloc_pages_exact(vring_pci_size(*num),
|
||||
GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN);
|
||||
if (pages)
|
||||
return pages;
|
||||
}
|
||||
|
||||
if (!*num)
|
||||
return NULL;
|
||||
|
||||
/* Try to get a single page. You are my only hope! */
|
||||
return alloc_pages_exact(vring_pci_size(*num), GFP_KERNEL|__GFP_ZERO);
|
||||
}
|
||||
|
||||
static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
|
||||
struct virtio_pci_vq_info *info,
|
||||
unsigned index,
|
||||
void (*callback)(struct virtqueue *vq),
|
||||
const char *name,
|
||||
u16 msix_vec)
|
||||
{
|
||||
struct virtio_pci_common_cfg __iomem *cfg = vp_dev->common;
|
||||
struct virtqueue *vq;
|
||||
u16 num, off;
|
||||
int err;
|
||||
|
||||
if (index >= ioread16(&cfg->num_queues))
|
||||
return ERR_PTR(-ENOENT);
|
||||
|
||||
/* Select the queue we're interested in */
|
||||
iowrite16(index, &cfg->queue_select);
|
||||
|
||||
/* Check if queue is either not available or already active. */
|
||||
num = ioread16(&cfg->queue_size);
|
||||
if (!num || ioread16(&cfg->queue_enable))
|
||||
return ERR_PTR(-ENOENT);
|
||||
|
||||
if (num & (num - 1)) {
|
||||
dev_warn(&vp_dev->pci_dev->dev, "bad queue size %u", num);
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
/* get offset of notification word for this vq */
|
||||
off = ioread16(&cfg->queue_notify_off);
|
||||
|
||||
info->num = num;
|
||||
info->msix_vector = msix_vec;
|
||||
|
||||
info->queue = alloc_virtqueue_pages(&info->num);
|
||||
if (info->queue == NULL)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
/* create the vring */
|
||||
vq = vring_new_virtqueue(index, info->num,
|
||||
SMP_CACHE_BYTES, &vp_dev->vdev,
|
||||
true, info->queue, vp_notify, callback, name);
|
||||
if (!vq) {
|
||||
err = -ENOMEM;
|
||||
goto err_new_queue;
|
||||
}
|
||||
|
||||
/* activate the queue */
|
||||
iowrite16(num, &cfg->queue_size);
|
||||
iowrite64_twopart(virt_to_phys(info->queue),
|
||||
&cfg->queue_desc_lo, &cfg->queue_desc_hi);
|
||||
iowrite64_twopart(virt_to_phys(virtqueue_get_avail(vq)),
|
||||
&cfg->queue_avail_lo, &cfg->queue_avail_hi);
|
||||
iowrite64_twopart(virt_to_phys(virtqueue_get_used(vq)),
|
||||
&cfg->queue_used_lo, &cfg->queue_used_hi);
|
||||
|
||||
if (vp_dev->notify_base) {
|
||||
/* offset should not wrap */
|
||||
if ((u64)off * vp_dev->notify_offset_multiplier + 2
|
||||
> vp_dev->notify_len) {
|
||||
dev_warn(&vp_dev->pci_dev->dev,
|
||||
"bad notification offset %u (x %u) "
|
||||
"for queue %u > %zd",
|
||||
off, vp_dev->notify_offset_multiplier,
|
||||
index, vp_dev->notify_len);
|
||||
err = -EINVAL;
|
||||
goto err_map_notify;
|
||||
}
|
||||
vq->priv = (void __force *)vp_dev->notify_base +
|
||||
off * vp_dev->notify_offset_multiplier;
|
||||
} else {
|
||||
vq->priv = (void __force *)map_capability(vp_dev->pci_dev,
|
||||
vp_dev->notify_map_cap, 2, 2,
|
||||
off * vp_dev->notify_offset_multiplier, 2,
|
||||
NULL);
|
||||
}
|
||||
|
||||
if (!vq->priv) {
|
||||
err = -ENOMEM;
|
||||
goto err_map_notify;
|
||||
}
|
||||
|
||||
if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
|
||||
iowrite16(msix_vec, &cfg->queue_msix_vector);
|
||||
msix_vec = ioread16(&cfg->queue_msix_vector);
|
||||
if (msix_vec == VIRTIO_MSI_NO_VECTOR) {
|
||||
err = -EBUSY;
|
||||
goto err_assign_vector;
|
||||
}
|
||||
}
|
||||
|
||||
return vq;
|
||||
|
||||
err_assign_vector:
|
||||
if (!vp_dev->notify_base)
|
||||
pci_iounmap(vp_dev->pci_dev, (void __iomem __force *)vq->priv);
|
||||
err_map_notify:
|
||||
vring_del_virtqueue(vq);
|
||||
err_new_queue:
|
||||
free_pages_exact(info->queue, vring_pci_size(info->num));
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
|
||||
static int vp_modern_find_vqs(struct virtio_device *vdev, unsigned nvqs,
|
||||
struct virtqueue *vqs[],
|
||||
vq_callback_t *callbacks[],
|
||||
const char *names[])
|
||||
{
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
|
||||
struct virtqueue *vq;
|
||||
int rc = vp_find_vqs(vdev, nvqs, vqs, callbacks, names);
|
||||
|
||||
if (rc)
|
||||
return rc;
|
||||
|
||||
/* Select and activate all queues. Has to be done last: once we do
|
||||
* this, there's no way to go back except reset.
|
||||
*/
|
||||
list_for_each_entry(vq, &vdev->vqs, list) {
|
||||
iowrite16(vq->index, &vp_dev->common->queue_select);
|
||||
iowrite16(1, &vp_dev->common->queue_enable);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void del_vq(struct virtio_pci_vq_info *info)
|
||||
{
|
||||
struct virtqueue *vq = info->vq;
|
||||
struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
|
||||
|
||||
iowrite16(vq->index, &vp_dev->common->queue_select);
|
||||
|
||||
if (vp_dev->msix_enabled) {
|
||||
iowrite16(VIRTIO_MSI_NO_VECTOR,
|
||||
&vp_dev->common->queue_msix_vector);
|
||||
/* Flush the write out to device */
|
||||
ioread16(&vp_dev->common->queue_msix_vector);
|
||||
}
|
||||
|
||||
if (!vp_dev->notify_base)
|
||||
pci_iounmap(vp_dev->pci_dev, (void __force __iomem *)vq->priv);
|
||||
|
||||
vring_del_virtqueue(vq);
|
||||
|
||||
free_pages_exact(info->queue, vring_pci_size(info->num));
|
||||
}
|
||||
|
||||
static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
|
||||
.get = NULL,
|
||||
.set = NULL,
|
||||
.generation = vp_generation,
|
||||
.get_status = vp_get_status,
|
||||
.set_status = vp_set_status,
|
||||
.reset = vp_reset,
|
||||
.find_vqs = vp_modern_find_vqs,
|
||||
.del_vqs = vp_del_vqs,
|
||||
.get_features = vp_get_features,
|
||||
.finalize_features = vp_finalize_features,
|
||||
.bus_name = vp_bus_name,
|
||||
.set_vq_affinity = vp_set_vq_affinity,
|
||||
};
|
||||
|
||||
static const struct virtio_config_ops virtio_pci_config_ops = {
|
||||
.get = vp_get,
|
||||
.set = vp_set,
|
||||
.generation = vp_generation,
|
||||
.get_status = vp_get_status,
|
||||
.set_status = vp_set_status,
|
||||
.reset = vp_reset,
|
||||
.find_vqs = vp_modern_find_vqs,
|
||||
.del_vqs = vp_del_vqs,
|
||||
.get_features = vp_get_features,
|
||||
.finalize_features = vp_finalize_features,
|
||||
.bus_name = vp_bus_name,
|
||||
.set_vq_affinity = vp_set_vq_affinity,
|
||||
};
|
||||
|
||||
/**
|
||||
* virtio_pci_find_capability - walk capabilities to find device info.
|
||||
* @dev: the pci device
|
||||
* @cfg_type: the VIRTIO_PCI_CAP_* value we seek
|
||||
* @ioresource_types: IORESOURCE_MEM and/or IORESOURCE_IO.
|
||||
*
|
||||
* Returns offset of the capability, or 0.
|
||||
*/
|
||||
static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type,
|
||||
u32 ioresource_types)
|
||||
{
|
||||
int pos;
|
||||
|
||||
for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
|
||||
pos > 0;
|
||||
pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
|
||||
u8 type, bar;
|
||||
pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
|
||||
cfg_type),
|
||||
&type);
|
||||
pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
|
||||
bar),
|
||||
&bar);
|
||||
|
||||
/* Ignore structures with reserved BAR values */
|
||||
if (bar > 0x5)
|
||||
continue;
|
||||
|
||||
if (type == cfg_type) {
|
||||
if (pci_resource_len(dev, bar) &&
|
||||
pci_resource_flags(dev, bar) & ioresource_types)
|
||||
return pos;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* This is part of the ABI. Don't screw with it. */
|
||||
static inline void check_offsets(void)
|
||||
{
|
||||
/* Note: disk space was harmed in compilation of this function. */
|
||||
BUILD_BUG_ON(VIRTIO_PCI_CAP_VNDR !=
|
||||
offsetof(struct virtio_pci_cap, cap_vndr));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_CAP_NEXT !=
|
||||
offsetof(struct virtio_pci_cap, cap_next));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_CAP_LEN !=
|
||||
offsetof(struct virtio_pci_cap, cap_len));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_CAP_CFG_TYPE !=
|
||||
offsetof(struct virtio_pci_cap, cfg_type));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_CAP_BAR !=
|
||||
offsetof(struct virtio_pci_cap, bar));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_CAP_OFFSET !=
|
||||
offsetof(struct virtio_pci_cap, offset));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_CAP_LENGTH !=
|
||||
offsetof(struct virtio_pci_cap, length));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_NOTIFY_CAP_MULT !=
|
||||
offsetof(struct virtio_pci_notify_cap,
|
||||
notify_off_multiplier));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_DFSELECT !=
|
||||
offsetof(struct virtio_pci_common_cfg,
|
||||
device_feature_select));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_DF !=
|
||||
offsetof(struct virtio_pci_common_cfg, device_feature));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_GFSELECT !=
|
||||
offsetof(struct virtio_pci_common_cfg,
|
||||
guest_feature_select));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_GF !=
|
||||
offsetof(struct virtio_pci_common_cfg, guest_feature));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_MSIX !=
|
||||
offsetof(struct virtio_pci_common_cfg, msix_config));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_NUMQ !=
|
||||
offsetof(struct virtio_pci_common_cfg, num_queues));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_STATUS !=
|
||||
offsetof(struct virtio_pci_common_cfg, device_status));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_CFGGENERATION !=
|
||||
offsetof(struct virtio_pci_common_cfg, config_generation));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SELECT !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_select));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SIZE !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_size));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_MSIX !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_msix_vector));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_ENABLE !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_enable));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NOFF !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_notify_off));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCLO !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_desc_lo));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCHI !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_desc_hi));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILLO !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_avail_lo));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILHI !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_avail_hi));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_USEDLO !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_used_lo));
|
||||
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_USEDHI !=
|
||||
offsetof(struct virtio_pci_common_cfg, queue_used_hi));
|
||||
}
|
||||
|
||||
/* the PCI probing function */
|
||||
int virtio_pci_modern_probe(struct virtio_pci_device *vp_dev)
|
||||
{
|
||||
struct pci_dev *pci_dev = vp_dev->pci_dev;
|
||||
int err, common, isr, notify, device;
|
||||
u32 notify_length;
|
||||
u32 notify_offset;
|
||||
|
||||
check_offsets();
|
||||
|
||||
/* We only own devices >= 0x1000 and <= 0x107f: leave the rest. */
|
||||
if (pci_dev->device < 0x1000 || pci_dev->device > 0x107f)
|
||||
return -ENODEV;
|
||||
|
||||
if (pci_dev->device < 0x1040) {
|
||||
/* Transitional devices: use the PCI subsystem device id as
|
||||
* virtio device id, same as legacy driver always did.
|
||||
*/
|
||||
vp_dev->vdev.id.device = pci_dev->subsystem_device;
|
||||
} else {
|
||||
/* Modern devices: simply use PCI device id, but start from 0x1040. */
|
||||
vp_dev->vdev.id.device = pci_dev->device - 0x1040;
|
||||
}
|
||||
vp_dev->vdev.id.vendor = pci_dev->subsystem_vendor;
|
||||
|
||||
if (virtio_device_is_legacy_only(vp_dev->vdev.id))
|
||||
return -ENODEV;
|
||||
|
||||
/* check for a common config: if not, use legacy mode (bar 0). */
|
||||
common = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG,
|
||||
IORESOURCE_IO | IORESOURCE_MEM);
|
||||
if (!common) {
|
||||
dev_info(&pci_dev->dev,
|
||||
"virtio_pci: leaving for legacy driver\n");
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
/* If common is there, these should be too... */
|
||||
isr = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_ISR_CFG,
|
||||
IORESOURCE_IO | IORESOURCE_MEM);
|
||||
notify = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_NOTIFY_CFG,
|
||||
IORESOURCE_IO | IORESOURCE_MEM);
|
||||
if (!isr || !notify) {
|
||||
dev_err(&pci_dev->dev,
|
||||
"virtio_pci: missing capabilities %i/%i/%i\n",
|
||||
common, isr, notify);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* Device capability is only mandatory for devices that have
|
||||
* device-specific configuration.
|
||||
*/
|
||||
device = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_DEVICE_CFG,
|
||||
IORESOURCE_IO | IORESOURCE_MEM);
|
||||
|
||||
err = -EINVAL;
|
||||
vp_dev->common = map_capability(pci_dev, common,
|
||||
sizeof(struct virtio_pci_common_cfg), 4,
|
||||
0, sizeof(struct virtio_pci_common_cfg),
|
||||
NULL);
|
||||
if (!vp_dev->common)
|
||||
goto err_map_common;
|
||||
vp_dev->isr = map_capability(pci_dev, isr, sizeof(u8), 1,
|
||||
0, 1,
|
||||
NULL);
|
||||
if (!vp_dev->isr)
|
||||
goto err_map_isr;
|
||||
|
||||
/* Read notify_off_multiplier from config space. */
|
||||
pci_read_config_dword(pci_dev,
|
||||
notify + offsetof(struct virtio_pci_notify_cap,
|
||||
notify_off_multiplier),
|
||||
&vp_dev->notify_offset_multiplier);
|
||||
/* Read notify length and offset from config space. */
|
||||
pci_read_config_dword(pci_dev,
|
||||
notify + offsetof(struct virtio_pci_notify_cap,
|
||||
cap.length),
|
||||
¬ify_length);
|
||||
|
||||
pci_read_config_dword(pci_dev,
|
||||
notify + offsetof(struct virtio_pci_notify_cap,
|
||||
cap.length),
|
||||
¬ify_offset);
|
||||
|
||||
/* We don't know how many VQs we'll map, ahead of the time.
|
||||
* If notify length is small, map it all now.
|
||||
* Otherwise, map each VQ individually later.
|
||||
*/
|
||||
if ((u64)notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {
|
||||
vp_dev->notify_base = map_capability(pci_dev, notify, 2, 2,
|
||||
0, notify_length,
|
||||
&vp_dev->notify_len);
|
||||
if (!vp_dev->notify_base)
|
||||
goto err_map_notify;
|
||||
} else {
|
||||
vp_dev->notify_map_cap = notify;
|
||||
}
|
||||
|
||||
/* Again, we don't know how much we should map, but PAGE_SIZE
|
||||
* is more than enough for all existing devices.
|
||||
*/
|
||||
if (device) {
|
||||
vp_dev->device = map_capability(pci_dev, device, 0, 4,
|
||||
0, PAGE_SIZE,
|
||||
&vp_dev->device_len);
|
||||
if (!vp_dev->device)
|
||||
goto err_map_device;
|
||||
|
||||
vp_dev->vdev.config = &virtio_pci_config_ops;
|
||||
} else {
|
||||
vp_dev->vdev.config = &virtio_pci_config_nodev_ops;
|
||||
}
|
||||
|
||||
vp_dev->config_vector = vp_config_vector;
|
||||
vp_dev->setup_vq = setup_vq;
|
||||
vp_dev->del_vq = del_vq;
|
||||
|
||||
return 0;
|
||||
|
||||
err_map_device:
|
||||
if (vp_dev->notify_base)
|
||||
pci_iounmap(pci_dev, vp_dev->notify_base);
|
||||
err_map_notify:
|
||||
pci_iounmap(pci_dev, vp_dev->isr);
|
||||
err_map_isr:
|
||||
pci_iounmap(pci_dev, vp_dev->common);
|
||||
err_map_common:
|
||||
return err;
|
||||
}
|
||||
|
||||
void virtio_pci_modern_remove(struct virtio_pci_device *vp_dev)
|
||||
{
|
||||
struct pci_dev *pci_dev = vp_dev->pci_dev;
|
||||
|
||||
if (vp_dev->device)
|
||||
pci_iounmap(pci_dev, vp_dev->device);
|
||||
if (vp_dev->notify_base)
|
||||
pci_iounmap(pci_dev, vp_dev->notify_base);
|
||||
pci_iounmap(pci_dev, vp_dev->isr);
|
||||
pci_iounmap(pci_dev, vp_dev->common);
|
||||
}
|
@ -54,8 +54,7 @@
|
||||
#define END_USE(vq)
|
||||
#endif
|
||||
|
||||
struct vring_virtqueue
|
||||
{
|
||||
struct vring_virtqueue {
|
||||
struct virtqueue vq;
|
||||
|
||||
/* Actual memory layout for this queue */
|
||||
@ -245,14 +244,14 @@ static inline int virtqueue_add(struct virtqueue *_vq,
|
||||
vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) + 1);
|
||||
vq->num_added++;
|
||||
|
||||
pr_debug("Added buffer head %i to %p\n", head, vq);
|
||||
END_USE(vq);
|
||||
|
||||
/* This is very unlikely, but theoretically possible. Kick
|
||||
* just in case. */
|
||||
if (unlikely(vq->num_added == (1 << 16) - 1))
|
||||
virtqueue_kick(_vq);
|
||||
|
||||
pr_debug("Added buffer head %i to %p\n", head, vq);
|
||||
END_USE(vq);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -15,6 +15,9 @@ struct pci_dev;
|
||||
#ifdef CONFIG_PCI
|
||||
/* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
|
||||
extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
|
||||
extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
|
||||
unsigned long offset,
|
||||
unsigned long maxlen);
|
||||
/* Create a virtual mapping cookie for a port on a given PCI device.
|
||||
* Do not call this directly, it exists to make it easier for architectures
|
||||
* to override */
|
||||
@ -30,6 +33,13 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
|
||||
unsigned long offset,
|
||||
unsigned long maxlen)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* __ASM_GENERIC_IO_H */
|
||||
|
@ -8,52 +8,13 @@
|
||||
*
|
||||
* The Guest needs devices to do anything useful. Since we don't let it touch
|
||||
* real devices (think of the damage it could do!) we provide virtual devices.
|
||||
* We could emulate a PCI bus with various devices on it, but that is a fairly
|
||||
* complex burden for the Host and suboptimal for the Guest, so we have our own
|
||||
* simple lguest bus and we use "virtio" drivers. These drivers need a set of
|
||||
* routines from us which will actually do the virtual I/O, but they handle all
|
||||
* the net/block/console stuff themselves. This means that if we want to add
|
||||
* a new device, we simply need to write a new virtio driver and create support
|
||||
* for it in the Launcher: this code won't need to change.
|
||||
* We emulate a PCI bus with virtio devices on it; we used to have our own
|
||||
* lguest bus which was far simpler, but this tests the virtio 1.0 standard.
|
||||
*
|
||||
* Virtio devices are also used by kvm, so we can simply reuse their optimized
|
||||
* device drivers. And one day when everyone uses virtio, my plan will be
|
||||
* complete. Bwahahahah!
|
||||
*
|
||||
* Devices are described by a simplified ID, a status byte, and some "config"
|
||||
* bytes which describe this device's configuration. This is placed by the
|
||||
* Launcher just above the top of physical memory:
|
||||
*/
|
||||
struct lguest_device_desc {
|
||||
/* The device type: console, network, disk etc. Type 0 terminates. */
|
||||
__u8 type;
|
||||
/* The number of virtqueues (first in config array) */
|
||||
__u8 num_vq;
|
||||
/*
|
||||
* The number of bytes of feature bits. Multiply by 2: one for host
|
||||
* features and one for Guest acknowledgements.
|
||||
*/
|
||||
__u8 feature_len;
|
||||
/* The number of bytes of the config array after virtqueues. */
|
||||
__u8 config_len;
|
||||
/* A status byte, written by the Guest. */
|
||||
__u8 status;
|
||||
__u8 config[0];
|
||||
};
|
||||
|
||||
/*D:135
|
||||
* This is how we expect the device configuration field for a virtqueue
|
||||
* to be laid out in config space.
|
||||
*/
|
||||
struct lguest_vqconfig {
|
||||
/* The number of entries in the virtio_ring */
|
||||
__u16 num;
|
||||
/* The interrupt we get when something happens. */
|
||||
__u16 irq;
|
||||
/* The page number of the virtio ring for this device. */
|
||||
__u32 pfn;
|
||||
};
|
||||
/*:*/
|
||||
|
||||
/* Write command first word is a request. */
|
||||
enum lguest_req
|
||||
@ -62,12 +23,22 @@ enum lguest_req
|
||||
LHREQ_GETDMA, /* No longer used */
|
||||
LHREQ_IRQ, /* + irq */
|
||||
LHREQ_BREAK, /* No longer used */
|
||||
LHREQ_EVENTFD, /* + address, fd. */
|
||||
LHREQ_EVENTFD, /* No longer used. */
|
||||
LHREQ_GETREG, /* + offset within struct pt_regs (then read value). */
|
||||
LHREQ_SETREG, /* + offset within struct pt_regs, value. */
|
||||
LHREQ_TRAP, /* + trap number to deliver to guest. */
|
||||
};
|
||||
|
||||
/*
|
||||
* The alignment to use between consumer and producer parts of vring.
|
||||
* x86 pagesize for historical reasons.
|
||||
* This is what read() of the lguest fd populates. trap ==
|
||||
* LGUEST_TRAP_ENTRY for an LHCALL_NOTIFY (addr is the
|
||||
* argument), 14 for a page fault in the MMIO region (addr is
|
||||
* the trap address, insn is the instruction), or 13 for a GPF
|
||||
* (insn is the instruction).
|
||||
*/
|
||||
#define LGUEST_VRING_ALIGN 4096
|
||||
struct lguest_pending {
|
||||
__u8 trap;
|
||||
__u8 insn[7];
|
||||
__u32 addr;
|
||||
};
|
||||
#endif /* _LINUX_LGUEST_LAUNCHER */
|
||||
|
@ -51,23 +51,29 @@
|
||||
/* Virtio vendor ID - Read Only */
|
||||
#define VIRTIO_MMIO_VENDOR_ID 0x00c
|
||||
|
||||
/* Bitmask of the features supported by the host
|
||||
/* Bitmask of the features supported by the device (host)
|
||||
* (32 bits per set) - Read Only */
|
||||
#define VIRTIO_MMIO_HOST_FEATURES 0x010
|
||||
#define VIRTIO_MMIO_DEVICE_FEATURES 0x010
|
||||
|
||||
/* Host features set selector - Write Only */
|
||||
#define VIRTIO_MMIO_HOST_FEATURES_SEL 0x014
|
||||
/* Device (host) features set selector - Write Only */
|
||||
#define VIRTIO_MMIO_DEVICE_FEATURES_SEL 0x014
|
||||
|
||||
/* Bitmask of features activated by the guest
|
||||
/* Bitmask of features activated by the driver (guest)
|
||||
* (32 bits per set) - Write Only */
|
||||
#define VIRTIO_MMIO_GUEST_FEATURES 0x020
|
||||
#define VIRTIO_MMIO_DRIVER_FEATURES 0x020
|
||||
|
||||
/* Activated features set selector - Write Only */
|
||||
#define VIRTIO_MMIO_GUEST_FEATURES_SEL 0x024
|
||||
#define VIRTIO_MMIO_DRIVER_FEATURES_SEL 0x024
|
||||
|
||||
|
||||
#ifndef VIRTIO_MMIO_NO_LEGACY /* LEGACY DEVICES ONLY! */
|
||||
|
||||
/* Guest's memory page size in bytes - Write Only */
|
||||
#define VIRTIO_MMIO_GUEST_PAGE_SIZE 0x028
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
/* Queue selector - Write Only */
|
||||
#define VIRTIO_MMIO_QUEUE_SEL 0x030
|
||||
|
||||
@ -77,12 +83,21 @@
|
||||
/* Queue size for the currently selected queue - Write Only */
|
||||
#define VIRTIO_MMIO_QUEUE_NUM 0x038
|
||||
|
||||
|
||||
#ifndef VIRTIO_MMIO_NO_LEGACY /* LEGACY DEVICES ONLY! */
|
||||
|
||||
/* Used Ring alignment for the currently selected queue - Write Only */
|
||||
#define VIRTIO_MMIO_QUEUE_ALIGN 0x03c
|
||||
|
||||
/* Guest's PFN for the currently selected queue - Read Write */
|
||||
#define VIRTIO_MMIO_QUEUE_PFN 0x040
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
/* Ready bit for the currently selected queue - Read Write */
|
||||
#define VIRTIO_MMIO_QUEUE_READY 0x044
|
||||
|
||||
/* Queue notifier - Write Only */
|
||||
#define VIRTIO_MMIO_QUEUE_NOTIFY 0x050
|
||||
|
||||
@ -95,6 +110,21 @@
|
||||
/* Device status register - Read Write */
|
||||
#define VIRTIO_MMIO_STATUS 0x070
|
||||
|
||||
/* Selected queue's Descriptor Table address, 64 bits in two halves */
|
||||
#define VIRTIO_MMIO_QUEUE_DESC_LOW 0x080
|
||||
#define VIRTIO_MMIO_QUEUE_DESC_HIGH 0x084
|
||||
|
||||
/* Selected queue's Available Ring address, 64 bits in two halves */
|
||||
#define VIRTIO_MMIO_QUEUE_AVAIL_LOW 0x090
|
||||
#define VIRTIO_MMIO_QUEUE_AVAIL_HIGH 0x094
|
||||
|
||||
/* Selected queue's Used Ring address, 64 bits in two halves */
|
||||
#define VIRTIO_MMIO_QUEUE_USED_LOW 0x0a0
|
||||
#define VIRTIO_MMIO_QUEUE_USED_HIGH 0x0a4
|
||||
|
||||
/* Configuration atomicity value */
|
||||
#define VIRTIO_MMIO_CONFIG_GENERATION 0x0fc
|
||||
|
||||
/* The config space is defined by each driver as
|
||||
* the per-driver configuration space - Read Write */
|
||||
#define VIRTIO_MMIO_CONFIG 0x100
|
||||
|
@ -36,8 +36,7 @@
|
||||
/* Size of a PFN in the balloon interface. */
|
||||
#define VIRTIO_BALLOON_PFN_SHIFT 12
|
||||
|
||||
struct virtio_balloon_config
|
||||
{
|
||||
struct virtio_balloon_config {
|
||||
/* Number of pages host wants Guest to give up. */
|
||||
__le32 num_pages;
|
||||
/* Number of pages we've actually got in balloon. */
|
||||
|
@ -31,22 +31,25 @@
|
||||
#include <linux/virtio_types.h>
|
||||
|
||||
/* Feature bits */
|
||||
#define VIRTIO_BLK_F_BARRIER 0 /* Does host support barriers? */
|
||||
#define VIRTIO_BLK_F_SIZE_MAX 1 /* Indicates maximum segment size */
|
||||
#define VIRTIO_BLK_F_SEG_MAX 2 /* Indicates maximum # of segments */
|
||||
#define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */
|
||||
#define VIRTIO_BLK_F_RO 5 /* Disk is read-only */
|
||||
#define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/
|
||||
#define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */
|
||||
#define VIRTIO_BLK_F_WCE 9 /* Writeback mode enabled after reset */
|
||||
#define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */
|
||||
#define VIRTIO_BLK_F_CONFIG_WCE 11 /* Writeback mode available in config */
|
||||
#define VIRTIO_BLK_F_MQ 12 /* support more than one vq */
|
||||
|
||||
/* Legacy feature bits */
|
||||
#ifndef VIRTIO_BLK_NO_LEGACY
|
||||
#define VIRTIO_BLK_F_BARRIER 0 /* Does host support barriers? */
|
||||
#define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */
|
||||
#define VIRTIO_BLK_F_WCE 9 /* Writeback mode enabled after reset */
|
||||
#define VIRTIO_BLK_F_CONFIG_WCE 11 /* Writeback mode available in config */
|
||||
#ifndef __KERNEL__
|
||||
/* Old (deprecated) name for VIRTIO_BLK_F_WCE. */
|
||||
#define VIRTIO_BLK_F_FLUSH VIRTIO_BLK_F_WCE
|
||||
#endif
|
||||
#endif /* !VIRTIO_BLK_NO_LEGACY */
|
||||
|
||||
#define VIRTIO_BLK_ID_BYTES 20 /* ID string length */
|
||||
|
||||
@ -100,8 +103,10 @@ struct virtio_blk_config {
|
||||
#define VIRTIO_BLK_T_IN 0
|
||||
#define VIRTIO_BLK_T_OUT 1
|
||||
|
||||
#ifndef VIRTIO_BLK_NO_LEGACY
|
||||
/* This bit says it's a scsi command, not an actual read or write. */
|
||||
#define VIRTIO_BLK_T_SCSI_CMD 2
|
||||
#endif /* VIRTIO_BLK_NO_LEGACY */
|
||||
|
||||
/* Cache flush command */
|
||||
#define VIRTIO_BLK_T_FLUSH 4
|
||||
@ -109,8 +114,10 @@ struct virtio_blk_config {
|
||||
/* Get device ID command */
|
||||
#define VIRTIO_BLK_T_GET_ID 8
|
||||
|
||||
#ifndef VIRTIO_BLK_NO_LEGACY
|
||||
/* Barrier before this op. */
|
||||
#define VIRTIO_BLK_T_BARRIER 0x80000000
|
||||
#endif /* !VIRTIO_BLK_NO_LEGACY */
|
||||
|
||||
/* This is the first element of the read scatter-gather list. */
|
||||
struct virtio_blk_outhdr {
|
||||
@ -122,12 +129,14 @@ struct virtio_blk_outhdr {
|
||||
__virtio64 sector;
|
||||
};
|
||||
|
||||
#ifndef VIRTIO_BLK_NO_LEGACY
|
||||
struct virtio_scsi_inhdr {
|
||||
__virtio32 errors;
|
||||
__virtio32 data_len;
|
||||
__virtio32 sense_len;
|
||||
__virtio32 residual;
|
||||
};
|
||||
#endif /* !VIRTIO_BLK_NO_LEGACY */
|
||||
|
||||
/* And this is the final byte of the write scatter-gather list. */
|
||||
#define VIRTIO_BLK_S_OK 0
|
||||
|
@ -49,12 +49,14 @@
|
||||
#define VIRTIO_TRANSPORT_F_START 28
|
||||
#define VIRTIO_TRANSPORT_F_END 33
|
||||
|
||||
#ifndef VIRTIO_CONFIG_NO_LEGACY
|
||||
/* Do we get callbacks when the ring is completely used, even if we've
|
||||
* suppressed them? */
|
||||
#define VIRTIO_F_NOTIFY_ON_EMPTY 24
|
||||
|
||||
/* Can the device handle any descriptor layout? */
|
||||
#define VIRTIO_F_ANY_LAYOUT 27
|
||||
#endif /* VIRTIO_CONFIG_NO_LEGACY */
|
||||
|
||||
/* v1.0 compliant. */
|
||||
#define VIRTIO_F_VERSION_1 32
|
||||
|
@ -35,7 +35,6 @@
|
||||
#define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */
|
||||
#define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */
|
||||
#define VIRTIO_NET_F_MAC 5 /* Host has given MAC address. */
|
||||
#define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */
|
||||
#define VIRTIO_NET_F_GUEST_TSO4 7 /* Guest can handle TSOv4 in. */
|
||||
#define VIRTIO_NET_F_GUEST_TSO6 8 /* Guest can handle TSOv6 in. */
|
||||
#define VIRTIO_NET_F_GUEST_ECN 9 /* Guest can handle TSO[6] w/ ECN in. */
|
||||
@ -56,6 +55,10 @@
|
||||
* Steering */
|
||||
#define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
|
||||
|
||||
#ifndef VIRTIO_NET_NO_LEGACY
|
||||
#define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */
|
||||
#endif /* VIRTIO_NET_NO_LEGACY */
|
||||
|
||||
#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
|
||||
#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
|
||||
|
||||
@ -71,19 +74,39 @@ struct virtio_net_config {
|
||||
__u16 max_virtqueue_pairs;
|
||||
} __attribute__((packed));
|
||||
|
||||
/*
|
||||
* This header comes first in the scatter-gather list. If you don't
|
||||
* specify GSO or CSUM features, you can simply ignore the header.
|
||||
*
|
||||
* This is bitwise-equivalent to the legacy struct virtio_net_hdr_mrg_rxbuf,
|
||||
* only flattened.
|
||||
*/
|
||||
struct virtio_net_hdr_v1 {
|
||||
#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */
|
||||
#define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */
|
||||
__u8 flags;
|
||||
#define VIRTIO_NET_HDR_GSO_NONE 0 /* Not a GSO frame */
|
||||
#define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
|
||||
#define VIRTIO_NET_HDR_GSO_UDP 3 /* GSO frame, IPv4 UDP (UFO) */
|
||||
#define VIRTIO_NET_HDR_GSO_TCPV6 4 /* GSO frame, IPv6 TCP */
|
||||
#define VIRTIO_NET_HDR_GSO_ECN 0x80 /* TCP has ECN set */
|
||||
__u8 gso_type;
|
||||
__virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
|
||||
__virtio16 gso_size; /* Bytes to append to hdr_len per frame */
|
||||
__virtio16 csum_start; /* Position to start checksumming from */
|
||||
__virtio16 csum_offset; /* Offset after that to place checksum */
|
||||
__virtio16 num_buffers; /* Number of merged rx buffers */
|
||||
};
|
||||
|
||||
#ifndef VIRTIO_NET_NO_LEGACY
|
||||
/* This header comes first in the scatter-gather list.
|
||||
* If VIRTIO_F_ANY_LAYOUT is not negotiated, it must
|
||||
* For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
|
||||
* be the first element of the scatter-gather list. If you don't
|
||||
* specify GSO or CSUM features, you can simply ignore the header. */
|
||||
struct virtio_net_hdr {
|
||||
#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 // Use csum_start, csum_offset
|
||||
#define VIRTIO_NET_HDR_F_DATA_VALID 2 // Csum is valid
|
||||
/* See VIRTIO_NET_HDR_F_* */
|
||||
__u8 flags;
|
||||
#define VIRTIO_NET_HDR_GSO_NONE 0 // Not a GSO frame
|
||||
#define VIRTIO_NET_HDR_GSO_TCPV4 1 // GSO frame, IPv4 TCP (TSO)
|
||||
#define VIRTIO_NET_HDR_GSO_UDP 3 // GSO frame, IPv4 UDP (UFO)
|
||||
#define VIRTIO_NET_HDR_GSO_TCPV6 4 // GSO frame, IPv6 TCP
|
||||
#define VIRTIO_NET_HDR_GSO_ECN 0x80 // TCP has ECN set
|
||||
/* See VIRTIO_NET_HDR_GSO_* */
|
||||
__u8 gso_type;
|
||||
__virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
|
||||
__virtio16 gso_size; /* Bytes to append to hdr_len per frame */
|
||||
@ -97,6 +120,7 @@ struct virtio_net_hdr_mrg_rxbuf {
|
||||
struct virtio_net_hdr hdr;
|
||||
__virtio16 num_buffers; /* Number of merged rx buffers */
|
||||
};
|
||||
#endif /* ...VIRTIO_NET_NO_LEGACY */
|
||||
|
||||
/*
|
||||
* Control virtqueue data structures
|
||||
|
@ -39,7 +39,7 @@
|
||||
#ifndef _LINUX_VIRTIO_PCI_H
|
||||
#define _LINUX_VIRTIO_PCI_H
|
||||
|
||||
#include <linux/virtio_config.h>
|
||||
#include <linux/types.h>
|
||||
|
||||
#ifndef VIRTIO_PCI_NO_LEGACY
|
||||
|
||||
@ -99,4 +99,95 @@
|
||||
/* Vector value used to disable MSI for queue */
|
||||
#define VIRTIO_MSI_NO_VECTOR 0xffff
|
||||
|
||||
#ifndef VIRTIO_PCI_NO_MODERN
|
||||
|
||||
/* IDs for different capabilities. Must all exist. */
|
||||
|
||||
/* Common configuration */
|
||||
#define VIRTIO_PCI_CAP_COMMON_CFG 1
|
||||
/* Notifications */
|
||||
#define VIRTIO_PCI_CAP_NOTIFY_CFG 2
|
||||
/* ISR access */
|
||||
#define VIRTIO_PCI_CAP_ISR_CFG 3
|
||||
/* Device specific configuration */
|
||||
#define VIRTIO_PCI_CAP_DEVICE_CFG 4
|
||||
/* PCI configuration access */
|
||||
#define VIRTIO_PCI_CAP_PCI_CFG 5
|
||||
|
||||
/* This is the PCI capability header: */
|
||||
struct virtio_pci_cap {
|
||||
__u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
|
||||
__u8 cap_next; /* Generic PCI field: next ptr. */
|
||||
__u8 cap_len; /* Generic PCI field: capability length */
|
||||
__u8 cfg_type; /* Identifies the structure. */
|
||||
__u8 bar; /* Where to find it. */
|
||||
__u8 padding[3]; /* Pad to full dword. */
|
||||
__le32 offset; /* Offset within bar. */
|
||||
__le32 length; /* Length of the structure, in bytes. */
|
||||
};
|
||||
|
||||
struct virtio_pci_notify_cap {
|
||||
struct virtio_pci_cap cap;
|
||||
__le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
|
||||
};
|
||||
|
||||
/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
|
||||
struct virtio_pci_common_cfg {
|
||||
/* About the whole device. */
|
||||
__le32 device_feature_select; /* read-write */
|
||||
__le32 device_feature; /* read-only */
|
||||
__le32 guest_feature_select; /* read-write */
|
||||
__le32 guest_feature; /* read-write */
|
||||
__le16 msix_config; /* read-write */
|
||||
__le16 num_queues; /* read-only */
|
||||
__u8 device_status; /* read-write */
|
||||
__u8 config_generation; /* read-only */
|
||||
|
||||
/* About a specific virtqueue. */
|
||||
__le16 queue_select; /* read-write */
|
||||
__le16 queue_size; /* read-write, power of 2. */
|
||||
__le16 queue_msix_vector; /* read-write */
|
||||
__le16 queue_enable; /* read-write */
|
||||
__le16 queue_notify_off; /* read-only */
|
||||
__le32 queue_desc_lo; /* read-write */
|
||||
__le32 queue_desc_hi; /* read-write */
|
||||
__le32 queue_avail_lo; /* read-write */
|
||||
__le32 queue_avail_hi; /* read-write */
|
||||
__le32 queue_used_lo; /* read-write */
|
||||
__le32 queue_used_hi; /* read-write */
|
||||
};
|
||||
|
||||
/* Macro versions of offsets for the Old Timers! */
|
||||
#define VIRTIO_PCI_CAP_VNDR 0
|
||||
#define VIRTIO_PCI_CAP_NEXT 1
|
||||
#define VIRTIO_PCI_CAP_LEN 2
|
||||
#define VIRTIO_PCI_CAP_CFG_TYPE 3
|
||||
#define VIRTIO_PCI_CAP_BAR 4
|
||||
#define VIRTIO_PCI_CAP_OFFSET 8
|
||||
#define VIRTIO_PCI_CAP_LENGTH 12
|
||||
|
||||
#define VIRTIO_PCI_NOTIFY_CAP_MULT 16
|
||||
|
||||
#define VIRTIO_PCI_COMMON_DFSELECT 0
|
||||
#define VIRTIO_PCI_COMMON_DF 4
|
||||
#define VIRTIO_PCI_COMMON_GFSELECT 8
|
||||
#define VIRTIO_PCI_COMMON_GF 12
|
||||
#define VIRTIO_PCI_COMMON_MSIX 16
|
||||
#define VIRTIO_PCI_COMMON_NUMQ 18
|
||||
#define VIRTIO_PCI_COMMON_STATUS 20
|
||||
#define VIRTIO_PCI_COMMON_CFGGENERATION 21
|
||||
#define VIRTIO_PCI_COMMON_Q_SELECT 22
|
||||
#define VIRTIO_PCI_COMMON_Q_SIZE 24
|
||||
#define VIRTIO_PCI_COMMON_Q_MSIX 26
|
||||
#define VIRTIO_PCI_COMMON_Q_ENABLE 28
|
||||
#define VIRTIO_PCI_COMMON_Q_NOFF 30
|
||||
#define VIRTIO_PCI_COMMON_Q_DESCLO 32
|
||||
#define VIRTIO_PCI_COMMON_Q_DESCHI 36
|
||||
#define VIRTIO_PCI_COMMON_Q_AVAILLO 40
|
||||
#define VIRTIO_PCI_COMMON_Q_AVAILHI 44
|
||||
#define VIRTIO_PCI_COMMON_Q_USEDLO 48
|
||||
#define VIRTIO_PCI_COMMON_Q_USEDHI 52
|
||||
|
||||
#endif /* VIRTIO_PCI_NO_MODERN */
|
||||
|
||||
#endif
|
||||
|
@ -9,6 +9,48 @@
|
||||
#include <linux/export.h>
|
||||
|
||||
#ifdef CONFIG_PCI
|
||||
/**
|
||||
* pci_iomap_range - create a virtual mapping cookie for a PCI BAR
|
||||
* @dev: PCI device that owns the BAR
|
||||
* @bar: BAR number
|
||||
* @offset: map memory at the given offset in BAR
|
||||
* @maxlen: max length of the memory to map
|
||||
*
|
||||
* Using this function you will get a __iomem address to your device BAR.
|
||||
* You can access it using ioread*() and iowrite*(). These functions hide
|
||||
* the details if this is a MMIO or PIO address space and will just do what
|
||||
* you expect from them in the correct way.
|
||||
*
|
||||
* @maxlen specifies the maximum length to map. If you want to get access to
|
||||
* the complete BAR from offset to the end, pass %0 here.
|
||||
* */
|
||||
void __iomem *pci_iomap_range(struct pci_dev *dev,
|
||||
int bar,
|
||||
unsigned long offset,
|
||||
unsigned long maxlen)
|
||||
{
|
||||
resource_size_t start = pci_resource_start(dev, bar);
|
||||
resource_size_t len = pci_resource_len(dev, bar);
|
||||
unsigned long flags = pci_resource_flags(dev, bar);
|
||||
|
||||
if (len <= offset || !start)
|
||||
return NULL;
|
||||
len -= offset;
|
||||
start += offset;
|
||||
if (maxlen && len > maxlen)
|
||||
len = maxlen;
|
||||
if (flags & IORESOURCE_IO)
|
||||
return __pci_ioport_map(dev, start, len);
|
||||
if (flags & IORESOURCE_MEM) {
|
||||
if (flags & IORESOURCE_CACHEABLE)
|
||||
return ioremap(start, len);
|
||||
return ioremap_nocache(start, len);
|
||||
}
|
||||
/* What? */
|
||||
return NULL;
|
||||
}
|
||||
EXPORT_SYMBOL(pci_iomap_range);
|
||||
|
||||
/**
|
||||
* pci_iomap - create a virtual mapping cookie for a PCI BAR
|
||||
* @dev: PCI device that owns the BAR
|
||||
@ -25,24 +67,7 @@
|
||||
* */
|
||||
void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
|
||||
{
|
||||
resource_size_t start = pci_resource_start(dev, bar);
|
||||
resource_size_t len = pci_resource_len(dev, bar);
|
||||
unsigned long flags = pci_resource_flags(dev, bar);
|
||||
|
||||
if (!len || !start)
|
||||
return NULL;
|
||||
if (maxlen && len > maxlen)
|
||||
len = maxlen;
|
||||
if (flags & IORESOURCE_IO)
|
||||
return __pci_ioport_map(dev, start, len);
|
||||
if (flags & IORESOURCE_MEM) {
|
||||
if (flags & IORESOURCE_CACHEABLE)
|
||||
return ioremap(start, len);
|
||||
return ioremap_nocache(start, len);
|
||||
}
|
||||
/* What? */
|
||||
return NULL;
|
||||
return pci_iomap_range(dev, bar, 0, maxlen);
|
||||
}
|
||||
|
||||
EXPORT_SYMBOL(pci_iomap);
|
||||
#endif /* CONFIG_PCI */
|
||||
|
@ -524,6 +524,12 @@ static int p9_virtio_probe(struct virtio_device *vdev)
|
||||
int err;
|
||||
struct virtio_chan *chan;
|
||||
|
||||
if (!vdev->config->get) {
|
||||
dev_err(&vdev->dev, "%s failure: config access disabled\n",
|
||||
__func__);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
chan = kmalloc(sizeof(struct virtio_chan), GFP_KERNEL);
|
||||
if (!chan) {
|
||||
pr_err("Failed to allocate virtio 9P channel\n");
|
||||
|
@ -1,7 +1,13 @@
|
||||
# This creates the demonstration utility "lguest" which runs a Linux guest.
|
||||
CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE
|
||||
CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE -Iinclude
|
||||
|
||||
all: lguest
|
||||
|
||||
include/linux/virtio_types.h: ../../include/uapi/linux/virtio_types.h
|
||||
mkdir -p include/linux 2>&1 || true
|
||||
ln -sf ../../../../include/uapi/linux/virtio_types.h $@
|
||||
|
||||
lguest: include/linux/virtio_types.h
|
||||
|
||||
clean:
|
||||
rm -f lguest
|
||||
|
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user