Some devices allow an individual function to be reset without affecting
other functions in the same device: that's what pci_reset_function does.
For devices that have this support, expose reset attribite in sysfs.
This is useful e.g. for virtualization, where a qemu userspace
process wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We cannot simply call acpi_get_pci_dev() on any random ACPI handle
and hope that it works, because a PCI root bridge may not have
an associated struct pci_dev.
This is allowed per the PCI specification, and is referred to as a
non-materialized bridge.
So, depending on the type of PCI bridge that the handle points to,
use the appropriate interface to return the struct pci_bus correctly.
Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
yenta needs this for example.
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This was #define'd as 0 on all platforms, so let's get rid of it.
This change makes pci_scan_slot() slightly easier to read.
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Conflicts:
arch/x86/kernel/reboot.c
security/Kconfig
Merge reason: resolve the conflicts, bump up from rc3 to rc8.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move tboot.h from asm to linux to fix the build errors of intel_txt
patch on non-X86 platforms. Remove the tboot code from generic code
init/main.c and kernel/cpu.c.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This file needs to include linux/dmi.h directly rather than relying on
it being pulled in from elsewhere.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
DMAR faults are recorded into a ring of "fault recording registers".
fault_index is a 0-based index into the ring. The code allows the
0-based fault_index to be equal to the total number of fault registers
available from the cap_num_fault_regs() macro, which causes access
beyond the last available register.
Signed-off-by Troy Heber <troy.heber@hp.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
An SR-IOV capable device includes an SR-IOV PCIe capability which
describes the Virtual Function (VF) BAR requirements. A typical SR-IOV
device can support multiple VFs whose BARs must be in a contiguous region,
effectively an array of VF BARs. The BAR reports the size requirement
for a single VF. We calculate the full range needed by simply multiplying
the VF BAR size with the number of possible VFs and create a resource
spanning the full range.
This all seems sane enough except it artificially inflates the alignment
requirement for the VF BAR. The VF BAR need only be aligned to the size
of a single BAR not the contiguous range of VF BARs. This can cause us
to fail to allocate resources for the BAR despite the fact that we
actually have enough space.
This patch adds a thin PCI specific layer over the generic
resource_alignment() function which is aware of the special nature of
VF BARs and does sorting and allocation based on the smaller alignment
requirement.
I recognize that while resource_alignment is generic, it's basically a
PCI helper. An alternative to this patch is to add PCI VF BAR specific
information to struct resource. I opted for the extra layer rather than
adding such PCI specific information to struct resource. This does
have the slight downside that we don't cache the BAR size and re-read
for each alignment query (happens a small handful of times during boot
for each VF BAR).
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Yu Zhao <yu.zhao@intel.com>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
make it use the node from irq_desc.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A95C392.5050903@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linux/ACPI core files using internal.h all PREFIX "ACPI: ",
however, not all ACPI drivers use/want it -- and they
should not have to #undef PREFIX to define their own.
Add GPL commment to internal.h while we are there.
This does not change any actual console output,
asside from a whitespace fix.
Signed-off-by: Len Brown <len.brown@intel.com>
Generic support for remapping HPET MSI's by parsing the HPET timer block
device scope in the ACPI DRHD tables. This is needed for platforms
supporting interrupt-remapping and MSI capable HPET timer block.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Jay Fenlason <fenlason@redhat.com>
LKML-Reference: <20090804190729.477649000@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Completed a major update for the acpi_get_object_info external interface.
Changes include:
- Support for variable, unlimited length HID, UID, and CID strings
- Support Processor objects the same as Devices (HID,UID,CID,ADR,STA, etc.)
- Call the _SxW power methods on behalf of a device object
- Determine if a device is a PCI root bridge
- Change the ACPI_BUFFER parameter to ACPI_DEVICE_INFO.
These changes will require an update to all callers of this interface.
See the ACPICA Programmer Reference for details.
Also, update all invocations of acpi_get_object_info interface
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The kcalloc() failure path in iommu_init_domains() calls
free_dmar_iommu(), which assumes that ->domains, ->domain_ids,
and ->lock have been properly initialized.
Add checks in free_[dmar]_iommu to not use ->domains,->domain_ids
if not alloced. Move the lock init to prior to the kcalloc()'s,
so it is valid in free_context_table() when free_dmar_iommu() invokes
it at the end.
Patch based on iommu-2.6,
commit 132032274a594ee9ffb6b9c9e2e9698149a09ea9
Signed-off-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Mark si_domain_init and iommu_prepare_static_identity_mapping with
__init, to eliminate the following warnings:
WARNING: drivers/pci/built-in.o(.text+0xf1f4): Section mismatch in reference from the function si_domain_init() to the function .init.text:si_domain_work_fn()
The function si_domain_init() references
the function __init si_domain_work_fn().
This is often because si_domain_init lacks a __init
annotation or the annotation of si_domain_work_fn is wrong.
WARNING: drivers/pci/built-in.o(.text+0xe340): Section mismatch in reference from the function iommu_prepare_static_identity_mapping() to the function .init.text:si_domain_init()
The function iommu_prepare_static_identity_mapping() references
the function __init si_domain_init().
This is often because iommu_prepare_static_identity_mapping lacks a __init
annotation or the annotation of si_domain_init is wrong.
Signed-off-by: Matt Kraai <kraai@ftbfs.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Without the check, the config space may be filled with zeros. Though
the driver should try to avoid call restoring before saving, but the
pci layer also should check this.
Also removes the existing check in pci_restore_standard_config, since
it's superfluous with the new check in restore_state.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI hotplug: SGI hotplug: do not use hotplug_slot_attr
PCI hotplug: SGI hotplug: fix build failure
All callers of the former were also calling the latter, in one order or
the other, and failing to correctly clean up if the second returned
failure.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
By the pci slot changes, callbacks of attributes under slot directory
(/sys/bus/pci/slots) had been changed to get the pointer to struct
pci_slot instead of struct hotplug_slot. So the path_show() that
assumes the parameter is a pointer to struct hotplug_slot seems
broken.
Tested-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The commit bd3d99c17039fd05a29587db3f4a180c48da115a ("PCI: Remove
untested Electromechanical Interlock (EMI) support in pciehp."), which
removes the definition of "struct hotplug_slot_attr", broke SGI
hotplug driver. By this commit, we get the following compile error.
drivers/pci/hotplug/sgi_hotplug.c:106: error: variable 'sn_slot_path_attr' has initializer but incomplete type
drivers/pci/hotplug/sgi_hotplug.c:106: error: unknown field 'attr' specified in initializer
drivers/pci/hotplug/sgi_hotplug.c:106: error: extra brace group at end of initializer
drivers/pci/hotplug/sgi_hotplug.c:106: error: (near initialization for 'sn_slot_path_attr')
drivers/pci/hotplug/sgi_hotplug.c:106: warning: excess elements in struct initializer
drivers/pci/hotplug/sgi_hotplug.c:106: warning: (near initialization for 'sn_slot_path_attr')
drivers/pci/hotplug/sgi_hotplug.c:106: error: unknown field 'show' specified in initializer
drivers/pci/hotplug/sgi_hotplug.c:106: warning: excess elements in struct initializer
drivers/pci/hotplug/sgi_hotplug.c:106: warning: (near initialization for 'sn_slot_path_attr')
drivers/pci/hotplug/sgi_hotplug.c: In function 'sn_hp_destroy':
drivers/pci/hotplug/sgi_hotplug.c:203: error: invalid use of undefined type 'struct hotplug_slot_attribute'
drivers/pci/hotplug/sgi_hotplug.c: In function 'sn_hotplug_slot_register':
drivers/pci/hotplug/sgi_hotplug.c:655: error: invalid use of undefined type 'struct hotplug_slot_attribute'
This patch fixes this regression by adding the definition of struct
hotplug_slot_attr into sgi_hotplug.c.
Tested-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Two defects work together result in KVM device passthrough randomly can't
work:
1. iommu_snooping is not initialized to zero when vm_iommu_init() called.
So it is possible to get a random value.
2. One line added by commit 2c2e2c38("IOMMU Identity Mapping Support")
change the code path, let it bypass domain_update_iommu_cap(), as well as
missing the increment of domain iommu reference count.
The latter is also likely to cause a leak of domains on repeated VMM
assignment and deassignment.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The physical address passed to domain_pfn_mapping() should be rounded
down to the start of the MM page, not the VT-d page.
This issue causes kernel panic on PAGE_SIZE>VTD_PAGE_SIZE platforms e.g. ia64
platforms.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In domain_sg_mapping(), use aligned_nrpages() instead of hand-coded
rounding code for calculating the size of each sg elem. This means that
on IA64 we correctly round up to the MM page size, not just to the VT-d
page size.
Also remove the incorrect mm_to_dma_pfn() when intel_map_sg() calls
domain_sg_mapping() -- the 'size' variable is in VT-d pages already.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This makes the hardware passthrough mode work a lot more like the
software version, so that the behaviour of a kernel with 'iommu=pt'
is the same whether the hardware supports passthrough or not.
In particular:
- We use a single si_domain for the pass-through devices.
- 32-bit devices can be taken out of the pass-through domain so that
they don't have to use swiotlb.
- Devices will work again after being removed from a KVM guest.
- A potential oops on OOM (in init_context_pass_through()) is fixed.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Yet another reason why trusting this stuff to the BIOS was a bad idea.
The HP DC7900 BIOS reports an iommu at an address which just returns all
ones, when VT-d is disabled in the BIOS.
Fix up the missing iounmap in the error paths while we're at it.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This function has traditionally used "insert_resource()", because before
commit cebd78a8c5 ("Fix pci_claim_resource") it used to just insert the
resource into whatever root resource tree that was indicated by
"pcibios_select_root()".
So there Matthew fixed it to actually look up the proper parent
resource, which means that now it's actively wrong to then traverse the
resource tree any more: we already know exactly where the new resource
should go.
And when we then did commit a76117dfd6 ("x86: Use pci_claim_resource"),
which changed the x86 PCI code from the open-coded
pr = pci_find_parent_resource(dev, r);
if (!pr || request_resource(pr, r) < 0) {
to using
if (pci_claim_resource(dev, idx) < 0) {
that "insert_resource()" now suddenly became a problem, and causes a
regression covered by
http://bugzilla.kernel.org/show_bug.cgi?id=13891
which this fixes.
Reported-and-tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrew Patterson <andrew.patterson@hp.com>
Cc: Linux PCI <linux-pci@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They are not supposed to be modified by drivers, so make them const.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The tboot module will DMA protect all of memory in order to ensure the that
kernel will be able to initialize without compromise (from DMA). Consequently,
the kernel must enable Intel Virtualization Technology for Directed I/O
(VT-d or Intel IOMMU) in order to replace this broad protection with the
appropriate page-granular protection. Otherwise DMA devices will be unable
to read or write from memory and the kernel will eventually panic.
Because runtime IOMMU support is configurable by command line options, this
patch will force it to be enabled regardless of the options specified, and will
log a message if it was required to force it on.
dmar.c | 7 +++++++
intel-iommu.c | 17 +++++++++++++++--
2 files changed, 22 insertions(+), 2 deletions(-)
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Shane Wang <shane.wang@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
g_iommus is freed after we "goto error;".
Found by smatch (http://repo.or.cz/w/smatch.git).
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We only ever obtain this lock immediately before the iova_rbtree_lock,
and release it immediately after the iova_rbtree_lock. So ditch it and
just use iova_rbtree_lock.
[v2: Remove the lockdep bits this time too]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After some API change, intel_iommu_unmap_range() introduced a assumption that
parameter size != 0, otherwise the dma_pte_clean_range() would have a
overflowed argument. But the user like KVM don't have this assumption before,
then some BUG() triggered.
Fix it by ignoring size = 0.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We did before, in the end -- but it was at the bottom of a long stack of
functions. Add an inline wrapper get_valid_domain_for_dev() which will
use the cached one _first_ and only make the out-of-line call if it's
not already set.
This takes the average time taken for a 1-page intel_map_sg() from 5961
cycles to 4812 cycles on my Lenovo x200s test box -- a modest 20%.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: Fix IRQ swizzling for ARI-enabled devices
ia64/PCI: adjust section annotation for pcibios_setup()
x86/PCI: get root CRS before scanning children
x86/PCI: fix boundary checking when using root CRS
PCI MSI: Fix restoration of MSI/MSI-X mask states in suspend/resume
PCI MSI: Unmask MSI if setup failed
PCI MSI: shorten PCI_MSIX_ENTRY_* symbol names
PCI: make pci_name() take const argument
PCI: More PATA quirks for not entering D3
PCI: fix kernel-doc warnings
PCI: check if bus has a proper bridge device before triggering SBR
PCI: remove pci_dac_dma_... APIs on mn10300
PCI ECRC: Remove unnecessary semicolons
PCI MSI: Return if alloc_msi_entry for MSI-X failed
Our current strategy for pass-through mode is to put all devices into
the 1:1 domain at startup (which is before we know what their dma_mask
will be), and only _later_ take them out of that domain, if it turns out
that they really can't address all of memory.
However, when there are a bunch of PCI devices behind a bridge, they all
end up with the same source-id on their DMA transactions, and hence in
the same IOMMU domain. This means that we _can't_ easily move them from
the 1:1 domain into their own domain at runtime, because there might be DMA
in-flight from their siblings.
So we have to adjust our pass-through strategy: For PCI devices not on
the root bus, and for the bridges which will take responsibility for
their transactions, we have to start up _out_ of the 1:1 domain, just in
case.
This fixes the BUG() we see when we have 32-bit-capable devices behind a
PCI-PCI bridge, and use the software identity mapping.
It does mean that we might end up using 'normal' mapping mode for some
devices which could actually live with the faster 1:1 mapping -- but
this is only for PCI devices behind bridges, which presumably aren't the
devices for which people are most concerned about performance.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
At boot time, the dma_mask won't have been set on any devices, so we
assume that all devices will be 64-bit capable (and thus get a 1:1 map).
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This should fix kernel.org bug #11821, where the dcdbas driver makes up
a platform device and then uses dma_alloc_coherent() on it, in an
attempt to get memory < 4GiB.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We need to give people a little more time to fix the broken drivers.
Re-introduce this, but tied in properly with the 'iommu=pt' support this
time. Change the config option name and make it default to 'no' too.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We do this twice, and it's about to get more complicated. This makes the
code slightly clearer about what it's doing, too.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When we reattach a device to the si_domain (because it's been removed
from a VM), we weren't calling domain_context_mapping() to actually tell
the hardware about that.
We should really put the call to domain_context_mapping() into
domain_add_dev_info() -- we never call the latter without also doing the
former, and we can keep the error paths simple that way. But that's a
cleanup which can wait for 2.6.32 now.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We should check iommu_dummy() _first_, because that means it's attached
to an iommu that we've just disabled completely. At the moment, we might
try to put the device into the identity mapping domain.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The aligned_nrpages() function rounds up to the next VM page, but
returns its result as a number of DMA pages.
Purely theoretical except on IA64, which doesn't boot with VT-d right
now anyway.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.infradead.org/iommu-2.6: (38 commits)
intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable()
intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops
intel-iommu: Use cmpxchg64_local() for setting PTEs
intel-iommu: Warn about unmatched unmap requests
intel-iommu: Kill superfluous mapping_lock
intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386
intel-iommu: Make iommu=pt work on i386 too
intel-iommu: Performance improvement for dma_pte_free_pagetable()
intel-iommu: Don't free too much in dma_pte_free_pagetable()
intel-iommu: dump mappings but don't die on pte already set
intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping()
intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg()
intel-iommu: Simplify __intel_alloc_iova()
intel-iommu: Performance improvement for domain_pfn_mapping()
intel-iommu: Performance improvement for dma_pte_clear_range()
intel-iommu: Clean up iommu_domain_identity_map()
intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs
intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument
intel-iommu: Change aligned_size() to aligned_nrpages()
intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping()
...
Check dma_pte_present() and only free the page if there _is_ one.
Kind of surprising that there was no warning about this.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
On Wed, 2009-07-01 at 16:59 -0700, Linus Torvalds wrote:
> I also _really_ hate how you do
>
> (unsigned long)pte >> VTD_PAGE_SHIFT ==
> (unsigned long)first_pte >> VTD_PAGE_SHIFT
Kill this, in favour of just looking to see if the incremented pte
pointer has 'wrapped' onto the next page. Which means we have to check
it _after_ incrementing it, not before.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>