When devices are under a p2p bridge, upstream transactions get replaced by the
device id of the bridge as it owns the PCIE transaction. Hence its necessary
to setup translations on behalf of the bridge as well. Due to this limitation
all devices under a p2p share the same domain in a DMAR.
We just cache the type of device, if its a native PCIe device
or not for later use.
[akpm@linux-foundation.org: BUG_ON -> WARN_ON+recover]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch supports the upcomming Intel IOMMU hardware a.k.a. Intel(R)
Virtualization Technology for Directed I/O Architecture and the hardware spec
for the same can be found here
http://www.intel.com/technology/virtualization/index.htm
FAQ! (questions from akpm, answers from ak)
> So... what's all this code for?
>
> I assume that the intent here is to speed things up under Xen, etc?
Yes in some cases, but not this code. That would be the Xen version of this
code that could potentially assign whole devices to guests. I expect this to
be only useful in some special cases though because most hardware is not
virtualizable and you typically want an own instance for each guest.
Ok at some point KVM might implement this too; i likely would use this code
for this.
> Do we
> have any benchmark results to help us to decide whether a merge would be
> justified?
The main advantage for doing it in the normal kernel is not performance, but
more safety. Broken devices won't be able to corrupt memory by doing random
DMA.
Unfortunately that doesn't work for graphics yet, for that need user space
interfaces for the X server are needed.
There are some potential performance benefits too:
- When you have a device that cannot address the complete address range an
IOMMU can remap its memory instead of bounce buffering. Remapping is likely
cheaper than copying.
- The IOMMU can merge sg lists into a single virtual block. This could
potentially speed up SG IO when the device is slow walking SG lists. [I
long ago benchmarked 5% on some block benchmark with an old MPT Fusion; but
it probably depends a lot on the HBA]
And you get better driver debugging because unexpected memory accesses from
the devices will cause a trappable event.
>
> Does it slow anything down?
It adds more overhead to each IO so yes.
This patch:
Add support for early detection and parsing of DMAR's (DMA Remapping) reported
to OS via ACPI tables.
DMA remapping(DMAR) devices support enables independent address translations
for Direct Memory Access(DMA) from Devices. These DMA remapping devices are
reported via ACPI tables and includes pci device scope covered by these DMA
remapping device.
For detailed info on the specification of "Intel(R) Virtualization Technology
for Directed I/O Architecture" please see
http://www.intel.com/technology/virtualization/index.htm
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With 64KB blocksize, a directory entry can have size 64KB which does not
fit into 16 bits we have for entry length. So we store 0xffff instead and
convert the value when read from / written to disk.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify the vfs_cap_data structure.
Also fix get_file_caps which was declaring
__le32 v1caps[XATTR_CAPS_SZ] on the stack, but
XATTR_CAPS_SZ is already * sizeof(__le32).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a panic due to access NULL pointer of kmem_cache_node at discard_slab()
after memory online.
When memory online is called, kmem_cache_nodes are created for all SLUBs
for new node whose memory are available.
slab_mem_going_online_callback() is called to make kmem_cache_node() in
callback of memory online event. If it (or other callbacks) fails, then
slab_mem_offline_callback() is called for rollback.
In memory offline, slab_mem_going_offline_callback() is called to shrink
all slub cache, then slab_mem_offline_callback() is called later.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: locking fix]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current memory notifier has some defects yet. (Fortunately, nothing uses
it.) This patch is to fix and rearrange for them.
- Add information of start_pfn, nr_pages, and node id if node status is
changes from/to memoryless node for callback functions.
Callbacks can't do anything without those information.
- Add notification going-online status.
It is necessary for creating per node structure before the node's
pages are available.
- Move GOING_OFFLINE status notification after page isolation.
It is good place for return memory like cache for callback,
because returned page is not used again.
- Make CANCEL events for rollingback when error occurs.
- Delete MEM_MAPPING_INVALID notification. It will be not used.
- Fix compile error of (un)register_memory_notifier().
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updates for version 2.07 of the boot protocol. This includes:
load_flags.KEEP_SEGMENTS- flag to request/inhibit segment reloads
hardware_subarch - what subarchitecture we're booting under
hardware_subarch_data - per-architecture data
The intention of these changes is to make booting a paravirtualized
kernel work via the normal Linux boot protocol.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the obsolete VIDIOC_G_MPEGCOMP and VIDIOC_S_MPEGCOMP ioctls from
the V4L2 API as per the removal schedule (October 2007).
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
struct video_device used to define a .hardware field. While
initialized on severl drivers, this field is never used inside V4L.
However, drivers using it need to include the old V4L1 header.
This seems to cause compilation troubles with some random configs.
Better just to remove it from all drivers.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
- De-confuse the defines for the address-space-control-elements
and the segment/region table entries.
- Create out of line functions for page table allocation / freeing.
- Simplify get_shadow_xxx functions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current tlb flushing code for page table entries violates the
s390 architecture in a small detail. The relevant section from the
principles of operation (SA22-7832-02 page 3-47):
"A valid table entry must not be changed while it is attached
to any CPU and may be used for translation by that CPU except to
(1) invalidate the entry by using INVALIDATE PAGE TABLE ENTRY or
INVALIDATE DAT TABLE ENTRY, (2) alter bits 56-63 of a page-table
entry, or (3) make a change by means of a COMPARE AND SWAP AND
PURGE instruction that purges the TLB."
That means if one thread of a multithreaded applciation uses a vma
while another thread does an unmap on it, the page table entries of
that vma needs to get removed with IPTE, IDTE or CSP. In some strange
and rare situations a cpu could check-stop (die) because a entry has
been pushed out of the TLB that is still needed to complete a
(milli-coded) instruction. I've never seen it happen with the current
code on any of the supported machines, so right now this is a
theoretical problem. But I want to fix it nevertheless, to avoid
headaches in the futures.
To get this implemented correctly without changing common code the
primitives ptep_get_and_clear, ptep_get_and_clear_full and
ptep_set_wrprotect need to use the IPTE instruction to invalidate the
pte before the new pte value gets stored. If IPTE is always used for
the three primitives three important operations will have a performace
hit: fork, mprotect and exit_mmap. Time for some workarounds:
* 1: ptep_get_and_clear_full is used in unmap_vmas to remove page
tables entries in a batched tlb gather operation. If the mmu_gather
context passed to unmap_vmas has been started with full_mm_flush==1
or if only one cpu is online or if the only user of a mm_struct is the
current process then the fullmm indication in the mmu_gather context is
set to one. All TLBs for mm_struct are flushed by the tlb_gather_mmu
call. No new TLBs can be created while the unmap is in progress. In
this case ptep_get_and_clear_full clears the ptes with a simple store.
* 2: ptep_get_and_clear is used in change_protection to clear the
ptes from the page tables before they are reentered with the new
access flags. At the end of the update flush_tlb_range clears the
remaining TLBs. In general the ptep_get_and_clear has to issue IPTE
for each pte and flush_tlb_range is a nop. But if there is only one
user of the mm_struct then ptep_get_and_clear uses simple stores
to do the update and flush_tlb_range will flush the TLBs.
* 3: Similar to 2, ptep_set_wrprotect is used in copy_page_range
for a fork to make all ptes of a cow mapping read-only. At the end of
of copy_page_range dup_mmap will flush the TLBs with a call to
flush_tlb_mm. Check for mm->mm_users and if there is only one user
avoid using IPTE in ptep_set_wrprotect and let flush_tlb_mm clear the
TLBs.
Overall for single threaded programs the tlb flush code now performs
better, for multi threaded programs it is slightly worse. In particular
exit_mmap() now does a single IDTE for the mm and then just frees every
page cache reference and every page table page directly without a delay
over the mmu_gather structure.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add two new sysfs entries per cpu: idle_count and idle_time.
idle_count contains the number of times a cpu went into idle state.
idle_time contains the time a cpu spent in idle state in microseconds.
This can be used e.g. by powertop to tell how often idle state is
entered and left.
# cat /sys/devices/system/cpu/cpu0/idle_count
504
# cat /sys/devices/system/cpu/cpu0/idle_time
469734037 us
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just hide it behind the #ifdef, because nobody wants
it now.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many places get the queue_mapping field from skb to pass it to the
netif_subqueue_stopped() which will be 0 in any case.
Make the helper that works with sk_buff
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the helper for getting the field, symmetrical to
the "set" one. Return 0 if CONFIG_NETDEVICES_MULTIQUEUE=n
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
By adding module aliases to inet_diag, tcp_diag and dccp_diag, we let
them load automatically as needed. This makes tools like "ss" run
faster.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calculation in SKB_WITH_OVERHEAD is incorrect in that it can cause
an overflow across a page boundary which is what it's meant to prevent.
In particular, the header length (X) should not be lumped together with
skb_shared_info. The latter needs to be aligned properly while the header
has no choice but to sit in front of wherever the payload is.
Therefore the correct calculation is to take away the aligned size of
skb_shared_info, and then subtract the header length. The resulting
quantity L satisfies the following inequality:
SKB_DATA_ALIGN(L + X) + sizeof(struct skb_shared_info) <= PAGE_SIZE
This is the quantity used by alloc_skb to do the actual allocation.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for upcoming 5723 devices.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Assign the next free socket options level to be used by the Bluetooth
protocol and address family.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
With the Bluetooth 1.2 specification the Extended SCO feature for
better audio connections was introduced. So far the Bluetooth core
wasn't able to handle any eSCO connections correctly. This patch
adds simple eSCO support while keeping backward compatibility with
older devices.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In case the remote entity tries to negogiate retransmission or flow
control mode, reject it and fall back to basic mode.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth 1.2 specification introduced a specific features mask
value to interoperate with newer versions of the specification. So far
this piece of information was never needed, but future extensions will
rely on it. This patch adds a generic way to retrieve this information
only once per connection setup.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
After the change to the L2CAP configuration parameter handling the
global conf_mtu variable is no longer needed and so remove it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth HCI commands are divided into logical OGF groups for
easier identification of their purposes. While this still makes sense
for the written specification, its makes the code only more complex
and harder to read. So instead of using separate OGF and OCF values
to identify the commands, use a common 16-bit opcode that combines
both values. As a side effect this also reduces the complexity of
OGF and OCF calculations during command header parsing.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Export the i8042_command() function which manages the mutual
exclusion with the help of the i8042_lock spinlock. This allows
to access i8042 safely from other parts of the kernel.
Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Add common helper routines: mpc52xx_map_wdt() and mpc52xx_restart().
Signed-off-by: Marian Balakowicz <m8@semihalf.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Add helper routine mpc52xx_find_and_map_path(). Extract common code to
mpc52xx_map_node() and refactor mpc52xx_find_and_map().
Signed-off-by: Jan Wrobel <wrr@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: update boards files
Blackfin arch: dma add some API and cleanup bf54x DMA definition
Blackfin arch: cleanup and promote the general purpose timers api to a core blackfin component
Blackfin arch: add a cheesy install target
Blackfin arch: add functions for converting between sclks and usecs
Blackfin arch: add assembly function for doing 64bit unsigned division
Blackfin arch: -mno-fdpic works
Blackfin arch: use "char bfin_board_name[]" rather than "char *bfin_board_name" per discussion on lkml as the former uses less storage
Blackfin arch: Fixing Bug: balance calls to get_task_mm with corresponding mmput calls
Blackfin serial driver Kconfig: depend on DMA not being enabled rather than a specific DMA size
Blackfin arch: Fix bug: missing CHIPID register field definition of BF54x
Blackfin arch: Fix up /proc/cpuinfo so it is like everyone else
Blackfin arch: Optimization - no need to make additional math here
Blackfin arch: force irq_flags into the .data section
Blackfin arch BF548 defconfig: enable watchdog by default
Blackfin arch: add new processor ADSP-BF52x arch/mach support
New kind of audit rule predicates: "object is visible in given subtree".
The part that can be sanely implemented, that is. Limitations:
* if you have hardlink from outside of tree, you'd better watch
it too (or just watch the object itself, obviously)
* if you mount something under a watched tree, tell audit
that new chunk should be added to watched subtrees
* if you umount something in a watched tree and it's still mounted
elsewhere, you will get matches on events happening there. New command
tells audit to recalculate the trees, trimming such sources of false
positives.
Note that it's _not_ about path - if something mounted in several places
(multiple mount, bindings, different namespaces, etc.), the match does
_not_ depend on which one we are using for access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Get a snapshot of a subtree, creating private clones of vfsmounts
for all its components and release such snapshot resp.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'master' of hera.kernel.org:/pub/scm/linux/kernel/git/kyle/parisc-2.6: (29 commits)
[PARISC] fix uninitialized variable warning in asm/rtc.h
[PARISC] Port checkstack.pl to parisc
[PARISC] Make palo target work when $obj != $src
[PARISC] Zap unused variable warnings in pci.c
[PARISC] Fix tests in palo target
[PARISC] Fix palo target
[PARISC] Restore palo target
[PARISC] Attempt to clean up parisc/Makefile
[PARISC] Fix infinite loop in /proc/iomem
[PARISC] Quiet sysfs_create_link __must_check warnings in pdc_stable
[PARISC] Squelch pci_enable_device __must_check warning in superio
[PARISC] Kill off broken irqstack code
[PARISC] Remove hardcoded uses of PAGE_SIZE
[PARISC] Clean up pointless ASM_PAGE_SIZE_DIV use
[PARISC] Kill off the last vestiges of ASM_PAGE_SIZE
[PARISC] Kill off ASM_PAGE_SIZE use
[PARISC] Beautify parisc vmlinux.lds.S
[PARISC] Clean up a resource_size_t warning in sba_iommu
[PARISC] Kill incorrect cast warning in unwinder
[PARISC] Kill zone_to_nid printk warning
...
Fixed trivial conflict in include/asm-parisc/tlbflush.h manually
get_rtc_time, in the case that PDC returns that the battery is bad, returns
an unmodified rtc_time arg to the caller, which then uses uninitialized
values. Fix this by memset-ing the arg with zeroes, so it will at least be
cleared if we return failure.
Spotted by John David Anglin.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits)
fix do_sys_open() prototype
sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake
Documentation: Fix typo in SubmitChecklist.
Typo: depricated -> deprecated
Add missing profile=kvm option to Documentation/kernel-parameters.txt
fix typo about TBI in e1000 comment
proc.txt: Add /proc/stat field
small documentation fixes
Fix compiler warning in smount example program from sharedsubtree.txt
docs/sysfs: add missing word to sysfs attribute explanation
documentation/ext3: grammar fixes
Documentation/java.txt: typo and grammar fixes
Documentation/filesystems/vfs.txt: typo fix
include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros
trivial copy_data_pages() tidy up
Fix typo in arch/x86/kernel/tsc_32.c
file link fix for Pegasus USB net driver help
remove unused return within void return function
Typo fixes retrun -> return
x86 hpet.h: remove broken links
...