and it can cause bootup crashes on certain AMD machines.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQYGDkAAoJEFjIrFwIi8fJrR0H/2iRt3nH3KeG6S2l2UaSvJuH
BqtUSFFtMxKwAc9WC8gx9lc6y9HFig1SThzWuWJulNpF50QnBp38+OuMzEespoUN
JLJtIp/jlYFZL5w2K7soXq7e0elbWTainPWvz5qJE7RifcnclAOGGrf/0LEVf/FQ
xCjn9MLDq5WzbkwuA7TPMDb9RSD3ZJThMShj82kziwpTaniJCpl4YY0lVYUfknXo
t2NTV6Ze6mkzU5QvxSA2ZJt89vsFXQNpsvUK0WzZfLmzugXqnqQHstaVti64ovZc
e64PW61PIRhaZMCDuieclMXWbXFjkp4AsEQJLv3K/kojG20xJ/X08nmolnl64vw=
=t4AN
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull a Xen fix from Konrad Rzeszutek Wilk:
"It is a bug-fix when we run the initial PV guest on a AMD K8 machine
and have CONFIG_AMD_NUMA enabled and detect the NUMA topology from the
Northbridge.
We end up in the situation where the initial domain gets too much
information and gets confused and crashes - the fix is to restrict the
domain to get the information - and we do it by just disabling NUMA on
the PV guest (the hypervisor is still able to do its proper NUMA
allocations of guests).
It is OK to disable the PV guest from accessing NUMA data as right now
we do not inject any NUMA node information to the PV guests. When we
do get to that point, then this patch will have to be reverted."
* Disable PV NUMA support as we do not do anything with it (yet) and it
can cause bootup crashes on certain AMD machines.
* tag 'stable/for-linus-3.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/boot: Disable NUMA for PV guests.
This patch is a follow-up for patch "filter: add XOR instruction for use
with X/K" that implements BPF x86 JIT parts for the BPF XOR operation.
Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
. Convert the trace builtins to use the growing evsel/evlist
tracepoint infrastructure, removing several open coded constructs
like switch like series of strcmp to dispatch events, etc.
Basically what had already been showcased in 'perf sched'.
. Add evsel constructor for tracepoints, that uses libtraceevent
just to parse the /format events file, use it in a new 'perf test'
to make sure the libtraceevent format parsing regressions can
be more readily caught.
. Some strange errors were happening in some builds, but not on the
next, reported by several people, problem was some parser related
files, generated during the build, didn't had proper make deps,
fix from Eric Sandeen.
. Fix some compiling errors on 32-bit, from Feng Tang.
. Don't use sscanf extension %as, not available on bionic, reimplementation
by Irina Tirdea.
. Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf.
. Introduce struct and cache information about the environment where a
perf.data file was captured, from Namhyung Kim.
. Fix several error paths in libtraceevent, from Namhyung Kim.
Print event causing perf_event_open() to fail in 'perf record',
from Stephane Eranian.
. New 'kvm' analysis tool, from Xiao Guangrong.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJQYIGnAAoJENZQFvNTUqpApNIP/2yUe3kfZ6Sz5rVwVdxyDBQN
Y0sv6IwtT31G6WA8o8fvIKylMiTn0XMt5TG3e57ebw01lHH/xQAkQs7EmpmGclIT
qOE5a1JwD8eyTMhBL92dt9LUHEq103pW/oBS1d6TeQ9XyEh+HThv+wVT/yRKneu6
IHV7b/8fC++v80bpsWLr8S3+p9OXMclsjR7IfDCeJnicxyHb/yWfRVbQiBwGk5/v
IfKc0uat60+1qAy7GwBM/nVDJqzekrPI76krNP8tVdftxBjatpXS+VBKiYfo352g
nr3Gpg2MWEY2oRzkN6jCj5yQAVTxa2OrAYY/I8dSJWkTHXL7UbzNzatNpVpvS0we
0wFP5gEumyuYE1YwjsR14ICSOJdMaO0pBYO4YMnqXsXGiS0JEM+2o+2bL2Nl9EDz
LEGWQGWVC23Tu3P1SaHgdb2YnQLZ18AVBwBljESrLvCf4leC/RWoA3HvJ4NOhook
08ee24r57SLLe7z8W3fBPRsslpmoZnhnLHES/N/qf2y5u4Ig9IkafMS2ARVSwgPX
6u0uVIf48YQGAlv8p3cNsYa2PITB3R04Nm5GB2KvTxplwuRD1jYqPpR4BaKrXe+s
SBVIxgDygglY+husTYLBWMBYswWhu5ECVobdfxtrQbQPt+v9hxf5S8UhAbu3AVcV
IocrCPg3REvuxPwwPmY9
=FgCY
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
* Convert the trace builtins to use the growing evsel/evlist
tracepoint infrastructure, removing several open coded constructs
like switch like series of strcmp to dispatch events, etc.
Basically what had already been showcased in 'perf sched'.
* Add evsel constructor for tracepoints, that uses libtraceevent
just to parse the /format events file, use it in a new 'perf test'
to make sure the libtraceevent format parsing regressions can
be more readily caught.
* Some strange errors were happening in some builds, but not on the
next, reported by several people, problem was some parser related
files, generated during the build, didn't had proper make deps,
fix from Eric Sandeen.
* Fix some compiling errors on 32-bit, from Feng Tang.
* Don't use sscanf extension %as, not available on bionic, reimplementation
by Irina Tirdea.
* Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf.
* Introduce struct and cache information about the environment where a
perf.data file was captured, from Namhyung Kim.
* Fix several error paths in libtraceevent, from Namhyung Kim.
Print event causing perf_event_open() to fail in 'perf record',
from Stephane Eranian.
* New 'kvm' analysis tool, from Xiao Guangrong.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Switch x86_64 to using sub-ns precise vsyscall
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
To help migrate archtectures over to the new update_vsyscall method,
redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Since users will need to include timekeeper_internal.h, move
update_vsyscall definitions to timekeeper_internal.h.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
CLOCK_TICK_RATE is used to accurately caclulate exactly how
a tick will be at a given HZ.
This is useful, because while we'd expect NSEC_PER_SEC/HZ,
the underlying hardware will have some granularity limit,
so we won't be able to have exactly HZ ticks per second.
This slight error can cause timekeeping quality problems
when using the jiffies or other jiffies driven clocksources.
Thus we currently use compile time CLOCK_TICK_RATE value to
generate SHIFTED_HZ and NSEC_PER_JIFFIES, which we then use
to adjust the jiffies clocksource to correct this error.
Unfortunately though, since CLOCK_TICK_RATE is a compile
time value, and the jiffies clocksource is registered very
early during boot, there are a number of cases where there
are different possible hardware timers that have different
tick rates. This causes problems in cases like ARM where
there are numerous different types of hardware, each having
their own compile-time CLOCK_TICK_RATE, making it hard to
accurately support different hardware with a single kernel.
For the most part, this doesn't matter all that much, as not
too many systems actually utilize the jiffies or jiffies driven
clocksource. Usually there are other highres clocksources
who's granularity error is negligable.
Even so, we have some complicated calcualtions that we do
everywhere to handle these edge cases.
This patch removes the compile time SHIFTED_HZ value, and
introduces a register_refined_jiffies() function. This results
in the default jiffies clock as being assumed a perfect HZ
freq, and allows archtectures that care about jiffies accuracy
to call register_refined_jiffies() with the tick rate, specified
dynamically at boot.
This allows us, where necessary, to not have a compile time
CLOCK_TICK_RATE constant, simplifies the jiffies code, and
still provides a way to have an accurate jiffies clock.
NOTE: Since this patch does not add register_refinied_jiffies()
calls for every arch, it may cause time quality regressions
in some cases. Its likely these will not be noticable, but
if they are an issue, adding the following to the end of
setup_arch() should resolve the regression:
register_refinied_jiffies(CLOCK_TICK_RATE)
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In order to add xen EFI frambebuffer video support, it is required to add
xen-efi's new video type (XEN_VGATYPE_EFI_LFB) case and handle it in the
function xen_init_vga and set the video type to VIDEO_TYPE_EFI to enable
efi video mode.
The original patch from which this was broken out from:
http://marc.info/?i=4E099AA6020000780004A4C6@nat28.tlf.novell.com
Signed-off-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Tang Liang <liang.tang@oracle.com>
[v2: The original author is Jan Beulich and Liang Tang ported it to upstream]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The xen c/s 25873 allows the hypervisor to retrieve the NUMLOCK flag.
With this patch, the Linux kernel can get the state according to the
data in the BIOS.
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The hypervisor is in charge of allocating the proper "NUMA" memory
and dealing with the CPU scheduler to keep them bound to the proper
NUMA node. The PV guests (and PVHVM) have no inkling of where they
run and do not need to know that right now. In the future we will
need to inject NUMA configuration data (if a guest spans two or more
NUMA nodes) so that the kernel can make the right choices. But those
patches are not yet present.
In the meantime, disable the NUMA capability in the PV guest, which
also fixes a bootup issue. Andre says:
"we see Dom0 crashes due to the kernel detecting the NUMA topology not
by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
This will detect the actual NUMA config of the physical machine, but
will crash about the mismatch with Dom0's virtual memory. Variation of
the theme: Dom0 sees what it's not supposed to see.
This happens with the said config option enabled and on a machine where
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
We have this dump then:
NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10
Scanning NUMA topology in Northbridge 24
Number of physical nodes 4
Node 0 MemBase 0000000000000000 Limit 0000000040000000
Node 1 MemBase 0000000040000000 Limit 0000000138000000
Node 2 MemBase 0000000138000000 Limit 00000001f8000000
Node 3 MemBase 00000001f8000000 Limit 0000000238000000
Initmem setup node 0 0000000000000000-0000000040000000
NODE_DATA [000000003ffd9000 - 000000003fffffff]
Initmem setup node 1 0000000040000000-0000000138000000
NODE_DATA [0000000137fd9000 - 0000000137ffffff]
Initmem setup node 2 0000000138000000-00000001f8000000
NODE_DATA [00000001f095e000 - 00000001f0984fff]
Initmem setup node 3 00000001f8000000-0000000238000000
Cannot find 159744 bytes in node 3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
.. snip..
[<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178
[<ffffffff81d23348>] sparse_init+0xe4/0x25a
[<ffffffff81d16840>] paging_init+0x13/0x22
[<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
[<ffffffff81683954>] ? printk+0x3c/0x3e
[<ffffffff81d01a38>] start_kernel+0xe5/0x468
[<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
[<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
[<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
"
so we just disable NUMA scanning by setting numa_off=1.
CC: stable@vger.kernel.org
Reported-and-Tested-by: Andre Przywara <andre.przywara@amd.com>
Acked-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull kbuild fixes from Michal Marek:
"There are two more kbuild fixes for 3.6.
One fixes a race between x86's archscripts target and the rule
(re)building scripts/basic/fixdep. The second is a fix for the
previous attempt at fixing make firmware_install with make 3.82.
This new solution should work with any version of GNU make"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
x86/kbuild: archscripts depends on scripts_basic
firmware: fix directory creation rule matching with make 3.80
If arch/x86/kernel/cpuid.c is a module, a CPU might offline or online
between the for_each_online_cpu() loop and the call to
register_hotcpu_notifier in cpuid_init or the call to
unregister_hotcpu_notifier in cpuid_exit. The potential races can
lead to leaks/duplicates, attempts to destroy non-existant devices, or
random pointer dereferences.
For example, in cpuid_exit if:
for_each_online_cpu(cpu)
cpuid_device_destroy(cpu);
class_destroy(cpuid_class);
__unregister_chrdev(CPUID_MAJOR, 0, NR_CPUS, "cpu/cpuid");
<----- CPU onlines
unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);
the hotcpu notifier will attempt to create a device for the
cpuid_class, which the module already destroyed.
This fix surrounds for_each_online_cpu and register_hotcpu_notifier or
unregister_hotcpu_notifier with get_online_cpus+put_online_cpus.
Tested on a VM.
Signed-off-by: Silas Boyd-Wickizer <sbw@mit.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
If arch/x86/kernel/msr.c is a module, a CPU might offline or online
between the for_each_online_cpu(i) loop and the call to
register_hotcpu_notifier in msr_init or the call to
unregister_hotcpu_notifier in msr_exit. The potential races can lead
to leaks/duplicates, attempts to destroy non-existant devices, or
random pointer dereferences.
For example, in msr_init if:
for_each_online_cpu(i) {
err = msr_device_create(i);
if (err != 0)
goto out_class;
}
<----- CPU offlines
register_hotcpu_notifier(&msr_class_cpu_notifier);
and the CPU never onlines before msr_exit, then the module will never
call msr_device_destroy for the associated CPU.
This fix surrounds for_each_online_cpu and register_hotcpu_notifier or
unregister_hotcpu_notifier with get_online_cpus+put_online_cpus.
Tested on a VM.
Signed-off-by: Silas Boyd-Wickizer <sbw@mit.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.
Fix this by saving the dr7 used for guest debugging and calculating the
effective register value as well as switch_db_regs on any potential
change. This will change to focus of the set_guest_debug vendor op to
update_dp_bp_intercept.
Found while trying to stop on start_secondary.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
To emulate level triggered interrupts, add a resample option to
KVM_IRQFD. When specified, a new resamplefd is provided that notifies
the user when the irqchip has been resampled by the VM. This may, for
instance, indicate an EOI. Also in this mode, posting of an interrupt
through an irqfd only asserts the interrupt. On resampling, the
interrupt is automatically de-asserted prior to user notification.
This enables level triggered interrupts to be posted and re-enabled
from vfio with no userspace intervention.
All resampling irqfds can make use of a single irq source ID, so we
reserve a new one for this interface.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* stable/late-swiotlb.v3.3:
xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer.
xen/swiotlb: Remove functions not needed anymore.
xen/pcifront: Use Xen-SWIOTLB when initting if required.
xen/swiotlb: For early initialization, return zero on success.
xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used.
xen/swiotlb: Move the error strings to its own function.
xen/swiotlb: Move the nr_tbl determination in its own function.
swiotlb: add the late swiotlb initialization function with iotlb memory
xen/swiotlb: With more than 4GB on 64-bit, disable the native SWIOTLB.
xen/swiotlb: Simplify the logic.
Conflicts:
arch/x86/xen/pci-swiotlb-xen.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reason for merge:
x86/fpu changed the structure of some of the code that x86/smap
changes; mostly fpu-internal.h but also minor changes to the
signal code.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Resolved Conflicts:
arch/x86/ia32/ia32_signal.c
arch/x86/include/asm/fpu-internal.h
arch/x86/kernel/signal.c
Preemption is disabled between kernel_fpu_begin/end() and as such
it is not a good idea to use these routines in kvm_load/put_guest_fpu()
which can be very far apart.
kvm_load/put_guest_fpu() routines are already called with
preemption disabled and KVM already uses the preempt notifier to save
the guest fpu state using kvm_put_guest_fpu().
So introduce __kernel_fpu_begin/end() routines which don't touch
preemption and use them instead of kernel_fpu_begin/end()
for KVM's use model of saving/restoring guest FPU state.
Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit
state in the case of VMX. For eagerFPU case, host cr0.TS is always clear.
So no need to worry about it. For the traditional lazyFPU restore case,
change the cr0.TS bit for the host state during vm-exit to be always clear
and cr0.TS bit is set in the __vmx_load_host_state() when the FPU
(guest FPU or the host task's FPU) state is not active. This ensures
that the host/guest FPU state is properly saved, restored
during context-switch and with interrupts (using irq_fpu_usable()) not
stomping on the active FPU state.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull x86 fixes from Ingo Molnar:
"Small fixlets"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/init.c: Fix devmem_is_allowed() off by one
x86/kconfig: Remove outdated reference to Intel CPUs in CONFIG_SWIOTLB
The changes to entry_32.S got missed in checkin:
63bcff2a x86, smap: Add STAC and CLAC instructions to control user space access
The resulting kernel was largely functional but SMAP protection could
have been bypassed.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com
Use kzalloc() so the struct resource doesn't contain garbage in
fields we don't initialize.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: x86@kernel.org
Signal handling contains a bunch of accesses to individual user space
items, which causes an excessive number of STAC and CLAC
instructions. Instead, let get/put_user_try ... get/put_user_catch()
contain the STAC and CLAC instructions.
This means that get/put_user_try no longer nests, and furthermore that
it is no longer legal to use user space access functions other than
__get/put_user_ex() inside those blocks. However, these macros are
x86-specific anyway and are only used in the signal-handling paths; a
simple reordering of moving the larger subroutine calls out of the
try...catch blocks resolves that problem.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-12-git-send-email-hpa@linux.intel.com
When Supervisor Mode Access Prevention (SMAP) is enabled, access to
userspace from the kernel is controlled by the AC flag. To make the
performance of manipulating that flag acceptable, there are two new
instructions, STAC and CLAC, to set and clear it.
This patch adds those instructions, via alternative(), when the SMAP
feature is enabled. It also adds X86_EFLAGS_AC unconditionally to the
SYSCALL entry mask; there is simply no reason to make that one
conditional.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com
The STAC/CLAC instructions are only available with SMAP, but on the
other hand they aren't needed if SMAP is not available, or before we
start to run userspace, so construct them as alternatives which start
out as noops and are enabled by the alternatives mechanism.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-7-git-send-email-hpa@linux.intel.com
* Fix M2P batching re-using the incorrect structure field.
* Disable BIOS SMP MP table search.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQXGfdAAoJEFjIrFwIi8fJbWcH/0FI2d/VyB+ZU0ng3R0Oa7mt
iR/x+Z+mfFdp2dXS6gs6DgJIZVA7i2K9pX4rOXjpDGGGyUeo1xoqjlQfsFWQGjZ/
p49RrDrM93c2GdRXk3iMSWfboQI7BXBs5rnyYZQL7kMxUSR75MxbeONvhPrMSO9I
3EBidWH08qjrn2HVF44F6xh5ONjpclo5AvGIzJ0eU4X0D0eqMnhvlAw8/UYJU2HV
heRvuxWF9l2jNpLhKhZy1730D1X/vKA5qKAcBW8rCOpEijyPpmtKbqapeUJg/9pH
NVquuwGutP5ozrSi7a/23+L+ezvQBmCPm5ZRG44PccBoZ/HVs8haT8UypSWSDzo=
=TwvM
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
- Fix M2P batching re-using the incorrect structure field.
In v3.5 we added batching for M2P override (Machine Frame Number ->
Physical Frame Number), but the original MFN was saved in an
incorrect structure - and we would oops/restore when restoring with
the old MFN.
- Disable BIOS SMP MP table search.
A bootup issue that we had ignored until we found that on DL380 G6 it
was needed.
* tag 'stable/for-linus-3.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/boot: Disable BIOS SMP MP table search.
xen/m2p: do not reuse kmap_op->dev_bus_addr
Exporting KVM exit information to userspace to be consumed by perf.
Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
[ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1347870675-31495-2-git-send-email-haodong@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While building the SUSE kernel packages, which build the scripts,
make clean, and then build everything, we have been running into spurious
build failures. We tracked them down to a simple dependency issue:
$ make mrproper
CLEAN arch/x86/tools
CLEAN scripts/basic
$ cp patches/config/x86_64/desktop .config
$ make archscripts
HOSTCC arch/x86/tools/relocs
/bin/sh: scripts/basic/fixdep: No such file or directory
make[3]: *** [arch/x86/tools/relocs] Error 1
make[2]: *** [archscripts] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2
This was introduced by commit
6520fe55 (x86, realmode: 16-bit real-mode code support for relocs),
which added the archscripts dependency to archprepare.
This patch adds the scripts_basic dependency to the x86 archscripts.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
we only use that to tell copy_thread() done by syscall from that
done by kernel_thread(). However, it's easier to do simply by
checking PF_KTHREAD in thread flags.
Merge sys_clone() guts for 32bit and 64bit, while we are at it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
TIF_NOTIFY_RESUME will work in precisely the same way; all that
is achieved by TIF_IRET is appearing that there's some work to be
done, so we end up on the iret exit path. Just use NOTIFY_RESUME.
And for execve() do that in 32bit start_thread(), not sys_execve()
itself.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
I get this warning:
arch/x86/kernel/kprobes.c:544:23: warning: ‘skip_singlestep’ declared ‘static’ but never defined
on tip/auto-latest.
Put the skip_singlestep function declaration up, in
KPROBES_CAN_USE_FTRACE and drop the superfluous forward
declaration.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1348145034-16603-1-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Most interrupt are delivered to only one vcpu. Use pre-build tables to
find interrupt destination instead of looping through all vcpus. In case
of logical mode loop only through vcpus in a logical cluster irq is sent
to.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
'ac' essentially reconstructs the 'access' variable we already
have, except for the PFERR_PRESENT_MASK and PFERR_RSVD_MASK. As
these are not used by callees, just use 'access' directly.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Keep track of accessed/dirty bits; if they are all set, do not
enter the accessed/dirty update loop.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
'eperm' is no longer used in the walker loop, so we can eliminate it.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of branchy code depending on level, gpte.ps, and mmu configuration,
prepare everything in a bitmap during mode changes and look it up during
runtime.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The page table walk is coded as an infinite loop, with a special
case on the last pte.
Code it as an ordinary loop with a termination condition on the last
pte (large page or walk length exhausted), and put the last pte handling
code after the loop where it belongs.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
walk_addr_generic() permission checks are a maze of branchy code, which is
performed four times per lookup. It depends on the type of access, efer.nxe,
cr0.wp, cr4.smep, and in the near future, cr4.smap.
Optimize this away by precalculating all variants and storing them in a
bitmap. The bitmap is recalculated when rarely-changing variables change
(cr0, cr4) and is indexed by the often-changing variables (page fault error
code, pte access permissions).
The permission check is moved to the end of the loop, otherwise an SMEP
fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
Noted by Xiao Guangrong.
The result is short, branch-free code.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
While unspecified, the behaviour of Intel processors is to first
perform the page table walk, then, if the walk was successful, to
atomically update the accessed and dirty bits of walked paging elements.
While we are not required to follow this exactly, doing so will allow us
to perform the access permissions check after the walk is complete, rather
than after each walk step.
(the tricky case is SMEP: a zero in any pte's U bit makes the referenced
page a supervisor page, so we can't fault on a one bit during the walk
itself).
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We no longer rely on paging_tmpl.h defines; so we can move the function
to mmu.c.
Rely on zero extension to 64 bits to get the correct nx behaviour.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If nx is disabled, then is gpte[63] is set we will hit a reserved
bit set fault before checking permissions; so we can ignore the
setting of efer.nxe.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
gpte_access() computes the access permissions of a guest pte and also
write-protects clean gptes. This is wrong when we are servicing a
write fault (since we'll be setting the dirty bit momentarily) but
correct when instantiating a speculative spte, or when servicing a
read fault (since we'll want to trap a following write in order to
set the dirty bit).
It doesn't seem to hurt in practice, but in order to make the code
readable, push the write protection out of gpte_access() and into
a new protect_clean_gpte() which is called explicitly when needed.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>