Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (146 commits)
  tools, perf: Documentation for the power events API
  perf: Add calls to suspend trace point
  perf script: Make some lists static
  perf script: Use the default lost event handler
  perf session: Warn about errors when processing pipe events too
  perf tools: Fix perf_event.h header usage
  perf test: Clarify some error reports in the open syscall test
  x86, NMI: Add touch_nmi_watchdog to io_check_error delay
  x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time
  x86: Only call smp_processor_id in non-preempt cases
  perf timechart: Adjust perf timechart to the new power events
  perf: Clean up power events by introducing new, more generic ones
  perf: Do not export power_frequency, but power_start event
  perf test: Add test for counting open syscalls
  perf evsel: Auto allocate resources needed for some methods
  perf evsel: Use {cpu,thread}_map to shorten list of parameters
  perf tools: Refactor all_tids to hold nr and the map
  perf tools: Refactor cpumap to hold nr and the map
  perf evsel: Introduce per cpu and per thread open helpers
  perf evsel: Steal the counter reading routines from stat
  ...
This commit is contained in:
Linus Torvalds 2011-01-06 10:17:26 -08:00
commit 28d9bfc37c
207 changed files with 4923 additions and 3453 deletions

View File

@ -2365,8 +2365,6 @@ E: acme@redhat.com
W: http://oops.ghostprotocols.net:81/blog/
P: 1024D/9224DF01 D5DF E3BB E3C8 BCBB F8AD 841A B6AB 4681 9224 DF01
D: IPX, LLC, DCCP, cyc2x, wl3501_cs, net/ hacks
S: R. Brasílio Itiberê, 4270/1010 - Água Verde
S: 80240-060 - Curitiba - Paraná
S: Brazil
N: Karsten Merker

View File

@ -1579,20 +1579,12 @@ and is between 256 and 4096 characters. It is defined in the file
nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels
Format: [panic,][num]
Valid num: 0,1,2
Valid num: 0
0 - turn nmi_watchdog off
1 - use the IO-APIC timer for the NMI watchdog
2 - use the local APIC for the NMI watchdog using
a performance counter. Note: This will use one
performance counter and the local APIC's performance
vector.
When panic is specified, panic when an NMI watchdog
timeout occurs.
This is useful when you use a panic=... timeout and
need the box quickly up again.
Instead of 1 and 2 it is possible to use the following
symbolic names: lapic and ioapic
Example: nmi_watchdog=2 or nmi_watchdog=panic,lapic
netpoll.carrier_timeout=
[NET] Specifies amount of time (in seconds) that

View File

@ -0,0 +1,90 @@
Subsystem Trace Points: power
The power tracing system captures events related to power transitions
within the kernel. Broadly speaking there are three major subheadings:
o Power state switch which reports events related to suspend (S-states),
cpuidle (C-states) and cpufreq (P-states)
o System clock related changes
o Power domains related changes and transitions
This document describes what each of the tracepoints is and why they
might be useful.
Cf. include/trace/events/power.h for the events definitions.
1. Power state switch events
============================
1.1 New trace API
-----------------
A 'cpu' event class gathers the CPU-related events: cpuidle and
cpufreq.
cpu_idle "state=%lu cpu_id=%lu"
cpu_frequency "state=%lu cpu_id=%lu"
A suspend event is used to indicate the system going in and out of the
suspend mode:
machine_suspend "state=%lu"
Note: the value of '-1' or '4294967295' for state means an exit from the current state,
i.e. trace_cpu_idle(4, smp_processor_id()) means that the system
enters the idle state 4, while trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id())
means that the system exits the previous idle state.
The event which has 'state=4294967295' in the trace is very important to the user
space tools which are using it to detect the end of the current state, and so to
correctly draw the states diagrams and to calculate accurate statistics etc.
1.2 DEPRECATED trace API
------------------------
A new Kconfig option CONFIG_EVENT_POWER_TRACING_DEPRECATED with the default value of
'y' has been created. This allows the legacy trace power API to be used conjointly
with the new trace API.
The Kconfig option, the old trace API (in include/trace/events/power.h) and the
old trace points will disappear in a future release (namely 2.6.41).
power_start "type=%lu state=%lu cpu_id=%lu"
power_frequency "type=%lu state=%lu cpu_id=%lu"
power_end "cpu_id=%lu"
The 'type' parameter takes one of those macros:
. POWER_NONE = 0,
. POWER_CSTATE = 1, /* C-State */
. POWER_PSTATE = 2, /* Fequency change or DVFS */
The 'state' parameter is set depending on the type:
. Target C-state for type=POWER_CSTATE,
. Target frequency for type=POWER_PSTATE,
power_end is used to indicate the exit of a state, corresponding to the latest
power_start event.
2. Clocks events
================
The clock events are used for clock enable/disable and for
clock rate change.
clock_enable "%s state=%lu cpu_id=%lu"
clock_disable "%s state=%lu cpu_id=%lu"
clock_set_rate "%s state=%lu cpu_id=%lu"
The first parameter gives the clock name (e.g. "gpio1_iclk").
The second parameter is '1' for enable, '0' for disable, the target
clock rate for set_rate.
3. Power domains events
=======================
The power domain events are used for power domains transitions
power_domain_target "%s state=%lu cpu_id=%lu"
The first parameter gives the power domain name (e.g. "mpu_pwrdm").
The second parameter is the power domain target state.

View File

@ -4627,7 +4627,7 @@ PERFORMANCE EVENTS SUBSYSTEM
M: Peter Zijlstra <a.p.zijlstra@chello.nl>
M: Paul Mackerras <paulus@samba.org>
M: Ingo Molnar <mingo@elte.hu>
M: Arnaldo Carvalho de Melo <acme@redhat.com>
M: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
S: Supported
F: kernel/perf_event*.c
F: include/linux/perf_event.h

View File

@ -1,10 +1,4 @@
#ifndef __ASM_ALPHA_PERF_EVENT_H
#define __ASM_ALPHA_PERF_EVENT_H
#ifdef CONFIG_PERF_EVENTS
extern void init_hw_perf_events(void);
#else
static inline void init_hw_perf_events(void) { }
#endif
#endif /* __ASM_ALPHA_PERF_EVENT_H */

View File

@ -112,8 +112,6 @@ init_IRQ(void)
wrent(entInt, 0);
alpha_mv.init_irq();
init_hw_perf_events();
}
/*

View File

@ -14,6 +14,7 @@
#include <linux/kernel.h>
#include <linux/kdebug.h>
#include <linux/mutex.h>
#include <linux/init.h>
#include <asm/hwrpb.h>
#include <asm/atomic.h>
@ -863,13 +864,13 @@ static void alpha_perf_event_irq_handler(unsigned long la_ptr,
/*
* Init call to initialise performance events at kernel startup.
*/
void __init init_hw_perf_events(void)
int __init init_hw_perf_events(void)
{
pr_info("Performance events: ");
if (!supported_cpu()) {
pr_cont("No support for your CPU.\n");
return;
return 0;
}
pr_cont("Supported CPU type!\n");
@ -881,6 +882,8 @@ void __init init_hw_perf_events(void)
/* And set up PMU specification */
alpha_pmu = &ev67_pmu;
perf_pmu_register(&pmu);
}
perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
return 0;
}
early_initcall(init_hw_perf_events);

View File

@ -3034,11 +3034,11 @@ init_hw_perf_events(void)
pr_info("no hardware support available\n");
}
perf_pmu_register(&pmu);
perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
return 0;
}
arch_initcall(init_hw_perf_events);
early_initcall(init_hw_perf_events);
/*
* Callchain handling code.

View File

@ -1047,6 +1047,6 @@ init_hw_perf_events(void)
return 0;
}
arch_initcall(init_hw_perf_events);
early_initcall(init_hw_perf_events);
#endif /* defined(CONFIG_CPU_MIPS32)... */

View File

@ -126,4 +126,4 @@ static int init_e500_pmu(void)
return register_fsl_emb_pmu(&e500_pmu);
}
arch_initcall(init_e500_pmu);
early_initcall(init_e500_pmu);

View File

@ -414,4 +414,4 @@ static int init_mpc7450_pmu(void)
return register_power_pmu(&mpc7450_pmu);
}
arch_initcall(init_mpc7450_pmu);
early_initcall(init_mpc7450_pmu);

View File

@ -1379,7 +1379,7 @@ int register_power_pmu(struct power_pmu *pmu)
freeze_events_kernel = MMCR0_FCHV;
#endif /* CONFIG_PPC64 */
perf_pmu_register(&power_pmu);
perf_pmu_register(&power_pmu, "cpu", PERF_TYPE_RAW);
perf_cpu_notifier(power_pmu_notifier);
return 0;

View File

@ -681,7 +681,7 @@ int register_fsl_emb_pmu(struct fsl_emb_pmu *pmu)
pr_info("%s performance monitor hardware support registered\n",
pmu->name);
perf_pmu_register(&fsl_emb_pmu);
perf_pmu_register(&fsl_emb_pmu, "cpu", PERF_TYPE_RAW);
return 0;
}

View File

@ -613,4 +613,4 @@ static int init_power4_pmu(void)
return register_power_pmu(&power4_pmu);
}
arch_initcall(init_power4_pmu);
early_initcall(init_power4_pmu);

View File

@ -682,4 +682,4 @@ static int init_power5p_pmu(void)
return register_power_pmu(&power5p_pmu);
}
arch_initcall(init_power5p_pmu);
early_initcall(init_power5p_pmu);

View File

@ -621,4 +621,4 @@ static int init_power5_pmu(void)
return register_power_pmu(&power5_pmu);
}
arch_initcall(init_power5_pmu);
early_initcall(init_power5_pmu);

View File

@ -544,4 +544,4 @@ static int init_power6_pmu(void)
return register_power_pmu(&power6_pmu);
}
arch_initcall(init_power6_pmu);
early_initcall(init_power6_pmu);

View File

@ -369,4 +369,4 @@ static int init_power7_pmu(void)
return register_power_pmu(&power7_pmu);
}
arch_initcall(init_power7_pmu);
early_initcall(init_power7_pmu);

View File

@ -494,4 +494,4 @@ static int init_ppc970_pmu(void)
return register_power_pmu(&ppc970_pmu);
}
arch_initcall(init_ppc970_pmu);
early_initcall(init_ppc970_pmu);

View File

@ -250,4 +250,4 @@ static int __init sh7750_pmu_init(void)
return register_sh_pmu(&sh7750_pmu);
}
arch_initcall(sh7750_pmu_init);
early_initcall(sh7750_pmu_init);

View File

@ -284,4 +284,4 @@ static int __init sh4a_pmu_init(void)
return register_sh_pmu(&sh4a_pmu);
}
arch_initcall(sh4a_pmu_init);
early_initcall(sh4a_pmu_init);

View File

@ -389,7 +389,7 @@ int __cpuinit register_sh_pmu(struct sh_pmu *_pmu)
WARN_ON(_pmu->num_events > MAX_HWEVENTS);
perf_pmu_register(&pmu);
perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
perf_cpu_notifier(sh_pmu_notifier);
return 0;
}

View File

@ -4,8 +4,6 @@
#ifdef CONFIG_PERF_EVENTS
#include <asm/ptrace.h>
extern void init_hw_perf_events(void);
#define perf_arch_fetch_caller_regs(regs, ip) \
do { \
unsigned long _pstate, _asi, _pil, _i7, _fp; \
@ -26,8 +24,6 @@ do { \
(regs)->u_regs[UREG_I6] = _fp; \
(regs)->u_regs[UREG_I7] = _i7; \
} while (0)
#else
static inline void init_hw_perf_events(void) { }
#endif
#endif

View File

@ -270,8 +270,6 @@ int __init nmi_init(void)
atomic_set(&nmi_active, -1);
}
}
if (!err)
init_hw_perf_events();
return err;
}

View File

@ -1307,20 +1307,23 @@ static bool __init supported_pmu(void)
return false;
}
void __init init_hw_perf_events(void)
int __init init_hw_perf_events(void)
{
pr_info("Performance events: ");
if (!supported_pmu()) {
pr_cont("No support for PMU type '%s'\n", sparc_pmu_type);
return;
return 0;
}
pr_cont("Supported PMU type is '%s'\n", sparc_pmu_type);
perf_pmu_register(&pmu);
perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
register_die_notifier(&perf_event_nmi_notifier);
return 0;
}
early_initcall(init_hw_perf_events);
void perf_callchain_kernel(struct perf_callchain_entry *entry,
struct pt_regs *regs)

View File

@ -180,8 +180,15 @@ extern void *text_poke_early(void *addr, const void *opcode, size_t len);
* On the local CPU you need to be protected again NMI or MCE handlers seeing an
* inconsistent instruction while you patch.
*/
struct text_poke_param {
void *addr;
const void *opcode;
size_t len;
};
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
extern void text_poke_smp_batch(struct text_poke_param *params, int n);
#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
#define IDEAL_NOP_SIZE_5 5

View File

@ -15,10 +15,6 @@ static inline int irq_canonicalize(int irq)
return ((irq == 2) ? 9 : irq);
}
#ifdef CONFIG_X86_LOCAL_APIC
# define ARCH_HAS_NMI_WATCHDOG
#endif
#ifdef CONFIG_X86_32
extern void irq_ctx_init(int cpu);
#else

View File

@ -28,7 +28,7 @@ extern void die(const char *, struct pt_regs *,long);
extern int __must_check __die(const char *, struct pt_regs *, long);
extern void show_registers(struct pt_regs *regs);
extern void show_trace(struct task_struct *t, struct pt_regs *regs,
unsigned long *sp, unsigned long bp);
unsigned long *sp);
extern void __show_regs(struct pt_regs *regs, int all);
extern void show_regs(struct pt_regs *regs);
extern unsigned long oops_begin(void);

View File

@ -123,6 +123,10 @@
#define MSR_AMD64_IBSCTL 0xc001103a
#define MSR_AMD64_IBSBRTARGET 0xc001103b
/* Fam 15h MSRs */
#define MSR_F15H_PERF_CTL 0xc0010200
#define MSR_F15H_PERF_CTR 0xc0010201
/* Fam 10h MSRs */
#define MSR_FAM10H_MMIO_CONF_BASE 0xc0010058
#define FAM10H_MMIO_CONF_ENABLE (1<<0)

View File

@ -5,41 +5,15 @@
#include <asm/irq.h>
#include <asm/io.h>
#ifdef ARCH_HAS_NMI_WATCHDOG
/**
* do_nmi_callback
*
* Check to see if a callback exists and execute it. Return 1
* if the handler exists and was handled successfully.
*/
int do_nmi_callback(struct pt_regs *regs, int cpu);
#ifdef CONFIG_X86_LOCAL_APIC
extern void die_nmi(char *str, struct pt_regs *regs, int do_panic);
extern int check_nmi_watchdog(void);
#if !defined(CONFIG_LOCKUP_DETECTOR)
extern int nmi_watchdog_enabled;
#endif
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
extern int reserve_perfctr_nmi(unsigned int);
extern void release_perfctr_nmi(unsigned int);
extern int reserve_evntsel_nmi(unsigned int);
extern void release_evntsel_nmi(unsigned int);
extern void setup_apic_nmi_watchdog(void *);
extern void stop_apic_nmi_watchdog(void *);
extern void disable_timer_nmi_watchdog(void);
extern void enable_timer_nmi_watchdog(void);
extern int nmi_watchdog_tick(struct pt_regs *regs, unsigned reason);
extern void cpu_nmi_set_wd_enabled(void);
extern atomic_t nmi_active;
extern unsigned int nmi_watchdog;
#define NMI_NONE 0
#define NMI_IO_APIC 1
#define NMI_LOCAL_APIC 2
#define NMI_INVALID 3
struct ctl_table;
extern int proc_nmi_enabled(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
@ -47,33 +21,8 @@ extern int unknown_nmi_panic;
void arch_trigger_all_cpu_backtrace(void);
#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
static inline void localise_nmi_watchdog(void)
{
if (nmi_watchdog == NMI_IO_APIC)
nmi_watchdog = NMI_LOCAL_APIC;
}
/* check if nmi_watchdog is active (ie was specified at boot) */
static inline int nmi_watchdog_active(void)
{
/*
* actually it should be:
* return (nmi_watchdog == NMI_LOCAL_APIC ||
* nmi_watchdog == NMI_IO_APIC)
* but since they are power of two we could use a
* cheaper way --cvg
*/
return nmi_watchdog & (NMI_LOCAL_APIC | NMI_IO_APIC);
}
#endif
void lapic_watchdog_stop(void);
int lapic_watchdog_init(unsigned nmi_hz);
int lapic_wd_event(unsigned nmi_hz);
unsigned lapic_adjust_nmi_hz(unsigned hz);
void disable_lapic_nmi_watchdog(void);
void enable_lapic_nmi_watchdog(void);
void stop_nmi(void);
void restart_nmi(void);

View File

@ -125,7 +125,6 @@ union cpuid10_edx {
#define IBS_OP_MAX_CNT_EXT 0x007FFFFFULL /* not a register bit mask */
#ifdef CONFIG_PERF_EVENTS
extern void init_hw_perf_events(void);
extern void perf_events_lapic_init(void);
#define PERF_EVENT_INDEX_OFFSET 0
@ -156,7 +155,6 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
}
#else
static inline void init_hw_perf_events(void) { }
static inline void perf_events_lapic_init(void) { }
#endif

View File

@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
};
/*
* P4 PEBS specifics (Replay Event only)
*
* Format (bits):
* 0-6: metric from P4_PEBS_METRIC enum
* 7 : reserved
* 8 : reserved
* 9-11 : reserved
*
* Note we have UOP and PEBS bits reserved for now
* just in case if we will need them once
*/
@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
P4_PEBS_METRIC__max
};
/*
* Notes on internal configuration of ESCR+CCCR tuples
*
* Since P4 has quite the different architecture of
* performance registers in compare with "architectural"
* once and we have on 64 bits to keep configuration
* of performance event, the following trick is used.
*
* 1) Since both ESCR and CCCR registers have only low
* 32 bits valuable, we pack them into a single 64 bit
* configuration. Low 32 bits of such config correspond
* to low 32 bits of CCCR register and high 32 bits
* correspond to low 32 bits of ESCR register.
*
* 2) The meaning of every bit of such config field can
* be found in Intel SDM but it should be noted that
* we "borrow" some reserved bits for own usage and
* clean them or set to a proper value when we do
* a real write to hardware registers.
*
* 3) The format of bits of config is the following
* and should be either 0 or set to some predefined
* values:
*
* Low 32 bits
* -----------
* 0-6: P4_PEBS_METRIC enum
* 7-11: reserved
* 12: reserved (Enable)
* 13-15: reserved (ESCR select)
* 16-17: Active Thread
* 18: Compare
* 19: Complement
* 20-23: Threshold
* 24: Edge
* 25: reserved (FORCE_OVF)
* 26: reserved (OVF_PMI_T0)
* 27: reserved (OVF_PMI_T1)
* 28-29: reserved
* 30: reserved (Cascade)
* 31: reserved (OVF)
*
* High 32 bits
* ------------
* 0: reserved (T1_USR)
* 1: reserved (T1_OS)
* 2: reserved (T0_USR)
* 3: reserved (T0_OS)
* 4: Tag Enable
* 5-8: Tag Value
* 9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
* 25-30: enum P4_EVENTS
* 31: reserved (HT thread)
*/
#endif /* PERF_EVENT_P4_H */

View File

@ -48,7 +48,6 @@ static inline void __init smpboot_setup_io_apic(void)
setup_IO_APIC();
else {
nr_ioapics = 0;
localise_nmi_watchdog();
}
#endif
}

View File

@ -7,6 +7,7 @@
#define _ASM_X86_STACKTRACE_H
#include <linux/uaccess.h>
#include <linux/ptrace.h>
extern int kstack_depth_to_print;
@ -46,7 +47,7 @@ struct stacktrace_ops {
};
void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
unsigned long *stack, unsigned long bp,
unsigned long *stack,
const struct stacktrace_ops *ops, void *data);
#ifdef CONFIG_X86_32
@ -57,13 +58,39 @@ void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
#define get_bp(bp) asm("movq %%rbp, %0" : "=r" (bp) :)
#endif
#ifdef CONFIG_FRAME_POINTER
static inline unsigned long
stack_frame(struct task_struct *task, struct pt_regs *regs)
{
unsigned long bp;
if (regs)
return regs->bp;
if (task == current) {
/* Grab bp right from our regs */
get_bp(bp);
return bp;
}
/* bp is the last reg pushed by switch_to */
return *(unsigned long *)task->thread.sp;
}
#else
static inline unsigned long
stack_frame(struct task_struct *task, struct pt_regs *regs)
{
return 0;
}
#endif
extern void
show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long bp, char *log_lvl);
unsigned long *stack, char *log_lvl);
extern void
show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *sp, unsigned long bp, char *log_lvl);
unsigned long *sp, char *log_lvl);
extern unsigned int code_bytes;

View File

@ -10,12 +10,6 @@
unsigned long long native_sched_clock(void);
extern int recalibrate_cpu_khz(void);
#if defined(CONFIG_X86_32) && defined(CONFIG_X86_IO_APIC)
extern int timer_ack;
#else
# define timer_ack (0)
#endif
extern int no_timer_check;
/* Accelerators for sched_clock()

View File

@ -591,17 +591,21 @@ static atomic_t stop_machine_first;
static int wrote_text;
struct text_poke_params {
void *addr;
const void *opcode;
size_t len;
struct text_poke_param *params;
int nparams;
};
static int __kprobes stop_machine_text_poke(void *data)
{
struct text_poke_params *tpp = data;
struct text_poke_param *p;
int i;
if (atomic_dec_and_test(&stop_machine_first)) {
text_poke(tpp->addr, tpp->opcode, tpp->len);
for (i = 0; i < tpp->nparams; i++) {
p = &tpp->params[i];
text_poke(p->addr, p->opcode, p->len);
}
smp_wmb(); /* Make sure other cpus see that this has run */
wrote_text = 1;
} else {
@ -610,8 +614,12 @@ static int __kprobes stop_machine_text_poke(void *data)
smp_mb(); /* Load wrote_text before following execution */
}
flush_icache_range((unsigned long)tpp->addr,
(unsigned long)tpp->addr + tpp->len);
for (i = 0; i < tpp->nparams; i++) {
p = &tpp->params[i];
flush_icache_range((unsigned long)p->addr,
(unsigned long)p->addr + p->len);
}
return 0;
}
@ -631,10 +639,13 @@ static int __kprobes stop_machine_text_poke(void *data)
void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
{
struct text_poke_params tpp;
struct text_poke_param p;
tpp.addr = addr;
tpp.opcode = opcode;
tpp.len = len;
p.addr = addr;
p.opcode = opcode;
p.len = len;
tpp.params = &p;
tpp.nparams = 1;
atomic_set(&stop_machine_first, 1);
wrote_text = 0;
/* Use __stop_machine() because the caller already got online_cpus. */
@ -642,6 +653,26 @@ void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
return addr;
}
/**
* text_poke_smp_batch - Update instructions on a live kernel on SMP
* @params: an array of text_poke parameters
* @n: the number of elements in params.
*
* Modify multi-byte instruction by using stop_machine() on SMP. Since the
* stop_machine() is heavy task, it is better to aggregate text_poke requests
* and do it once if possible.
*
* Note: Must be called under get_online_cpus() and text_mutex.
*/
void __kprobes text_poke_smp_batch(struct text_poke_param *params, int n)
{
struct text_poke_params tpp = {.params = params, .nparams = n};
atomic_set(&stop_machine_first, 1);
wrote_text = 0;
stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
}
#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
#ifdef CONFIG_X86_64

View File

@ -3,10 +3,7 @@
#
obj-$(CONFIG_X86_LOCAL_APIC) += apic.o apic_noop.o probe_$(BITS).o ipi.o
ifneq ($(CONFIG_HARDLOCKUP_DETECTOR),y)
obj-$(CONFIG_X86_LOCAL_APIC) += nmi.o
endif
obj-$(CONFIG_HARDLOCKUP_DETECTOR) += hw_nmi.o
obj-y += hw_nmi.o
obj-$(CONFIG_X86_IO_APIC) += io_apic.o
obj-$(CONFIG_SMP) += ipi.o

View File

@ -31,7 +31,6 @@
#include <linux/init.h>
#include <linux/cpu.h>
#include <linux/dmi.h>
#include <linux/nmi.h>
#include <linux/smp.h>
#include <linux/mm.h>
@ -799,11 +798,7 @@ void __init setup_boot_APIC_clock(void)
* PIT/HPET going. Otherwise register lapic as a dummy
* device.
*/
if (nmi_watchdog != NMI_IO_APIC)
lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
else
pr_warning("APIC timer registered as dummy,"
" due to nmi_watchdog=%d!\n", nmi_watchdog);
lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
/* Setup the lapic or request the broadcast */
setup_APIC_timer();
@ -1387,7 +1382,6 @@ void __cpuinit end_local_APIC_setup(void)
}
#endif
setup_apic_nmi_watchdog(NULL);
apic_pm_activate();
/*
@ -1758,17 +1752,10 @@ int __init APIC_init_uniprocessor(void)
setup_IO_APIC();
else {
nr_ioapics = 0;
localise_nmi_watchdog();
}
#else
localise_nmi_watchdog();
#endif
x86_init.timers.setup_percpu_clockev();
#ifdef CONFIG_X86_64
check_nmi_watchdog();
#endif
return 0;
}

View File

@ -17,20 +17,31 @@
#include <linux/nmi.h>
#include <linux/module.h>
#ifdef CONFIG_HARDLOCKUP_DETECTOR
u64 hw_nmi_get_sample_period(void)
{
return (u64)(cpu_khz) * 1000 * 60;
}
#endif
#ifdef ARCH_HAS_NMI_WATCHDOG
#ifdef arch_trigger_all_cpu_backtrace
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
/* "in progress" flag of arch_trigger_all_cpu_backtrace */
static unsigned long backtrace_flag;
void arch_trigger_all_cpu_backtrace(void)
{
int i;
if (test_and_set_bit(0, &backtrace_flag))
/*
* If there is already a trigger_all_cpu_backtrace() in progress
* (backtrace_flag == 1), don't output double cpu dump infos.
*/
return;
cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
printk(KERN_INFO "sending NMI to all CPUs:\n");
@ -42,6 +53,9 @@ void arch_trigger_all_cpu_backtrace(void)
break;
mdelay(1);
}
clear_bit(0, &backtrace_flag);
smp_mb__after_clear_bit();
}
static int __kprobes
@ -50,7 +64,7 @@ arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
{
struct die_args *args = __args;
struct pt_regs *regs;
int cpu = smp_processor_id();
int cpu;
switch (cmd) {
case DIE_NMI:
@ -62,6 +76,7 @@ arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
}
regs = args->regs;
cpu = smp_processor_id();
if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
@ -91,18 +106,3 @@ static int __init register_trigger_all_cpu_backtrace(void)
}
early_initcall(register_trigger_all_cpu_backtrace);
#endif
/* STUB calls to mimic old nmi_watchdog behaviour */
#if defined(CONFIG_X86_LOCAL_APIC)
unsigned int nmi_watchdog = NMI_NONE;
EXPORT_SYMBOL(nmi_watchdog);
void acpi_nmi_enable(void) { return; }
void acpi_nmi_disable(void) { return; }
#endif
atomic_t nmi_active = ATOMIC_INIT(0); /* oprofile uses this */
EXPORT_SYMBOL(nmi_active);
int unknown_nmi_panic;
void cpu_nmi_set_wd_enabled(void) { return; }
void stop_apic_nmi_watchdog(void *unused) { return; }
void setup_apic_nmi_watchdog(void *unused) { return; }
int __init check_nmi_watchdog(void) { return 0; }

View File

@ -54,7 +54,6 @@
#include <asm/dma.h>
#include <asm/timer.h>
#include <asm/i8259.h>
#include <asm/nmi.h>
#include <asm/msidef.h>
#include <asm/hypertransport.h>
#include <asm/setup.h>
@ -2642,24 +2641,6 @@ static void lapic_register_intr(int irq)
"edge");
}
static void __init setup_nmi(void)
{
/*
* Dirty trick to enable the NMI watchdog ...
* We put the 8259A master into AEOI mode and
* unmask on all local APICs LVT0 as NMI.
*
* The idea to use the 8259A in AEOI mode ('8259A Virtual Wire')
* is from Maciej W. Rozycki - so we do not have to EOI from
* the NMI handler or the timer interrupt.
*/
apic_printk(APIC_VERBOSE, KERN_INFO "activating NMI Watchdog ...");
enable_NMI_through_LVT0();
apic_printk(APIC_VERBOSE, " done.\n");
}
/*
* This looks a bit hackish but it's about the only one way of sending
* a few INTA cycles to 8259As and any associated glue logic. ICR does
@ -2765,15 +2746,6 @@ static inline void __init check_timer(void)
*/
apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_EXTINT);
legacy_pic->init(1);
#ifdef CONFIG_X86_32
{
unsigned int ver;
ver = apic_read(APIC_LVR);
ver = GET_APIC_VERSION(ver);
timer_ack = (nmi_watchdog == NMI_IO_APIC && !APIC_INTEGRATED(ver));
}
#endif
pin1 = find_isa_irq_pin(0, mp_INT);
apic1 = find_isa_irq_apic(0, mp_INT);
@ -2821,10 +2793,6 @@ static inline void __init check_timer(void)
unmask_ioapic(cfg);
}
if (timer_irq_works()) {
if (nmi_watchdog == NMI_IO_APIC) {
setup_nmi();
legacy_pic->unmask(0);
}
if (disable_timer_pin_1 > 0)
clear_IO_APIC_pin(0, pin1);
goto out;
@ -2850,11 +2818,6 @@ static inline void __init check_timer(void)
if (timer_irq_works()) {
apic_printk(APIC_QUIET, KERN_INFO "....... works.\n");
timer_through_8259 = 1;
if (nmi_watchdog == NMI_IO_APIC) {
legacy_pic->mask(0);
setup_nmi();
legacy_pic->unmask(0);
}
goto out;
}
/*
@ -2866,15 +2829,6 @@ static inline void __init check_timer(void)
apic_printk(APIC_QUIET, KERN_INFO "....... failed.\n");
}
if (nmi_watchdog == NMI_IO_APIC) {
apic_printk(APIC_QUIET, KERN_WARNING "timer doesn't work "
"through the IO-APIC - disabling NMI Watchdog!\n");
nmi_watchdog = NMI_NONE;
}
#ifdef CONFIG_X86_32
timer_ack = 0;
#endif
apic_printk(APIC_QUIET, KERN_INFO
"...trying to set up timer as Virtual Wire IRQ...\n");

View File

@ -1,567 +0,0 @@
/*
* NMI watchdog support on APIC systems
*
* Started by Ingo Molnar <mingo@redhat.com>
*
* Fixes:
* Mikael Pettersson : AMD K7 support for local APIC NMI watchdog.
* Mikael Pettersson : Power Management for local APIC NMI watchdog.
* Mikael Pettersson : Pentium 4 support for local APIC NMI watchdog.
* Pavel Machek and
* Mikael Pettersson : PM converted to driver model. Disable/enable API.
*/
#include <asm/apic.h>
#include <linux/nmi.h>
#include <linux/mm.h>
#include <linux/delay.h>
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/sysdev.h>
#include <linux/sysctl.h>
#include <linux/percpu.h>
#include <linux/kprobes.h>
#include <linux/cpumask.h>
#include <linux/kernel_stat.h>
#include <linux/kdebug.h>
#include <linux/smp.h>
#include <asm/i8259.h>
#include <asm/io_apic.h>
#include <asm/proto.h>
#include <asm/timer.h>
#include <asm/mce.h>
#include <asm/mach_traps.h>
int unknown_nmi_panic;
int nmi_watchdog_enabled;
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
/* nmi_active:
* >0: the lapic NMI watchdog is active, but can be disabled
* <0: the lapic NMI watchdog has not been set up, and cannot
* be enabled
* 0: the lapic NMI watchdog is disabled, but can be enabled
*/
atomic_t nmi_active = ATOMIC_INIT(0); /* oprofile uses this */
EXPORT_SYMBOL(nmi_active);
unsigned int nmi_watchdog = NMI_NONE;
EXPORT_SYMBOL(nmi_watchdog);
static int panic_on_timeout;
static unsigned int nmi_hz = HZ;
static DEFINE_PER_CPU(short, wd_enabled);
static int endflag __initdata;
static inline unsigned int get_nmi_count(int cpu)
{
return per_cpu(irq_stat, cpu).__nmi_count;
}
static inline int mce_in_progress(void)
{
#if defined(CONFIG_X86_MCE)
return atomic_read(&mce_entry) > 0;
#endif
return 0;
}
/*
* Take the local apic timer and PIT/HPET into account. We don't
* know which one is active, when we have highres/dyntick on
*/
static inline unsigned int get_timer_irqs(int cpu)
{
return per_cpu(irq_stat, cpu).apic_timer_irqs +
per_cpu(irq_stat, cpu).irq0_irqs;
}
#ifdef CONFIG_SMP
/*
* The performance counters used by NMI_LOCAL_APIC don't trigger when
* the CPU is idle. To make sure the NMI watchdog really ticks on all
* CPUs during the test make them busy.
*/
static __init void nmi_cpu_busy(void *data)
{
local_irq_enable_in_hardirq();
/*
* Intentionally don't use cpu_relax here. This is
* to make sure that the performance counter really ticks,
* even if there is a simulator or similar that catches the
* pause instruction. On a real HT machine this is fine because
* all other CPUs are busy with "useless" delay loops and don't
* care if they get somewhat less cycles.
*/
while (endflag == 0)
mb();
}
#endif
static void report_broken_nmi(int cpu, unsigned int *prev_nmi_count)
{
printk(KERN_CONT "\n");
printk(KERN_WARNING
"WARNING: CPU#%d: NMI appears to be stuck (%d->%d)!\n",
cpu, prev_nmi_count[cpu], get_nmi_count(cpu));
printk(KERN_WARNING
"Please report this to bugzilla.kernel.org,\n");
printk(KERN_WARNING
"and attach the output of the 'dmesg' command.\n");
per_cpu(wd_enabled, cpu) = 0;
atomic_dec(&nmi_active);
}
static void __acpi_nmi_disable(void *__unused)
{
apic_write(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED);
}
int __init check_nmi_watchdog(void)
{
unsigned int *prev_nmi_count;
int cpu;
if (!nmi_watchdog_active() || !atomic_read(&nmi_active))
return 0;
prev_nmi_count = kmalloc(nr_cpu_ids * sizeof(int), GFP_KERNEL);
if (!prev_nmi_count)
goto error;
printk(KERN_INFO "Testing NMI watchdog ... ");
#ifdef CONFIG_SMP
if (nmi_watchdog == NMI_LOCAL_APIC)
smp_call_function(nmi_cpu_busy, (void *)&endflag, 0);
#endif
for_each_possible_cpu(cpu)
prev_nmi_count[cpu] = get_nmi_count(cpu);
local_irq_enable();
mdelay((20 * 1000) / nmi_hz); /* wait 20 ticks */
for_each_online_cpu(cpu) {
if (!per_cpu(wd_enabled, cpu))
continue;
if (get_nmi_count(cpu) - prev_nmi_count[cpu] <= 5)
report_broken_nmi(cpu, prev_nmi_count);
}
endflag = 1;
if (!atomic_read(&nmi_active)) {
kfree(prev_nmi_count);
atomic_set(&nmi_active, -1);
goto error;
}
printk("OK.\n");
/*
* now that we know it works we can reduce NMI frequency to
* something more reasonable; makes a difference in some configs
*/
if (nmi_watchdog == NMI_LOCAL_APIC)
nmi_hz = lapic_adjust_nmi_hz(1);
kfree(prev_nmi_count);
return 0;
error:
if (nmi_watchdog == NMI_IO_APIC) {
if (!timer_through_8259)
legacy_pic->mask(0);
on_each_cpu(__acpi_nmi_disable, NULL, 1);
}
#ifdef CONFIG_X86_32
timer_ack = 0;
#endif
return -1;
}
static int __init setup_nmi_watchdog(char *str)
{
unsigned int nmi;
if (!strncmp(str, "panic", 5)) {
panic_on_timeout = 1;
str = strchr(str, ',');
if (!str)
return 1;
++str;
}
if (!strncmp(str, "lapic", 5))
nmi_watchdog = NMI_LOCAL_APIC;
else if (!strncmp(str, "ioapic", 6))
nmi_watchdog = NMI_IO_APIC;
else {
get_option(&str, &nmi);
if (nmi >= NMI_INVALID)
return 0;
nmi_watchdog = nmi;
}
return 1;
}
__setup("nmi_watchdog=", setup_nmi_watchdog);
/*
* Suspend/resume support
*/
#ifdef CONFIG_PM
static int nmi_pm_active; /* nmi_active before suspend */
static int lapic_nmi_suspend(struct sys_device *dev, pm_message_t state)
{
/* only CPU0 goes here, other CPUs should be offline */
nmi_pm_active = atomic_read(&nmi_active);
stop_apic_nmi_watchdog(NULL);
BUG_ON(atomic_read(&nmi_active) != 0);
return 0;
}
static int lapic_nmi_resume(struct sys_device *dev)
{
/* only CPU0 goes here, other CPUs should be offline */
if (nmi_pm_active > 0) {
setup_apic_nmi_watchdog(NULL);
touch_nmi_watchdog();
}
return 0;
}
static struct sysdev_class nmi_sysclass = {
.name = "lapic_nmi",
.resume = lapic_nmi_resume,
.suspend = lapic_nmi_suspend,
};
static struct sys_device device_lapic_nmi = {
.id = 0,
.cls = &nmi_sysclass,
};
static int __init init_lapic_nmi_sysfs(void)
{
int error;
/*
* should really be a BUG_ON but b/c this is an
* init call, it just doesn't work. -dcz
*/
if (nmi_watchdog != NMI_LOCAL_APIC)
return 0;
if (atomic_read(&nmi_active) < 0)
return 0;
error = sysdev_class_register(&nmi_sysclass);
if (!error)
error = sysdev_register(&device_lapic_nmi);
return error;
}
/* must come after the local APIC's device_initcall() */
late_initcall(init_lapic_nmi_sysfs);
#endif /* CONFIG_PM */
static void __acpi_nmi_enable(void *__unused)
{
apic_write(APIC_LVT0, APIC_DM_NMI);
}
/*
* Enable timer based NMIs on all CPUs:
*/
void acpi_nmi_enable(void)
{
if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
on_each_cpu(__acpi_nmi_enable, NULL, 1);
}
/*
* Disable timer based NMIs on all CPUs:
*/
void acpi_nmi_disable(void)
{
if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
on_each_cpu(__acpi_nmi_disable, NULL, 1);
}
/*
* This function is called as soon the LAPIC NMI watchdog driver has everything
* in place and it's ready to check if the NMIs belong to the NMI watchdog
*/
void cpu_nmi_set_wd_enabled(void)
{
__get_cpu_var(wd_enabled) = 1;
}
void setup_apic_nmi_watchdog(void *unused)
{
if (__get_cpu_var(wd_enabled))
return;
/* cheap hack to support suspend/resume */
/* if cpu0 is not active neither should the other cpus */
if (smp_processor_id() != 0 && atomic_read(&nmi_active) <= 0)
return;
switch (nmi_watchdog) {
case NMI_LOCAL_APIC:
if (lapic_watchdog_init(nmi_hz) < 0) {
__get_cpu_var(wd_enabled) = 0;
return;
}
/* FALL THROUGH */
case NMI_IO_APIC:
__get_cpu_var(wd_enabled) = 1;
atomic_inc(&nmi_active);
}
}
void stop_apic_nmi_watchdog(void *unused)
{
/* only support LOCAL and IO APICs for now */
if (!nmi_watchdog_active())
return;
if (__get_cpu_var(wd_enabled) == 0)
return;
if (nmi_watchdog == NMI_LOCAL_APIC)
lapic_watchdog_stop();
else
__acpi_nmi_disable(NULL);
__get_cpu_var(wd_enabled) = 0;
atomic_dec(&nmi_active);
}
/*
* the best way to detect whether a CPU has a 'hard lockup' problem
* is to check it's local APIC timer IRQ counts. If they are not
* changing then that CPU has some problem.
*
* as these watchdog NMI IRQs are generated on every CPU, we only
* have to check the current processor.
*
* since NMIs don't listen to _any_ locks, we have to be extremely
* careful not to rely on unsafe variables. The printk might lock
* up though, so we have to break up any console locks first ...
* [when there will be more tty-related locks, break them up here too!]
*/
static DEFINE_PER_CPU(unsigned, last_irq_sum);
static DEFINE_PER_CPU(long, alert_counter);
static DEFINE_PER_CPU(int, nmi_touch);
void touch_nmi_watchdog(void)
{
if (nmi_watchdog_active()) {
unsigned cpu;
/*
* Tell other CPUs to reset their alert counters. We cannot
* do it ourselves because the alert count increase is not
* atomic.
*/
for_each_present_cpu(cpu) {
if (per_cpu(nmi_touch, cpu) != 1)
per_cpu(nmi_touch, cpu) = 1;
}
}
/*
* Tickle the softlockup detector too:
*/
touch_softlockup_watchdog();
}
EXPORT_SYMBOL(touch_nmi_watchdog);
notrace __kprobes int
nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
{
/*
* Since current_thread_info()-> is always on the stack, and we
* always switch the stack NMI-atomically, it's safe to use
* smp_processor_id().
*/
unsigned int sum;
int touched = 0;
int cpu = smp_processor_id();
int rc = 0;
sum = get_timer_irqs(cpu);
if (__get_cpu_var(nmi_touch)) {
__get_cpu_var(nmi_touch) = 0;
touched = 1;
}
/* We can be called before check_nmi_watchdog, hence NULL check. */
if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
static DEFINE_RAW_SPINLOCK(lock); /* Serialise the printks */
raw_spin_lock(&lock);
printk(KERN_WARNING "NMI backtrace for cpu %d\n", cpu);
show_regs(regs);
dump_stack();
raw_spin_unlock(&lock);
cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
rc = 1;
}
/* Could check oops_in_progress here too, but it's safer not to */
if (mce_in_progress())
touched = 1;
/* if the none of the timers isn't firing, this cpu isn't doing much */
if (!touched && __get_cpu_var(last_irq_sum) == sum) {
/*
* Ayiee, looks like this CPU is stuck ...
* wait a few IRQs (5 seconds) before doing the oops ...
*/
__this_cpu_inc(alert_counter);
if (__this_cpu_read(alert_counter) == 5 * nmi_hz)
/*
* die_nmi will return ONLY if NOTIFY_STOP happens..
*/
die_nmi("BUG: NMI Watchdog detected LOCKUP",
regs, panic_on_timeout);
} else {
__get_cpu_var(last_irq_sum) = sum;
__this_cpu_write(alert_counter, 0);
}
/* see if the nmi watchdog went off */
if (!__get_cpu_var(wd_enabled))
return rc;
switch (nmi_watchdog) {
case NMI_LOCAL_APIC:
rc |= lapic_wd_event(nmi_hz);
break;
case NMI_IO_APIC:
/*
* don't know how to accurately check for this.
* just assume it was a watchdog timer interrupt
* This matches the old behaviour.
*/
rc = 1;
break;
}
return rc;
}
#ifdef CONFIG_SYSCTL
static void enable_ioapic_nmi_watchdog_single(void *unused)
{
__get_cpu_var(wd_enabled) = 1;
atomic_inc(&nmi_active);
__acpi_nmi_enable(NULL);
}
static void enable_ioapic_nmi_watchdog(void)
{
on_each_cpu(enable_ioapic_nmi_watchdog_single, NULL, 1);
touch_nmi_watchdog();
}
static void disable_ioapic_nmi_watchdog(void)
{
on_each_cpu(stop_apic_nmi_watchdog, NULL, 1);
}
static int __init setup_unknown_nmi_panic(char *str)
{
unknown_nmi_panic = 1;
return 1;
}
__setup("unknown_nmi_panic", setup_unknown_nmi_panic);
static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu)
{
unsigned char reason = get_nmi_reason();
char buf[64];
sprintf(buf, "NMI received for unknown reason %02x\n", reason);
die_nmi(buf, regs, 1); /* Always panic here */
return 0;
}
/*
* proc handler for /proc/sys/kernel/nmi
*/
int proc_nmi_enabled(struct ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
{
int old_state;
nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0;
old_state = nmi_watchdog_enabled;
proc_dointvec(table, write, buffer, length, ppos);
if (!!old_state == !!nmi_watchdog_enabled)
return 0;
if (atomic_read(&nmi_active) < 0 || !nmi_watchdog_active()) {
printk(KERN_WARNING
"NMI watchdog is permanently disabled\n");
return -EIO;
}
if (nmi_watchdog == NMI_LOCAL_APIC) {
if (nmi_watchdog_enabled)
enable_lapic_nmi_watchdog();
else
disable_lapic_nmi_watchdog();
} else if (nmi_watchdog == NMI_IO_APIC) {
if (nmi_watchdog_enabled)
enable_ioapic_nmi_watchdog();
else
disable_ioapic_nmi_watchdog();
} else {
printk(KERN_WARNING
"NMI watchdog doesn't know what hardware to touch\n");
return -EIO;
}
return 0;
}
#endif /* CONFIG_SYSCTL */
int do_nmi_callback(struct pt_regs *regs, int cpu)
{
#ifdef CONFIG_SYSCTL
if (unknown_nmi_panic)
return unknown_nmi_panic_callback(regs, cpu);
#endif
return 0;
}
void arch_trigger_all_cpu_backtrace(void)
{
int i;
cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
printk(KERN_INFO "sending NMI to all CPUs:\n");
apic->send_IPI_all(NMI_VECTOR);
/* Wait for up to 10 seconds for all CPUs to do the backtrace */
for (i = 0; i < 10 * 1000; i++) {
if (cpumask_empty(to_cpumask(backtrace_mask)))
break;
mdelay(1);
}
}

View File

@ -894,7 +894,6 @@ void __init identify_boot_cpu(void)
#else
vgetcpu_set_mode();
#endif
init_hw_perf_events();
}
void __cpuinit identify_secondary_cpu(struct cpuinfo_x86 *c)

View File

@ -330,9 +330,6 @@ static bool reserve_pmc_hardware(void)
{
int i;
if (nmi_watchdog == NMI_LOCAL_APIC)
disable_lapic_nmi_watchdog();
for (i = 0; i < x86_pmu.num_counters; i++) {
if (!reserve_perfctr_nmi(x86_pmu.perfctr + i))
goto perfctr_fail;
@ -355,9 +352,6 @@ perfctr_fail:
for (i--; i >= 0; i--)
release_perfctr_nmi(x86_pmu.perfctr + i);
if (nmi_watchdog == NMI_LOCAL_APIC)
enable_lapic_nmi_watchdog();
return false;
}
@ -369,9 +363,6 @@ static void release_pmc_hardware(void)
release_perfctr_nmi(x86_pmu.perfctr + i);
release_evntsel_nmi(x86_pmu.eventsel + i);
}
if (nmi_watchdog == NMI_LOCAL_APIC)
enable_lapic_nmi_watchdog();
}
#else
@ -384,15 +375,53 @@ static void release_pmc_hardware(void) {}
static bool check_hw_exists(void)
{
u64 val, val_new = 0;
int ret = 0;
int i, reg, ret = 0;
/*
* Check to see if the BIOS enabled any of the counters, if so
* complain and bail.
*/
for (i = 0; i < x86_pmu.num_counters; i++) {
reg = x86_pmu.eventsel + i;
ret = rdmsrl_safe(reg, &val);
if (ret)
goto msr_fail;
if (val & ARCH_PERFMON_EVENTSEL_ENABLE)
goto bios_fail;
}
if (x86_pmu.num_counters_fixed) {
reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
ret = rdmsrl_safe(reg, &val);
if (ret)
goto msr_fail;
for (i = 0; i < x86_pmu.num_counters_fixed; i++) {
if (val & (0x03 << i*4))
goto bios_fail;
}
}
/*
* Now write a value and read it back to see if it matches,
* this is needed to detect certain hardware emulators (qemu/kvm)
* that don't trap on the MSR access and always return 0s.
*/
val = 0xabcdUL;
ret |= checking_wrmsrl(x86_pmu.perfctr, val);
ret = checking_wrmsrl(x86_pmu.perfctr, val);
ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new);
if (ret || val != val_new)
return false;
goto msr_fail;
return true;
bios_fail:
printk(KERN_CONT "Broken BIOS detected, using software events only.\n");
printk(KERN_ERR FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n", reg, val);
return false;
msr_fail:
printk(KERN_CONT "Broken PMU hardware detected, using software events only.\n");
return false;
}
static void reserve_ds_buffers(void);
@ -451,7 +480,7 @@ static int x86_setup_perfctr(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
u64 config;
if (!hwc->sample_period) {
if (!is_sampling_event(event)) {
hwc->sample_period = x86_pmu.max_period;
hwc->last_period = hwc->sample_period;
local64_set(&hwc->period_left, hwc->sample_period);
@ -1362,7 +1391,7 @@ static void __init pmu_check_apic(void)
pr_info("no hardware sampling interrupt available.\n");
}
void __init init_hw_perf_events(void)
int __init init_hw_perf_events(void)
{
struct event_constraint *c;
int err;
@ -1377,20 +1406,18 @@ void __init init_hw_perf_events(void)
err = amd_pmu_init();
break;
default:
return;
return 0;
}
if (err != 0) {
pr_cont("no PMU driver, software events only.\n");
return;
return 0;
}
pmu_check_apic();
/* sanity check that the hardware exists or is emulated */
if (!check_hw_exists()) {
pr_cont("Broken PMU hardware detected, software events only.\n");
return;
}
if (!check_hw_exists())
return 0;
pr_cont("%s PMU driver.\n", x86_pmu.name);
@ -1438,9 +1465,12 @@ void __init init_hw_perf_events(void)
pr_info("... fixed-purpose events: %d\n", x86_pmu.num_counters_fixed);
pr_info("... event mask: %016Lx\n", x86_pmu.intel_ctrl);
perf_pmu_register(&pmu);
perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
perf_cpu_notifier(x86_pmu_notifier);
return 0;
}
early_initcall(init_hw_perf_events);
static inline void x86_pmu_read(struct perf_event *event)
{
@ -1686,7 +1716,7 @@ perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
perf_callchain_store(entry, regs->ip);
dump_trace(NULL, regs, NULL, regs->bp, &backtrace_ops, entry);
dump_trace(NULL, regs, NULL, &backtrace_ops, entry);
}
#ifdef CONFIG_COMPAT

View File

@ -1,7 +1,5 @@
#ifdef CONFIG_CPU_SUP_AMD
static DEFINE_RAW_SPINLOCK(amd_nb_lock);
static __initconst const u64 amd_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
@ -275,7 +273,7 @@ done:
return &emptyconstraint;
}
static struct amd_nb *amd_alloc_nb(int cpu, int nb_id)
static struct amd_nb *amd_alloc_nb(int cpu)
{
struct amd_nb *nb;
int i;
@ -285,7 +283,7 @@ static struct amd_nb *amd_alloc_nb(int cpu, int nb_id)
if (!nb)
return NULL;
nb->nb_id = nb_id;
nb->nb_id = -1;
/*
* initialize all possible NB constraints
@ -306,7 +304,7 @@ static int amd_pmu_cpu_prepare(int cpu)
if (boot_cpu_data.x86_max_cores < 2)
return NOTIFY_OK;
cpuc->amd_nb = amd_alloc_nb(cpu, -1);
cpuc->amd_nb = amd_alloc_nb(cpu);
if (!cpuc->amd_nb)
return NOTIFY_BAD;
@ -325,8 +323,6 @@ static void amd_pmu_cpu_starting(int cpu)
nb_id = amd_get_nb_id(cpu);
WARN_ON_ONCE(nb_id == BAD_APICID);
raw_spin_lock(&amd_nb_lock);
for_each_online_cpu(i) {
nb = per_cpu(cpu_hw_events, i).amd_nb;
if (WARN_ON_ONCE(!nb))
@ -341,8 +337,6 @@ static void amd_pmu_cpu_starting(int cpu)
cpuc->amd_nb->nb_id = nb_id;
cpuc->amd_nb->refcnt++;
raw_spin_unlock(&amd_nb_lock);
}
static void amd_pmu_cpu_dead(int cpu)
@ -354,8 +348,6 @@ static void amd_pmu_cpu_dead(int cpu)
cpuhw = &per_cpu(cpu_hw_events, cpu);
raw_spin_lock(&amd_nb_lock);
if (cpuhw->amd_nb) {
struct amd_nb *nb = cpuhw->amd_nb;
@ -364,8 +356,6 @@ static void amd_pmu_cpu_dead(int cpu)
cpuhw->amd_nb = NULL;
}
raw_spin_unlock(&amd_nb_lock);
}
static __initconst const struct x86_pmu amd_pmu = {

View File

@ -816,6 +816,32 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (ret)
return ret;
if (event->attr.precise_ip &&
(event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
/*
* Use an alternative encoding for CPU_CLK_UNHALTED.THREAD_P
* (0x003c) so that we can use it with PEBS.
*
* The regular CPU_CLK_UNHALTED.THREAD_P event (0x003c) isn't
* PEBS capable. However we can use INST_RETIRED.ANY_P
* (0x00c0), which is a PEBS capable event, to get the same
* count.
*
* INST_RETIRED.ANY_P counts the number of cycles that retires
* CNTMASK instructions. By setting CNTMASK to a value (16)
* larger than the maximum number of instructions that can be
* retired per cycle (4) and then inverting the condition, we
* count all cycles that retire 16 or less instructions, which
* is every cycle.
*
* Thereby we gain a PEBS capable cycle counter.
*/
u64 alt_config = 0x108000c0; /* INST_RETIRED.TOTAL_CYCLES */
alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
event->hw.config = alt_config;
}
if (event->attr.type != PERF_TYPE_RAW)
return 0;

View File

@ -16,32 +16,12 @@
#include <linux/kernel.h>
#include <linux/bitops.h>
#include <linux/smp.h>
#include <linux/nmi.h>
#include <asm/nmi.h>
#include <linux/kprobes.h>
#include <asm/apic.h>
#include <asm/perf_event.h>
struct nmi_watchdog_ctlblk {
unsigned int cccr_msr;
unsigned int perfctr_msr; /* the MSR to reset in NMI handler */
unsigned int evntsel_msr; /* the MSR to select the events to handle */
};
/* Interface defining a CPU specific perfctr watchdog */
struct wd_ops {
int (*reserve)(void);
void (*unreserve)(void);
int (*setup)(unsigned nmi_hz);
void (*rearm)(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz);
void (*stop)(void);
unsigned perfctr;
unsigned evntsel;
u64 checkbit;
};
static const struct wd_ops *wd_ops;
/*
* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
* offset from MSR_P4_BSU_ESCR0.
@ -60,8 +40,6 @@ static const struct wd_ops *wd_ops;
static DECLARE_BITMAP(perfctr_nmi_owner, NMI_MAX_COUNTER_BITS);
static DECLARE_BITMAP(evntsel_nmi_owner, NMI_MAX_COUNTER_BITS);
static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
/* converts an msr to an appropriate reservation bit */
static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
{
@ -172,623 +150,3 @@ void release_evntsel_nmi(unsigned int msr)
clear_bit(counter, evntsel_nmi_owner);
}
EXPORT_SYMBOL(release_evntsel_nmi);
void disable_lapic_nmi_watchdog(void)
{
BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
if (atomic_read(&nmi_active) <= 0)
return;
on_each_cpu(stop_apic_nmi_watchdog, NULL, 1);
if (wd_ops)
wd_ops->unreserve();
BUG_ON(atomic_read(&nmi_active) != 0);
}
void enable_lapic_nmi_watchdog(void)
{
BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
/* are we already enabled */
if (atomic_read(&nmi_active) != 0)
return;
/* are we lapic aware */
if (!wd_ops)
return;
if (!wd_ops->reserve()) {
printk(KERN_ERR "NMI watchdog: cannot reserve perfctrs\n");
return;
}
on_each_cpu(setup_apic_nmi_watchdog, NULL, 1);
touch_nmi_watchdog();
}
/*
* Activate the NMI watchdog via the local APIC.
*/
static unsigned int adjust_for_32bit_ctr(unsigned int hz)
{
u64 counter_val;
unsigned int retval = hz;
/*
* On Intel CPUs with P6/ARCH_PERFMON only 32 bits in the counter
* are writable, with higher bits sign extending from bit 31.
* So, we can only program the counter with 31 bit values and
* 32nd bit should be 1, for 33.. to be 1.
* Find the appropriate nmi_hz
*/
counter_val = (u64)cpu_khz * 1000;
do_div(counter_val, retval);
if (counter_val > 0x7fffffffULL) {
u64 count = (u64)cpu_khz * 1000;
do_div(count, 0x7fffffffUL);
retval = count + 1;
}
return retval;
}
static void write_watchdog_counter(unsigned int perfctr_msr,
const char *descr, unsigned nmi_hz)
{
u64 count = (u64)cpu_khz * 1000;
do_div(count, nmi_hz);
if (descr)
pr_debug("setting %s to -0x%08Lx\n", descr, count);
wrmsrl(perfctr_msr, 0 - count);
}
static void write_watchdog_counter32(unsigned int perfctr_msr,
const char *descr, unsigned nmi_hz)
{
u64 count = (u64)cpu_khz * 1000;
do_div(count, nmi_hz);
if (descr)
pr_debug("setting %s to -0x%08Lx\n", descr, count);
wrmsr(perfctr_msr, (u32)(-count), 0);
}
/*
* AMD K7/K8/Family10h/Family11h support.
* AMD keeps this interface nicely stable so there is not much variety
*/
#define K7_EVNTSEL_ENABLE (1 << 22)
#define K7_EVNTSEL_INT (1 << 20)
#define K7_EVNTSEL_OS (1 << 17)
#define K7_EVNTSEL_USR (1 << 16)
#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
static int setup_k7_watchdog(unsigned nmi_hz)
{
unsigned int perfctr_msr, evntsel_msr;
unsigned int evntsel;
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
perfctr_msr = wd_ops->perfctr;
evntsel_msr = wd_ops->evntsel;
wrmsrl(perfctr_msr, 0UL);
evntsel = K7_EVNTSEL_INT
| K7_EVNTSEL_OS
| K7_EVNTSEL_USR
| K7_NMI_EVENT;
/* setup the timer */
wrmsr(evntsel_msr, evntsel, 0);
write_watchdog_counter(perfctr_msr, "K7_PERFCTR0", nmi_hz);
/* initialize the wd struct before enabling */
wd->perfctr_msr = perfctr_msr;
wd->evntsel_msr = evntsel_msr;
wd->cccr_msr = 0; /* unused */
/* ok, everything is initialized, announce that we're set */
cpu_nmi_set_wd_enabled();
apic_write(APIC_LVTPC, APIC_DM_NMI);
evntsel |= K7_EVNTSEL_ENABLE;
wrmsr(evntsel_msr, evntsel, 0);
return 1;
}
static void single_msr_stop_watchdog(void)
{
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
wrmsr(wd->evntsel_msr, 0, 0);
}
static int single_msr_reserve(void)
{
if (!reserve_perfctr_nmi(wd_ops->perfctr))
return 0;
if (!reserve_evntsel_nmi(wd_ops->evntsel)) {
release_perfctr_nmi(wd_ops->perfctr);
return 0;
}
return 1;
}
static void single_msr_unreserve(void)
{
release_evntsel_nmi(wd_ops->evntsel);
release_perfctr_nmi(wd_ops->perfctr);
}
static void __kprobes
single_msr_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
{
/* start the cycle over again */
write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
}
static const struct wd_ops k7_wd_ops = {
.reserve = single_msr_reserve,
.unreserve = single_msr_unreserve,
.setup = setup_k7_watchdog,
.rearm = single_msr_rearm,
.stop = single_msr_stop_watchdog,
.perfctr = MSR_K7_PERFCTR0,
.evntsel = MSR_K7_EVNTSEL0,
.checkbit = 1ULL << 47,
};
/*
* Intel Model 6 (PPro+,P2,P3,P-M,Core1)
*/
#define P6_EVNTSEL0_ENABLE (1 << 22)
#define P6_EVNTSEL_INT (1 << 20)
#define P6_EVNTSEL_OS (1 << 17)
#define P6_EVNTSEL_USR (1 << 16)
#define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79
#define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED
static int setup_p6_watchdog(unsigned nmi_hz)
{
unsigned int perfctr_msr, evntsel_msr;
unsigned int evntsel;
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
perfctr_msr = wd_ops->perfctr;
evntsel_msr = wd_ops->evntsel;
/* KVM doesn't implement this MSR */
if (wrmsr_safe(perfctr_msr, 0, 0) < 0)
return 0;
evntsel = P6_EVNTSEL_INT
| P6_EVNTSEL_OS
| P6_EVNTSEL_USR
| P6_NMI_EVENT;
/* setup the timer */
wrmsr(evntsel_msr, evntsel, 0);
nmi_hz = adjust_for_32bit_ctr(nmi_hz);
write_watchdog_counter32(perfctr_msr, "P6_PERFCTR0", nmi_hz);
/* initialize the wd struct before enabling */
wd->perfctr_msr = perfctr_msr;
wd->evntsel_msr = evntsel_msr;
wd->cccr_msr = 0; /* unused */
/* ok, everything is initialized, announce that we're set */
cpu_nmi_set_wd_enabled();
apic_write(APIC_LVTPC, APIC_DM_NMI);
evntsel |= P6_EVNTSEL0_ENABLE;
wrmsr(evntsel_msr, evntsel, 0);
return 1;
}
static void __kprobes p6_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
{
/*
* P6 based Pentium M need to re-unmask
* the apic vector but it doesn't hurt
* other P6 variant.
* ArchPerfom/Core Duo also needs this
*/
apic_write(APIC_LVTPC, APIC_DM_NMI);
/* P6/ARCH_PERFMON has 32 bit counter write */
write_watchdog_counter32(wd->perfctr_msr, NULL, nmi_hz);
}
static const struct wd_ops p6_wd_ops = {
.reserve = single_msr_reserve,
.unreserve = single_msr_unreserve,
.setup = setup_p6_watchdog,
.rearm = p6_rearm,
.stop = single_msr_stop_watchdog,
.perfctr = MSR_P6_PERFCTR0,
.evntsel = MSR_P6_EVNTSEL0,
.checkbit = 1ULL << 39,
};
/*
* Intel P4 performance counters.
* By far the most complicated of all.
*/
#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1 << 7)
#define P4_ESCR_EVENT_SELECT(N) ((N) << 25)
#define P4_ESCR_OS (1 << 3)
#define P4_ESCR_USR (1 << 2)
#define P4_CCCR_OVF_PMI0 (1 << 26)
#define P4_CCCR_OVF_PMI1 (1 << 27)
#define P4_CCCR_THRESHOLD(N) ((N) << 20)
#define P4_CCCR_COMPLEMENT (1 << 19)
#define P4_CCCR_COMPARE (1 << 18)
#define P4_CCCR_REQUIRED (3 << 16)
#define P4_CCCR_ESCR_SELECT(N) ((N) << 13)
#define P4_CCCR_ENABLE (1 << 12)
#define P4_CCCR_OVF (1 << 31)
#define P4_CONTROLS 18
static unsigned int p4_controls[18] = {
MSR_P4_BPU_CCCR0,
MSR_P4_BPU_CCCR1,
MSR_P4_BPU_CCCR2,
MSR_P4_BPU_CCCR3,
MSR_P4_MS_CCCR0,
MSR_P4_MS_CCCR1,
MSR_P4_MS_CCCR2,
MSR_P4_MS_CCCR3,
MSR_P4_FLAME_CCCR0,
MSR_P4_FLAME_CCCR1,
MSR_P4_FLAME_CCCR2,
MSR_P4_FLAME_CCCR3,
MSR_P4_IQ_CCCR0,
MSR_P4_IQ_CCCR1,
MSR_P4_IQ_CCCR2,
MSR_P4_IQ_CCCR3,
MSR_P4_IQ_CCCR4,
MSR_P4_IQ_CCCR5,
};
/*
* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
* CRU_ESCR0 (with any non-null event selector) through a complemented
* max threshold. [IA32-Vol3, Section 14.9.9]
*/
static int setup_p4_watchdog(unsigned nmi_hz)
{
unsigned int perfctr_msr, evntsel_msr, cccr_msr;
unsigned int evntsel, cccr_val;
unsigned int misc_enable, dummy;
unsigned int ht_num;
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
return 0;
#ifdef CONFIG_SMP
/* detect which hyperthread we are on */
if (smp_num_siblings == 2) {
unsigned int ebx, apicid;
ebx = cpuid_ebx(1);
apicid = (ebx >> 24) & 0xff;
ht_num = apicid & 1;
} else
#endif
ht_num = 0;
/*
* performance counters are shared resources
* assign each hyperthread its own set
* (re-use the ESCR0 register, seems safe
* and keeps the cccr_val the same)
*/
if (!ht_num) {
/* logical cpu 0 */
perfctr_msr = MSR_P4_IQ_PERFCTR0;
evntsel_msr = MSR_P4_CRU_ESCR0;
cccr_msr = MSR_P4_IQ_CCCR0;
cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
/*
* If we're on the kdump kernel or other situation, we may
* still have other performance counter registers set to
* interrupt and they'll keep interrupting forever because
* of the P4_CCCR_OVF quirk. So we need to ACK all the
* pending interrupts and disable all the registers here,
* before reenabling the NMI delivery. Refer to p4_rearm()
* about the P4_CCCR_OVF quirk.
*/
if (reset_devices) {
unsigned int low, high;
int i;
for (i = 0; i < P4_CONTROLS; i++) {
rdmsr(p4_controls[i], low, high);
low &= ~(P4_CCCR_ENABLE | P4_CCCR_OVF);
wrmsr(p4_controls[i], low, high);
}
}
} else {
/* logical cpu 1 */
perfctr_msr = MSR_P4_IQ_PERFCTR1;
evntsel_msr = MSR_P4_CRU_ESCR0;
cccr_msr = MSR_P4_IQ_CCCR1;
/* Pentium 4 D processors don't support P4_CCCR_OVF_PMI1 */
if (boot_cpu_data.x86_model == 4 && boot_cpu_data.x86_mask == 4)
cccr_val = P4_CCCR_OVF_PMI0;
else
cccr_val = P4_CCCR_OVF_PMI1;
cccr_val |= P4_CCCR_ESCR_SELECT(4);
}
evntsel = P4_ESCR_EVENT_SELECT(0x3F)
| P4_ESCR_OS
| P4_ESCR_USR;
cccr_val |= P4_CCCR_THRESHOLD(15)
| P4_CCCR_COMPLEMENT
| P4_CCCR_COMPARE
| P4_CCCR_REQUIRED;
wrmsr(evntsel_msr, evntsel, 0);
wrmsr(cccr_msr, cccr_val, 0);
write_watchdog_counter(perfctr_msr, "P4_IQ_COUNTER0", nmi_hz);
wd->perfctr_msr = perfctr_msr;
wd->evntsel_msr = evntsel_msr;
wd->cccr_msr = cccr_msr;
/* ok, everything is initialized, announce that we're set */
cpu_nmi_set_wd_enabled();
apic_write(APIC_LVTPC, APIC_DM_NMI);
cccr_val |= P4_CCCR_ENABLE;
wrmsr(cccr_msr, cccr_val, 0);
return 1;
}
static void stop_p4_watchdog(void)
{
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
wrmsr(wd->cccr_msr, 0, 0);
wrmsr(wd->evntsel_msr, 0, 0);
}
static int p4_reserve(void)
{
if (!reserve_perfctr_nmi(MSR_P4_IQ_PERFCTR0))
return 0;
#ifdef CONFIG_SMP
if (smp_num_siblings > 1 && !reserve_perfctr_nmi(MSR_P4_IQ_PERFCTR1))
goto fail1;
#endif
if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0))
goto fail2;
/* RED-PEN why is ESCR1 not reserved here? */
return 1;
fail2:
#ifdef CONFIG_SMP
if (smp_num_siblings > 1)
release_perfctr_nmi(MSR_P4_IQ_PERFCTR1);
fail1:
#endif
release_perfctr_nmi(MSR_P4_IQ_PERFCTR0);
return 0;
}
static void p4_unreserve(void)
{
#ifdef CONFIG_SMP
if (smp_num_siblings > 1)
release_perfctr_nmi(MSR_P4_IQ_PERFCTR1);
#endif
release_evntsel_nmi(MSR_P4_CRU_ESCR0);
release_perfctr_nmi(MSR_P4_IQ_PERFCTR0);
}
static void __kprobes p4_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
{
unsigned dummy;
/*
* P4 quirks:
* - An overflown perfctr will assert its interrupt
* until the OVF flag in its CCCR is cleared.
* - LVTPC is masked on interrupt and must be
* unmasked by the LVTPC handler.
*/
rdmsrl(wd->cccr_msr, dummy);
dummy &= ~P4_CCCR_OVF;
wrmsrl(wd->cccr_msr, dummy);
apic_write(APIC_LVTPC, APIC_DM_NMI);
/* start the cycle over again */
write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
}
static const struct wd_ops p4_wd_ops = {
.reserve = p4_reserve,
.unreserve = p4_unreserve,
.setup = setup_p4_watchdog,
.rearm = p4_rearm,
.stop = stop_p4_watchdog,
/* RED-PEN this is wrong for the other sibling */
.perfctr = MSR_P4_BPU_PERFCTR0,
.evntsel = MSR_P4_BSU_ESCR0,
.checkbit = 1ULL << 39,
};
/*
* Watchdog using the Intel architected PerfMon.
* Used for Core2 and hopefully all future Intel CPUs.
*/
#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
static struct wd_ops intel_arch_wd_ops;
static int setup_intel_arch_watchdog(unsigned nmi_hz)
{
unsigned int ebx;
union cpuid10_eax eax;
unsigned int unused;
unsigned int perfctr_msr, evntsel_msr;
unsigned int evntsel;
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
/*
* Check whether the Architectural PerfMon supports
* Unhalted Core Cycles Event or not.
* NOTE: Corresponding bit = 0 in ebx indicates event present.
*/
cpuid(10, &(eax.full), &ebx, &unused, &unused);
if ((eax.split.mask_length <
(ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
(ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
return 0;
perfctr_msr = wd_ops->perfctr;
evntsel_msr = wd_ops->evntsel;
wrmsrl(perfctr_msr, 0UL);
evntsel = ARCH_PERFMON_EVENTSEL_INT
| ARCH_PERFMON_EVENTSEL_OS
| ARCH_PERFMON_EVENTSEL_USR
| ARCH_PERFMON_NMI_EVENT_SEL
| ARCH_PERFMON_NMI_EVENT_UMASK;
/* setup the timer */
wrmsr(evntsel_msr, evntsel, 0);
nmi_hz = adjust_for_32bit_ctr(nmi_hz);
write_watchdog_counter32(perfctr_msr, "INTEL_ARCH_PERFCTR0", nmi_hz);
wd->perfctr_msr = perfctr_msr;
wd->evntsel_msr = evntsel_msr;
wd->cccr_msr = 0; /* unused */
/* ok, everything is initialized, announce that we're set */
cpu_nmi_set_wd_enabled();
apic_write(APIC_LVTPC, APIC_DM_NMI);
evntsel |= ARCH_PERFMON_EVENTSEL_ENABLE;
wrmsr(evntsel_msr, evntsel, 0);
intel_arch_wd_ops.checkbit = 1ULL << (eax.split.bit_width - 1);
return 1;
}
static struct wd_ops intel_arch_wd_ops __read_mostly = {
.reserve = single_msr_reserve,
.unreserve = single_msr_unreserve,
.setup = setup_intel_arch_watchdog,
.rearm = p6_rearm,
.stop = single_msr_stop_watchdog,
.perfctr = MSR_ARCH_PERFMON_PERFCTR1,
.evntsel = MSR_ARCH_PERFMON_EVENTSEL1,
};
static void probe_nmi_watchdog(void)
{
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
if (boot_cpu_data.x86 == 6 ||
(boot_cpu_data.x86 >= 0xf && boot_cpu_data.x86 <= 0x15))
wd_ops = &k7_wd_ops;
return;
case X86_VENDOR_INTEL:
/* Work around where perfctr1 doesn't have a working enable
* bit as described in the following errata:
* AE49 Core Duo and Intel Core Solo 65 nm
* AN49 Intel Pentium Dual-Core
* AF49 Dual-Core Intel Xeon Processor LV
*/
if ((boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 14) ||
((boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 15 &&
boot_cpu_data.x86_mask == 4))) {
intel_arch_wd_ops.perfctr = MSR_ARCH_PERFMON_PERFCTR0;
intel_arch_wd_ops.evntsel = MSR_ARCH_PERFMON_EVENTSEL0;
}
if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
wd_ops = &intel_arch_wd_ops;
break;
}
switch (boot_cpu_data.x86) {
case 6:
if (boot_cpu_data.x86_model > 13)
return;
wd_ops = &p6_wd_ops;
break;
case 15:
wd_ops = &p4_wd_ops;
break;
default:
return;
}
break;
}
}
/* Interface to nmi.c */
int lapic_watchdog_init(unsigned nmi_hz)
{
if (!wd_ops) {
probe_nmi_watchdog();
if (!wd_ops) {
printk(KERN_INFO "NMI watchdog: CPU not supported\n");
return -1;
}
if (!wd_ops->reserve()) {
printk(KERN_ERR
"NMI watchdog: cannot reserve perfctrs\n");
return -1;
}
}
if (!(wd_ops->setup(nmi_hz))) {
printk(KERN_ERR "Cannot setup NMI watchdog on CPU %d\n",
raw_smp_processor_id());
return -1;
}
return 0;
}
void lapic_watchdog_stop(void)
{
if (wd_ops)
wd_ops->stop();
}
unsigned lapic_adjust_nmi_hz(unsigned hz)
{
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1)
hz = adjust_for_32bit_ctr(hz);
return hz;
}
int __kprobes lapic_wd_event(unsigned nmi_hz)
{
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
u64 ctr;
rdmsrl(wd->perfctr_msr, ctr);
if (ctr & wd_ops->checkbit) /* perfctr still running? */
return 0;
wd_ops->rearm(wd, nmi_hz);
return 1;
}

View File

@ -175,21 +175,21 @@ static const struct stacktrace_ops print_trace_ops = {
void
show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long bp, char *log_lvl)
unsigned long *stack, char *log_lvl)
{
printk("%sCall Trace:\n", log_lvl);
dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
dump_trace(task, regs, stack, &print_trace_ops, log_lvl);
}
void show_trace(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long bp)
unsigned long *stack)
{
show_trace_log_lvl(task, regs, stack, bp, "");
show_trace_log_lvl(task, regs, stack, "");
}
void show_stack(struct task_struct *task, unsigned long *sp)
{
show_stack_log_lvl(task, NULL, sp, 0, "");
show_stack_log_lvl(task, NULL, sp, "");
}
/*
@ -210,7 +210,7 @@ void dump_stack(void)
init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
init_utsname()->version);
show_trace(NULL, NULL, &stack, bp);
show_trace(NULL, NULL, &stack);
}
EXPORT_SYMBOL(dump_stack);

View File

@ -17,11 +17,12 @@
#include <asm/stacktrace.h>
void dump_trace(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long bp,
void dump_trace(struct task_struct *task,
struct pt_regs *regs, unsigned long *stack,
const struct stacktrace_ops *ops, void *data)
{
int graph = 0;
unsigned long bp;
if (!task)
task = current;
@ -34,18 +35,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
stack = (unsigned long *)task->thread.sp;
}
#ifdef CONFIG_FRAME_POINTER
if (!bp) {
if (task == current) {
/* Grab bp right from our regs */
get_bp(bp);
} else {
/* bp is the last reg pushed by switch_to */
bp = *(unsigned long *) task->thread.sp;
}
}
#endif
bp = stack_frame(task, regs);
for (;;) {
struct thread_info *context;
@ -65,7 +55,7 @@ EXPORT_SYMBOL(dump_trace);
void
show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *sp, unsigned long bp, char *log_lvl)
unsigned long *sp, char *log_lvl)
{
unsigned long *stack;
int i;
@ -87,7 +77,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
touch_nmi_watchdog();
}
printk(KERN_CONT "\n");
show_trace_log_lvl(task, regs, sp, bp, log_lvl);
show_trace_log_lvl(task, regs, sp, log_lvl);
}
@ -112,8 +102,7 @@ void show_registers(struct pt_regs *regs)
u8 *ip;
printk(KERN_EMERG "Stack:\n");
show_stack_log_lvl(NULL, regs, &regs->sp,
0, KERN_EMERG);
show_stack_log_lvl(NULL, regs, &regs->sp, KERN_EMERG);
printk(KERN_EMERG "Code: ");

View File

@ -139,8 +139,8 @@ fixup_bp_irq_link(unsigned long bp, unsigned long *stack,
* severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
*/
void dump_trace(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long bp,
void dump_trace(struct task_struct *task,
struct pt_regs *regs, unsigned long *stack,
const struct stacktrace_ops *ops, void *data)
{
const unsigned cpu = get_cpu();
@ -149,6 +149,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
unsigned used = 0;
struct thread_info *tinfo;
int graph = 0;
unsigned long bp;
if (!task)
task = current;
@ -160,18 +161,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
stack = (unsigned long *)task->thread.sp;
}
#ifdef CONFIG_FRAME_POINTER
if (!bp) {
if (task == current) {
/* Grab bp right from our regs */
get_bp(bp);
} else {
/* bp is the last reg pushed by switch_to */
bp = *(unsigned long *) task->thread.sp;
}
}
#endif
bp = stack_frame(task, regs);
/*
* Print function call entries in all stacks, starting at the
* current stack address. If the stacks consist of nested
@ -235,7 +225,7 @@ EXPORT_SYMBOL(dump_trace);
void
show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *sp, unsigned long bp, char *log_lvl)
unsigned long *sp, char *log_lvl)
{
unsigned long *irq_stack_end;
unsigned long *irq_stack;
@ -279,7 +269,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
preempt_enable();
printk(KERN_CONT "\n");
show_trace_log_lvl(task, regs, sp, bp, log_lvl);
show_trace_log_lvl(task, regs, sp, log_lvl);
}
void show_registers(struct pt_regs *regs)
@ -308,7 +298,7 @@ void show_registers(struct pt_regs *regs)
printk(KERN_EMERG "Stack:\n");
show_stack_log_lvl(NULL, regs, (unsigned long *)sp,
regs->bp, KERN_EMERG);
KERN_EMERG);
printk(KERN_EMERG "Code: ");

View File

@ -1184,6 +1184,10 @@ static void __kprobes optimized_callback(struct optimized_kprobe *op,
{
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
/* This is possible if op is under delayed unoptimizing */
if (kprobe_disabled(&op->kp))
return;
preempt_disable();
if (kprobe_running()) {
kprobes_inc_nmissed_count(&op->kp);
@ -1401,10 +1405,16 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op)
return 0;
}
/* Replace a breakpoint (int3) with a relative jump. */
int __kprobes arch_optimize_kprobe(struct optimized_kprobe *op)
#define MAX_OPTIMIZE_PROBES 256
static struct text_poke_param *jump_poke_params;
static struct jump_poke_buffer {
u8 buf[RELATIVEJUMP_SIZE];
} *jump_poke_bufs;
static void __kprobes setup_optimize_kprobe(struct text_poke_param *tprm,
u8 *insn_buf,
struct optimized_kprobe *op)
{
unsigned char jmp_code[RELATIVEJUMP_SIZE];
s32 rel = (s32)((long)op->optinsn.insn -
((long)op->kp.addr + RELATIVEJUMP_SIZE));
@ -1412,16 +1422,79 @@ int __kprobes arch_optimize_kprobe(struct optimized_kprobe *op)
memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_SIZE,
RELATIVE_ADDR_SIZE);
jmp_code[0] = RELATIVEJUMP_OPCODE;
*(s32 *)(&jmp_code[1]) = rel;
insn_buf[0] = RELATIVEJUMP_OPCODE;
*(s32 *)(&insn_buf[1]) = rel;
tprm->addr = op->kp.addr;
tprm->opcode = insn_buf;
tprm->len = RELATIVEJUMP_SIZE;
}
/*
* Replace breakpoints (int3) with relative jumps.
* Caller must call with locking kprobe_mutex and text_mutex.
*/
void __kprobes arch_optimize_kprobes(struct list_head *oplist)
{
struct optimized_kprobe *op, *tmp;
int c = 0;
list_for_each_entry_safe(op, tmp, oplist, list) {
WARN_ON(kprobe_disabled(&op->kp));
/* Setup param */
setup_optimize_kprobe(&jump_poke_params[c],
jump_poke_bufs[c].buf, op);
list_del_init(&op->list);
if (++c >= MAX_OPTIMIZE_PROBES)
break;
}
/*
* text_poke_smp doesn't support NMI/MCE code modifying.
* However, since kprobes itself also doesn't support NMI/MCE
* code probing, it's not a problem.
*/
text_poke_smp(op->kp.addr, jmp_code, RELATIVEJUMP_SIZE);
return 0;
text_poke_smp_batch(jump_poke_params, c);
}
static void __kprobes setup_unoptimize_kprobe(struct text_poke_param *tprm,
u8 *insn_buf,
struct optimized_kprobe *op)
{
/* Set int3 to first byte for kprobes */
insn_buf[0] = BREAKPOINT_INSTRUCTION;
memcpy(insn_buf + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
tprm->addr = op->kp.addr;
tprm->opcode = insn_buf;
tprm->len = RELATIVEJUMP_SIZE;
}
/*
* Recover original instructions and breakpoints from relative jumps.
* Caller must call with locking kprobe_mutex.
*/
extern void arch_unoptimize_kprobes(struct list_head *oplist,
struct list_head *done_list)
{
struct optimized_kprobe *op, *tmp;
int c = 0;
list_for_each_entry_safe(op, tmp, oplist, list) {
/* Setup param */
setup_unoptimize_kprobe(&jump_poke_params[c],
jump_poke_bufs[c].buf, op);
list_move(&op->list, done_list);
if (++c >= MAX_OPTIMIZE_PROBES)
break;
}
/*
* text_poke_smp doesn't support NMI/MCE code modifying.
* However, since kprobes itself also doesn't support NMI/MCE
* code probing, it's not a problem.
*/
text_poke_smp_batch(jump_poke_params, c);
}
/* Replace a relative jump with a breakpoint (int3). */
@ -1453,11 +1526,35 @@ static int __kprobes setup_detour_execution(struct kprobe *p,
}
return 0;
}
static int __kprobes init_poke_params(void)
{
/* Allocate code buffer and parameter array */
jump_poke_bufs = kmalloc(sizeof(struct jump_poke_buffer) *
MAX_OPTIMIZE_PROBES, GFP_KERNEL);
if (!jump_poke_bufs)
return -ENOMEM;
jump_poke_params = kmalloc(sizeof(struct text_poke_param) *
MAX_OPTIMIZE_PROBES, GFP_KERNEL);
if (!jump_poke_params) {
kfree(jump_poke_bufs);
jump_poke_bufs = NULL;
return -ENOMEM;
}
return 0;
}
#else /* !CONFIG_OPTPROBES */
static int __kprobes init_poke_params(void)
{
return 0;
}
#endif
int __init arch_init_kprobes(void)
{
return 0;
return init_poke_params();
}
int __kprobes arch_trampoline_kprobe(struct kprobe *p)

View File

@ -91,8 +91,7 @@ void exit_thread(void)
void show_regs(struct pt_regs *regs)
{
show_registers(regs);
show_trace(NULL, regs, (unsigned long *)kernel_stack_pointer(regs),
regs->bp);
show_trace(NULL, regs, (unsigned long *)kernel_stack_pointer(regs));
}
void show_regs_common(void)
@ -374,6 +373,7 @@ void default_idle(void)
{
if (hlt_use_halt()) {
trace_power_start(POWER_CSTATE, 1, smp_processor_id());
trace_cpu_idle(1, smp_processor_id());
current_thread_info()->status &= ~TS_POLLING;
/*
* TS_POLLING-cleared state must be visible before we
@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
{
trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
trace_cpu_idle((ax>>4)+1, smp_processor_id());
if (!need_resched()) {
if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)&current_thread_info()->flags);
@ -460,6 +461,7 @@ static void mwait_idle(void)
{
if (!need_resched()) {
trace_power_start(POWER_CSTATE, 1, smp_processor_id());
trace_cpu_idle(1, smp_processor_id());
if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)&current_thread_info()->flags);
@ -481,10 +483,12 @@ static void mwait_idle(void)
static void poll_idle(void)
{
trace_power_start(POWER_CSTATE, 0, smp_processor_id());
trace_cpu_idle(0, smp_processor_id());
local_irq_enable();
while (!need_resched())
cpu_relax();
trace_power_end(0);
trace_power_end(smp_processor_id());
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
}
/*

View File

@ -113,8 +113,8 @@ void cpu_idle(void)
stop_critical_timings();
pm_idle();
start_critical_timings();
trace_power_end(smp_processor_id());
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
}
tick_nohz_restart_sched_tick();
preempt_enable_no_resched();

View File

@ -142,6 +142,8 @@ void cpu_idle(void)
start_critical_timings();
trace_power_end(smp_processor_id());
trace_cpu_idle(PWR_EVENT_EXIT,
smp_processor_id());
/* In many cases the interrupt that ended idle
has already called exit_idle. But some idle

View File

@ -281,6 +281,13 @@ static void __cpuinit smp_callin(void)
*/
smp_store_cpu_info(cpuid);
/*
* This must be done before setting cpu_online_mask
* or calling notify_cpu_starting.
*/
set_cpu_sibling_map(raw_smp_processor_id());
wmb();
notify_cpu_starting(cpuid);
/*
@ -316,16 +323,6 @@ notrace static void __cpuinit start_secondary(void *unused)
*/
check_tsc_sync_target();
if (nmi_watchdog == NMI_IO_APIC) {
legacy_pic->mask(0);
enable_NMI_through_LVT0();
legacy_pic->unmask(0);
}
/* This must be done before setting cpu_online_mask */
set_cpu_sibling_map(raw_smp_processor_id());
wmb();
/*
* We need to hold call_lock, so there is no inconsistency
* between the time smp_call_function() determines number of
@ -1061,8 +1058,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
printk(KERN_INFO "SMP mode deactivated.\n");
smpboot_clear_io_apic();
localise_nmi_watchdog();
connect_bsp_APIC();
setup_local_APIC();
end_local_APIC_setup();
@ -1196,7 +1191,6 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
#ifdef CONFIG_X86_IO_APIC
setup_ioapic_dest();
#endif
check_nmi_watchdog();
mtrr_aps_init();
}
@ -1341,8 +1335,6 @@ int native_cpu_disable(void)
if (cpu == 0)
return -EBUSY;
if (nmi_watchdog == NMI_LOCAL_APIC)
stop_apic_nmi_watchdog(NULL);
clear_local_APIC();
cpu_disable_common();

View File

@ -73,22 +73,22 @@ static const struct stacktrace_ops save_stack_ops_nosched = {
*/
void save_stack_trace(struct stack_trace *trace)
{
dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
dump_trace(current, NULL, NULL, &save_stack_ops, trace);
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
}
EXPORT_SYMBOL_GPL(save_stack_trace);
void save_stack_trace_bp(struct stack_trace *trace, unsigned long bp)
void save_stack_trace_regs(struct stack_trace *trace, struct pt_regs *regs)
{
dump_trace(current, NULL, NULL, bp, &save_stack_ops, trace);
dump_trace(current, regs, NULL, &save_stack_ops, trace);
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
}
void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
{
dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace);
dump_trace(tsk, NULL, NULL, &save_stack_ops_nosched, trace);
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
}

View File

@ -22,10 +22,6 @@
#include <asm/hpet.h>
#include <asm/time.h>
#if defined(CONFIG_X86_32) && defined(CONFIG_X86_IO_APIC)
int timer_ack;
#endif
#ifdef CONFIG_X86_64
volatile unsigned long __jiffies __section_jiffies = INITIAL_JIFFIES;
#endif
@ -63,20 +59,6 @@ static irqreturn_t timer_interrupt(int irq, void *dev_id)
/* Keep nmi watchdog up to date */
inc_irq_stat(irq0_irqs);
/* Optimized out for !IO_APIC and x86_64 */
if (timer_ack) {
/*
* Subtle, when I/O APICs are used we have to ack timer IRQ
* manually to deassert NMI lines for the watchdog if run
* on an 82489DX-based system.
*/
raw_spin_lock(&i8259A_lock);
outb(0x0c, PIC_MASTER_OCW3);
/* Ack the IRQ; AEOI will end it automatically. */
inb(PIC_MASTER_POLL);
raw_spin_unlock(&i8259A_lock);
}
global_clock_event->event_handler(global_clock_event);
/* MCA bus quirk: Acknowledge irq0 by setting bit 7 in port 0x61 */

View File

@ -83,6 +83,8 @@ EXPORT_SYMBOL_GPL(used_vectors);
static int ignore_nmis;
int unknown_nmi_panic;
static inline void conditional_sti(struct pt_regs *regs)
{
if (regs->flags & X86_EFLAGS_IF)
@ -300,6 +302,13 @@ gp_in_kernel:
die("general protection fault", regs, error_code);
}
static int __init setup_unknown_nmi_panic(char *str)
{
unknown_nmi_panic = 1;
return 1;
}
__setup("unknown_nmi_panic", setup_unknown_nmi_panic);
static notrace __kprobes void
mem_parity_error(unsigned char reason, struct pt_regs *regs)
{
@ -342,9 +351,11 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
reason = (reason & 0xf) | 8;
outb(reason, 0x61);
i = 2000;
while (--i)
udelay(1000);
i = 20000;
while (--i) {
touch_nmi_watchdog();
udelay(100);
}
reason &= ~8;
outb(reason, 0x61);
@ -371,7 +382,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
reason, smp_processor_id());
printk(KERN_EMERG "Do you have a strange power saving mode enabled?\n");
if (panic_on_unrecovered_nmi)
if (unknown_nmi_panic || panic_on_unrecovered_nmi)
panic("NMI: Not continuing");
printk(KERN_EMERG "Dazed and confused, but trying to continue\n");
@ -397,20 +408,8 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
== NOTIFY_STOP)
return;
#ifndef CONFIG_LOCKUP_DETECTOR
/*
* Ok, so this is none of the documented NMI sources,
* so it must be the NMI watchdog.
*/
if (nmi_watchdog_tick(regs, reason))
return;
if (!do_nmi_callback(regs, cpu))
#endif /* !CONFIG_LOCKUP_DETECTOR */
unknown_nmi_error(reason, regs);
#else
unknown_nmi_error(reason, regs);
#endif
unknown_nmi_error(reason, regs);
return;
}
@ -446,14 +445,12 @@ do_nmi(struct pt_regs *regs, long error_code)
void stop_nmi(void)
{
acpi_nmi_disable();
ignore_nmis++;
}
void restart_nmi(void)
{
ignore_nmis--;
acpi_nmi_enable();
}
/* May run on IST stack. */

View File

@ -185,7 +185,7 @@ void kmemcheck_error_save(enum kmemcheck_shadow state,
e->trace.entries = e->trace_entries;
e->trace.max_entries = ARRAY_SIZE(e->trace_entries);
e->trace.skip = 0;
save_stack_trace_bp(&e->trace, regs->bp);
save_stack_trace_regs(&e->trace, regs);
/* Round address down to nearest 16 bytes */
shadow_copy = kmemcheck_shadow_lookup(address

View File

@ -126,7 +126,7 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
if (!user_mode_vm(regs)) {
unsigned long stack = kernel_stack_pointer(regs);
if (depth)
dump_trace(NULL, regs, (unsigned long *)stack, 0,
dump_trace(NULL, regs, (unsigned long *)stack,
&backtrace_ops, &depth);
return;
}

View File

@ -732,6 +732,9 @@ int __init op_nmi_init(struct oprofile_operations *ops)
case 0x14:
cpu_type = "x86-64/family14h";
break;
case 0x15:
cpu_type = "x86-64/family15h";
break;
default:
return -ENODEV;
}

View File

@ -58,9 +58,6 @@ static void timer_stop(void)
int __init op_nmi_timer_init(struct oprofile_operations *ops)
{
if ((nmi_watchdog != NMI_IO_APIC) || (atomic_read(&nmi_active) <= 0))
return -ENODEV;
ops->start = timer_start;
ops->stop = timer_stop;
ops->cpu_type = "timer";

View File

@ -29,11 +29,12 @@
#include "op_x86_model.h"
#include "op_counter.h"
#define NUM_COUNTERS 4
#define NUM_COUNTERS 4
#define NUM_COUNTERS_F15H 6
#ifdef CONFIG_OPROFILE_EVENT_MULTIPLEX
#define NUM_VIRT_COUNTERS 32
#define NUM_VIRT_COUNTERS 32
#else
#define NUM_VIRT_COUNTERS NUM_COUNTERS
#define NUM_VIRT_COUNTERS 0
#endif
#define OP_EVENT_MASK 0x0FFF
@ -41,7 +42,8 @@
#define MSR_AMD_EVENTSEL_RESERVED ((0xFFFFFCF0ULL<<32)|(1ULL<<21))
static unsigned long reset_value[NUM_VIRT_COUNTERS];
static int num_counters;
static unsigned long reset_value[OP_MAX_COUNTER];
#define IBS_FETCH_SIZE 6
#define IBS_OP_SIZE 12
@ -387,7 +389,7 @@ static void op_mux_switch_ctrl(struct op_x86_model_spec const *model,
int i;
/* enable active counters */
for (i = 0; i < NUM_COUNTERS; ++i) {
for (i = 0; i < num_counters; ++i) {
int virt = op_x86_phys_to_virt(i);
if (!reset_value[virt])
continue;
@ -406,7 +408,7 @@ static void op_amd_shutdown(struct op_msrs const * const msrs)
{
int i;
for (i = 0; i < NUM_COUNTERS; ++i) {
for (i = 0; i < num_counters; ++i) {
if (!msrs->counters[i].addr)
continue;
release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
@ -418,7 +420,7 @@ static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
{
int i;
for (i = 0; i < NUM_COUNTERS; i++) {
for (i = 0; i < num_counters; i++) {
if (!reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i))
goto fail;
if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i)) {
@ -426,8 +428,13 @@ static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
goto fail;
}
/* both registers must be reserved */
msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
if (num_counters == NUM_COUNTERS_F15H) {
msrs->counters[i].addr = MSR_F15H_PERF_CTR + (i << 1);
msrs->controls[i].addr = MSR_F15H_PERF_CTL + (i << 1);
} else {
msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
}
continue;
fail:
if (!counter_config[i].enabled)
@ -447,7 +454,7 @@ static void op_amd_setup_ctrs(struct op_x86_model_spec const *model,
int i;
/* setup reset_value */
for (i = 0; i < NUM_VIRT_COUNTERS; ++i) {
for (i = 0; i < OP_MAX_COUNTER; ++i) {
if (counter_config[i].enabled
&& msrs->counters[op_x86_virt_to_phys(i)].addr)
reset_value[i] = counter_config[i].count;
@ -456,7 +463,7 @@ static void op_amd_setup_ctrs(struct op_x86_model_spec const *model,
}
/* clear all counters */
for (i = 0; i < NUM_COUNTERS; ++i) {
for (i = 0; i < num_counters; ++i) {
if (!msrs->controls[i].addr)
continue;
rdmsrl(msrs->controls[i].addr, val);
@ -472,7 +479,7 @@ static void op_amd_setup_ctrs(struct op_x86_model_spec const *model,
}
/* enable active counters */
for (i = 0; i < NUM_COUNTERS; ++i) {
for (i = 0; i < num_counters; ++i) {
int virt = op_x86_phys_to_virt(i);
if (!reset_value[virt])
continue;
@ -503,7 +510,7 @@ static int op_amd_check_ctrs(struct pt_regs * const regs,
u64 val;
int i;
for (i = 0; i < NUM_COUNTERS; ++i) {
for (i = 0; i < num_counters; ++i) {
int virt = op_x86_phys_to_virt(i);
if (!reset_value[virt])
continue;
@ -526,7 +533,7 @@ static void op_amd_start(struct op_msrs const * const msrs)
u64 val;
int i;
for (i = 0; i < NUM_COUNTERS; ++i) {
for (i = 0; i < num_counters; ++i) {
if (!reset_value[op_x86_phys_to_virt(i)])
continue;
rdmsrl(msrs->controls[i].addr, val);
@ -546,7 +553,7 @@ static void op_amd_stop(struct op_msrs const * const msrs)
* Subtle: stop on all counters to avoid race with setting our
* pm callback
*/
for (i = 0; i < NUM_COUNTERS; ++i) {
for (i = 0; i < num_counters; ++i) {
if (!reset_value[op_x86_phys_to_virt(i)])
continue;
rdmsrl(msrs->controls[i].addr, val);
@ -706,18 +713,29 @@ static int setup_ibs_files(struct super_block *sb, struct dentry *root)
return 0;
}
struct op_x86_model_spec op_amd_spec;
static int op_amd_init(struct oprofile_operations *ops)
{
init_ibs();
create_arch_files = ops->create_files;
ops->create_files = setup_ibs_files;
if (boot_cpu_data.x86 == 0x15) {
num_counters = NUM_COUNTERS_F15H;
} else {
num_counters = NUM_COUNTERS;
}
op_amd_spec.num_counters = num_counters;
op_amd_spec.num_controls = num_counters;
op_amd_spec.num_virt_counters = max(num_counters, NUM_VIRT_COUNTERS);
return 0;
}
struct op_x86_model_spec op_amd_spec = {
.num_counters = NUM_COUNTERS,
.num_controls = NUM_COUNTERS,
.num_virt_counters = NUM_VIRT_COUNTERS,
/* num_counters/num_controls filled in at runtime */
.reserved = MSR_AMD_EVENTSEL_RESERVED,
.event_mask = OP_EVENT_MASK,
.init = op_amd_init,

View File

@ -11,7 +11,7 @@
#include <linux/oprofile.h>
#include <linux/smp.h>
#include <linux/ptrace.h>
#include <linux/nmi.h>
#include <asm/nmi.h>
#include <asm/msr.h>
#include <asm/fixmap.h>
#include <asm/apic.h>

View File

@ -577,9 +577,7 @@ acpi_ns_init_one_device(acpi_handle obj_handle,
* as possible (without an NMI being received in the middle of
* this) - so disable NMIs and initialize the device:
*/
acpi_nmi_disable();
status = acpi_ns_evaluate(info);
acpi_nmi_enable();
if (ACPI_SUCCESS(status)) {
walk_info->num_INI++;

View File

@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
(unsigned long)freqs->cpu);
trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
trace_cpu_frequency(freqs->new, freqs->cpu);
srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
CPUFREQ_POSTCHANGE, freqs);
if (likely(policy) && likely(policy->cpu == freqs->cpu))

View File

@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
if (cpuidle_curr_governor->reflect)
cpuidle_curr_governor->reflect(dev);
trace_power_end(smp_processor_id());
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
}
/**

View File

@ -220,9 +220,8 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
kt_before = ktime_get_real();
stop_critical_timings();
#ifndef MODULE
trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
#endif
trace_cpu_idle((eax >> 4) + 1, cpu);
if (!need_resched()) {
__monitor((void *)&current_thread_info()->flags, 0, 0);

View File

@ -642,19 +642,14 @@ static struct notifier_block die_notifier = {
*/
#ifdef CONFIG_HPWDT_NMI_DECODING
#ifdef ARCH_HAS_NMI_WATCHDOG
#ifdef CONFIG_X86_LOCAL_APIC
static void __devinit hpwdt_check_nmi_decoding(struct pci_dev *dev)
{
/*
* If nmi_watchdog is turned off then we can turn on
* our nmi decoding capability.
*/
if (!nmi_watchdog_active())
hpwdt_nmi_decoding = 1;
else
dev_warn(&dev->dev, "NMI decoding is disabled. To enable this "
"functionality you must reboot with nmi_watchdog=0 "
"and load the hpwdt driver with priority=1.\n");
hpwdt_nmi_decoding = 1;
}
#else
static void __devinit hpwdt_check_nmi_decoding(struct pci_dev *dev)
@ -662,7 +657,7 @@ static void __devinit hpwdt_check_nmi_decoding(struct pci_dev *dev)
dev_warn(&dev->dev, "NMI decoding is disabled. "
"Your kernel does not support a NMI Watchdog.\n");
}
#endif /* ARCH_HAS_NMI_WATCHDOG */
#endif /* CONFIG_X86_LOCAL_APIC */
static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
{

View File

@ -154,12 +154,14 @@ enum {
TRACE_EVENT_FL_ENABLED_BIT,
TRACE_EVENT_FL_FILTERED_BIT,
TRACE_EVENT_FL_RECORDED_CMD_BIT,
TRACE_EVENT_FL_CAP_ANY_BIT,
};
enum {
TRACE_EVENT_FL_ENABLED = (1 << TRACE_EVENT_FL_ENABLED_BIT),
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
TRACE_EVENT_FL_RECORDED_CMD = (1 << TRACE_EVENT_FL_RECORDED_CMD_BIT),
TRACE_EVENT_FL_CAP_ANY = (1 << TRACE_EVENT_FL_CAP_ANY_BIT),
};
struct ftrace_event_call {
@ -196,6 +198,14 @@ struct ftrace_event_call {
#endif
};
#define __TRACE_EVENT_FLAGS(name, value) \
static int __init trace_init_flags_##name(void) \
{ \
event_##name.flags = value; \
return 0; \
} \
early_initcall(trace_init_flags_##name);
#define PERF_MAX_TRACE_SIZE 2048
#define MAX_FILTER_PRED 32
@ -215,6 +225,10 @@ enum {
FILTER_PTR_STRING,
};
#define EVENT_STORAGE_SIZE 128
extern struct mutex event_storage_mutex;
extern char event_storage[EVENT_STORAGE_SIZE];
extern int trace_event_raw_init(struct ftrace_event_call *call);
extern int trace_define_field(struct ftrace_event_call *call, const char *type,
const char *name, int offset, int size,

View File

@ -275,7 +275,9 @@ extern int arch_prepared_optinsn(struct arch_optimized_insn *optinsn);
extern int arch_check_optimized_kprobe(struct optimized_kprobe *op);
extern int arch_prepare_optimized_kprobe(struct optimized_kprobe *op);
extern void arch_remove_optimized_kprobe(struct optimized_kprobe *op);
extern int arch_optimize_kprobe(struct optimized_kprobe *op);
extern void arch_optimize_kprobes(struct list_head *oplist);
extern void arch_unoptimize_kprobes(struct list_head *oplist,
struct list_head *done_list);
extern void arch_unoptimize_kprobe(struct optimized_kprobe *op);
extern kprobe_opcode_t *get_optinsn_slot(void);
extern void free_optinsn_slot(kprobe_opcode_t *slot, int dirty);

View File

@ -14,22 +14,14 @@
* may be used to reset the timeout - for code which intentionally
* disables interrupts for a long time. This call is stateless.
*/
#ifdef ARCH_HAS_NMI_WATCHDOG
#if defined(ARCH_HAS_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
#include <asm/nmi.h>
extern void touch_nmi_watchdog(void);
extern void acpi_nmi_disable(void);
extern void acpi_nmi_enable(void);
#else
#ifndef CONFIG_HARDLOCKUP_DETECTOR
static inline void touch_nmi_watchdog(void)
{
touch_softlockup_watchdog();
}
#else
extern void touch_nmi_watchdog(void);
#endif
static inline void acpi_nmi_disable(void) { }
static inline void acpi_nmi_enable(void) { }
#endif
/*

View File

@ -215,8 +215,9 @@ struct perf_event_attr {
*/
precise_ip : 2, /* skid constraint */
mmap_data : 1, /* non-exec mmap data */
sample_id_all : 1, /* sample_type all events */
__reserved_1 : 46;
__reserved_1 : 45;
union {
__u32 wakeup_events; /* wakeup every n events */
@ -327,6 +328,15 @@ struct perf_event_header {
enum perf_event_type {
/*
* If perf_event_attr.sample_id_all is set then all event types will
* have the sample_type selected fields related to where/when
* (identity) an event took place (TID, TIME, ID, CPU, STREAM_ID)
* described in PERF_RECORD_SAMPLE below, it will be stashed just after
* the perf_event_header and the fields already present for the existing
* fields, i.e. at the end of the payload. That way a newer perf.data
* file will be supported by older perf tools, with these new optional
* fields being ignored.
*
* The MMAP events record the PROT_EXEC mappings so that we can
* correlate userspace IPs to code. They have the following structure:
*
@ -578,6 +588,10 @@ struct perf_event;
struct pmu {
struct list_head entry;
struct device *dev;
char *name;
int type;
int * __percpu pmu_disable_count;
struct perf_cpu_context * __percpu pmu_cpu_context;
int task_ctx_nr;
@ -758,6 +772,9 @@ struct perf_event {
u64 shadow_ctx_time;
struct perf_event_attr attr;
u16 header_size;
u16 id_header_size;
u16 read_size;
struct hw_perf_event hw;
struct perf_event_context *ctx;
@ -903,7 +920,7 @@ struct perf_output_handle {
#ifdef CONFIG_PERF_EVENTS
extern int perf_pmu_register(struct pmu *pmu);
extern int perf_pmu_register(struct pmu *pmu, char *name, int type);
extern void perf_pmu_unregister(struct pmu *pmu);
extern int perf_num_counters(void);
@ -970,6 +987,11 @@ extern int perf_event_overflow(struct perf_event *event, int nmi,
struct perf_sample_data *data,
struct pt_regs *regs);
static inline bool is_sampling_event(struct perf_event *event)
{
return event->attr.sample_period != 0;
}
/*
* Return 1 for a software event, 0 for a hardware event
*/

View File

@ -316,6 +316,7 @@ extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
size_t *lenp, loff_t *ppos);
extern unsigned int softlockup_panic;
extern int softlockup_thresh;
void lockup_detector_init(void);
#else
static inline void touch_softlockup_watchdog(void)
{
@ -326,6 +327,9 @@ static inline void touch_softlockup_watchdog_sync(void)
static inline void touch_all_softlockup_watchdogs(void)
{
}
static inline void lockup_detector_init(void)
{
}
#endif
#ifdef CONFIG_DETECT_HUNG_TASK

View File

@ -2,6 +2,7 @@
#define __LINUX_STACKTRACE_H
struct task_struct;
struct pt_regs;
#ifdef CONFIG_STACKTRACE
struct task_struct;
@ -13,7 +14,8 @@ struct stack_trace {
};
extern void save_stack_trace(struct stack_trace *trace);
extern void save_stack_trace_bp(struct stack_trace *trace, unsigned long bp);
extern void save_stack_trace_regs(struct stack_trace *trace,
struct pt_regs *regs);
extern void save_stack_trace_tsk(struct task_struct *tsk,
struct stack_trace *trace);

View File

@ -127,8 +127,6 @@ extern struct trace_event_functions exit_syscall_print_funcs;
#define SYSCALL_TRACE_ENTER_EVENT(sname) \
static struct syscall_metadata \
__attribute__((__aligned__(4))) __syscall_meta_##sname; \
static struct ftrace_event_call \
__attribute__((__aligned__(4))) event_enter_##sname; \
static struct ftrace_event_call __used \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) \
@ -137,13 +135,12 @@ extern struct trace_event_functions exit_syscall_print_funcs;
.class = &event_class_syscall_enter, \
.event.funcs = &enter_syscall_print_funcs, \
.data = (void *)&__syscall_meta_##sname,\
}
}; \
__TRACE_EVENT_FLAGS(enter_##sname, TRACE_EVENT_FL_CAP_ANY)
#define SYSCALL_TRACE_EXIT_EVENT(sname) \
static struct syscall_metadata \
__attribute__((__aligned__(4))) __syscall_meta_##sname; \
static struct ftrace_event_call \
__attribute__((__aligned__(4))) event_exit_##sname; \
static struct ftrace_event_call __used \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) \
@ -152,7 +149,8 @@ extern struct trace_event_functions exit_syscall_print_funcs;
.class = &event_class_syscall_exit, \
.event.funcs = &exit_syscall_print_funcs, \
.data = (void *)&__syscall_meta_##sname,\
}
}; \
__TRACE_EVENT_FLAGS(exit_##sname, TRACE_EVENT_FL_CAP_ANY)
#define SYSCALL_METADATA(sname, nb) \
SYSCALL_TRACE_ENTER_EVENT(sname); \

View File

@ -106,6 +106,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
#define TP_PROTO(args...) args
#define TP_ARGS(args...) args
#define TP_CONDITION(args...) args
#ifdef CONFIG_TRACEPOINTS
@ -119,12 +120,14 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
* as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
* "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
*/
#define __DO_TRACE(tp, proto, args) \
#define __DO_TRACE(tp, proto, args, cond) \
do { \
struct tracepoint_func *it_func_ptr; \
void *it_func; \
void *__data; \
\
if (!(cond)) \
return; \
rcu_read_lock_sched_notrace(); \
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
if (it_func_ptr) { \
@ -142,7 +145,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
* not add unwanted padding between the beginning of the section and the
* structure. Force alignment to the same alignment as the section start.
*/
#define __DECLARE_TRACE(name, proto, args, data_proto, data_args) \
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
extern struct tracepoint __tracepoint_##name; \
static inline void trace_##name(proto) \
{ \
@ -151,7 +154,8 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
do_trace: \
__DO_TRACE(&__tracepoint_##name, \
TP_PROTO(data_proto), \
TP_ARGS(data_args)); \
TP_ARGS(data_args), \
TP_CONDITION(cond)); \
} \
static inline int \
register_trace_##name(void (*probe)(data_proto), void *data) \
@ -186,7 +190,7 @@ do_trace: \
EXPORT_SYMBOL(__tracepoint_##name)
#else /* !CONFIG_TRACEPOINTS */
#define __DECLARE_TRACE(name, proto, args, data_proto, data_args) \
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
static inline void trace_##name(proto) \
{ } \
static inline int \
@ -227,13 +231,20 @@ do_trace: \
* "void *__data, proto" as the callback prototype.
*/
#define DECLARE_TRACE_NOARGS(name) \
__DECLARE_TRACE(name, void, , void *__data, __data)
__DECLARE_TRACE(name, void, , 1, void *__data, __data)
#define DECLARE_TRACE(name, proto, args) \
__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \
__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1, \
PARAMS(void *__data, proto), \
PARAMS(__data, args))
#define DECLARE_TRACE_CONDITION(name, proto, args, cond) \
__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), PARAMS(cond), \
PARAMS(void *__data, proto), \
PARAMS(__data, args))
#define TRACE_EVENT_FLAGS(event, flag)
#endif /* DECLARE_TRACE */
#ifndef TRACE_EVENT
@ -347,11 +358,21 @@ do_trace: \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#define DEFINE_EVENT_CONDITION(template, name, proto, \
args, cond) \
DECLARE_TRACE_CONDITION(name, PARAMS(proto), \
PARAMS(args), PARAMS(cond))
#define TRACE_EVENT(name, proto, args, struct, assign, print) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#define TRACE_EVENT_FN(name, proto, args, struct, \
assign, print, reg, unreg) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#define TRACE_EVENT_CONDITION(name, proto, args, cond, \
struct, assign, print) \
DECLARE_TRACE_CONDITION(name, PARAMS(proto), \
PARAMS(args), PARAMS(cond))
#define TRACE_EVENT_FLAGS(event, flag)
#endif /* ifdef TRACE_EVENT (see note above) */

View File

@ -26,6 +26,15 @@
#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
DEFINE_TRACE(name)
#undef TRACE_EVENT_CONDITION
#define TRACE_EVENT_CONDITION(name, proto, args, cond, tstruct, assign, print) \
TRACE_EVENT(name, \
PARAMS(proto), \
PARAMS(args), \
PARAMS(tstruct), \
PARAMS(assign), \
PARAMS(print))
#undef TRACE_EVENT_FN
#define TRACE_EVENT_FN(name, proto, args, tstruct, \
assign, print, reg, unreg) \
@ -39,6 +48,10 @@
#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
DEFINE_TRACE(name)
#undef DEFINE_EVENT_CONDITION
#define DEFINE_EVENT_CONDITION(template, name, proto, args, cond) \
DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
#undef DECLARE_TRACE
#define DECLARE_TRACE(name, proto, args) \
DEFINE_TRACE(name)
@ -75,9 +88,11 @@
#undef TRACE_EVENT
#undef TRACE_EVENT_FN
#undef TRACE_EVENT_CONDITION
#undef DECLARE_EVENT_CLASS
#undef DEFINE_EVENT
#undef DEFINE_EVENT_PRINT
#undef DEFINE_EVENT_CONDITION
#undef TRACE_HEADER_MULTI_READ
#undef DECLARE_TRACE

View File

@ -7,16 +7,67 @@
#include <linux/ktime.h>
#include <linux/tracepoint.h>
#ifndef _TRACE_POWER_ENUM_
#define _TRACE_POWER_ENUM_
enum {
POWER_NONE = 0,
POWER_CSTATE = 1, /* C-State */
POWER_PSTATE = 2, /* Fequency change or DVFS */
POWER_SSTATE = 3, /* Suspend */
};
DECLARE_EVENT_CLASS(cpu,
TP_PROTO(unsigned int state, unsigned int cpu_id),
TP_ARGS(state, cpu_id),
TP_STRUCT__entry(
__field( u32, state )
__field( u32, cpu_id )
),
TP_fast_assign(
__entry->state = state;
__entry->cpu_id = cpu_id;
),
TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
(unsigned long)__entry->cpu_id)
);
DEFINE_EVENT(cpu, cpu_idle,
TP_PROTO(unsigned int state, unsigned int cpu_id),
TP_ARGS(state, cpu_id)
);
/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
#define PWR_EVENT_EXIT -1
#endif
DEFINE_EVENT(cpu, cpu_frequency,
TP_PROTO(unsigned int frequency, unsigned int cpu_id),
TP_ARGS(frequency, cpu_id)
);
TRACE_EVENT(machine_suspend,
TP_PROTO(unsigned int state),
TP_ARGS(state),
TP_STRUCT__entry(
__field( u32, state )
),
TP_fast_assign(
__entry->state = state;
),
TP_printk("state=%lu", (unsigned long)__entry->state)
);
/* This code will be removed after deprecation time exceeded (2.6.41) */
#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
/*
* The power events are used for cpuidle & suspend (power_start, power_end)
* and for cpufreq (power_frequency)
@ -75,6 +126,36 @@ TRACE_EVENT(power_end,
);
/* Deprecated dummy functions must be protected against multi-declartion */
#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
enum {
POWER_NONE = 0,
POWER_CSTATE = 1,
POWER_PSTATE = 2,
};
#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
#else /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
enum {
POWER_NONE = 0,
POWER_CSTATE = 1,
POWER_PSTATE = 2,
};
/* These dummy declaration have to be ripped out when the deprecated
events get removed */
static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
static inline void trace_power_end(u64 cpuid) {};
static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
/*
* The clock events are used for clock enable/disable and for
* clock rate change
@ -153,7 +234,6 @@ DEFINE_EVENT(power_domain, power_domain_target,
TP_ARGS(name, state, cpu_id)
);
#endif /* _TRACE_POWER_H */
/* This part must be outside protection */

View File

@ -40,6 +40,8 @@ TRACE_EVENT_FN(sys_enter,
syscall_regfunc, syscall_unregfunc
);
TRACE_EVENT_FLAGS(sys_enter, TRACE_EVENT_FL_CAP_ANY)
TRACE_EVENT_FN(sys_exit,
TP_PROTO(struct pt_regs *regs, long ret),
@ -62,6 +64,8 @@ TRACE_EVENT_FN(sys_exit,
syscall_regfunc, syscall_unregfunc
);
TRACE_EVENT_FLAGS(sys_exit, TRACE_EVENT_FL_CAP_ANY)
#endif /* CONFIG_HAVE_SYSCALL_TRACEPOINTS */
#endif /* _TRACE_EVENTS_SYSCALLS_H */

View File

@ -82,6 +82,10 @@
TRACE_EVENT(name, PARAMS(proto), PARAMS(args), \
PARAMS(tstruct), PARAMS(assign), PARAMS(print)) \
#undef TRACE_EVENT_FLAGS
#define TRACE_EVENT_FLAGS(name, value) \
__TRACE_EVENT_FLAGS(name, value)
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
@ -129,6 +133,9 @@
#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
#undef TRACE_EVENT_FLAGS
#define TRACE_EVENT_FLAGS(event, flag)
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
/*
@ -289,13 +296,19 @@ static struct trace_event_functions ftrace_event_type_funcs_##call = { \
#undef __array
#define __array(type, item, len) \
BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
ret = trace_define_field(event_call, #type "[" #len "]", #item, \
do { \
mutex_lock(&event_storage_mutex); \
BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
snprintf(event_storage, sizeof(event_storage), \
"%s[%d]", #type, len); \
ret = trace_define_field(event_call, event_storage, #item, \
offsetof(typeof(field), item), \
sizeof(field.item), \
is_signed_type(type), FILTER_OTHER); \
if (ret) \
return ret;
mutex_unlock(&event_storage_mutex); \
if (ret) \
return ret; \
} while (0);
#undef __dynamic_array
#define __dynamic_array(type, item, len) \

View File

@ -67,6 +67,7 @@
#include <linux/sfi.h>
#include <linux/shmem_fs.h>
#include <linux/slab.h>
#include <linux/perf_event.h>
#include <asm/io.h>
#include <asm/bugs.h>
@ -603,6 +604,8 @@ asmlinkage void __init start_kernel(void)
"enabled *very* early, fixing it\n");
local_irq_disable();
}
idr_init_cache();
perf_event_init();
rcu_init();
radix_tree_init();
/* init some links before init_ISA_irqs() */
@ -658,7 +661,6 @@ asmlinkage void __init start_kernel(void)
enable_debug_pagealloc();
kmemleak_init();
debug_objects_mem_init();
idr_init_cache();
setup_per_cpu_pageset();
numa_policy_init();
if (late_time_init)
@ -882,6 +884,7 @@ static int __init kernel_init(void * unused)
smp_prepare_cpus(setup_max_cpus);
do_pre_smp_initcalls();
lockup_detector_init();
smp_init();
sched_init_smp();

View File

@ -641,7 +641,7 @@ int __init init_hw_breakpoint(void)
constraints_initialized = 1;
perf_pmu_register(&perf_breakpoint);
perf_pmu_register(&perf_breakpoint, "breakpoint", PERF_TYPE_BREAKPOINT);
return register_die_notifier(&hw_breakpoint_exceptions_nb);

View File

@ -354,13 +354,20 @@ static inline int kprobe_aggrprobe(struct kprobe *p)
return p->pre_handler == aggr_pre_handler;
}
/* Return true(!0) if the kprobe is unused */
static inline int kprobe_unused(struct kprobe *p)
{
return kprobe_aggrprobe(p) && kprobe_disabled(p) &&
list_empty(&p->list);
}
/*
* Keep all fields in the kprobe consistent
*/
static inline void copy_kprobe(struct kprobe *old_p, struct kprobe *p)
static inline void copy_kprobe(struct kprobe *ap, struct kprobe *p)
{
memcpy(&p->opcode, &old_p->opcode, sizeof(kprobe_opcode_t));
memcpy(&p->ainsn, &old_p->ainsn, sizeof(struct arch_specific_insn));
memcpy(&p->opcode, &ap->opcode, sizeof(kprobe_opcode_t));
memcpy(&p->ainsn, &ap->ainsn, sizeof(struct arch_specific_insn));
}
#ifdef CONFIG_OPTPROBES
@ -384,6 +391,17 @@ void __kprobes opt_pre_handler(struct kprobe *p, struct pt_regs *regs)
}
}
/* Free optimized instructions and optimized_kprobe */
static __kprobes void free_aggr_kprobe(struct kprobe *p)
{
struct optimized_kprobe *op;
op = container_of(p, struct optimized_kprobe, kp);
arch_remove_optimized_kprobe(op);
arch_remove_kprobe(p);
kfree(op);
}
/* Return true(!0) if the kprobe is ready for optimization. */
static inline int kprobe_optready(struct kprobe *p)
{
@ -397,6 +415,33 @@ static inline int kprobe_optready(struct kprobe *p)
return 0;
}
/* Return true(!0) if the kprobe is disarmed. Note: p must be on hash list */
static inline int kprobe_disarmed(struct kprobe *p)
{
struct optimized_kprobe *op;
/* If kprobe is not aggr/opt probe, just return kprobe is disabled */
if (!kprobe_aggrprobe(p))
return kprobe_disabled(p);
op = container_of(p, struct optimized_kprobe, kp);
return kprobe_disabled(p) && list_empty(&op->list);
}
/* Return true(!0) if the probe is queued on (un)optimizing lists */
static int __kprobes kprobe_queued(struct kprobe *p)
{
struct optimized_kprobe *op;
if (kprobe_aggrprobe(p)) {
op = container_of(p, struct optimized_kprobe, kp);
if (!list_empty(&op->list))
return 1;
}
return 0;
}
/*
* Return an optimized kprobe whose optimizing code replaces
* instructions including addr (exclude breakpoint).
@ -422,30 +467,23 @@ static struct kprobe *__kprobes get_optimized_kprobe(unsigned long addr)
/* Optimization staging list, protected by kprobe_mutex */
static LIST_HEAD(optimizing_list);
static LIST_HEAD(unoptimizing_list);
static void kprobe_optimizer(struct work_struct *work);
static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer);
static DECLARE_COMPLETION(optimizer_comp);
#define OPTIMIZE_DELAY 5
/* Kprobe jump optimizer */
static __kprobes void kprobe_optimizer(struct work_struct *work)
/*
* Optimize (replace a breakpoint with a jump) kprobes listed on
* optimizing_list.
*/
static __kprobes void do_optimize_kprobes(void)
{
struct optimized_kprobe *op, *tmp;
/* Lock modules while optimizing kprobes */
mutex_lock(&module_mutex);
mutex_lock(&kprobe_mutex);
if (kprobes_all_disarmed || !kprobes_allow_optimization)
goto end;
/*
* Wait for quiesence period to ensure all running interrupts
* are done. Because optprobe may modify multiple instructions
* there is a chance that Nth instruction is interrupted. In that
* case, running interrupt can return to 2nd-Nth byte of jump
* instruction. This wait is for avoiding it.
*/
synchronize_sched();
/* Optimization never be done when disarmed */
if (kprobes_all_disarmed || !kprobes_allow_optimization ||
list_empty(&optimizing_list))
return;
/*
* The optimization/unoptimization refers online_cpus via
@ -459,17 +497,111 @@ static __kprobes void kprobe_optimizer(struct work_struct *work)
*/
get_online_cpus();
mutex_lock(&text_mutex);
list_for_each_entry_safe(op, tmp, &optimizing_list, list) {
WARN_ON(kprobe_disabled(&op->kp));
if (arch_optimize_kprobe(op) < 0)
op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
list_del_init(&op->list);
arch_optimize_kprobes(&optimizing_list);
mutex_unlock(&text_mutex);
put_online_cpus();
}
/*
* Unoptimize (replace a jump with a breakpoint and remove the breakpoint
* if need) kprobes listed on unoptimizing_list.
*/
static __kprobes void do_unoptimize_kprobes(struct list_head *free_list)
{
struct optimized_kprobe *op, *tmp;
/* Unoptimization must be done anytime */
if (list_empty(&unoptimizing_list))
return;
/* Ditto to do_optimize_kprobes */
get_online_cpus();
mutex_lock(&text_mutex);
arch_unoptimize_kprobes(&unoptimizing_list, free_list);
/* Loop free_list for disarming */
list_for_each_entry_safe(op, tmp, free_list, list) {
/* Disarm probes if marked disabled */
if (kprobe_disabled(&op->kp))
arch_disarm_kprobe(&op->kp);
if (kprobe_unused(&op->kp)) {
/*
* Remove unused probes from hash list. After waiting
* for synchronization, these probes are reclaimed.
* (reclaiming is done by do_free_cleaned_kprobes.)
*/
hlist_del_rcu(&op->kp.hlist);
} else
list_del_init(&op->list);
}
mutex_unlock(&text_mutex);
put_online_cpus();
end:
}
/* Reclaim all kprobes on the free_list */
static __kprobes void do_free_cleaned_kprobes(struct list_head *free_list)
{
struct optimized_kprobe *op, *tmp;
list_for_each_entry_safe(op, tmp, free_list, list) {
BUG_ON(!kprobe_unused(&op->kp));
list_del_init(&op->list);
free_aggr_kprobe(&op->kp);
}
}
/* Start optimizer after OPTIMIZE_DELAY passed */
static __kprobes void kick_kprobe_optimizer(void)
{
if (!delayed_work_pending(&optimizing_work))
schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
}
/* Kprobe jump optimizer */
static __kprobes void kprobe_optimizer(struct work_struct *work)
{
LIST_HEAD(free_list);
/* Lock modules while optimizing kprobes */
mutex_lock(&module_mutex);
mutex_lock(&kprobe_mutex);
/*
* Step 1: Unoptimize kprobes and collect cleaned (unused and disarmed)
* kprobes before waiting for quiesence period.
*/
do_unoptimize_kprobes(&free_list);
/*
* Step 2: Wait for quiesence period to ensure all running interrupts
* are done. Because optprobe may modify multiple instructions
* there is a chance that Nth instruction is interrupted. In that
* case, running interrupt can return to 2nd-Nth byte of jump
* instruction. This wait is for avoiding it.
*/
synchronize_sched();
/* Step 3: Optimize kprobes after quiesence period */
do_optimize_kprobes();
/* Step 4: Free cleaned kprobes after quiesence period */
do_free_cleaned_kprobes(&free_list);
mutex_unlock(&kprobe_mutex);
mutex_unlock(&module_mutex);
/* Step 5: Kick optimizer again if needed */
if (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list))
kick_kprobe_optimizer();
else
/* Wake up all waiters */
complete_all(&optimizer_comp);
}
/* Wait for completing optimization and unoptimization */
static __kprobes void wait_for_kprobe_optimizer(void)
{
if (delayed_work_pending(&optimizing_work))
wait_for_completion(&optimizer_comp);
}
/* Optimize kprobe if p is ready to be optimized */
@ -495,28 +627,85 @@ static __kprobes void optimize_kprobe(struct kprobe *p)
/* Check if it is already optimized. */
if (op->kp.flags & KPROBE_FLAG_OPTIMIZED)
return;
op->kp.flags |= KPROBE_FLAG_OPTIMIZED;
list_add(&op->list, &optimizing_list);
if (!delayed_work_pending(&optimizing_work))
schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
if (!list_empty(&op->list))
/* This is under unoptimizing. Just dequeue the probe */
list_del_init(&op->list);
else {
list_add(&op->list, &optimizing_list);
kick_kprobe_optimizer();
}
}
/* Short cut to direct unoptimizing */
static __kprobes void force_unoptimize_kprobe(struct optimized_kprobe *op)
{
get_online_cpus();
arch_unoptimize_kprobe(op);
put_online_cpus();
if (kprobe_disabled(&op->kp))
arch_disarm_kprobe(&op->kp);
}
/* Unoptimize a kprobe if p is optimized */
static __kprobes void unoptimize_kprobe(struct kprobe *p)
static __kprobes void unoptimize_kprobe(struct kprobe *p, bool force)
{
struct optimized_kprobe *op;
if ((p->flags & KPROBE_FLAG_OPTIMIZED) && kprobe_aggrprobe(p)) {
op = container_of(p, struct optimized_kprobe, kp);
if (!list_empty(&op->list))
/* Dequeue from the optimization queue */
if (!kprobe_aggrprobe(p) || kprobe_disarmed(p))
return; /* This is not an optprobe nor optimized */
op = container_of(p, struct optimized_kprobe, kp);
if (!kprobe_optimized(p)) {
/* Unoptimized or unoptimizing case */
if (force && !list_empty(&op->list)) {
/*
* Only if this is unoptimizing kprobe and forced,
* forcibly unoptimize it. (No need to unoptimize
* unoptimized kprobe again :)
*/
list_del_init(&op->list);
else
/* Replace jump with break */
arch_unoptimize_kprobe(op);
op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
force_unoptimize_kprobe(op);
}
return;
}
op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
if (!list_empty(&op->list)) {
/* Dequeue from the optimization queue */
list_del_init(&op->list);
return;
}
/* Optimized kprobe case */
if (force)
/* Forcibly update the code: this is a special case */
force_unoptimize_kprobe(op);
else {
list_add(&op->list, &unoptimizing_list);
kick_kprobe_optimizer();
}
}
/* Cancel unoptimizing for reusing */
static void reuse_unused_kprobe(struct kprobe *ap)
{
struct optimized_kprobe *op;
BUG_ON(!kprobe_unused(ap));
/*
* Unused kprobe MUST be on the way of delayed unoptimizing (means
* there is still a relative jump) and disabled.
*/
op = container_of(ap, struct optimized_kprobe, kp);
if (unlikely(list_empty(&op->list)))
printk(KERN_WARNING "Warning: found a stray unused "
"aggrprobe@%p\n", ap->addr);
/* Enable the probe again */
ap->flags &= ~KPROBE_FLAG_DISABLED;
/* Optimize it again (remove from op->list) */
BUG_ON(!kprobe_optready(ap));
optimize_kprobe(ap);
}
/* Remove optimized instructions */
@ -525,12 +714,12 @@ static void __kprobes kill_optimized_kprobe(struct kprobe *p)
struct optimized_kprobe *op;
op = container_of(p, struct optimized_kprobe, kp);
if (!list_empty(&op->list)) {
/* Dequeue from the optimization queue */
if (!list_empty(&op->list))
/* Dequeue from the (un)optimization queue */
list_del_init(&op->list);
op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
}
/* Don't unoptimize, because the target code will be freed. */
op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
/* Don't touch the code, because it is already freed. */
arch_remove_optimized_kprobe(op);
}
@ -543,16 +732,6 @@ static __kprobes void prepare_optimized_kprobe(struct kprobe *p)
arch_prepare_optimized_kprobe(op);
}
/* Free optimized instructions and optimized_kprobe */
static __kprobes void free_aggr_kprobe(struct kprobe *p)
{
struct optimized_kprobe *op;
op = container_of(p, struct optimized_kprobe, kp);
arch_remove_optimized_kprobe(op);
kfree(op);
}
/* Allocate new optimized_kprobe and try to prepare optimized instructions */
static __kprobes struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
{
@ -587,7 +766,8 @@ static __kprobes void try_to_optimize_kprobe(struct kprobe *p)
op = container_of(ap, struct optimized_kprobe, kp);
if (!arch_prepared_optinsn(&op->optinsn)) {
/* If failed to setup optimizing, fallback to kprobe */
free_aggr_kprobe(ap);
arch_remove_optimized_kprobe(op);
kfree(op);
return;
}
@ -631,21 +811,16 @@ static void __kprobes unoptimize_all_kprobes(void)
return;
kprobes_allow_optimization = false;
printk(KERN_INFO "Kprobes globally unoptimized\n");
get_online_cpus(); /* For avoiding text_mutex deadlock */
mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist) {
if (!kprobe_disabled(p))
unoptimize_kprobe(p);
unoptimize_kprobe(p, false);
}
}
mutex_unlock(&text_mutex);
put_online_cpus();
/* Allow all currently running kprobes to complete */
synchronize_sched();
/* Wait for unoptimizing completion */
wait_for_kprobe_optimizer();
printk(KERN_INFO "Kprobes globally unoptimized\n");
}
int sysctl_kprobes_optimization;
@ -669,44 +844,60 @@ int proc_kprobes_optimization_handler(struct ctl_table *table, int write,
}
#endif /* CONFIG_SYSCTL */
/* Put a breakpoint for a probe. Must be called with text_mutex locked */
static void __kprobes __arm_kprobe(struct kprobe *p)
{
struct kprobe *old_p;
struct kprobe *_p;
/* Check collision with other optimized kprobes */
old_p = get_optimized_kprobe((unsigned long)p->addr);
if (unlikely(old_p))
unoptimize_kprobe(old_p); /* Fallback to unoptimized kprobe */
_p = get_optimized_kprobe((unsigned long)p->addr);
if (unlikely(_p))
/* Fallback to unoptimized kprobe */
unoptimize_kprobe(_p, true);
arch_arm_kprobe(p);
optimize_kprobe(p); /* Try to optimize (add kprobe to a list) */
}
static void __kprobes __disarm_kprobe(struct kprobe *p)
/* Remove the breakpoint of a probe. Must be called with text_mutex locked */
static void __kprobes __disarm_kprobe(struct kprobe *p, bool reopt)
{
struct kprobe *old_p;
struct kprobe *_p;
unoptimize_kprobe(p); /* Try to unoptimize */
arch_disarm_kprobe(p);
unoptimize_kprobe(p, false); /* Try to unoptimize */
/* If another kprobe was blocked, optimize it. */
old_p = get_optimized_kprobe((unsigned long)p->addr);
if (unlikely(old_p))
optimize_kprobe(old_p);
if (!kprobe_queued(p)) {
arch_disarm_kprobe(p);
/* If another kprobe was blocked, optimize it. */
_p = get_optimized_kprobe((unsigned long)p->addr);
if (unlikely(_p) && reopt)
optimize_kprobe(_p);
}
/* TODO: reoptimize others after unoptimized this probe */
}
#else /* !CONFIG_OPTPROBES */
#define optimize_kprobe(p) do {} while (0)
#define unoptimize_kprobe(p) do {} while (0)
#define unoptimize_kprobe(p, f) do {} while (0)
#define kill_optimized_kprobe(p) do {} while (0)
#define prepare_optimized_kprobe(p) do {} while (0)
#define try_to_optimize_kprobe(p) do {} while (0)
#define __arm_kprobe(p) arch_arm_kprobe(p)
#define __disarm_kprobe(p) arch_disarm_kprobe(p)
#define __disarm_kprobe(p, o) arch_disarm_kprobe(p)
#define kprobe_disarmed(p) kprobe_disabled(p)
#define wait_for_kprobe_optimizer() do {} while (0)
/* There should be no unused kprobes can be reused without optimization */
static void reuse_unused_kprobe(struct kprobe *ap)
{
printk(KERN_ERR "Error: There should be no unused kprobe here.\n");
BUG_ON(kprobe_unused(ap));
}
static __kprobes void free_aggr_kprobe(struct kprobe *p)
{
arch_remove_kprobe(p);
kfree(p);
}
@ -732,11 +923,10 @@ static void __kprobes arm_kprobe(struct kprobe *kp)
/* Disarm a kprobe with text_mutex */
static void __kprobes disarm_kprobe(struct kprobe *kp)
{
get_online_cpus(); /* For avoiding text_mutex deadlock */
/* Ditto */
mutex_lock(&text_mutex);
__disarm_kprobe(kp);
__disarm_kprobe(kp, true);
mutex_unlock(&text_mutex);
put_online_cpus();
}
/*
@ -942,7 +1132,7 @@ static int __kprobes add_new_kprobe(struct kprobe *ap, struct kprobe *p)
BUG_ON(kprobe_gone(ap) || kprobe_gone(p));
if (p->break_handler || p->post_handler)
unoptimize_kprobe(ap); /* Fall back to normal kprobe */
unoptimize_kprobe(ap, true); /* Fall back to normal kprobe */
if (p->break_handler) {
if (ap->break_handler)
@ -993,19 +1183,21 @@ static void __kprobes init_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
* This is the second or subsequent kprobe at the address - handle
* the intricacies
*/
static int __kprobes register_aggr_kprobe(struct kprobe *old_p,
static int __kprobes register_aggr_kprobe(struct kprobe *orig_p,
struct kprobe *p)
{
int ret = 0;
struct kprobe *ap = old_p;
struct kprobe *ap = orig_p;
if (!kprobe_aggrprobe(old_p)) {
/* If old_p is not an aggr_kprobe, create new aggr_kprobe. */
ap = alloc_aggr_kprobe(old_p);
if (!kprobe_aggrprobe(orig_p)) {
/* If orig_p is not an aggr_kprobe, create new aggr_kprobe. */
ap = alloc_aggr_kprobe(orig_p);
if (!ap)
return -ENOMEM;
init_aggr_kprobe(ap, old_p);
}
init_aggr_kprobe(ap, orig_p);
} else if (kprobe_unused(ap))
/* This probe is going to die. Rescue it */
reuse_unused_kprobe(ap);
if (kprobe_gone(ap)) {
/*
@ -1039,23 +1231,6 @@ static int __kprobes register_aggr_kprobe(struct kprobe *old_p,
return add_new_kprobe(ap, p);
}
/* Try to disable aggr_kprobe, and return 1 if succeeded.*/
static int __kprobes try_to_disable_aggr_kprobe(struct kprobe *p)
{
struct kprobe *kp;
list_for_each_entry_rcu(kp, &p->list, list) {
if (!kprobe_disabled(kp))
/*
* There is an active probe on the list.
* We can't disable aggr_kprobe.
*/
return 0;
}
p->flags |= KPROBE_FLAG_DISABLED;
return 1;
}
static int __kprobes in_kprobes_functions(unsigned long addr)
{
struct kprobe_blackpoint *kb;
@ -1098,34 +1273,33 @@ static kprobe_opcode_t __kprobes *kprobe_addr(struct kprobe *p)
/* Check passed kprobe is valid and return kprobe in kprobe_table. */
static struct kprobe * __kprobes __get_valid_kprobe(struct kprobe *p)
{
struct kprobe *old_p, *list_p;
struct kprobe *ap, *list_p;
old_p = get_kprobe(p->addr);
if (unlikely(!old_p))
ap = get_kprobe(p->addr);
if (unlikely(!ap))
return NULL;
if (p != old_p) {
list_for_each_entry_rcu(list_p, &old_p->list, list)
if (p != ap) {
list_for_each_entry_rcu(list_p, &ap->list, list)
if (list_p == p)
/* kprobe p is a valid probe */
goto valid;
return NULL;
}
valid:
return old_p;
return ap;
}
/* Return error if the kprobe is being re-registered */
static inline int check_kprobe_rereg(struct kprobe *p)
{
int ret = 0;
struct kprobe *old_p;
mutex_lock(&kprobe_mutex);
old_p = __get_valid_kprobe(p);
if (old_p)
if (__get_valid_kprobe(p))
ret = -EINVAL;
mutex_unlock(&kprobe_mutex);
return ret;
}
@ -1229,67 +1403,121 @@ fail_with_jump_label:
}
EXPORT_SYMBOL_GPL(register_kprobe);
/* Check if all probes on the aggrprobe are disabled */
static int __kprobes aggr_kprobe_disabled(struct kprobe *ap)
{
struct kprobe *kp;
list_for_each_entry_rcu(kp, &ap->list, list)
if (!kprobe_disabled(kp))
/*
* There is an active probe on the list.
* We can't disable this ap.
*/
return 0;
return 1;
}
/* Disable one kprobe: Make sure called under kprobe_mutex is locked */
static struct kprobe *__kprobes __disable_kprobe(struct kprobe *p)
{
struct kprobe *orig_p;
/* Get an original kprobe for return */
orig_p = __get_valid_kprobe(p);
if (unlikely(orig_p == NULL))
return NULL;
if (!kprobe_disabled(p)) {
/* Disable probe if it is a child probe */
if (p != orig_p)
p->flags |= KPROBE_FLAG_DISABLED;
/* Try to disarm and disable this/parent probe */
if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
disarm_kprobe(orig_p);
orig_p->flags |= KPROBE_FLAG_DISABLED;
}
}
return orig_p;
}
/*
* Unregister a kprobe without a scheduler synchronization.
*/
static int __kprobes __unregister_kprobe_top(struct kprobe *p)
{
struct kprobe *old_p, *list_p;
struct kprobe *ap, *list_p;
old_p = __get_valid_kprobe(p);
if (old_p == NULL)
/* Disable kprobe. This will disarm it if needed. */
ap = __disable_kprobe(p);
if (ap == NULL)
return -EINVAL;
if (old_p == p ||
(kprobe_aggrprobe(old_p) &&
list_is_singular(&old_p->list))) {
if (ap == p)
/*
* Only probe on the hash list. Disarm only if kprobes are
* enabled and not gone - otherwise, the breakpoint would
* already have been removed. We save on flushing icache.
* This probe is an independent(and non-optimized) kprobe
* (not an aggrprobe). Remove from the hash list.
*/
if (!kprobes_all_disarmed && !kprobe_disabled(old_p))
disarm_kprobe(old_p);
hlist_del_rcu(&old_p->hlist);
} else {
goto disarmed;
/* Following process expects this probe is an aggrprobe */
WARN_ON(!kprobe_aggrprobe(ap));
if (list_is_singular(&ap->list) && kprobe_disarmed(ap))
/*
* !disarmed could be happen if the probe is under delayed
* unoptimizing.
*/
goto disarmed;
else {
/* If disabling probe has special handlers, update aggrprobe */
if (p->break_handler && !kprobe_gone(p))
old_p->break_handler = NULL;
ap->break_handler = NULL;
if (p->post_handler && !kprobe_gone(p)) {
list_for_each_entry_rcu(list_p, &old_p->list, list) {
list_for_each_entry_rcu(list_p, &ap->list, list) {
if ((list_p != p) && (list_p->post_handler))
goto noclean;
}
old_p->post_handler = NULL;
ap->post_handler = NULL;
}
noclean:
/*
* Remove from the aggrprobe: this path will do nothing in
* __unregister_kprobe_bottom().
*/
list_del_rcu(&p->list);
if (!kprobe_disabled(old_p)) {
try_to_disable_aggr_kprobe(old_p);
if (!kprobes_all_disarmed) {
if (kprobe_disabled(old_p))
disarm_kprobe(old_p);
else
/* Try to optimize this probe again */
optimize_kprobe(old_p);
}
}
if (!kprobe_disabled(ap) && !kprobes_all_disarmed)
/*
* Try to optimize this probe again, because post
* handler may have been changed.
*/
optimize_kprobe(ap);
}
return 0;
disarmed:
BUG_ON(!kprobe_disarmed(ap));
hlist_del_rcu(&ap->hlist);
return 0;
}
static void __kprobes __unregister_kprobe_bottom(struct kprobe *p)
{
struct kprobe *old_p;
struct kprobe *ap;
if (list_empty(&p->list))
/* This is an independent kprobe */
arch_remove_kprobe(p);
else if (list_is_singular(&p->list)) {
/* "p" is the last child of an aggr_kprobe */
old_p = list_entry(p->list.next, struct kprobe, list);
/* This is the last child of an aggrprobe */
ap = list_entry(p->list.next, struct kprobe, list);
list_del(&p->list);
arch_remove_kprobe(old_p);
free_aggr_kprobe(old_p);
free_aggr_kprobe(ap);
}
/* Otherwise, do nothing. */
}
int __kprobes register_kprobes(struct kprobe **kps, int num)
@ -1607,29 +1835,13 @@ static void __kprobes kill_kprobe(struct kprobe *p)
int __kprobes disable_kprobe(struct kprobe *kp)
{
int ret = 0;
struct kprobe *p;
mutex_lock(&kprobe_mutex);
/* Check whether specified probe is valid. */
p = __get_valid_kprobe(kp);
if (unlikely(p == NULL)) {
/* Disable this kprobe */
if (__disable_kprobe(kp) == NULL)
ret = -EINVAL;
goto out;
}
/* If the probe is already disabled (or gone), just return */
if (kprobe_disabled(kp))
goto out;
kp->flags |= KPROBE_FLAG_DISABLED;
if (p != kp)
/* When kp != p, p is always enabled. */
try_to_disable_aggr_kprobe(p);
if (!kprobes_all_disarmed && kprobe_disabled(p))
disarm_kprobe(p);
out:
mutex_unlock(&kprobe_mutex);
return ret;
}
@ -1927,36 +2139,27 @@ static void __kprobes disarm_all_kprobes(void)
mutex_lock(&kprobe_mutex);
/* If kprobes are already disarmed, just return */
if (kprobes_all_disarmed)
goto already_disabled;
if (kprobes_all_disarmed) {
mutex_unlock(&kprobe_mutex);
return;
}
kprobes_all_disarmed = true;
printk(KERN_INFO "Kprobes globally disabled\n");
/*
* Here we call get_online_cpus() for avoiding text_mutex deadlock,
* because disarming may also unoptimize kprobes.
*/
get_online_cpus();
mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist) {
if (!arch_trampoline_kprobe(p) && !kprobe_disabled(p))
__disarm_kprobe(p);
__disarm_kprobe(p, false);
}
}
mutex_unlock(&text_mutex);
put_online_cpus();
mutex_unlock(&kprobe_mutex);
/* Allow all currently running kprobes to complete */
synchronize_sched();
return;
already_disabled:
mutex_unlock(&kprobe_mutex);
return;
/* Wait for disarming all kprobes by optimizer */
wait_for_kprobe_optimizer();
}
/*

View File

@ -13,6 +13,7 @@
#include <linux/mm.h>
#include <linux/cpu.h>
#include <linux/smp.h>
#include <linux/idr.h>
#include <linux/file.h>
#include <linux/poll.h>
#include <linux/slab.h>
@ -21,7 +22,9 @@
#include <linux/dcache.h>
#include <linux/percpu.h>
#include <linux/ptrace.h>
#include <linux/reboot.h>
#include <linux/vmstat.h>
#include <linux/device.h>
#include <linux/vmalloc.h>
#include <linux/hardirq.h>
#include <linux/rculist.h>
@ -133,6 +136,28 @@ static void unclone_ctx(struct perf_event_context *ctx)
}
}
static u32 perf_event_pid(struct perf_event *event, struct task_struct *p)
{
/*
* only top level events have the pid namespace they were created in
*/
if (event->parent)
event = event->parent;
return task_tgid_nr_ns(p, event->ns);
}
static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)
{
/*
* only top level events have the pid namespace they were created in
*/
if (event->parent)
event = event->parent;
return task_pid_nr_ns(p, event->ns);
}
/*
* If we inherit events we want to return the parent event id
* to userspace.
@ -312,9 +337,84 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
ctx->nr_stat++;
}
/*
* Called at perf_event creation and when events are attached/detached from a
* group.
*/
static void perf_event__read_size(struct perf_event *event)
{
int entry = sizeof(u64); /* value */
int size = 0;
int nr = 1;
if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
size += sizeof(u64);
if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
size += sizeof(u64);
if (event->attr.read_format & PERF_FORMAT_ID)
entry += sizeof(u64);
if (event->attr.read_format & PERF_FORMAT_GROUP) {
nr += event->group_leader->nr_siblings;
size += sizeof(u64);
}
size += entry * nr;
event->read_size = size;
}
static void perf_event__header_size(struct perf_event *event)
{
struct perf_sample_data *data;
u64 sample_type = event->attr.sample_type;
u16 size = 0;
perf_event__read_size(event);
if (sample_type & PERF_SAMPLE_IP)
size += sizeof(data->ip);
if (sample_type & PERF_SAMPLE_ADDR)
size += sizeof(data->addr);
if (sample_type & PERF_SAMPLE_PERIOD)
size += sizeof(data->period);
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;
event->header_size = size;
}
static void perf_event__id_header_size(struct perf_event *event)
{
struct perf_sample_data *data;
u64 sample_type = event->attr.sample_type;
u16 size = 0;
if (sample_type & PERF_SAMPLE_TID)
size += sizeof(data->tid_entry);
if (sample_type & PERF_SAMPLE_TIME)
size += sizeof(data->time);
if (sample_type & PERF_SAMPLE_ID)
size += sizeof(data->id);
if (sample_type & PERF_SAMPLE_STREAM_ID)
size += sizeof(data->stream_id);
if (sample_type & PERF_SAMPLE_CPU)
size += sizeof(data->cpu_entry);
event->id_header_size = size;
}
static void perf_group_attach(struct perf_event *event)
{
struct perf_event *group_leader = event->group_leader;
struct perf_event *group_leader = event->group_leader, *pos;
/*
* We can have double attach due to group movement in perf_event_open.
@ -333,6 +433,11 @@ static void perf_group_attach(struct perf_event *event)
list_add_tail(&event->group_entry, &group_leader->sibling_list);
group_leader->nr_siblings++;
perf_event__header_size(group_leader);
list_for_each_entry(pos, &group_leader->sibling_list, group_entry)
perf_event__header_size(pos);
}
/*
@ -391,7 +496,7 @@ static void perf_group_detach(struct perf_event *event)
if (event->group_leader != event) {
list_del_init(&event->group_entry);
event->group_leader->nr_siblings--;
return;
goto out;
}
if (!list_empty(&event->group_entry))
@ -410,6 +515,12 @@ static void perf_group_detach(struct perf_event *event)
/* Inherit group flags from the previous leader */
sibling->group_flags = event->group_flags;
}
out:
perf_event__header_size(event->group_leader);
list_for_each_entry(tmp, &event->group_leader->sibling_list, group_entry)
perf_event__header_size(tmp);
}
static inline int
@ -1073,7 +1184,7 @@ static int perf_event_refresh(struct perf_event *event, int refresh)
/*
* not supported on inherited events
*/
if (event->attr.inherit)
if (event->attr.inherit || !is_sampling_event(event))
return -EINVAL;
atomic_add(refresh, &event->event_limit);
@ -2289,31 +2400,6 @@ static int perf_release(struct inode *inode, struct file *file)
return perf_event_release_kernel(event);
}
static int perf_event_read_size(struct perf_event *event)
{
int entry = sizeof(u64); /* value */
int size = 0;
int nr = 1;
if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
size += sizeof(u64);
if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
size += sizeof(u64);
if (event->attr.read_format & PERF_FORMAT_ID)
entry += sizeof(u64);
if (event->attr.read_format & PERF_FORMAT_GROUP) {
nr += event->group_leader->nr_siblings;
size += sizeof(u64);
}
size += entry * nr;
return size;
}
u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
{
struct perf_event *child;
@ -2428,7 +2514,7 @@ perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
if (event->state == PERF_EVENT_STATE_ERROR)
return 0;
if (count < perf_event_read_size(event))
if (count < event->read_size)
return -ENOSPC;
WARN_ON_ONCE(event->ctx->parent_ctx);
@ -2514,7 +2600,7 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
int ret = 0;
u64 value;
if (!event->attr.sample_period)
if (!is_sampling_event(event))
return -EINVAL;
if (copy_from_user(&value, arg, sizeof(value)))
@ -3305,6 +3391,73 @@ __always_inline void perf_output_copy(struct perf_output_handle *handle,
} while (len);
}
static void __perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,
struct perf_event *event)
{
u64 sample_type = event->attr.sample_type;
data->type = sample_type;
header->size += event->id_header_size;
if (sample_type & PERF_SAMPLE_TID) {
/* namespace issues */
data->tid_entry.pid = perf_event_pid(event, current);
data->tid_entry.tid = perf_event_tid(event, current);
}
if (sample_type & PERF_SAMPLE_TIME)
data->time = perf_clock();
if (sample_type & PERF_SAMPLE_ID)
data->id = primary_event_id(event);
if (sample_type & PERF_SAMPLE_STREAM_ID)
data->stream_id = event->id;
if (sample_type & PERF_SAMPLE_CPU) {
data->cpu_entry.cpu = raw_smp_processor_id();
data->cpu_entry.reserved = 0;
}
}
static void perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,
struct perf_event *event)
{
if (event->attr.sample_id_all)
__perf_event_header__init_id(header, data, event);
}
static void __perf_event__output_id_sample(struct perf_output_handle *handle,
struct perf_sample_data *data)
{
u64 sample_type = data->type;
if (sample_type & PERF_SAMPLE_TID)
perf_output_put(handle, data->tid_entry);
if (sample_type & PERF_SAMPLE_TIME)
perf_output_put(handle, data->time);
if (sample_type & PERF_SAMPLE_ID)
perf_output_put(handle, data->id);
if (sample_type & PERF_SAMPLE_STREAM_ID)
perf_output_put(handle, data->stream_id);
if (sample_type & PERF_SAMPLE_CPU)
perf_output_put(handle, data->cpu_entry);
}
static void perf_event__output_id_sample(struct perf_event *event,
struct perf_output_handle *handle,
struct perf_sample_data *sample)
{
if (event->attr.sample_id_all)
__perf_event__output_id_sample(handle, sample);
}
int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size,
int nmi, int sample)
@ -3312,6 +3465,7 @@ int perf_output_begin(struct perf_output_handle *handle,
struct perf_buffer *buffer;
unsigned long tail, offset, head;
int have_lost;
struct perf_sample_data sample_data;
struct {
struct perf_event_header header;
u64 id;
@ -3338,8 +3492,12 @@ int perf_output_begin(struct perf_output_handle *handle,
goto out;
have_lost = local_read(&buffer->lost);
if (have_lost)
size += sizeof(lost_event);
if (have_lost) {
lost_event.header.size = sizeof(lost_event);
perf_event_header__init_id(&lost_event.header, &sample_data,
event);
size += lost_event.header.size;
}
perf_output_get_handle(handle);
@ -3370,11 +3528,11 @@ int perf_output_begin(struct perf_output_handle *handle,
if (have_lost) {
lost_event.header.type = PERF_RECORD_LOST;
lost_event.header.misc = 0;
lost_event.header.size = sizeof(lost_event);
lost_event.id = event->id;
lost_event.lost = local_xchg(&buffer->lost, 0);
perf_output_put(handle, lost_event);
perf_event__output_id_sample(event, handle, &sample_data);
}
return 0;
@ -3407,28 +3565,6 @@ void perf_output_end(struct perf_output_handle *handle)
rcu_read_unlock();
}
static u32 perf_event_pid(struct perf_event *event, struct task_struct *p)
{
/*
* only top level events have the pid namespace they were created in
*/
if (event->parent)
event = event->parent;
return task_tgid_nr_ns(p, event->ns);
}
static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)
{
/*
* only top level events have the pid namespace they were created in
*/
if (event->parent)
event = event->parent;
return task_pid_nr_ns(p, event->ns);
}
static void perf_output_read_one(struct perf_output_handle *handle,
struct perf_event *event,
u64 enabled, u64 running)
@ -3603,62 +3739,17 @@ void perf_prepare_sample(struct perf_event_header *header,
{
u64 sample_type = event->attr.sample_type;
data->type = sample_type;
header->type = PERF_RECORD_SAMPLE;
header->size = sizeof(*header);
header->size = sizeof(*header) + event->header_size;
header->misc = 0;
header->misc |= perf_misc_flags(regs);
if (sample_type & PERF_SAMPLE_IP) {
__perf_event_header__init_id(header, data, event);
if (sample_type & PERF_SAMPLE_IP)
data->ip = perf_instruction_pointer(regs);
header->size += sizeof(data->ip);
}
if (sample_type & PERF_SAMPLE_TID) {
/* namespace issues */
data->tid_entry.pid = perf_event_pid(event, current);
data->tid_entry.tid = perf_event_tid(event, current);
header->size += sizeof(data->tid_entry);
}
if (sample_type & PERF_SAMPLE_TIME) {
data->time = perf_clock();
header->size += sizeof(data->time);
}
if (sample_type & PERF_SAMPLE_ADDR)
header->size += sizeof(data->addr);
if (sample_type & PERF_SAMPLE_ID) {
data->id = primary_event_id(event);
header->size += sizeof(data->id);
}
if (sample_type & PERF_SAMPLE_STREAM_ID) {
data->stream_id = event->id;
header->size += sizeof(data->stream_id);
}
if (sample_type & PERF_SAMPLE_CPU) {
data->cpu_entry.cpu = raw_smp_processor_id();
data->cpu_entry.reserved = 0;
header->size += sizeof(data->cpu_entry);
}
if (sample_type & PERF_SAMPLE_PERIOD)
header->size += sizeof(data->period);
if (sample_type & PERF_SAMPLE_READ)
header->size += perf_event_read_size(event);
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
int size = 1;
@ -3722,23 +3813,26 @@ perf_event_read_event(struct perf_event *event,
struct task_struct *task)
{
struct perf_output_handle handle;
struct perf_sample_data sample;
struct perf_read_event read_event = {
.header = {
.type = PERF_RECORD_READ,
.misc = 0,
.size = sizeof(read_event) + perf_event_read_size(event),
.size = sizeof(read_event) + event->read_size,
},
.pid = perf_event_pid(event, task),
.tid = perf_event_tid(event, task),
};
int ret;
perf_event_header__init_id(&read_event.header, &sample, event);
ret = perf_output_begin(&handle, event, read_event.header.size, 0, 0);
if (ret)
return;
perf_output_put(&handle, read_event);
perf_output_read(&handle, event);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
}
@ -3768,14 +3862,16 @@ static void perf_event_task_output(struct perf_event *event,
struct perf_task_event *task_event)
{
struct perf_output_handle handle;
struct perf_sample_data sample;
struct task_struct *task = task_event->task;
int size, ret;
int ret, size = task_event->event_id.header.size;
size = task_event->event_id.header.size;
ret = perf_output_begin(&handle, event, size, 0, 0);
perf_event_header__init_id(&task_event->event_id.header, &sample, event);
ret = perf_output_begin(&handle, event,
task_event->event_id.header.size, 0, 0);
if (ret)
return;
goto out;
task_event->event_id.pid = perf_event_pid(event, task);
task_event->event_id.ppid = perf_event_pid(event, current);
@ -3785,7 +3881,11 @@ static void perf_event_task_output(struct perf_event *event,
perf_output_put(&handle, task_event->event_id);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
out:
task_event->event_id.header.size = size;
}
static int perf_event_task_match(struct perf_event *event)
@ -3900,11 +4000,16 @@ static void perf_event_comm_output(struct perf_event *event,
struct perf_comm_event *comm_event)
{
struct perf_output_handle handle;
struct perf_sample_data sample;
int size = comm_event->event_id.header.size;
int ret = perf_output_begin(&handle, event, size, 0, 0);
int ret;
perf_event_header__init_id(&comm_event->event_id.header, &sample, event);
ret = perf_output_begin(&handle, event,
comm_event->event_id.header.size, 0, 0);
if (ret)
return;
goto out;
comm_event->event_id.pid = perf_event_pid(event, comm_event->task);
comm_event->event_id.tid = perf_event_tid(event, comm_event->task);
@ -3912,7 +4017,12 @@ static void perf_event_comm_output(struct perf_event *event,
perf_output_put(&handle, comm_event->event_id);
perf_output_copy(&handle, comm_event->comm,
comm_event->comm_size);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
out:
comm_event->event_id.header.size = size;
}
static int perf_event_comm_match(struct perf_event *event)
@ -3957,7 +4067,6 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event)
comm_event->comm_size = size;
comm_event->event_id.header.size = sizeof(comm_event->event_id) + size;
rcu_read_lock();
list_for_each_entry_rcu(pmu, &pmus, entry) {
cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
@ -4038,11 +4147,15 @@ static void perf_event_mmap_output(struct perf_event *event,
struct perf_mmap_event *mmap_event)
{
struct perf_output_handle handle;
struct perf_sample_data sample;
int size = mmap_event->event_id.header.size;
int ret = perf_output_begin(&handle, event, size, 0, 0);
int ret;
perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
ret = perf_output_begin(&handle, event,
mmap_event->event_id.header.size, 0, 0);
if (ret)
return;
goto out;
mmap_event->event_id.pid = perf_event_pid(event, current);
mmap_event->event_id.tid = perf_event_tid(event, current);
@ -4050,7 +4163,12 @@ static void perf_event_mmap_output(struct perf_event *event,
perf_output_put(&handle, mmap_event->event_id);
perf_output_copy(&handle, mmap_event->file_name,
mmap_event->file_size);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
out:
mmap_event->event_id.header.size = size;
}
static int perf_event_mmap_match(struct perf_event *event,
@ -4205,6 +4323,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
static void perf_log_throttle(struct perf_event *event, int enable)
{
struct perf_output_handle handle;
struct perf_sample_data sample;
int ret;
struct {
@ -4226,11 +4345,15 @@ static void perf_log_throttle(struct perf_event *event, int enable)
if (enable)
throttle_event.header.type = PERF_RECORD_UNTHROTTLE;
ret = perf_output_begin(&handle, event, sizeof(throttle_event), 1, 0);
perf_event_header__init_id(&throttle_event.header, &sample, event);
ret = perf_output_begin(&handle, event,
throttle_event.header.size, 1, 0);
if (ret)
return;
perf_output_put(&handle, throttle_event);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
}
@ -4246,6 +4369,13 @@ static int __perf_event_overflow(struct perf_event *event, int nmi,
struct hw_perf_event *hwc = &event->hw;
int ret = 0;
/*
* Non-sampling counters might still use the PMI to fold short
* hardware counters, ignore those.
*/
if (unlikely(!is_sampling_event(event)))
return 0;
if (!throttle) {
hwc->interrupts++;
} else {
@ -4391,7 +4521,7 @@ static void perf_swevent_event(struct perf_event *event, u64 nr,
if (!regs)
return;
if (!hwc->sample_period)
if (!is_sampling_event(event))
return;
if (nr == 1 && hwc->sample_period == 1 && !event->attr.freq)
@ -4554,7 +4684,7 @@ static int perf_swevent_add(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
struct hlist_head *head;
if (hwc->sample_period) {
if (is_sampling_event(event)) {
hwc->last_period = hwc->sample_period;
perf_swevent_set_period(event);
}
@ -4811,15 +4941,6 @@ static int perf_tp_event_init(struct perf_event *event)
if (event->attr.type != PERF_TYPE_TRACEPOINT)
return -ENOENT;
/*
* Raw tracepoint data is a severe data leak, only allow root to
* have these.
*/
if ((event->attr.sample_type & PERF_SAMPLE_RAW) &&
perf_paranoid_tracepoint_raw() &&
!capable(CAP_SYS_ADMIN))
return -EPERM;
err = perf_trace_init(event);
if (err)
return err;
@ -4842,7 +4963,7 @@ static struct pmu perf_tracepoint = {
static inline void perf_tp_register(void)
{
perf_pmu_register(&perf_tracepoint);
perf_pmu_register(&perf_tracepoint, "tracepoint", PERF_TYPE_TRACEPOINT);
}
static int perf_event_set_filter(struct perf_event *event, void __user *arg)
@ -4932,31 +5053,33 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
static void perf_swevent_start_hrtimer(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
s64 period;
if (!is_sampling_event(event))
return;
hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
hwc->hrtimer.function = perf_swevent_hrtimer;
if (hwc->sample_period) {
s64 period = local64_read(&hwc->period_left);
if (period) {
if (period < 0)
period = 10000;
period = local64_read(&hwc->period_left);
if (period) {
if (period < 0)
period = 10000;
local64_set(&hwc->period_left, 0);
} else {
period = max_t(u64, 10000, hwc->sample_period);
}
__hrtimer_start_range_ns(&hwc->hrtimer,
local64_set(&hwc->period_left, 0);
} else {
period = max_t(u64, 10000, hwc->sample_period);
}
__hrtimer_start_range_ns(&hwc->hrtimer,
ns_to_ktime(period), 0,
HRTIMER_MODE_REL_PINNED, 0);
}
}
static void perf_swevent_cancel_hrtimer(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
if (hwc->sample_period) {
if (is_sampling_event(event)) {
ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
local64_set(&hwc->period_left, ktime_to_ns(remaining));
@ -5184,8 +5307,61 @@ static void free_pmu_context(struct pmu *pmu)
out:
mutex_unlock(&pmus_lock);
}
static struct idr pmu_idr;
int perf_pmu_register(struct pmu *pmu)
static ssize_t
type_show(struct device *dev, struct device_attribute *attr, char *page)
{
struct pmu *pmu = dev_get_drvdata(dev);
return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->type);
}
static struct device_attribute pmu_dev_attrs[] = {
__ATTR_RO(type),
__ATTR_NULL,
};
static int pmu_bus_running;
static struct bus_type pmu_bus = {
.name = "event_source",
.dev_attrs = pmu_dev_attrs,
};
static void pmu_dev_release(struct device *dev)
{
kfree(dev);
}
static int pmu_dev_alloc(struct pmu *pmu)
{
int ret = -ENOMEM;
pmu->dev = kzalloc(sizeof(struct device), GFP_KERNEL);
if (!pmu->dev)
goto out;
device_initialize(pmu->dev);
ret = dev_set_name(pmu->dev, "%s", pmu->name);
if (ret)
goto free_dev;
dev_set_drvdata(pmu->dev, pmu);
pmu->dev->bus = &pmu_bus;
pmu->dev->release = pmu_dev_release;
ret = device_add(pmu->dev);
if (ret)
goto free_dev;
out:
return ret;
free_dev:
put_device(pmu->dev);
goto out;
}
int perf_pmu_register(struct pmu *pmu, char *name, int type)
{
int cpu, ret;
@ -5195,13 +5371,38 @@ int perf_pmu_register(struct pmu *pmu)
if (!pmu->pmu_disable_count)
goto unlock;
pmu->type = -1;
if (!name)
goto skip_type;
pmu->name = name;
if (type < 0) {
int err = idr_pre_get(&pmu_idr, GFP_KERNEL);
if (!err)
goto free_pdc;
err = idr_get_new_above(&pmu_idr, pmu, PERF_TYPE_MAX, &type);
if (err) {
ret = err;
goto free_pdc;
}
}
pmu->type = type;
if (pmu_bus_running) {
ret = pmu_dev_alloc(pmu);
if (ret)
goto free_idr;
}
skip_type:
pmu->pmu_cpu_context = find_pmu_context(pmu->task_ctx_nr);
if (pmu->pmu_cpu_context)
goto got_cpu_context;
pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context);
if (!pmu->pmu_cpu_context)
goto free_pdc;
goto free_dev;
for_each_possible_cpu(cpu) {
struct perf_cpu_context *cpuctx;
@ -5245,6 +5446,14 @@ unlock:
return ret;
free_dev:
device_del(pmu->dev);
put_device(pmu->dev);
free_idr:
if (pmu->type >= PERF_TYPE_MAX)
idr_remove(&pmu_idr, pmu->type);
free_pdc:
free_percpu(pmu->pmu_disable_count);
goto unlock;
@ -5264,6 +5473,10 @@ void perf_pmu_unregister(struct pmu *pmu)
synchronize_rcu();
free_percpu(pmu->pmu_disable_count);
if (pmu->type >= PERF_TYPE_MAX)
idr_remove(&pmu_idr, pmu->type);
device_del(pmu->dev);
put_device(pmu->dev);
free_pmu_context(pmu);
}
@ -5273,6 +5486,13 @@ struct pmu *perf_init_event(struct perf_event *event)
int idx;
idx = srcu_read_lock(&pmus_srcu);
rcu_read_lock();
pmu = idr_find(&pmu_idr, event->attr.type);
rcu_read_unlock();
if (pmu)
goto unlock;
list_for_each_entry_rcu(pmu, &pmus, entry) {
int ret = pmu->event_init(event);
if (!ret)
@ -5737,6 +5957,12 @@ SYSCALL_DEFINE5(perf_event_open,
list_add_tail(&event->owner_entry, &current->perf_event_list);
mutex_unlock(&current->perf_event_mutex);
/*
* Precalculate sample_data sizes
*/
perf_event__header_size(event);
perf_event__id_header_size(event);
/*
* Drop the reference on the group_event after placing the
* new event on the sibling_list. This ensures destruction
@ -6089,6 +6315,12 @@ inherit_event(struct perf_event *parent_event,
child_event->ctx = child_ctx;
child_event->overflow_handler = parent_event->overflow_handler;
/*
* Precalculate sample_data sizes
*/
perf_event__header_size(child_event);
perf_event__id_header_size(child_event);
/*
* Link it up in the child's context:
*/
@ -6320,7 +6552,7 @@ static void __cpuinit perf_event_init_cpu(int cpu)
mutex_unlock(&swhash->hlist_mutex);
}
#ifdef CONFIG_HOTPLUG_CPU
#if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC
static void perf_pmu_rotate_stop(struct pmu *pmu)
{
struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
@ -6374,6 +6606,26 @@ static void perf_event_exit_cpu(int cpu)
static inline void perf_event_exit_cpu(int cpu) { }
#endif
static int
perf_reboot(struct notifier_block *notifier, unsigned long val, void *v)
{
int cpu;
for_each_online_cpu(cpu)
perf_event_exit_cpu(cpu);
return NOTIFY_OK;
}
/*
* Run the perf reboot notifier at the very last possible moment so that
* the generic watchdog code runs as long as possible.
*/
static struct notifier_block perf_reboot_notifier = {
.notifier_call = perf_reboot,
.priority = INT_MIN,
};
static int __cpuinit
perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
{
@ -6402,14 +6654,45 @@ void __init perf_event_init(void)
{
int ret;
idr_init(&pmu_idr);
perf_event_init_all_cpus();
init_srcu_struct(&pmus_srcu);
perf_pmu_register(&perf_swevent);
perf_pmu_register(&perf_cpu_clock);
perf_pmu_register(&perf_task_clock);
perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);
perf_pmu_register(&perf_cpu_clock, NULL, -1);
perf_pmu_register(&perf_task_clock, NULL, -1);
perf_tp_register();
perf_cpu_notifier(perf_cpu_notify);
register_reboot_notifier(&perf_reboot_notifier);
ret = init_hw_breakpoint();
WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
}
static int __init perf_event_sysfs_init(void)
{
struct pmu *pmu;
int ret;
mutex_lock(&pmus_lock);
ret = bus_register(&pmu_bus);
if (ret)
goto unlock;
list_for_each_entry(pmu, &pmus, entry) {
if (!pmu->name || pmu->type < 0)
continue;
ret = pmu_dev_alloc(pmu);
WARN(ret, "Failed to register pmu: %s, reason %d\n", pmu->name, ret);
}
pmu_bus_running = 1;
ret = 0;
unlock:
mutex_unlock(&pmus_lock);
return ret;
}
device_initcall(perf_event_sysfs_init);

View File

@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/suspend.h>
#include <trace/events/power.h>
#include "power.h"
@ -201,6 +202,7 @@ int suspend_devices_and_enter(suspend_state_t state)
if (!suspend_ops)
return -ENOSYS;
trace_machine_suspend(state);
if (suspend_ops->begin) {
error = suspend_ops->begin(state);
if (error)
@ -229,6 +231,7 @@ int suspend_devices_and_enter(suspend_state_t state)
Close:
if (suspend_ops->end)
suspend_ops->end();
trace_machine_suspend(PWR_EVENT_EXIT);
return error;
Recover_platform:

View File

@ -8293,8 +8293,6 @@ void __init sched_init(void)
zalloc_cpumask_var(&cpu_isolated_map, GFP_NOWAIT);
#endif /* SMP */
perf_event_init();
scheduler_running = 1;
}

View File

@ -745,8 +745,15 @@ static struct ctl_table kern_table[] = {
.extra1 = &zero,
.extra2 = &one,
},
{
.procname = "nmi_watchdog",
.data = &watchdog_enabled,
.maxlen = sizeof (int),
.mode = 0644,
.proc_handler = proc_dowatchdog_enabled,
},
#endif
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) && !defined(CONFIG_LOCKUP_DETECTOR)
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
{
.procname = "unknown_nmi_panic",
.data = &unknown_nmi_panic,
@ -754,13 +761,6 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "nmi_watchdog",
.data = &nmi_watchdog_enabled,
.maxlen = sizeof (int),
.mode = 0644,
.proc_handler = proc_nmi_enabled,
},
#endif
#if defined(CONFIG_X86)
{

View File

@ -136,7 +136,6 @@ static const struct bin_table bin_kern_table[] = {
{ CTL_INT, KERN_IA64_UNALIGNED, "ignore-unaligned-usertrap" },
{ CTL_INT, KERN_COMPAT_LOG, "compat-log" },
{ CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" },
{ CTL_INT, KERN_NMI_WATCHDOG, "nmi_watchdog" },
{ CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" },
{}
};

View File

@ -69,6 +69,21 @@ config EVENT_TRACING
select CONTEXT_SWITCH_TRACER
bool
config EVENT_POWER_TRACING_DEPRECATED
depends on EVENT_TRACING
bool "Deprecated power event trace API, to be removed"
default y
help
Provides old power event types:
C-state/idle accounting events:
power:power_start
power:power_end
and old cpufreq accounting event:
power:power_frequency
This is for userspace compatibility
and will vanish after 5 kernel iterations,
namely 2.6.41.
config CONTEXT_SWITCH_TRACER
bool

View File

@ -13,5 +13,8 @@
#define CREATE_TRACE_POINTS
#include <trace/events/power.h>
EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
#ifdef EVENT_POWER_TRACING_DEPRECATED
EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
#endif
EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);

View File

@ -21,17 +21,46 @@ typedef typeof(unsigned long [PERF_MAX_TRACE_SIZE / sizeof(unsigned long)])
/* Count the events in use (per event id, not per instance) */
static int total_ref_count;
static int perf_trace_event_perm(struct ftrace_event_call *tp_event,
struct perf_event *p_event)
{
/* No tracing, just counting, so no obvious leak */
if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
return 0;
/* Some events are ok to be traced by non-root users... */
if (p_event->attach_state == PERF_ATTACH_TASK) {
if (tp_event->flags & TRACE_EVENT_FL_CAP_ANY)
return 0;
}
/*
* ...otherwise raw tracepoint data can be a severe data leak,
* only allow root to have these.
*/
if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
return -EPERM;
return 0;
}
static int perf_trace_event_init(struct ftrace_event_call *tp_event,
struct perf_event *p_event)
{
struct hlist_head __percpu *list;
int ret = -ENOMEM;
int ret;
int cpu;
ret = perf_trace_event_perm(tp_event, p_event);
if (ret)
return ret;
p_event->tp_event = tp_event;
if (tp_event->perf_refcount++ > 0)
return 0;
ret = -ENOMEM;
list = alloc_percpu(struct hlist_head);
if (!list)
goto fail;

View File

@ -27,6 +27,12 @@
DEFINE_MUTEX(event_mutex);
DEFINE_MUTEX(event_storage_mutex);
EXPORT_SYMBOL_GPL(event_storage_mutex);
char event_storage[EVENT_STORAGE_SIZE];
EXPORT_SYMBOL_GPL(event_storage);
LIST_HEAD(ftrace_events);
LIST_HEAD(ftrace_common_fields);

View File

@ -83,13 +83,19 @@ static void __always_unused ____ftrace_check_##name(void) \
#undef __array
#define __array(type, item, len) \
BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
ret = trace_define_field(event_call, #type "[" #len "]", #item, \
do { \
BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
mutex_lock(&event_storage_mutex); \
snprintf(event_storage, sizeof(event_storage), \
"%s[%d]", #type, len); \
ret = trace_define_field(event_call, event_storage, #item, \
offsetof(typeof(field), item), \
sizeof(field.item), \
is_signed_type(type), FILTER_OTHER); \
if (ret) \
return ret;
mutex_unlock(&event_storage_mutex); \
if (ret) \
return ret; \
} while (0);
#undef __array_desc
#define __array_desc(type, container, item, len) \

View File

@ -57,6 +57,8 @@ static int __init hardlockup_panic_setup(char *str)
{
if (!strncmp(str, "panic", 5))
hardlockup_panic = 1;
else if (!strncmp(str, "0", 1))
no_watchdog = 1;
return 1;
}
__setup("nmi_watchdog=", hardlockup_panic_setup);
@ -548,13 +550,13 @@ static struct notifier_block __cpuinitdata cpu_nfb = {
.notifier_call = cpu_callback
};
static int __init spawn_watchdog_task(void)
void __init lockup_detector_init(void)
{
void *cpu = (void *)(long)smp_processor_id();
int err;
if (no_watchdog)
return 0;
return;
err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
WARN_ON(notifier_to_errno(err));
@ -562,6 +564,5 @@ static int __init spawn_watchdog_task(void)
cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
register_cpu_notifier(&cpu_nfb);
return 0;
return;
}
early_initcall(spawn_watchdog_task);

View File

@ -173,7 +173,8 @@ config LOCKUP_DETECTOR
An NMI is generated every 60 seconds or so to check for hardlockups.
config HARDLOCKUP_DETECTOR
def_bool LOCKUP_DETECTOR && PERF_EVENTS && HAVE_PERF_EVENTS_NMI
def_bool LOCKUP_DETECTOR && PERF_EVENTS && HAVE_PERF_EVENTS_NMI && \
!ARCH_HAS_NMI_WATCHDOG
config BOOTPARAM_SOFTLOCKUP_PANIC
bool "Panic (Reboot) On Soft Lockups"

View File

@ -214,17 +214,22 @@ ifdef BUILD_C_RECORDMCOUNT
# The empty.o file is created in the make process in order to determine
# the target endianness and word size. It is made before all other C
# files, including recordmcount.
cmd_record_mcount = if [ $(@) != "scripts/mod/empty.o" ]; then \
$(objtree)/scripts/recordmcount "$(@)"; \
fi;
sub_cmd_record_mcount = \
if [ $(@) != "scripts/mod/empty.o" ]; then \
$(objtree)/scripts/recordmcount "$(@)"; \
fi;
else
cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
sub_cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
"$(if $(CONFIG_CPU_BIG_ENDIAN),big,little)" \
"$(if $(CONFIG_64BIT),64,32)" \
"$(OBJDUMP)" "$(OBJCOPY)" "$(CC) $(KBUILD_CFLAGS)" \
"$(LD)" "$(NM)" "$(RM)" "$(MV)" \
"$(if $(part-of-module),1,0)" "$(@)";
endif
cmd_record_mcount = \
if [ "$(findstring -pg,$(_c_flags))" = "-pg" ]; then \
$(sub_cmd_record_mcount) \
fi;
endif
define rule_cc_o_c

View File

@ -24,12 +24,47 @@ OPTIONS
--input=::
Input file name. (default: perf.data)
-d::
--dsos=<dso[,dso...]>::
Only consider symbols in these dsos.
-s::
--symbol=<symbol>::
Symbol to annotate.
-f::
--force::
Don't complain, do it.
-v::
--verbose::
Be more verbose. (Show symbol address, etc)
-D::
--dump-raw-trace::
Dump raw trace in ASCII.
-k::
--vmlinux=<file>::
vmlinux pathname.
-m::
--modules::
Load module symbols. WARNING: use only with -k and LIVE kernel.
-l::
--print-line::
Print matching source lines (may be slow).
-P::
--full-paths::
Don't shorten the displayed pathnames.
--stdio:: Use the stdio interface.
--tui:: Use the TUI interface Use of --tui requires a tty, if one is not
present, as when piping to other commands, the stdio interface is
used. This interfaces starts by centering on the line with more
samples, TAB/UNTAB cycles thru the lines with more samples.
samples, TAB/UNTAB cycles through the lines with more samples.
SEE ALSO
--------

View File

@ -18,6 +18,9 @@ perf report.
OPTIONS
-------
-H::
--with-hits::
Show only DSOs with hits.
-i::
--input=::
Input file name. (default: perf.data)

View File

@ -19,6 +19,18 @@ If no parameters are passed it will assume perf.data.old and perf.data.
OPTIONS
-------
-M::
--displacement::
Show position displacement relative to baseline.
-D::
--dump-raw-trace::
Dump raw trace in ASCII.
-m::
--modules::
Load module symbols. WARNING: use only with -k and LIVE kernel
-d::
--dsos=::
Only consider symbols in these dsos. CSV that understands
@ -42,7 +54,7 @@ OPTIONS
--field-separator=::
Use a special separator character and don't pad with spaces, replacing
all occurances of this separator in symbol names (and other output)
all occurrences of this separator in symbol names (and other output)
with a '.' character, that thus it's the only non valid separator.
-v::
@ -50,6 +62,13 @@ OPTIONS
Be verbose, for instance, show the raw counts in addition to the
diff.
-f::
--force::
Don't complain, do it.
--symfs=<directory>::
Look for files with symbols relative to this directory.
SEE ALSO
--------
linkperf:perf-record[1]

View File

@ -22,7 +22,7 @@ There are a couple of variants of perf kvm:
a performance counter profile of guest os in realtime
of an arbitrary workload.
'perf kvm record <command>' to record the performance couinter profile
'perf kvm record <command>' to record the performance counter profile
of an arbitrary workload and save it into a perf data file. If both
--host and --guest are input, the perf data file name is perf.data.kvm.
If there is no --host but --guest, the file name is perf.data.guest.
@ -40,6 +40,12 @@ There are a couple of variants of perf kvm:
OPTIONS
-------
-i::
--input=::
Input file name.
-o::
--output::
Output file name.
--host=::
Collect host side performance profile.
--guest=::

Some files were not shown because too many files have changed in this diff Show More