Pull timer core updates from Thomas Gleixner:
"The time(r) departement provides:
- more infrastructure work on the year 2038 issue
- a few fixes in the Armada SoC timers
- the usual pile of fixlets and improvements"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: armada-370-xp: Use the reference clock on A375 SoC
watchdog: orion: Use the reference clock on Armada 375 SoC
clocksource: armada-370-xp: Add missing clock enable
time: Fix sign bug in NTP mult overflow warning
time: Remove timekeeping_inject_sleeptime()
rtc: Update suspend/resume timing to use 64bit time
rtc/lib: Provide y2038 safe rtc_tm_to_time()/rtc_time_to_tm() replacement
time: Fixup comments to reflect usage of timespec64
time: Expose get_monotonic_coarse64() for in-kernel uses
time: Expose getrawmonotonic64 for in-kernel uses
time: Provide y2038 safe mktime() replacement
time: Provide y2038 safe timekeeping_inject_sleeptime() replacement
time: Provide y2038 safe do_settimeofday() replacement
time: Complete NTP adjustment threshold judging conditions
time: Avoid possible NTP adjustment mult overflow.
time: Rename udelay_test.c to test_udelay.c
clocksource: sirf: Remove hard-coded clock rate
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.
This instruments might_sleep() checks to catch places that nest
blocking primitives - such as mutex usage in a wait loop. Such
bugs can result in hard to debug races/hangs.
Another category of invalid nesting that this facility will detect
is the calling of blocking functions from within schedule() ->
sched_submit_work() -> blk_schedule_flush_plug().
There's some potential for false positives (if secondary blocking
primitives themselves are not ready yet for this facility), but the
kernel will warn once about such bugs per bootup, so the warning
isn't much of a nuisance.
This feature comes with a number of fixes, for problems uncovered
with it, so no messages are expected normally.
- Another round of sched/numa optimizations and refinements, for
CONFIG_NUMA_BALANCING=y.
- Another round of sched/dl fixes and refinements.
Plus various smaller fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched: Add missing rcu protection to wake_up_all_idle_cpus
sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
sched/numa: Init numa balancing fields of init_task
sched/deadline: Remove unnecessary definitions in cpudeadline.h
sched/cpupri: Remove unnecessary definitions in cpupri.h
sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
sched/fair: Fix stale overloaded status in the busiest group finding logic
sched: Move p->nr_cpus_allowed check to select_task_rq()
sched/completion: Document when to use wait_for_completion_io_*()
sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
sched/fair: Kill task_struct::numa_entry and numa_group::task_list
sched: Refactor task_struct to use numa_faults instead of numa_* pointers
sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
sched/deadline: Reschedule from switched_from_dl() after a successful pull
sched/deadline: Push task away if the deadline is equal to curr during wakeup
sched/deadline: Add deadline rq status print
sched/deadline: Fix artificial overrun introduced by yield_task_dl()
sched/rt: Clean up check_preempt_equal_prio()
sched/core: Use dl_bw_of() under rcu_read_lock_sched()
sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
...
Pull perf events update from Ingo Molnar:
"On the kernel side there's few changes, the one that stands out is
PEBS machine state sampling support on x86, by Stephane Eranian.
On the tooling side:
User visible tooling changes:
- Don't open the DWARF info multiple times, keeping instead a dwfl
handle in struct dso, greatly speeding up 'perf report' on powerpc.
(Sukadev Bhattiprolu)
- Introduce PARSE_OPT_DISABLED option flag and use it to avoid
showing undersired options in tools that provides frontends to
'perf record', like sched, kvm, etc (Namhyung Kim)
- Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo
Carvalho de Melo)
- Fix annotation with kcore (Adrian Hunter)
- Support source line numbers in annotate using a hotkey (Andi Kleen)
- Callchain improvements including:
* Enable printing the srcline in the history
* Make get_srcline fall back to sym+offset (Andi Kleen)
- TUI hist_entry browser fixes, including showing missing overhead
value for first level callchain. Detected comparing the output of
--stdio/--gui (that matched) with --tui, that had this problem.
(Namhyung Kim)
- Support handling complete branch stacks as histograms (Andi Kleen)
Tooling infrastructure changes:
- Prep work for supporting per-pkg and snapshot counters in 'perf
stat' (Jiri Olsa)
- 'perf stat' refactorings, moving stuff from it to evsel.c to use in
per-pkg/snapshot format changes (Jiri Olsa)
- Add per-pkg format file parsing (Matt Fleming)
- Clean up libelf feature support code (Namhyung Kim)
- Add gzip decompression support for kernel modules (Namhyung Kim)
- More prep patches for Intel PT, including a a thread stack and more
stuff made available via the database export mechanism (Adrian
Hunter)
- More Intel PT work, including a facility to export sample data
(comms, threads, symbol names, etc) in a database friendly way,
with an script to use this to create a postgresql database.
(Adrian Hunter)
- Make sure that thread->mg->machine points to the machine where the
thread exists (it was being set only for the kmaps kernel modules
case, do it as well for the mmaps) and use it to shorten function
signatures (Arnaldo Carvalho de Melo)
... and lots of other fixes and smaller improvements"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
perf report: In branch stack mode use address history sorting
perf report: Add --branch-history option
perf callchain: Support handling complete branch stacks as histograms
perf stat: Add support for snapshot counters
perf stat: Add support for per-pkg counters
perf tools: Remove perf_evsel__read interface
perf stat: Use read_counter in read_counter_aggr
perf stat: Make read_counter work over the thread dimension
perf stat: Use perf_evsel__read_cb in read_counter
perf tools: Add snapshot format file parsing
perf tools: Add per-pkg format file parsing
perf evsel: Introduce perf_evsel__read_cb function
perf evsel: Introduce perf_counts_values__scale function
perf evsel: Introduce perf_evsel__compute_deltas function
perf tools: Allow to force redirect pr_debug to stderr.
perf tools: Fix segfault due to invalid kernel dso access
perf callchain: Make get_srcline fall back to sym+offset
perf symbols: Move bfd_demangle stubbing to its only user
perf callchain: Enable printing the srcline in the history
perf tools: Collapse first level callchain entry if it has sibling
...
Pull RCU updates from Ingo Molnar:
"These are the main changes in this cycle:
- Streamline RCU's use of per-CPU variables, shifting from "cpu"
arguments to functions to "this_"-style per-CPU variable
accessors.
- signal-handling RCU updates.
- real-time updates.
- torture-test updates.
- miscellaneous fixes.
- documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
rcu: Fix FIXME in rcu_tasks_kthread()
rcu: More info about potential deadlocks with rcu_read_unlock()
rcu: Optimize cond_resched_rcu_qs()
rcu: Add sparse check for RCU_INIT_POINTER()
documentation: memory-barriers.txt: Correct example for reorderings
documentation: Add atomic_long_t to atomic_ops.txt
documentation: Additional restriction for control dependencies
documentation: Document RCU self test boot params
rcutorture: Fix rcu_torture_cbflood() memory leak
rcutorture: Remove obsolete kversion param in kvm.sh
rcutorture: Remove stale test configurations
rcutorture: Enable RCU self test in configs
rcutorture: Add early boot self tests
torture: Run Linux-kernel binary out of results directory
cpu: Avoid puts_pending overflow
rcu: Remove "cpu" argument to rcu_cleanup_after_idle()
rcu: Remove "cpu" argument to rcu_prepare_for_idle()
rcu: Remove "cpu" argument to rcu_needs_cpu()
rcu: Remove "cpu" argument to rcu_note_context_switch()
rcu: Remove "cpu" argument to rcu_preempt_check_callbacks()
...
Locklessly doing is_idle_task(rq->curr) is only okay because of
RCU protection. The older variant of the broken code checked
rq->curr == rq->idle instead and therefore didn't need RCU.
Fixes: f6be8af1c95d ("sched: Add new API wake_up_if_idle() to wake up the idle cpu")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/729365dddca178506dfd0a9451006344cd6808bc.1417277372.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It appears that some SCHEDULE_USER (asm for schedule_user) callers
in arch/x86/kernel/entry_64.S are called from RCU kernel context,
and schedule_user will return in RCU user context. This causes RCU
warnings and possible failures.
This is intended to be a minimal fix suitable for 3.18.
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 6067dc5a8c2b ("time: Avoid possible NTP adjustment
mult overflow") a new check was added to watch for adjustments
that could cause a mult overflow.
Unfortunately the check compares a signed with unsigned value
and ignored the case where the adjustment was negative, which
causes spurious warn-ons on some systems (and seems like it
would result in problematic time adjustments there as well, due
to the early return).
Thus this patch adds a check to make sure the adjustment is
positive before we check for an overflow, and resovles the issue
in my testing.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Debugged-by: pang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns. I suspect that this is a mistake and that
the code only works because int3 is paranoid.
Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris bisected a NULL pointer deference in task_sched_runtime() to
commit 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency'.
Chris observed crashes in atop or other /proc walking programs when he
started fork bombs on his machine. He assumed that this is a new exit
race, but that does not make any sense when looking at that commit.
What's interesting is that, the commit provides update_curr callbacks
for all scheduling classes except stop_task and idle_task.
While nothing can ever hit that via the clock_nanosleep() and
clock_gettime() interfaces, which have been the target of the commit in
question, the author obviously forgot that there are other code paths
which invoke task_sched_runtime()
do_task_stat(()
thread_group_cputime_adjusted()
thread_group_cputime()
task_cputime()
task_sched_runtime()
if (task_current(rq, p) && task_on_rq_queued(p)) {
update_rq_clock(rq);
up->sched_class->update_curr(rq);
}
If the stats are read for a stomp machine task, aka 'migration/N' and
that task is current on its cpu, this will happily call the NULL pointer
of stop_task->update_curr. Ooops.
Chris observation that this happens faster when he runs the fork bomb
makes sense as the fork bomb will kick migration threads more often so
the probability to hit the issue will increase.
Add the missing update_curr callbacks to the scheduler classes stop_task
and idle_task. While idle tasks cannot be monitored via /proc we have
other means to hit the idle case.
Fixes: 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
Reported-by: Chris Mason <clm@fb.com>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull scheduler fixes from Ingo Molnar:
"Misc fixes: two NUMA fixes, two cputime fixes and an RCU/lockdep fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency
sched/cputime: Fix cpu_timer_sample_group() double accounting
sched/numa: Avoid selecting oneself as swap target
sched/numa: Fix out of bounds read in sched_init_numa()
sched: Remove lockdep check in sched_move_task()
Pull perf fixes from Ingo Molnar:
"Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a
build dependencies fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
perf: Fix corruption of sibling list with hotplug
perf/x86: Fix embarrasing typo
Fix up a few comments that weren't updated when the
functions were converted to use timespec64 structures.
Acked-by: Arnd Bergmann <arnd.bergmann@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Adds a timespec64 based get_monotonic_coarse64() implementation
that can be used as we convert internal users of
get_monotonic_coarse away from using timespecs.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Adds a timespec64 based getrawmonotonic64() implementation
that can be used as we convert internal users of
getrawmonotonic away from using timespecs.
Signed-off-by: John Stultz <john.stultz@linaro.org>
As part of addressing "y2038 problem" for in-kernel uses, this
patch adds safe mktime64() using time64_t.
After this patch, mktime() is deprecated and all its call sites
will be fixed using mktime64(), after that it can be removed.
Signed-off-by: pang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
As part of addressing "y2038 problem" for in-kernel uses, this
patch adds timekeeping_inject_sleeptime64() using timespec64.
After this patch, timekeeping_inject_sleeptime() is deprecated
and all its call sites will be fixed using the new interface,
after that it can be removed.
NOTE: timekeeping_inject_sleeptime() is safe actually, but we
want to eliminate timespec eventually, so comes this patch.
Signed-off-by: pang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The kernel uses 32-bit signed value(time_t) for seconds elapsed
1970-01-01:00:00:00, thus it will overflow at 2038-01-19 03:14:08
on 32-bit systems. This is widely known as the y2038 problem.
As part of addressing "y2038 problem" for in-kernel uses, this patch
adds safe do_settimeofday64() using timespec64.
After this patch, do_settimeofday() is deprecated and all its call
sites will be fixed using do_settimeofday64(), after that it can be
removed.
Signed-off-by: pang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The clocksource mult-adjustment threshold is [mult-maxadj, mult+maxadj],
timekeeping_adjust() only deals with the upper threshold, but misses the
lower threshold.
This patch adds the lower threshold judging condition.
Signed-off-by: pang.xunlei <pang.xunlei@linaro.org>
[jstultz: Minor fix for > 80 char line]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Ideally, __clocksource_updatefreq_scale, selects the largest shift
value possible for a clocksource. This results in the mult memember of
struct clocksource being particularly large, although not so large
that NTP would adjust the clock to cause it to overflow.
That said, nothing actually prohibits an overflow from occuring, its
just that it "shouldn't" occur.
So while very unlikely, and so far never observed, the value of
(cs->mult+cs->maxadj) may have a chance to reach very near 0xFFFFFFFF,
so there is a possibility it may overflow when doing NTP positive
adjustment
See the following detail: When NTP slewes the clock, kernel goes
through update_wall_time()->...->timekeeping_apply_adjustment():
tk->tkr.mult += mult_adj;
Since there is no guard against it, its possible tk->tkr.mult may
overflow during this operation.
This patch avoids any possible mult overflow by judging the overflow
case before adding mult_adj to mult, also adds the WARNING message
when capturing such case.
Signed-off-by: pang.xunlei <pang.xunlei@linaro.org>
[jstultz: Reworded commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Kees requested that this test module be renamed for consistency sake,
so this patch renames the udelay_test.c file (recently added to
tip/timers/core for 3.17) to test_udelay.c
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg KH <greg@kroah.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linux-Next <linux-next@vger.kernel.org>
Cc: David Riley <davidriley@chromium.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
This patch reorders fields in the perf_sample_data struct in order to
minimize the number of cachelines touched in perf_sample_data_init().
It also removes some intializations which are redundant with the code
in kernel/events/core.c
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411559322-16548-7-git-send-email-eranian@google.com
Cc: cebbert.lkml@gmail.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: jolsa@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Enable capture of interrupted machine state for each sample.
Registers to sample are passed per event in the sample_regs_intr bitmask.
To sample interrupt machine state, the PERF_SAMPLE_INTR_REGS must be passed in
sample_type.
The list of available registers is arch dependent and provided by asm/perf_regs.h
Registers are laid out as u64 in the order of the bit order of sample_intr_regs.
This patch also adds a new ABI version PERF_ATTR_SIZE_VER4 because we extend
the perf_event_attr struct with a new u64 field.
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: cebbert.lkml@gmail.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Actually, cpudl_set() and cpudl_init() can never be used without
CONFIG_SMP.
Signed-off-by: pang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415260327-30465-4-git-send-email-pang.xunlei@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Actually, cpupri_set() and cpupri_init() can never be used without
CONFIG_SMP.
Signed-off-by: pang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: "pang.xunlei" <pang.xunlei@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415260327-30465-1-git-send-email-pang.xunlei@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Do not call dequeue_pushable_dl_task() when failing to push an eligible
task, as it remains pushable, merely not at this particular moment.
Actually the patch is the same behavior as commit 311e800e16f6 ("sched,
rt: Fix rq->rt.pushable_tasks bug in push_rt_task()" in -rt side.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415258564-8573-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit caeb178c60f4 ("sched/fair: Make update_sd_pick_busiest() return
'true' on a busier sd") changes groups to be ranked in the order of
overloaded > imbalance > other, and busiest group is picked according
to this order.
sgs->group_capacity_factor is used to check if the group is overloaded.
When the child domain prefers tasks to go to siblings first, the
sgs->group_capacity_factor will be set lower than one in order to
move all the excess tasks away.
However, group overloaded status is not updated when
sgs->group_capacity_factor is set to lower than one, which leads to us
missing to find the busiest group.
This patch fixes it by updating group overloaded status when sg capacity
factor is set to one, in order to find the busiest group accurately.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415144690-25196-1-git-send-email-wanpeng.li@linux.intel.com
[ Fixed the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the p->nr_cpus_allowed check into kernel/sched/core.c: select_task_rq().
This change will make fair.c, rt.c, and deadline.c all start with the
same logic.
Suggested-and-Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "pang.xunlei" <pang.xunlei@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415150077-59053-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As discussed in [1], accounting IO is meant for blkio only. Document that
so driver authors won't use them for device io.
[1] http://thread.gmane.org/gmane.linux.drivers.i2c/20470
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415098901-2768-1-git-send-email-wsa@the-dreams.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit d670ec13178d0 "posix-cpu-timers: Cure SMP wobbles" fixes one glibc
test case in cost of breaking another one. After that commit, calling
clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result
of Y time being smaller than X time.
Reproducer/tester can be found further below, it can be compiled and ran by:
gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread
while ./tst-cpuclock2 ; do : ; done
This reproducer, when running on a buggy kernel, will complain
about "clock_gettime difference too small".
Issue happens because on start in thread_group_cputimer() we initialize
sum_exec_runtime of cputimer with threads runtime not yet accounted and
then add the threads runtime to running cputimer again on scheduler
tick, making it's sum_exec_runtime bigger than actual threads runtime.
KOSAKI Motohiro posted a fix for this problem, but that patch was never
applied: https://lkml.org/lkml/2013/5/26/191 .
This patch takes different approach to cure the problem. It calls
update_curr() when cputimer starts, that assure we will have updated
stats of running threads and on the next schedule tick we will account
only the runtime that elapsed from cputimer start. That also assure we
have consistent state between cpu times of individual threads and cpu
time of the process consisted by those threads.
Full reproducer (tst-cpuclock2.c):
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>
#include <stdint.h>
#include <inttypes.h>
/* Parameters for the Linux kernel ABI for CPU clocks. */
#define CPUCLOCK_SCHED 2
#define MAKE_PROCESS_CPUCLOCK(pid, clock) \
((~(clockid_t) (pid) << 3) | (clockid_t) (clock))
static pthread_barrier_t barrier;
/* Help advance the clock. */
static void *chew_cpu(void *arg)
{
pthread_barrier_wait(&barrier);
while (1) ;
return NULL;
}
/* Don't use the glibc wrapper. */
static int do_nanosleep(int flags, const struct timespec *req)
{
clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED);
return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL);
}
static int64_t tsdiff(const struct timespec *before, const struct timespec *after)
{
int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec;
int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec;
return after_i - before_i;
}
int main(void)
{
int result = 0;
pthread_t th;
pthread_barrier_init(&barrier, NULL, 2);
if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) {
perror("pthread_create");
return 1;
}
pthread_barrier_wait(&barrier);
/* The test. */
struct timespec before, after, sleeptimeabs;
int64_t sleepdiff, diffabs;
const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 };
/* The relative nanosleep. Not sure why this is needed, but its presence
seems to make it easier to reproduce the problem. */
if (do_nanosleep(0, &sleeptime) != 0) {
perror("clock_nanosleep");
return 1;
}
/* Get the current time. */
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) {
perror("clock_gettime[2]");
return 1;
}
/* Compute the absolute sleep time based on the current time. */
uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec;
sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000;
sleeptimeabs.tv_nsec = nsec % 1000000000;
/* Sleep for the computed time. */
if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) {
perror("absolute clock_nanosleep");
return 1;
}
/* Get the time after the sleep. */
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) {
perror("clock_gettime[3]");
return 1;
}
/* The time after sleep should always be equal to or after the absolute sleep
time passed to clock_nanosleep. */
sleepdiff = tsdiff(&sleeptimeabs, &after);
if (sleepdiff < 0) {
printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff);
result = 1;
printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec);
printf("After %llu.%09llu\n", after.tv_sec, after.tv_nsec);
printf("Sleep %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec);
}
/* The difference between the timestamps taken before and after the
clock_nanosleep call should be equal to or more than the duration of the
sleep. */
diffabs = tsdiff(&before, &after);
if (diffabs < sleeptime.tv_nsec) {
printf("clock_gettime difference too small: %" PRId64 "\n", diffabs);
result = 1;
}
pthread_cancel(th);
return result;
}
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While looking over the cpu-timer code I found that we appear to add
the delta for the calling task twice, through:
cpu_timer_sample_group()
thread_group_cputimer()
thread_group_cputime()
times->sum_exec_runtime += task_sched_runtime();
*sample = cputime.sum_exec_runtime + task_delta_exec();
Which would make the sample run ahead, making the sleep short.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because the whole numa task selection stuff runs with preemption
enabled (its long and expensive) we can end up migrating and selecting
oneself as a swap target. This doesn't really work out well -- we end
up trying to acquire the same lock twice for the swap migrate -- so
avoid this.
Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141110100328.GF29390@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a CPU hotplugged out, we call perf_remove_from_context() (via
perf_event_exit_cpu()) to rip each CPU-bound event out of its PMU's cpu
context, but leave siblings grouped together. Freeing of these events is
left to the mercy of the usual refcounting.
When a CPU-bound event's refcount drops to zero we cross-call to
__perf_remove_from_context() to clean it up, detaching grouped siblings.
This works when the relevant CPU is online, but will fail if the CPU is
currently offline, and we won't detach the event from its siblings
before freeing the event, leaving the sibling list corrupt. If the
sibling list is later walked (e.g. because the CPU cam online again
before a remaining sibling's refcount drops to zero), we will walk the
now corrupted siblings list, potentially dereferencing garbage values.
Given that the events should never be scheduled again (as we removed
them from their context), we can simply detatch siblings when the CPU
goes down in the first place. If the CPU comes back online, the
redundant call to __perf_remove_from_context() is safe.
Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: vincent.weaver@maine.edu
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415203904-25308-2-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Fix a crash in the suspend-to-idle code path introduced by a
recent commit that forgot to check a pointer against NULL before
dereferencing it (Dmitry Eremin-Solenikov).
- Fix a boot crash on Exynos5 introduced by a recent commit making
that platform use generic Device Tree bindings for power domains
which exposed a weakness in the generic power domains framework
leading to that crash (Ulf Hansson).
- Fix a crash during system resume on systems where cpufreq depends
on Operation Performance Points (OPP) for functionality, but
CONFIG_OPP is not set. This leads the cpufreq driver registration
to fail, but the resume code attempts to restore the pre-suspend
cpufreq configuration (which does not exist) nevertheless and
crashes. From Geert Uytterhoeven.
- Add a new ACPI blacklist entry for Dell Vostro 3546 that has
problems if it is reported as Windows 8 compatible to the BIOS
(Adam Lee).
- Fix swapped arguments in an error message in the cpufreq-dt
driver (Abhilash Kesavan).
- Fix up the prototypes of new callbacks in struct generic_pm_domain
to make them more useful. Users of those callbacks will be added
in 3.19 and it's better for them to be based on the correct struct
definition in mainline from the start. From Ulf Hansson and
Kevin Hilman.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUZhtnAAoJEILEb/54YlRxx1EP/0Rk7pJUHeOMmdyXyY7B+n+f
MlXHVMDhskT370fsdTGbpeYb5ATr5kGatfhr+vyDQmBtxdw7lDJxKq54s6kmmIL3
SEMRRb4NtkPsdDE7zq985JmjsrnHtKxC5NjSUwEGxdyyfAZxll4mrZL6RrqXCu44
L+qdVXRffCCrJDXZl5FZUpSZ3ZUc+xTiaDy7ObjLe2bwmzvBOAwS2flBMKxN9X+e
khlGdQZ0e9T2Y3IXriHxHMui8OVbkPyYZkW1aubCd0HwuTMP7sebosX/2JWdJOmg
q6bGcvPlBwXDRoShlzFO8CN5w5E8fIe0vfPcg9SB3s21S7rJEbYQX/5ytm107aJj
Ysv7mcb2dAHG0V3J7hxhkS+7UNPxfk3G+8frxW2UQ6eIDlZkBORIUhGCzeSbIGYM
aIKiomN4jGuPeaOkEnKl4RwMlzjuzAs2V06viffbq63eyWBvtHDW8M5bdq901pXp
1jOT7yKqLzOZYqcYaLr3z+IBw/+hfuG/FdCp3uGyFqeHPBNIP3BfFnWm6A6E13b+
aC6gvhQHojT7L2gqIBJ+Qn0EiRWNqwoLk6w6DLDYJna/hYyoXq0BKv+/x2OegItU
ENKYVpfmSt3YsEhcTBW4h5IpUvK07o5Oa3nTxen6924Im61dMyaSUDD5DiaqCgXO
bVJTsF983hBZGTy0IMX/
=wQxT
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are three regression fixes, two recent (generic power domains,
suspend-to-idle) and one older (cpufreq), an ACPI blacklist entry for
one more machine having problems with Windows 8 compatibility, a minor
cpufreq driver fix (cpufreq-dt) and a fixup for new callback
definitions (generic power domains).
Specifics:
- Fix a crash in the suspend-to-idle code path introduced by a recent
commit that forgot to check a pointer against NULL before
dereferencing it (Dmitry Eremin-Solenikov).
- Fix a boot crash on Exynos5 introduced by a recent commit making
that platform use generic Device Tree bindings for power domains
which exposed a weakness in the generic power domains framework
leading to that crash (Ulf Hansson).
- Fix a crash during system resume on systems where cpufreq depends
on Operation Performance Points (OPP) for functionality, but
CONFIG_OPP is not set. This leads the cpufreq driver registration
to fail, but the resume code attempts to restore the pre-suspend
cpufreq configuration (which does not exist) nevertheless and
crashes. From Geert Uytterhoeven.
- Add a new ACPI blacklist entry for Dell Vostro 3546 that has
problems if it is reported as Windows 8 compatible to the BIOS
(Adam Lee).
- Fix swapped arguments in an error message in the cpufreq-dt driver
(Abhilash Kesavan).
- Fix up the prototypes of new callbacks in struct generic_pm_domain
to make them more useful. Users of those callbacks will be added
in 3.19 and it's better for them to be based on the correct struct
definition in mainline from the start. From Ulf Hansson and Kevin
Hilman"
* tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: Fix initial default state of the need_restore flag
PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set
PM / Domains: Change prototype for the attach and detach callbacks
cpufreq: Avoid crash in resume on SMP without OPP
cpufreq: cpufreq-dt: Fix arguments in clock failure error message
ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546
Commit 69361eef9056 ("panic: add TAINT_SOFTLOCKUP") added the 'L' flag,
but failed to update the comments for print_tainted(). So, update the
comments.
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit affines rcu_tasks_kthread() to the housekeeping CPUs
in CONFIG_NO_HZ_FULL builds. This is just a default, so systems
administrators are free to put this kthread somewhere else if they wish.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull audit fixes from Paul Moore:
"After he sent the initial audit pull request for 3.18, Eric asked me
to take over the management of the audit tree, hence this pull request
to fix a couple of problems with audit.
As you can see below, the changes are minimal: adding some whitespace
to a string so userspace parses it correctly, and fixing a problem
with audit's usage of fsnotify that was causing audit watch rules to
be lost. Neither of these patches were very controversial on the
mailing lists and they fix real problems, getting them into 3.18 would
be a good thing"
* 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit:
audit: keep inode pinned
audit: AUDIT_FEATURE_CHANGE message format missing delimiting space
Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.
The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.
Adding any mask should fix this.
Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org # 2.6.36+
Signed-off-by: Paul Moore <pmoore@redhat.com>
If the read loop in trace_buffers_splice_read() keeps failing due to
memory allocation failures without reading even a single page then this
function will keep busy looping.
Remove the risk for that by exiting the function if memory allocation
failures are seen.
Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.in
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
lockup:
# trace-cmd record -e raw_syscalls:* -F false
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
...
Call Trace:
[<ffffffff8105b580>] ? __wake_up_common+0x90/0x90
[<ffffffff81092e25>] wait_on_pipe+0x35/0x40
[<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0
[<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0
[<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40
[<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290
[<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0
[<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
[<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60
[<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
[<ffffffff8112415d>] do_splice_to+0x6d/0x90
[<ffffffff81126971>] SyS_splice+0x7c1/0x800
[<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8
The problem is this: tracing_buffers_splice_read() calls
ring_buffer_wait() to wait for data in the ring buffers. The buffers
are not empty so ring_buffer_wait() returns immediately. But
tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
meaning it only wants to read a full page. When the full page is not
available, tracing_buffers_splice_read() tries to wait again with
ring_buffer_wait(), which again returns immediately, and so on.
Fix this by adding a "full" argument to ring_buffer_wait() which will
make ring_buffer_wait() wait until the writer has left the reader's
page, i.e. until full-page reads will succeed.
Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in
Cc: stable@vger.kernel.org # 3.16+
Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
On latest mm + KASan patchset I've got this:
==================================================================
BUG: AddressSanitizer: out of bounds access in sched_init_smp+0x3ba/0x62c at addr ffff88006d4bee6c
=============================================================================
BUG kmalloc-8 (Not tainted): kasan error
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in alloc_vfsmnt+0xb0/0x2c0 age=75 cpu=0 pid=0
__slab_alloc+0x4b4/0x4f0
__kmalloc_track_caller+0x15f/0x1e0
kstrdup+0x44/0x90
alloc_vfsmnt+0xb0/0x2c0
vfs_kern_mount+0x35/0x190
kern_mount_data+0x25/0x50
pid_ns_prepare_proc+0x19/0x50
alloc_pid+0x5e2/0x630
copy_process.part.41+0xdf5/0x2aa0
do_fork+0xf5/0x460
kernel_thread+0x21/0x30
rest_init+0x1e/0x90
start_kernel+0x522/0x531
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x15b/0x16a
INFO: Slab 0xffffea0001b52f80 objects=24 used=22 fp=0xffff88006d4befc0 flags=0x100000000004080
INFO: Object 0xffff88006d4bed20 @offset=3360 fp=0xffff88006d4bee70
Bytes b4 ffff88006d4bed10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
Object ffff88006d4bed20: 70 72 6f 63 00 6b 6b a5 proc.kk.
Redzone ffff88006d4bed28: cc cc cc cc cc cc cc cc ........
Padding ffff88006d4bee68: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B 3.18.0-rc3-mm1+ #108
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
ffff88006d4be000 0000000000000000 ffff88006d4bed20 ffff88006c86fd18
ffffffff81cd0a59 0000000000000058 ffff88006d404240 ffff88006c86fd48
ffffffff811fa3a8 ffff88006d404240 ffffea0001b52f80 ffff88006d4bed20
Call Trace:
dump_stack (lib/dump_stack.c:52)
print_trailer (mm/slub.c:645)
object_err (mm/slub.c:652)
? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
kasan_report_error (mm/kasan/report.c:102 mm/kasan/report.c:178)
? kasan_poison_shadow (mm/kasan/kasan.c:48)
? kasan_unpoison_shadow (mm/kasan/kasan.c:54)
? kasan_poison_shadow (mm/kasan/kasan.c:48)
? kasan_kmalloc (mm/kasan/kasan.c:311)
__asan_load4 (mm/kasan/kasan.c:371)
? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
kernel_init_freeable (init/main.c:869 init/main.c:997)
? finish_task_switch (kernel/sched/sched.h:1036 kernel/sched/core.c:2248)
? rest_init (init/main.c:924)
kernel_init (init/main.c:929)
? rest_init (init/main.c:924)
ret_from_fork (arch/x86/kernel/entry_64.S:348)
? rest_init (init/main.c:924)
Read of size 4 by task swapper/0:
Memory state around the buggy address:
ffff88006d4beb80: fc fc fc fc fc fc fc fc fc fc 00 fc fc fc fc fc
ffff88006d4bec00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88006d4bec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88006d4bed00: fc fc fc fc 00 fc fc fc fc fc fc fc fc fc fc fc
ffff88006d4bed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88006d4bee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 04 fc
^
ffff88006d4bee80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88006d4bef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88006d4bef80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff88006d4bf000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88006d4bf080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Zero 'level' (e.g. on non-NUMA system) causing out of bounds
access in this line:
sched_max_numa_distance = sched_domains_numa_distance[level - 1];
Fix this by exiting from sched_init_numa() earlier.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Fixes: 9942f79ba ("sched/numa: Export info needed for NUMA balancing on complex topologies")
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1415372020-1871-1-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If no freeze_ops is set, trying to enter suspend-to-IDLE will cause a
nice oops in platform_suspend_prepare_late(). Add respective checks to
platform_suspend_prepare_late() and platform_resume_early() functions.
Fixes: a8d46b9e4e48 (ACPI / sleep: Rework the handling of ACPI GPE wakeup ...)
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch simplifies task_struct by removing the four numa_* pointers
in the same array and replacing them with the array pointer. By doing this,
on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on
x86_64).
A new parameter is added to the task_faults_idx function so that it can return
an index to the correct offset, corresponding with the old precalculated
pointers.
All of the code in sched/ that depended on task_faults_idx and numa_* was
changed in order to match the new logic.
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: dave@stgolabs.net
Cc: riel@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfell
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are both UP and SMP version of pull_dl_task(), so don't need
to check CONFIG_SMP in switched_from_dl();
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-6-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In switched_from_dl() we have to issue a resched if we successfully
pulled some task from other cpus. This patch also aligns the behavior
with -rt.
Suggested-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-5-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch pushes task away if the dealine of the task is equal
to current during wake up. The same behavior as rt class.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-4-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>