linux-next

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git synced 2025-01-14 17:53:39 +00:00

Author	SHA1	Message	Date
Paul Turner	f269ae0469	sched: Update_cfs_shares at period edge Now that our measurement intervals are small (~1ms) we can amortize the posting of update_shares() to be about each period overflow. This is a large cost saving for frequently switching tasks. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141507.200772172@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:29 +02:00
Paul Turner	48a1675323	sched: Refactor update_shares_cpu() -> update_blocked_avgs() Now that running entities maintain their own load-averages the work we must do in update_shares() is largely restricted to the periodic decay of blocked entities. This allows us to be a little less pessimistic regarding our occupancy on rq->lock and the associated rq->clock updates required. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141507.133999170@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:28 +02:00
Paul Turner	82958366cf	sched: Replace update_shares weight distribution with per-entity computation Now that the machinery in place is in place to compute contributed load in a bottom up fashion; replace the shares distribution code within update_shares() accordingly. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141507.061208672@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:28 +02:00
Paul Turner	f1b17280ef	sched: Maintain runnable averages across throttled periods With bandwidth control tracked entities may cease execution according to user specified bandwidth limits. Charging this time as either throttled or blocked however, is incorrect and would falsely skew in either direction. What we actually want is for any throttled periods to be "invisible" to load-tracking as they are removed from the system for that interval and contribute normally otherwise. Do this by moderating the progression of time to omit any periods in which the entity belonged to a throttled hierarchy. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.998912151@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:27 +02:00
Paul Turner	bb17f65571	sched: Normalize tg load contributions against runnable time Entities of equal weight should receive equitable distribution of cpu time. This is challenging in the case of a task_group's shares as execution may be occurring on multiple cpus simultaneously. To handle this we divide up the shares into weights proportionate with the load on each cfs_rq. This does not however, account for the fact that the sum of the parts may be less than one cpu and so we need to normalize: load(tg) = min(runnable_avg(tg), 1) * tg->shares Where runnable_avg is the aggregate time in which the task_group had runnable children. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.930124292@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:26 +02:00
Paul Turner	8165e145ce	sched: Compute load contribution by a group entity Unlike task entities who have a fixed weight, group entities instead own a fraction of their parenting task_group's shares as their contributed weight. Compute this fraction so that we can correctly account hierarchies and shared entity nodes. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.855074415@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:25 +02:00
Paul Turner	c566e8e9e4	sched: Aggregate total task_group load Maintain a global running sum of the average load seen on each cfs_rq belonging to each task group so that it may be used in calculating an appropriate shares:weight distribution. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.792901086@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:24 +02:00
Paul Turner	aff3e49884	sched: Account for blocked load waking back up When a running entity blocks we migrate its tracked load to cfs_rq->blocked_runnable_avg. In the sleep case this occurs while holding rq->lock and so is a natural transition. Wake-ups however, are potentially asynchronous in the presence of migration and so special care must be taken. We use an atomic counter to track such migrated load, taking care to match this with the previously introduced decay counters so that we don't migrate too much load. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.726077467@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:23 +02:00
Paul Turner	0a74bef8be	sched: Add an rq migration call-back to sched_class Since we are now doing bottom up load accumulation we need explicit notification when a task has been re-parented so that the old hierarchy can be updated. Adds: migrate_task_rq(struct task_struct *p, int next_cpu) (The alternative is to do this out of __set_task_cpu, but it was suggested that this would be a cleaner encapsulation.) Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.660023400@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:23 +02:00
Paul Turner	9ee474f556	sched: Maintain the load contribution of blocked entities We are currently maintaining: runnable_load(cfs_rq) = \Sum task_load(t) For all running children t of cfs_rq. While this can be naturally updated for tasks in a runnable state (as they are scheduled); this does not account for the load contributed by blocked task entities. This can be solved by introducing a separate accounting for blocked load: blocked_load(cfs_rq) = \Sum runnable(b) * weight(b) Obviously we do not want to iterate over all blocked entities to account for their decay, we instead observe that: runnable_load(t) = \Sum p_iy^i and that to account for an additional idle period we only need to compute: yrunnable_load(t). This means that we can compute all blocked entities at once by evaluating: blocked_load(cfs_rq)` = y * blocked_load(cfs_rq) Finally we maintain a decay counter so that when a sleeping entity re-awakens we can determine how much of its load should be removed from the blocked sum. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.585389902@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:22 +02:00
Paul Turner	2dac754e10	sched: Aggregate load contributed by task entities on parenting cfs_rq For a given task t, we can compute its contribution to load as: task_load(t) = runnable_avg(t) * weight(t) On a parenting cfs_rq we can then aggregate: runnable_load(cfs_rq) = \Sum task_load(t), for all runnable children t Maintain this bottom up, with task entities adding their contributed load to the parenting cfs_rq sum. When a task entity's load changes we add the same delta to the maintained sum. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.514678907@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:21 +02:00
Ben Segall	18bf2805d9	sched: Maintain per-rq runnable averages Since runqueues do not have a corresponding sched_entity we instead embed a sched_avg structure directly. Signed-off-by: Ben Segall <bsegall@google.com> Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.442637130@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:20 +02:00
Paul Turner	9d85f21c94	sched: Track the runnable average on a per-task entity basis Instead of tracking averaging the load parented by a cfs_rq, we can track entity load directly. With the load for a given cfs_rq then being the sum of its children. To do this we represent the historical contribution to runnable average within each trailing 1024us of execution as the coefficients of a geometric series. We can express this for a given task t as: runnable_sum(t) = \Sum u_i * y^i, runnable_avg_period(t) = \Sum 1024 * y^i load(t) = weight_t * runnable_sum(t) / runnable_avg_period(t) Where: u_i is the usage in the last i`th 1024us period (approximately 1ms) ~ms and y is chosen such that y^k = 1/2. We currently choose k to be 32 which roughly translates to about a sched period. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.372695337@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:27:18 +02:00
Chuansheng Liu	351f181f91	timers, sched: Correct the comments for tick_sched_timer() In the comments of function tick_sched_timer(), the sentence "timer->base->cpu_base->lock held" is not right. In function __run_hrtimer(), before call timer->function(), the cpu_base->lock has been unlocked. Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Cc: fei.li@intel.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351098455.15558.1421.camel@cliu38-desktop-build Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-24 10:16:51 +02:00
Daniel Vetter	6b898c07cb	console: use might_sleep in console_lock Instead of BUG_ON(in_interrupt()), since that doesn't check for all the newfangled stuff like preempt. Note that this is valid since the console_sem is essentially used like a real mutex with only two twists: - we allow trylock from hardirq context - across suspend/resume we lock the logical console_lock, but drop the semaphore protecting the locking state. Now that doesn't guarantee that no one is playing tricks in single-thread atomic contexts at suspend/resume/boot time, but - I couldn't find anything suspicious with some grepping, - might_sleep shouldn't die, - and I think the upside of catching more potential issues is worth the risk of getting a might_sleep backtrace that would have been save (and then dealing with that fallout). Cc: Dave Airlie <airlied@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-23 20:14:55 -07:00
Linus Torvalds	e17b131583	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Most of these are uprobes race fixes from Oleg, and their preparatory cleanups. (It's larger than what I'd normally send for an -rc kernel, but they looked significant enough to not delay them.) There's also an oprofile fix and an uncore PMU fix." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) perf/x86: Disable uncore on virtualized CPUs oprofile, x86: Fix wrapping bug in op_x86_get_ctrl() ring-buffer: Check for uninitialized cpu buffer before resizing uprobes: Fix the racy uprobe->flags manipulation uprobes: Fix prepare_uprobe() race with itself uprobes: Introduce prepare_uprobe() uprobes: Fix handle_swbp() vs unregister() + register() race uprobes: Do not delete uprobe if uprobe_unregister() fails uprobes: Don't return success if alloc_uprobe() fails uprobes/x86: Only rep+nop can be emulated correctly uprobes: Simplify is_swbp_at_addr(), remove stale comments uprobes: Kill set_orig_insn()->is_swbp_at_addr() uprobes: Introduce copy_opcode(), kill read_opcode() uprobes: Kill set_swbp()->is_swbp_at_addr() uprobes: Restrict valid_vma(false) to skip VM_SHARED vmas uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC uprobes: Change write_opcode() to use FOLL_FORCE uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume() uprobes: Kill UTASK_BP_HIT state uprobes: Fix UPROBE_SKIP_SSTEP checks in handle_swbp() ...	2012-10-24 04:07:51 +03:00
Paul E. McKenney	53bb857c37	rcu: Dump number of callbacks in stall warning messages In theory, if a grace period manages to get started despite there being no callbacks on any of the CPUs, all CPUs could go into dyntick-idle mode, so that the grace period would never end. This commit updates the RCU CPU stall warning messages to detect this condition by summing up the number of callbacks on all CPUs. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:55:27 -07:00
Paul E. McKenney	eee0588261	rcu: Add grace-period information to RCU CPU stall warnings This commit causes the last grace period started and completed to be printed on RCU CPU stall warning messages in order to aid diagnosis. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:55:26 -07:00
Paul E. McKenney	b637a328bd	rcu: Print remote CPU's stacks in stall warnings The RCU CPU stall warnings rely on trigger_all_cpu_backtrace() to do NMI-based dump of the stack traces of all CPUs. Unfortunately, a number of architectures do not implement trigger_all_cpu_backtrace(), in which case RCU falls back to just dumping the stack of the running CPU. This is unhelpful in the case where the running CPU has detected that some other CPU has stalled. This commit therefore makes the running CPU dump the stacks of the tasks running on the stalled CPUs. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:55:25 -07:00
Lai Jiangshan	f2ebfbc991	srcu: Export process_srcu() Because process_srcu() will be used in DEFINE_SRCU(), which is a macro that could be expanded pretty much anywhere, it can no longer be static. Note that process_srcu() is still internal to srcu.h. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:54:42 -07:00
Lai Jiangshan	4e87b2d7e8	srcu: Credit Lai Jiangshan with SRCU rewrite Lai Jiangshan rewrote SRCU, so this commit ensures that he gets his proper share of blame^Wcredit. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:54:41 -07:00
Paul E. McKenney	340f588bba	rcu: Fix precedence error in cpu_needs_another_gp() The fix introduced by a10d206e (rcu: Fix day-one dyntick-idle stall-warning bug) has a C-language precedence error. It turns out that this error is harmless in that the same result is computed for all inputs, but the code is nevertheless a potential source of confusion. This commit therefore introduces parentheses in order to force the execution of the code to reflect the intent. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:54:09 -07:00
Antti P Miettinen	3705b88db0	rcu: Add a module parameter to force use of expedited RCU primitives There have been some embedded applications that would benefit from use of expedited grace-period primitives. In some ways, this is similar to synchronize_net() doing either a normal or an expedited grace period depending on lock state, but with control outside of the kernel. This commit therefore adds rcu_expedited boot and sysfs parameters that cause the kernel to substitute expedited primitives for the normal grace-period primitives. [ paulmck: Add trace/event/rcu.h to kernel/srcu.c to avoid build error. Get rid of infinite loop through contention path.] Signed-off-by: Antti P Miettinen <amiettinen@nvidia.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:54:08 -07:00
Frederic Weisbecker	4d9a5d4319	rcu: Remove rcu_switch() It's only there to call rcu_user_hooks_switch(). Let's just call rcu_user_hooks_switch() directly, we don't need this function in the middle. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:54:06 -07:00
Paul E. McKenney	489832609a	rcu: Make rcutorture give diagnostics if CPU offline fails This commit causes rcutorture to print the errno if cpu_down() fails when the rcutorture "verbose" module parameter is specified. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:46:47 -07:00
Paul E. McKenney	abfd6e58ae	rcu: Fix comment about _rcu_barrier()/orphanage exclusion In the old days, _rcu_barrier() acquired ->onofflock to exclude rcu_send_cbs_to_orphanage(), which allowed the latter to avoid memory barriers in callback handling. However, _rcu_barrier() recently started doing get_online_cpus() to lock out CPU-hotplug operations entirely, which means that the comment in rcu_send_cbs_to_orphanage() that talks about ->onofflock is now obsolete. This commit therefore fixes the comment. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-23 14:46:47 -07:00
Daniel Vetter	daee779718	console: implement lockdep support for console_lock Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock: https://lkml.org/lkml/2012/8/21/36 Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat. Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen. Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually. There are a few special cases: - The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything. - In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either. - The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that. This all leaves us with annotating the normal console_lock, _unlock and _trylock functions. And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit: ====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted ------------------------------------------------------- kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7 but task is already holding lock: (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (console_lock){+.+.+.}: [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81040190>] console_lock+0x59/0x5b [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4 [<ffffffff810584a2>] worker_thread+0x1a7/0x24b [<ffffffff8105ca29>] kthread+0x7f/0x87 [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10 -> #0 ((&info->queue)){+.+...}: [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6 [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81058cab>] wait_on_work+0x3b/0xa7 [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102 [<ffffffff81058e33>] cancel_work_sync+0xb/0xd [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc [<ffffffff81264793>] bind_con_driver+0x145/0x263 [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195 [<ffffffff8126540c>] store_bind+0x1ad/0x1c1 [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121 [<ffffffff811145b2>] vfs_write+0x9b/0xfd [<ffffffff811147b7>] sys_write+0x3e/0x6b [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue)); * DEADLOCK * v2: Mark the lockdep_map static, noticed by Jani Nikula. Cc: Dave Airlie <airlied@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-10-22 16:12:20 -07:00
Rafael J. Wysocki	5efbe4279f	PM / QoS: Introduce request and constraint data types for PM QoS flags Introduce struct pm_qos_flags_request and struct pm_qos_flags representing PM QoS flags request type and PM QoS flags constraint type, respectively. With these definitions the data structures will be arranged so that the list member of a struct pm_qos_flags object will contain the head of a list of struct pm_qos_flags_request objects representing all of the "flags" requests present for the given device. Then, the effective_flags member of a struct pm_qos_flags object will contain the bitwise OR of the flags members of all the struct pm_qos_flags_request objects in the list. Additionally, introduce helper function pm_qos_update_flags() allowing the caller to manage the list of struct pm_qos_flags_request pointed to by the list member of struct pm_qos_flags. The flags are of type s32 so that the request's "value" field is always of the same type regardless of what kind of request it is (latency requests already have value fields of type s32). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Jean Pihet <j-pihet@ti.com> Acked-by: mark gross <markgross@thegnar.org>	2012-10-23 01:07:46 +02:00
Randy Dunlap	0390c88356	module_signing: fix printk format warning Fix the warning: kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t' by using the proper 'z' modifier for printing a size_t. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-22 08:56:34 +03:00
Ingo Molnar	ef8ff74ed8	Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent Pull ftrace ring-buffer resizing fix from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-21 19:53:34 +02:00
Ingo Molnar	f38787f4f9	Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/urgent Pull various uprobes bugfixes from Oleg Nesterov - mostly race and failure path fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-21 18:18:17 +02:00
Ingo Molnar	0acfd009be	Merge branch 'nohz/core' of git://github.com/fweisbec/linux-dynticks into timers/core Pull uncontroversial cleanup/refactoring nohz patches from Frederic Weisbecker. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-21 18:14:02 +02:00
Tejun Heo	ead5c47371	cgroup_freezer: don't use cgroup_lock_live_group() freezer_read/write() used cgroup_lock_live_group() to synchronize against task migration into and out of the target cgroup. cgroup_lock_live_group() grabs the internal cgroup lock and using it from outside cgroup core leads to complex and fragile locking dependency issues which are difficult to resolve. Now that freezer_can_attach() is replaced with freezer_attach() and update_if_frozen() updated, nothing requires excluding migration against freezer state reads and changes. This patch removes cgroup_lock_live_group() and the matching cgroup_unlock() usages. The prone-to-bitrot, already outdated and unnecessary global lock hierarchy documentation is replaced with documentation in local scope. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Li Zefan <lizefan@huawei.com>	2012-10-20 16:33:12 -07:00
Tejun Heo	b4d18311d3	cgroup_freezer: prepare update_if_frozen() for locking change Locking will change such that migration can happen while freezer_read/write() is in progress. This means that update_if_frozen() can no longer assume that all tasks in the cgroup coform to the current freezer state - newly migrated tasks which haven't finished freezer_attach() yet might be in any state. This patch updates update_if_frozen() such that it no longer verifies task states against freezer state. It now simply decides whether FREEZING stage is complete. This removal of verification makes it meaningless to call from freezer_change_state(). Drop it and move the fast exit test from freezer_read() - the only left caller - to update_if_frozen(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Li Zefan <lizefan@huawei.com>	2012-10-20 16:33:08 -07:00
Tejun Heo	8755ade683	cgroup_freezer: allow moving tasks in and out of a frozen cgroup cgroup_freezer is one of the few users of cgroup_subsys->can_attach() and uses it to prevent tasks from being migrated into or out of a frozen cgroup. This makes cgroup_freezer cumbersome to use especially when co-mounted with other controllers. ->can_attach() is problematic in general as it can make co-mounting multiple cgroups difficult - migrating tasks may fail for reasons completely irrelevant for other controllers. freezer_can_attach() in particular is more problematic because it messes with cgroup internal locking to ensure that the state verification performed at freezer_can_attach() stays valid until migration is complete. This patch replaces freezer_can_attach() with freezer_attach() so that tasks are always allowed to migrate - they are nudged into the conforming state from freezer_attach(). This means that there can be tasks which are being migrated which don't conform to the current cgroup_freezer state until freezer_attach() is complete. Under the current locking scheme, the only such place is freezer_fork() which is updated to handle such window. While this patch doesn't remove the use of internal cgroup locking from freezer_read/write() paths, it removes the requirement to keep the freezer state constant while migrating and enables such change. Note that this creates a userland visible behavior change - FROZEN cgroup can no longer be used to lock migrations in and out of the cgroup. This behavior change is intended. I don't think the feature is necessary - userland should coordinate accesses to cgroup fs anyway - and even if the feature is needed cgroup_freezer is the completely wrong place to implement it. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <1350426526-14254-1-git-send-email-tj@kernel.org> Cc: Matt Helsley <matthltc@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Li Zefan <lizefan@huawei.com>	2012-10-20 16:28:56 -07:00
Paul E. McKenney	62da192129	rcu: Accelerate callbacks for CPU initiating a grace period Because grace-period initialization is carried out by a separate kthread, it might happen on a different CPU than the one that had the callback needing a grace period -- which is where the callback acceleration needs to happen. Fortunately, rcu_start_gp() holds the root rcu_node structure's ->lock, which prevents a new grace period from starting. This allows this function to safely determine that a grace period has not yet started, which in turn allows it to fully accelerate any callbacks that it has pending. This commit adds this acceleration. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-10-20 13:47:10 -07:00
Kees Cook	31fd84b95e	use clamp_t in UNAME26 fix The min/max call needed to have explicit types on some architectures (e.g. mn10300). Use clamp_t instead to avoid the warning: kernel/sys.c: In function 'override_release': kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-19 18:51:17 -07:00
David Howells	caabe24057	MODSIGN: Move the magic string to the end of a module and eliminate the search Emit the magic string that indicates a module has a signature after the signature data instead of before it. This allows module_sig_check() to be made simpler and faster by the elimination of the search for the magic string. Instead we just need to do a single memcmp(). This works because at the end of the signature data there is the fixed-length signature information block. This block then falls immediately prior to the magic number. From the contents of the information block, it is trivial to calculate the size of the signature data and thus the size of the actual module data. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-19 17:30:40 -07:00
Tejun Heo	d878383211	Revert "cgroup: Remove task_lock() from cgroup_post_fork()" This reverts commit 7e3aa30ac8c904a706518b725c451bb486daaae9. The commit incorrectly assumed that fork path always performed threadgroup_change_begin/end() and depended on that for synchronization against task exit and cgroup migration paths instead of explicitly grabbing task_lock(). threadgroup_change is not locked when forking a new process (as opposed to a new thread in the same process) and even if it were it wouldn't be effective as different processes use different threadgroup locks. Revert the incorrect optimization. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <20121008020000.GB2575@localhost> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org	2012-10-19 14:09:35 -07:00
Tejun Heo	9bb71308b8	Revert "cgroup: Drop task_lock(parent) on cgroup_fork()" This reverts commit 7e381b0eb1e1a9805c37335562e8dc02e7d7848c. The commit incorrectly assumed that fork path always performed threadgroup_change_begin/end() and depended on that for synchronization against task exit and cgroup migration paths instead of explicitly grabbing task_lock(). threadgroup_change is not locked when forking a new process (as opposed to a new thread in the same process) and even if it were it wouldn't be effective as different processes use different threadgroup locks. Revert the incorrect optimization. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <20121008020000.GB2575@localhost> Acked-by: Li Zefan <lizefan@huawei.com> Bitterly-Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org	2012-10-19 14:08:49 -07:00
Cyrill Gorcunov	bbc2e3ef87	pidns: remove recursion from free_pid_ns() free_pid_ns() operates in a recursive fashion: free_pid_ns(parent) put_pid_ns(parent) kref_put(&ns->kref, free_pid_ns); free_pid_ns thus if there was a huge nesting of namespaces the userspace may trigger avalanche calling of free_pid_ns leading to kernel stack exhausting and a panic eventually. This patch turns the recursion into an iterative loop. Based on a patch by Andrew Vagin. [akpm@linux-foundation.org: export put_pid_ns() to modules] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-19 14:07:47 -07:00
Kees Cook	2702b1526c	kernel/sys.c: fix stack memory content leak via UNAME26 Calling uname() with the UNAME26 personality set allows a leak of kernel stack contents. This fixes it by defensively calculating the length of copy_to_user() call, making the len argument unsigned, and initializing the stack buffer to zero (now technically unneeded, but hey, overkill). CVE-2012-0957 Reported-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Brad Spengler <spender@grsecurity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-19 14:07:47 -07:00
Paul E. McKenney	85eae82a08	printk: Fix scheduling-while-atomic problem in console_cpu_notify() The console_cpu_notify() function runs with interrupts disabled in the CPU_DYING case. It therefore cannot block, for example, as will happen when it calls console_lock(). Therefore, remove the CPU_DYING leg of the switch statement to avoid this problem. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-16 18:17:44 -07:00
Daisuke Nishimura	1f5320d597	cgroup: notify_on_release may not be triggered in some cases notify_on_release must be triggered when the last process in a cgroup is move to another. But if the first(and only) process in a cgroup is moved to another, notify_on_release is not triggered. # mkdir /cgroup/cpu/SRC # mkdir /cgroup/cpu/DST # # echo 1 >/cgroup/cpu/SRC/notify_on_release # echo 1 >/cgroup/cpu/DST/notify_on_release # # sleep 300 & [1] 8629 # # echo 8629 >/cgroup/cpu/SRC/tasks # echo 8629 >/cgroup/cpu/DST/tasks -> notify_on_release for /SRC must be triggered at this point, but it isn't. This is because put_css_set() is called before setting CGRP_RELEASABLE in cgroup_task_migrate(), and is a regression introduce by the commit:74a1166d(cgroups: make procs file writable), which was merged into v3.0. Cc: Ben Blum <bblum@andrew.cmu.edu> Cc: <stable@vger.kernel.org> # v3.0.x and later Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-10-16 17:09:36 -07:00
Tejun Heo	3c426d5e11	cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks cgroup_freezer doesn't transition from FREEZING to FROZEN if the cgroup contains PF_NOFREEZE tasks or tasks sleeping with PF_FREEZER_SKIP set. Only kernel tasks can be non-freezable (PF_NOFREEZE) and there's nothing cgroup_freezer or userland can do about or to it. It's pointless to stall the transition for PF_NOFREEZE tasks. PF_FREEZER_SKIP indicates that the task can be skipped when determining whether frozen state is reached. A task with PF_FREEZER_SKIP is guaranteed to perform try_to_freeze() after it wakes up and can be considered frozen much like stopped or traced tasks. Note that a vfork parent uses PF_FREEZER_SKIP while waiting for the child. This updates update_if_frozen() such that it only considers freezable tasks and treats %true freezer_should_skip() tasks as frozen. This allows cgroups w/ kthreads and vfork parents successfully reach FROZEN state. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl>	2012-10-16 15:03:14 -07:00
Tejun Heo	51f246ed95	cgroup_freezer: make it official that writes to freezer.state don't fail try_to_freeze_cgroup() has condition checks which are intended to fail the write operation to freezer.state if there are tasks which can't be frozen. The condition checks have been broken for quite some time now. freeze_task() returns %false if the target task can't be frozen, so num_cant_freeze_now is never incremented. In addition, strangely, cgroup freezing proceeds even after the write is failed, which is rather broken. This patch rips out the non-working code intended to fail the write to freezer.state when the cgroup contains non-freezable tasks and makes it official that writes to freezer.state succeed whether there are non-freezable tasks in the cgroup or not. This leaves is_task_frozen_enough() with only one user - upste_if_frozen(). Collapse it into the caller. Note that this removes an extra call to freezing(). This doesn't cause any userland behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl>	2012-10-16 15:03:14 -07:00
Tejun Heo	5edee61ede	cgroup: cgroup_subsys->fork() should be called after the task is added to css_set cgroup core has a bug which violates a basic rule about event notifications - when a new entity needs to be added, you add that to the notification list first and then make the new entity conform to the current state. If done in the reverse order, an event happening inbetween will be lost. cgroup_subsys->fork() is invoked way before the new task is added to the css_set. Currently, cgroup_freezer is the only user of ->fork() and uses it to make new tasks conform to the current state of the freezer. If FROZEN state is requested while fork is in progress between cgroup_fork_callbacks() and cgroup_post_fork(), the child could escape freezing - the cgroup isn't frozen when ->fork() is called and the freezer couldn't see the new task on the css_set. This patch moves cgroup_subsys->fork() invocation to cgroup_post_fork() after the new task is added to the css_set. cgroup_fork_callbacks() is removed. Because now a task may be migrated during cgroup_subsys->fork(), freezer_fork() is updated so that it adheres to the usual RCU locking and the rather pointless comment on why locking can be different there is removed (if it doesn't make anything simpler, why even bother?). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@vger.kernel.org	2012-10-16 15:03:14 -07:00
Ingo Molnar	8ed92e51f9	sched: Add WAKEUP_PREEMPTION feature flag, on by default As per the recent discussion with Mike and Linus, make it easier to test with/without this feature. No change in default behavior. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-izoxq4haeg4mTognnDbwcevt@git.kernel.org	2012-10-16 10:05:27 +02:00
Frederic Weisbecker	94a5714020	tick: Conditionally build nohz specific code in tick handler This optimize a bit the high res tick sched handler. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org>	2012-10-15 18:51:08 +02:00
Frederic Weisbecker	9e8f559b08	tick: Consolidate tick handling for high and low res handlers Besides unifying code, this also adds the idle check before processing idle accounting specifics on the low res handler. This way we also generalize this part of the nohz code for !CONFIG_HIGH_RES_TIMERS to prepare for the adaptive tickless features. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org>	2012-10-15 18:42:25 +02:00

... 5 6 7 8 9 ...

14641 Commits