23430 Commits

Author SHA1 Message Date
Thomas Gleixner
464b5847e6 Merge branch 'irq/urgent' into irq/core
Merge urgent fixes so pending patches for 4.9 can be applied.
2016-09-20 23:20:32 +02:00
Ingo Molnar
b2c16e1efd Merge branch 'linus' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20 08:29:21 +02:00
Johannes Weiner
d979a39d72 cgroup: duplicate cgroup reference when cloning sockets
When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup.  As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org>	[4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-19 15:36:17 -07:00
Sebastian Andrzej Siewior
30e92153b4 padata: Convert to hotplug state machine
Install the callbacks via the state machine. CPU-hotplug multinstance support
is used with the nocalls() version. Maybe parts of padata_alloc() could be
moved into the online callback so that we could invoke ->startup callback for
instance and drop get_online_cpus().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-crypto@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160906170457.32393-14-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-19 21:44:30 +02:00
Marc Zyngier
1984e07591 genirq: Skip chained interrupt trigger setup if type is IRQ_TYPE_NONE
There is no point in trying to configure the trigger of a chained
interrupt if no trigger information has been configured. At best
this is ignored, and at the worse this confuses the underlying
irqchip (which is likely not to handle such a thing), and
unnecessarily alarms the user.

Only apply the configuration if type is not IRQ_TYPE_NONE.

Fixes: 1e12c4a9393b ("genirq: Correctly configure the trigger on chained interrupts")
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/CAMuHMdVW1eTn20=EtYcJ8hkVwohaSuH_yQXrY2MGBEvZ8fpFOg@mail.gmail.com
Link: http://lkml.kernel.org/r/1474274967-15984-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-19 11:31:36 +02:00
Wei Yongjun
8a15b81741 cpuset: fix non static symbol warning
Fixes the following sparse warning:

kernel/cpuset.c:2088:6: warning:
 symbol 'cpuset_fork' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-16 11:31:17 -04:00
Andy Lutomirski
ac496bf48d fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y
vmalloc() is a bit slow, and pounding vmalloc()/vfree() will eventually
force a global TLB flush.

To reduce pressure on them, if CONFIG_VMAP_STACK=y, cache two thread
stacks per CPU.  This will let us quickly allocate a hopefully
cache-hot, TLB-hot stack under heavy forking workloads (shell script style).

On my silly pthread_create() benchmark, it saves about 2 µs per
pthread_create()+join() with CONFIG_VMAP_STACK=y.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/94811d8e3994b2e962f88866290017d498eb069c.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 09:18:54 +02:00
Andy Lutomirski
68f24b08ee sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
We currently keep every task's stack around until the task_struct
itself is freed.  This means that we keep the stack allocation alive
for longer than necessary and that, under load, we free stacks in
big batches whenever RCU drops the last task reference.  Neither of
these is good for reuse of cache-hot memory, and freeing in batches
prevents us from usefully caching small numbers of vmalloced stacks.

On architectures that have thread_info on the stack, we can't easily
change this, but on architectures that set THREAD_INFO_IN_TASK, we
can free it as soon as the task is dead.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/08ca06cde00ebed0046c5d26cbbf3fbb7ef5b812.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 09:18:54 +02:00
Oleg Nesterov
23196f2e5f kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
get_task_struct(tsk) no longer pins tsk->stack so all users of
to_live_kthread() should do try_get_task_stack/put_task_stack to protect
"struct kthread" which lives on kthread's stack.

TODO: Kill to_live_kthread(), perhaps we can even kill "struct kthread" too,
and rework kthread_stop(), it can use task_work_add() to sync with the exiting
kernel thread.

Message-Id: <20160629180357.GA7178@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cb9b16bbc19d4aea4507ab0552e4644c1211d130.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 09:18:53 +02:00
Ingo Molnar
2d8fbcd13e Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

 - Expedited grace-period changes, most notably avoiding having
   user threads drive expedited grace periods, using a workqueue
   instead.

 - Miscellaneous fixes, including a performance fix for lists
   that was sent with the lists modifications (second URL below).

 - CPU hotplug updates, most notably providing exact CPU-online
   tracking for RCU.  This will in turn allow removal of the
   checks supporting RCU's prior heuristic that was based on the
   assumption that CPUs would take no longer than one jiffy to
   come online.

 - Torture-test updates.

 - Documentation updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 09:08:43 +02:00
Thomas Gleixner
0a30d69195 Merge branch 'irq/for-block' into irq/core
Add the new irq spreading infrastructure.
2016-09-15 20:54:40 +02:00
Andy Lutomirski
c65eacbe29 sched/core: Allow putting thread_info into task_struct
If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT,
then thread_info is defined as a single 'u32 flags' and is the first
entry of task_struct.  thread_info::task is removed (it serves no
purpose if thread_info is embedded in task_struct), and
thread_info::cpu gets its own slot in task_struct.

This is heavily based on a patch written by Linus.

Originally-from: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 08:25:13 +02:00
Ingo Molnar
d4b80afbba Merge branch 'linus' into x86/asm, to pick up recent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 08:24:53 +02:00
Thomas Gleixner
44082fd670 genirq/affinity: Remove old irq spread infrastructure
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-5-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-14 22:11:09 +02:00
Thomas Gleixner
e75eafb9b0 genirq/msi: Switch to new irq spreading infrastructure
Switch MSI over to the new spreading code. If a pci device contains a valid
pointer to a cpumask, then this mask is used for spreading otherwise the
online cpu mask is used. This allows a driver to restrict the spread to a
subset of CPUs, e.g. cpus on a particular node.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-4-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-14 22:11:09 +02:00
Thomas Gleixner
34c3d9819f genirq/affinity: Provide smarter irq spreading infrastructure
The current irq spreading infrastructure is just looking at a cpumask and
tries to spread the interrupts over the mask. Thats suboptimal as it does
not take numa nodes into account.

Change the logic so the interrupts are spread across numa nodes and inside
the nodes. If there are more cpus than vectors per node, then we set the
affinity to several cpus. If HT siblings are available we take that into
account and try to set all siblings to a single vector.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-3-git-send-email-hch@lst.de
2016-09-14 22:11:08 +02:00
Thomas Gleixner
28f4b04143 genirq/msi: Add cpumask allocation to alloc_msi_entry
For irq spreading want to store affinity masks in the msi_entry. Add the
infrastructure for it.

We allocate an array of cpumasks with an array size of the number of used
vectors in the entry, so we can hand in the information per linux interrupt
later.

As we hand in the number of used vectors, we assign them right
away. Convert all the call sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/1473862739-15032-2-git-send-email-hch@lst.de
2016-09-14 22:11:08 +02:00
Paul E. McKenney
d74b62bc32 Merge branches 'doc.2016.08.22c', 'exp.2016.08.22c', 'fixes.2016.09.14a', 'hotplug.2016.08.22c' and 'torture.2016.08.22c' into HEAD
doc.2016.08.22c: Documentation updates
exp.2016.08.22c: Expedited grace-period updates
fixes.2016.09.14a: Miscellaneous fixes
hotplug.2016.08.22c: CPU-hotplug changes
torture.2016.08.22c: Torture-test changes
2016-09-14 12:58:49 -07:00
Dmitry Safonov
6846351052 x86/signal: Add SA_{X32,IA32}_ABI sa_flags
Introduce new flags that defines which ABI to use on creating sigframe.
Those flags kernel will set according to sigaction syscall ABI,
which set handler for the signal being delivered.

So that will drop the dependency on TIF_IA32/TIF_X32 flags on signal deliver.
Those flags will be used only under CONFIG_COMPAT.

Similar way ARM uses sa_flags to differ in which mode deliver signal
for 26-bit applications (look at SA_THIRYTWO).

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: oleg@redhat.com
Cc: linux-mm@kvack.org
Cc: gorcunov@openvz.org
Cc: xemul@virtuozzo.com
Link: http://lkml.kernel.org/r/20160905133308.28234-7-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-14 21:28:11 +02:00
Thomas Gleixner
16217dc79d First drop of irqchip updates for 4.9
- ACPI IORT core code
 - IORT support for the GICv3 ITS
 - A few of GIC cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX2VxlAAoJECPQ0LrRPXpD7tMP/i696FakX2xYknlVNaBy5vww
 eH1TLVpItkcfUCHbOyNzJcLcVVRnC4JuaiCphtgDQ0Zm4HJsl8LRFoKF9iBF/Wmt
 CvL5YLfkl6ziMZyPa23a9ApH5gKTY2QzirDAl+noF+N6tSsmz5JCXArW0YFawZJs
 o1tmh/XX0xb9cqB5f/jISxTNF7rPw/Dc1sDY3/p7DUch2TDjuTLOQljnJ2EFb8Mh
 QltBs9EbklYCKaSBVIHXmhAOBCaW8Nwm2BNscgNEQAH1EtjBjGK/aqmFqGzxBWms
 wXr8GHSNhnVsgpOzalG6yJzhtWcj4KNf1utZaNc0L8dT1bVc0yJEEUkxOEbi4pIO
 sst+BWo2FAe/cDyWWxwiSBLaO7M5SCTvsBN25AYMfZw9AU6pw6SMCdbm6BCWSPmh
 YUW7khrXObtNTl+0lroi/mlmIE3sq+UVpQWDzfwsvfWb1kgqIif2Fd6TqU+Jodl5
 pK/UzMHP3YQnAZjcn6Yz4sEWkAGgK8RrIPje1Th0mxi20pyx8VbAucwfSCUi5hza
 9mbaaidnLRvDVX38TNs/LzOejwfW3JKDJUPy9jLg3l07Fgron+jfLtCZOEm9XfJH
 nj/MN3g+yil9OeosJ+6NfzZKonJrfccpkVKkZrNT+ReeM5YFQylgX0OHRQT9eUDo
 z3P5/VhsoTcyqynxMcRp
 =Xhg0
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Merge the first drop of irqchip updates for 4.9 from Marc Zyngier:

- ACPI IORT core code
- IORT support for the GICv3 ITS
- A few of GIC cleanups
2016-09-14 20:53:26 +02:00
Craig Gallek
ecb3f394c5 genirq: Expose interrupt information through sysfs
Information about interrupts is exposed via /proc/interrupts, but the
format of that file has changed over kernel versions and differs across
architectures. It also has varying column numbers depending on hardware.

That all makes it hard for tools to parse.

To solve this, expose the information through sysfs so each irq attribute
is in a separate file in a consistent, machine parsable way.

This feature is only available when both CONFIG_SPARSE_IRQ and
CONFIG_SYSFS are enabled.

Examples:
  /sys/kernel/irq/18/actions:	i801_smbus,ehci_hcd:usb1,uhci_hcd:usb7
  /sys/kernel/irq/18/chip_name:	IR-IO-APIC
  /sys/kernel/irq/18/hwirq:		18
  /sys/kernel/irq/18/name:		fasteoi
  /sys/kernel/irq/18/per_cpu_count:	0,0
  /sys/kernel/irq/18/type:		level

  /sys/kernel/irq/25/actions:	ahci0
  /sys/kernel/irq/25/chip_name:	IR-PCI-MSI
  /sys/kernel/irq/25/hwirq:		512000
  /sys/kernel/irq/25/name:		edge
  /sys/kernel/irq/25/per_cpu_count:	29036,0
  /sys/kernel/irq/25/type:		edge

[ tglx: Moved kobject_del() under sparse_irq_lock, massaged code comments
  	and changelog ]

Signed-off-by: Craig Gallek <kraig@google.com>
Cc: David Decotigny <decot@google.com>
Link: http://lkml.kernel.org/r/1473783291-122873-1-git-send-email-kraigatgoog@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-14 15:28:15 +02:00
Rafael J. Wysocki
21ca6d2c52 cpufreq: schedutil: Add iowait boosting
Modify the schedutil cpufreq governor to boost the CPU
frequency if the SCHED_CPUFREQ_IOWAIT flag is passed to
it via cpufreq_update_util().

If that happens, the frequency is set to the maximum during
the first update after receiving the SCHED_CPUFREQ_IOWAIT flag
and then the boost is reduced by half during each following update.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Looks-good-to: Steve Muckle <smuckle@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-09-13 23:36:01 +02:00
Rafael J. Wysocki
8c34ab1910 cpufreq / sched: SCHED_CPUFREQ_IOWAIT flag to indicate iowait condition
Testing indicates that it is possible to improve performace
significantly without increasing energy consumption too much by
teaching cpufreq governors to bump up the CPU performance level if
the in_iowait flag is set for the task in enqueue_task_fair().

For this purpose, define a new cpufreq_update_util() flag
SCHED_CPUFREQ_IOWAIT and modify enqueue_task_fair() to pass that
flag to cpufreq_update_util() in the in_iowait case.  That generally
requires cpufreq_update_util() to be called directly from there,
because update_load_avg() may not be invoked in that case.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Looks-good-to: Steve Muckle <smuckle@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-09-13 23:36:01 +02:00
Linus Torvalds
fda67514e4 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "A try_to_wake_up() memory ordering race fix causing a busy-loop in
  ttwu()"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix a race between try_to_wake_up() and a woken up task
2016-09-13 12:49:40 -07:00
Linus Torvalds
ee319d5834 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This contains:

   - a set of fixes found by directed-random perf fuzzing efforts by
     Vince Weaver, Alexander Shishkin and Peter Zijlstra

   - a cqm driver crash fix

   - an AMD uncore driver use after free fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix PEBSv3 record drain
  perf/x86/intel/bts: Kill a silly warning
  perf/x86/intel/bts: Fix BTS PMI detection
  perf/x86/intel/bts: Fix confused ordering of PMU callbacks
  perf/core: Fix aux_mmap_count vs aux_refcount order
  perf/core: Fix a race between mmap_close() and set_output() of AUX events
  perf/x86/amd/uncore: Prevent use after free
  perf/x86/intel/cqm: Check cqm/mbm enabled state in event init
  perf/core: Remove WARN from perf_event_read()
2016-09-13 12:47:29 -07:00
Wanpeng Li
57ccdf449f tick/nohz: Prevent stopping the tick on an offline CPU
can_stop_full_tick() has no check for offline cpus. So it allows to stop
the tick on an offline cpu from the interrupt return path, which is wrong
and subsequently makes irq_work_needs_cpu() warn about being called for an
offline cpu.

Commit f7ea0fd639c2c4 ("tick: Don't invoke tick_nohz_stop_sched_tick() if
the cpu is offline") added prevention for can_stop_idle_tick(), but forgot
to do the same in can_stop_full_tick(). Add it.

[ tglx: Massaged changelog ]

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1473245473-4463-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 17:53:52 +02:00
Joonwoo Park
28b89b9e6f cpuset: handle race between CPU hotplug and cpuset_hotplug_work
A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations.  For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.

However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.

For example when there are 4 possible CPUs 0-3 and only CPU0 is online:

  ========================  ===========================
   cpu_online_mask           top_cpuset.effective_cpus
  ========================  ===========================
   echo 1 > cpu2/online.
   CPU hotplug notifier woke up hotplug work but not yet scheduled.
      [0,2]                     [0]

   echo 0 > cpu0/online.
   The workqueue is still runnable.
      [2]                       [0]
  ========================  ===========================

  Now there is no intersection between cpu_online_mask and
  top_cpuset.effective_cpus.  Thus invoking sys_sched_setaffinity() at
  this moment can cause following:

   Unable to handle kernel NULL pointer dereference at virtual address 000000d0
   ------------[ cut here ]------------
   Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
   Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 2 PID: 1420 Comm: taskset Tainted: G        W       4.4.8+ #98
   task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
   PC is at guarantee_online_cpus+0x2c/0x58
   LR is at cpuset_cpus_allowed+0x4c/0x6c
   <snip>
   Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
   Call trace:
   [<ffffffc0001389b0>] guarantee_online_cpus+0x2c/0x58
   [<ffffffc00013b208>] cpuset_cpus_allowed+0x4c/0x6c
   [<ffffffc0000d61f0>] sched_setaffinity+0xc0/0x1ac
   [<ffffffc0000d6374>] SyS_sched_setaffinity+0x98/0xac
   [<ffffffc000085cb0>] el0_svc_naked+0x24/0x28

The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually.  Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.

Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-13 11:26:27 -04:00
Dave Hansen
e2753293ac x86/pkeys: Fix pkeys build breakage for some non-x86 arches
Guenter Roeck reported breakage on the h8300 and c6x architectures (among
others) caused by the new memory protection keys syscalls.  This patch does
what Arnd suggested and adds them to kernel/sys_ni.c.

Fixes: a60f7b69d92c ("generic syscalls: Wire up memory protection keys syscalls")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/20160912203842.48E7AC50@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 14:41:36 +02:00
Anisse Astier
1ad1410f63 PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO
PAGE_POISONING_ZERO disables zeroing new pages on alloc, they are
poisoned (zeroed) as they become available.
In the hibernate use case, free pages will appear in the system without
being cleared, left there by the loading kernel.

This patch will make sure free pages are cleared on resume when
PAGE_POISONING_ZERO is enabled. We free the pages just after resume
because we can't do it later: going through any device resume code might
allocate some memory and invalidate the free pages bitmap.

Thus we don't need to disable hibernation when PAGE_POISONING_ZERO is
enabled.

Signed-off-by: Anisse Astier <anisse@astier.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-13 02:35:27 +02:00
Sudeep Holla
fa7fd6fa38 PM / sleep: enable suspend-to-idle even without registered suspend_ops
Suspend-to-idle (aka the "freeze" sleep state) is a system sleep state
in which all of the processors enter deepest possible idle state and
wait for interrupts right after suspending all the devices.

There is no hard requirement for a platform to support and register
platform specific suspend_ops to enter suspend-to-idle/freeze state.
Only deeper system sleep states like PM_SUSPEND_STANDBY and
PM_SUSPEND_MEM rely on such low level support/implementation.

suspend-to-idle can be entered as along as all the devices can be
suspended. This patch enables the support for suspend-to-idle even on
systems that don't have any low level support for deeper system sleep
states and/or don't register any platform specific suspend_ops.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-13 02:17:19 +02:00
Chen Yu
5b3f249c94 PM / sleep: Increase default DPM watchdog timeout to 120
Recently we have a new report that, the harddisk can not
resume on time due to firmware issues, and got a kernel
panic because of DPM watchdog timeout. So adjust the
default timeout from 60 to 120 to survive on this platform,
and make DPM_WATCHDOG depending on EXPERT.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=117971
Suggested-by: Pavel Machek <pavel@ucw.cz>
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Reported-by: Higuita <higuita@gmx.net>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-13 02:15:58 +02:00
David S. Miller
b20b378d49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mediatek/mtk_eth_soc.c
	drivers/net/ethernet/qlogic/qed/qed_dcbx.c
	drivers/net/phy/Kconfig

All conflicts were cases of overlapping commits.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12 15:52:44 -07:00
Steven Rostedt (Red Hat)
f971cc9aab tracing: Have max_latency be defined for HWLAT_TRACER as well
The hwlat tracer uses tr->max_latency, and if it's the only tracer enabled
that uses it, the build will fail. Add max_latency and its file when the
hwlat tracer is enabled.

Link: http://lkml.kernel.org/r/d6c3b7eb-ba95-1ffa-0453-464e1e24262a@infradead.org

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-12 09:59:46 -04:00
Linus Torvalds
98ac9a608d Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "nvdimm fixes for v4.8, two of them are tagged for -stable:

   - Fix devm_memremap_pages() to use track_pfn_insert().  Otherwise,
     DAX pmd mappings end up with an uncached pgprot, and unusable
     performance for the device-dax interface.  The device-dax interface
     appeared in 4.7 so this is tagged for -stable.

   - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
     understand DAX pmd entries.  This fix is tagged for -stable.

   - Fix a mis-merge of the nfit machine-check handler to flip the
     polarity of an if() to match the final version of the patch that
     Vishal sent for 4.8-rc1.  Without this the nfit machine check
     handler never detects / inserts new 'badblocks' entries which
     applications use to identify lost portions of files.

   - For test purposes, fix the nvdimm_clear_poison() path to operate on
     legacy / simulated nvdimm memory ranges.  Without this fix a test
     can set badblocks, but never clear them on these ranges.

   - Fix the range checking done by dax_dev_pmd_fault().  This is not
     tagged for -stable since this problem is mitigated by specifying
     aligned resources at device-dax setup time.

  These patches have appeared in a next release over the past week.  The
  recent rebase you can see in the timestamps was to drop an invalid fix
  as identified by the updated device-dax unit tests [1].  The -mm
  touches have an ack from Andrew"

[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
   https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm: allow legacy (e820) pmem region to clear bad blocks
  nfit, mce: Fix SPA matching logic in MCE handler
  mm: fix cache mode of dax pmd mappings
  mm: fix show_smap() for zone_device-pmd ranges
  dax: fix mapping size check
2016-09-10 09:58:52 -07:00
Ingo Molnar
5006921837 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:17:54 +02:00
Peter Zijlstra
de58af878d Revert "sched/fair: Make update_min_vruntime() more readable"
There's a bug in this commit:

   97a7142f157a ("sched/fair: Make update_min_vruntime() more readable")

... when !rb_leftmost && curr we fail to advance min_vruntime.

So revert it.

Reported-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:17:40 +02:00
Alexander Shishkin
b79ccadd6b perf/core: Fix aux_mmap_count vs aux_refcount order
The order of accesses to ring buffer's aux_mmap_count and aux_refcount
has to be preserved across the users, namely perf_mmap_close() and
perf_aux_output_begin(), otherwise the inversion can result in the latter
holding the last reference to the aux buffer and subsequently free'ing
it in atomic context, triggering a warning.

> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 257 at kernel/events/ring_buffer.c:541 __rb_free_aux+0x11a/0x130
> CPU: 0 PID: 257 Comm: stopbug Not tainted 4.8.0-rc1+ #2596
> Call Trace:
>  [<ffffffff810f3e0b>] __warn+0xcb/0xf0
>  [<ffffffff810f3f3d>] warn_slowpath_null+0x1d/0x20
>  [<ffffffff8121182a>] __rb_free_aux+0x11a/0x130
>  [<ffffffff812127a8>] rb_free_aux+0x18/0x20
>  [<ffffffff81212913>] perf_aux_output_begin+0x163/0x1e0
>  [<ffffffff8100c33a>] bts_event_start+0x3a/0xd0
>  [<ffffffff8100c42d>] bts_event_add+0x5d/0x80
>  [<ffffffff81203646>] event_sched_in.isra.104+0xf6/0x2f0
>  [<ffffffff8120652e>] group_sched_in+0x6e/0x190
>  [<ffffffff8120694e>] ctx_sched_in+0x2fe/0x5f0
>  [<ffffffff81206ca0>] perf_event_sched_in+0x60/0x80
>  [<ffffffff81206d1b>] ctx_resched+0x5b/0x90
>  [<ffffffff81207281>] __perf_event_enable+0x1e1/0x240
>  [<ffffffff81200639>] event_function+0xa9/0x180
>  [<ffffffff81202000>] ? perf_cgroup_attach+0x70/0x70
>  [<ffffffff8120203f>] remote_function+0x3f/0x50
>  [<ffffffff811971f3>] flush_smp_call_function_queue+0x83/0x150
>  [<ffffffff81197bd3>] generic_smp_call_function_single_interrupt+0x13/0x60
>  [<ffffffff810a6477>] smp_call_function_single_interrupt+0x27/0x40
>  [<ffffffff81a26ea9>] call_function_single_interrupt+0x89/0x90
>  [<ffffffff81120056>] finish_task_switch+0xa6/0x210
>  [<ffffffff81120017>] ? finish_task_switch+0x67/0x210
>  [<ffffffff81a1e83d>] __schedule+0x3dd/0xb50
>  [<ffffffff81a1efe5>] schedule+0x35/0x80
>  [<ffffffff81128031>] sys_sched_yield+0x61/0x70
>  [<ffffffff81a25be5>] entry_SYSCALL_64_fastpath+0x18/0xa8
> ---[ end trace 6235f556f5ea83a9 ]---

This patch puts the checks in perf_aux_output_begin() in the same order
as that of perf_mmap_close().

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:36 +02:00
Alexander Shishkin
767ae08678 perf/core: Fix a race between mmap_close() and set_output() of AUX events
In the mmap_close() path we need to stop all the AUX events that are
writing data to the AUX area that we are unmapping, before we can
safely free the pages. To determine if an event needs to be stopped,
we're comparing its ->rb against the one that's getting unmapped.
However, a SET_OUTPUT ioctl may turn up inside an AUX transaction
and swizzle event::rb to some other ring buffer, but the transaction
will keep writing data to the old ring buffer until the event gets
scheduled out. At this point, mmap_close() will skip over such an
event and will proceed to free the AUX area, while it's still being
used by this event, which will set off a warning in the mmap_close()
path and cause a memory corruption.

To avoid this, always stop an AUX event before its ->rb is updated;
this will release the (potentially) last reference on the AUX area
of the buffer. If the event gets restarted, its new ring buffer will
be used. If another SET_OUTPUT comes and switches it back to the
old ring buffer that's getting unmapped, it's also fine: this
ring buffer's aux_mmap_count will be zero and AUX transactions won't
start any more.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:36 +02:00
Daniel Borkmann
f3694e0012 bpf: add BPF_CALL_x macros for declaring helpers
This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions
to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros
that are used today. Motivation for this is to hide all the register handling
and all necessary casts from the user, so that it is done automatically in the
background when adding a BPF_CALL_<n>() call.

This makes current helpers easier to review, eases to write future helpers,
avoids getting the casting mess wrong, and allows for extending all helpers at
once (f.e. build time checks, etc). It also helps detecting more easily in
code reviews that unused registers are not instrumented in the code by accident,
breaking compatibility with existing programs.

BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some
fundamental differences, for example, for generating the actual helper function
that carries all u64 regs, we need to fill unused regs, so that we always end up
with 5 u64 regs as an argument.

I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and
they look all as expected. No sparse issue spotted. We let this also sit for a
few days with Fengguang's kbuild test robot, and there were no issues seen. On
s390, it barked on the "uses dynamic stack allocation" notice, which is an old
one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion
to the call wrapper, just telling that the perf raw record/frag sits on stack
(gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests
and they were fine as well. All eBPF helpers are now converted to use these
macros, getting rid of a good chunk of all the raw castings.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Daniel Borkmann
f035a51536 bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macros
Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit
which otherwise often result in overly long bytes_to_bpf_size(sizeof())
and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro
helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF())
check in convert_bpf_extensions(), but we should rather make that generic
as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF()
users to detect any rewriter size issues at compile time. Note, there are
currently none, but we want to assert that it stays this way.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Daniel Borkmann
6088b5823b bpf: minor cleanups in helpers
Some minor misc cleanups, f.e. use sizeof(__u32) instead of hardcoding
and in __bpf_skb_max_len(), I missed that we always have skb->dev valid
anyway, so we can drop the unneeded test for dev; also few more other
misc bits addressed here.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:03 -07:00
Dan Williams
9049771f7d mm: fix cache mode of dax pmd mappings
track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
uncacheable rendering them impractical for application usage.  DAX-pte
mappings are cached and the goal of establishing DAX-pmd mappings is to
attain more performance, not dramatically less (3 orders of magnitude).

track_pfn_insert() relies on a previous call to reserve_memtype() to
establish the expected page_cache_mode for the range.  While memremap()
arranges for reserve_memtype() to be called, devm_memremap_pages() does
not.  So, teach track_pfn_insert() and untrack_pfn() how to handle
tracking without a vma, and arrange for devm_memremap_pages() to
establish the write-back-cache reservation in the memtype tree.

Cc: <stable@vger.kernel.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-09 17:34:46 -07:00
Daniel Borkmann
2d2be8cab2 bpf: fix range propagation on direct packet access
LLVM can generate code that tests for direct packet access via
skb->data/data_end in a way that currently gets rejected by the
verifier, example:

  [...]
   7: (61) r3 = *(u32 *)(r6 +80)
   8: (61) r9 = *(u32 *)(r6 +76)
   9: (bf) r2 = r9
  10: (07) r2 += 54
  11: (3d) if r3 >= r2 goto pc+12
   R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
   R9=pkt(id=0,off=0,r=0) R10=fp
  12: (18) r4 = 0xffffff7a
  14: (05) goto pc+430
  [...]

  from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
                 R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
  24: (7b) *(u64 *)(r10 -40) = r1
  25: (b7) r1 = 0
  26: (63) *(u32 *)(r6 +56) = r1
  27: (b7) r2 = 40
  28: (71) r8 = *(u8 *)(r9 +20)
  invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)

The reason why this gets rejected despite a proper test is that we
currently call find_good_pkt_pointers() only in case where we detect
tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
derived, for example, from a register of type pkt(id=Y,off=0,r=0)
pointing to skb->data. find_good_pkt_pointers() then fills the range
in the current branch to pkt(id=Y,off=0,r=Z) on success.

For above case, we need to extend that to recognize pkt_end >= rX
pattern and mark the other branch that is taken on success with the
appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
only two practical options to test for from what LLVM could have
generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
that we would need to take into account as well.

After the fix:

  [...]
   7: (61) r3 = *(u32 *)(r6 +80)
   8: (61) r9 = *(u32 *)(r6 +76)
   9: (bf) r2 = r9
  10: (07) r2 += 54
  11: (3d) if r3 >= r2 goto pc+12
   R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
   R9=pkt(id=0,off=0,r=0) R10=fp
  12: (18) r4 = 0xffffff7a
  14: (05) goto pc+430
  [...]

  from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
                 R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
  24: (7b) *(u64 *)(r10 -40) = r1
  25: (b7) r1 = 0
  26: (63) *(u32 *)(r6 +56) = r1
  27: (b7) r2 = 40
  28: (71) r8 = *(u8 *)(r9 +20)
  29: (bf) r1 = r8
  30: (25) if r8 > 0x3c goto pc+47
   R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
   R9=pkt(id=0,off=0,r=54) R10=fp
  31: (b7) r1 = 1
  [...]

Verifier test cases are also added in this work, one that demonstrates
the mentioned example here and one that tries a bad packet access for
the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
with both test variants (>, >=).

Fixes: 969bf05eb3ce ("bpf: direct packet access")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:28:37 -07:00
Ingo Molnar
950d8381d9 Merge branch 'linus' into timers/core, to refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 14:05:16 +02:00
Arnd Bergmann
f1e4ba5b6a perf, bpf: fix conditional call to bpf_overflow_handler
The newly added bpf_overflow_handler function is only built of both
CONFIG_EVENT_TRACING and CONFIG_BPF_SYSCALL are enabled, but the caller
only checks the latter:

kernel/events/core.c: In function 'perf_event_alloc':
kernel/events/core.c:9106:27: error: 'bpf_overflow_handler' undeclared (first use in this function)

This changes the caller so we also skip this call if CONFIG_EVENT_TRACING
is disabled entirely.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: aa6a5f3cb2b2 ("perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs")
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06 16:34:14 -07:00
Sebastian Andrzej Siewior
c4544dbc7a kernel/softirq: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160818125731.27256-7-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 18:30:22 +02:00
Sebastian Andrzej Siewior
6731d4f123 slab: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: rt@linutronix.de
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Link: http://lkml.kernel.org/r/20160823125319.abeapfjapf2kfezp@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 18:30:20 +02:00
Richard Weinberger
e6d4989a9a relayfs: Convert to hotplug state machine
Install the callbacks via the state machine. They are installed at run time but
relay_prepare_cpu() does not need to be invoked by the boot CPU because
relay_open() was not yet invoked and there are no pools that need to be created.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20160818125731.27256-3-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 18:30:20 +02:00
Akash Goel
017c59c042 relay: Use per CPU constructs for the relay channel buffer pointers
relay essentially needs to maintain a per CPU array of channel buffer
pointers but it manually creates that array.  Instead its better to use
the per CPU constructs, provided by the kernel, to allocate & access the
array of pointer to channel buffers.

Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://lkml.kernel.org/r/1470909140-25919-1-git-send-email-akash.goel@intel.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 18:30:19 +02:00
Thomas Gleixner
ee1e714b94 cpu/hotplug: Remove CPU_STARTING and CPU_DYING notifier
All users are converted to state machine, remove CPU_STARTING and the
corresponding CPU_DYING.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160818125731.27256-2-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 18:30:19 +02:00