We do not initialize init_task.numa_preferred_nid,
but this value is inherited by userspace "init"
process:
rest_init()->kernel_thread(kernel_init)->do_fork(CLONE_VM);
__sched_fork()
{
if (clone_flags & CLONE_VM)
p->numa_preferred_nid = current->numa_preferred_nid;
else
p->numa_preferred_nid = -1;
}
kernel_init() becomes userspace "init" process.
So, we propagate garbage nid to userspace, and it may be used
during numa balancing.
Currently, we do not have reports about this brings a problem,
but it seem we should set it for sure.
Even if init_task.numa_preferred_nid is zero, we may meet a weird
configuration without nid#0. On sparc64, where processors are
numbered physically, I saw a machine without cpu#1, while cpu#2
existed. Possible, something similar may be with numa nodes.
So, let's initialize it and be sure we're safe.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sergey Dyasly <dserrg@gmail.com>
Link: http://lkml.kernel.org/r/1415699189.15631.6.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While looking over the cpu-timer code I found that we appear to add
the delta for the calling task twice, through:
cpu_timer_sample_group()
thread_group_cputimer()
thread_group_cputime()
times->sum_exec_runtime += task_sched_runtime();
*sample = cputime.sum_exec_runtime + task_delta_exec();
Which would make the sample run ahead, making the sleep short.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch simplifies task_struct by removing the four numa_* pointers
in the same array and replacing them with the array pointer. By doing this,
on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on
x86_64).
A new parameter is added to the task_faults_idx function so that it can return
an index to the correct offset, corresponding with the old precalculated
pointers.
All of the code in sched/ that depended on task_faults_idx and numa_* was
changed in order to match the new logic.
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: dave@stgolabs.net
Cc: riel@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfell
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no user.. make it go away.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Provide better implementations of wait_event_freezable() APIs.
The problem is with freezer_do_not_count(), it hides the thread from
the freezer, even though this thread might not actually freeze/sleep
at all.
Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-d86fz1jmso9wjxa8jfpinp8o@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull CMA and DMA-mapping fixes from Marek Szyprowski:
"This contains important fixes for recently introduced highmem support
for default contiguous memory region used for dma-mapping subsystem"
* 'fixes-for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
mm, cma: make parameters order consistent in func declaration and definition
mm: cma: Use %pa to print physical addresses
mm: cma: Ensure that reservations never cross the low/high mem boundary
mm: cma: Always consider a 0 base address reservation as dynamic
mm: cma: Don't crash on allocation if CMA area can't be activated
* A regression from 3.16 which was noticed in 3.17. With the restructuring of
the m25p80.c driver and the SPI NOR library framework, we omitted proper
listing of the SPI device IDs. This means m25p80.c wouldn't auto-load
(modprobe) properly when built as a module. For now, we duplicate the device
IDs into both modules.
* The OMAP / ELM modules were depending on an implicit link ordering. Use
deferred probing so that the new link order (in 3.18-rc) can still allow for
successful probing.
* Fix suspend/resume support for LH28F640BF NOR flash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUVq20AAoJEFySrpd9RFgt8XQQAI4oIygz4zGQ6n0y4HOqwOBy
F4ZPtOuzuCYA86x2zORFgj4A9JGVjDQwTfnMQnn1NG+XEEmZMJfG2IwqlUxsZd5A
KkAS5XUoi/Fvq95Qi95KQYXqm1dniXEGKsRFsHKXIsnnDmbqRK5fBn6Ve5PAwcau
uru5FwrZ2Ve0EwF9/Z/bxAatRirdAhwgMGlaXdXLmL7S13NQGmXP9QI7CbxSZ38R
GJ+A6PhiYs6Sml6Ou5bovNXyFGfx4J35pk6nTWoWe5MfZHRQk447OQwBPbsrM119
Boq8F/6diXyJfuXdxvF6JiDmDzaw/fBY+Xuq1O6p+JzLONN16x93KlpAPhzy4a15
PwFHCBzg5khY49if/dmrPJ+kLkU+9wIHUib8m6HSImCKBT5Bv/VJXoQ1g4s1IJ8/
Di3mz/8pWm/cscABIkuEqb9TwwUrSHzVXgGH/p4CY0eUo8DbQQA1zDsig8aRIX36
FlAReaHH8QivdnghkMX9Px7SIo7XoMZZEi+55k8FrIVqjqEHNGx+w+BKhxgtFggN
nAg0l7NrLdQHpigK1SZjLFGIYi7MmarvbatUjVPGagiRqoQ0mCSS7eKX1DEs4EAo
P2g64BSJickGAhUiAV9ZO1EBoaOU6olIPpc33J+uG/8qBU1cNClx3FJ1UPWX27JQ
+FBsD1mec4FuoZ2SoE7r
=IxjM
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20141102' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"Three main MTD fixes for 3.18:
- A regression from 3.16 which was noticed in 3.17. With the
restructuring of the m25p80.c driver and the SPI NOR library
framework, we omitted proper listing of the SPI device IDs. This
means m25p80.c wouldn't auto-load (modprobe) properly when built as
a module. For now, we duplicate the device IDs into both modules.
- The OMAP / ELM modules were depending on an implicit link ordering.
Use deferred probing so that the new link order (in 3.18-rc) can
still allow for successful probing.
- Fix suspend/resume support for LH28F640BF NOR flash"
* tag 'for-linus-20141102' of git://git.infradead.org/linux-mtd:
mtd: cfi_cmdset_0001.c: fix resume for LH28F640BF chips
mtd: omap: fix mtd devices not showing up
mtd: m25p80,spi-nor: Fix module aliases for m25p80
mtd: spi-nor: make spi_nor_scan() take a chip type name, not spi_device_id
mtd: m25p80: get rid of spi_get_device_id
This is a set of six patches consisting of two MAINTAINER updates, two scsi-mq
fixs for the old parallel interface (not every request is tagged and we need
to set the right flags to populate the SPI tag message) and a fix for a memory
leak in scatterlist traversal caused by a preallocation update in 3.17) and an
ipv6 fix for cxgbi.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJUVqYyAAoJEDeqqVYsXL0MDwQIAI4aCdC3LUelt6IzywToiJ9u
GTUZU7eULzV1KuaExcBM8g9DyE3wtnhfS0WA5TqvfKf8ImMKntan0loK0juIjj8K
UBtbSDapwkKRRAIoRHvBzcHU+Jzc//naO95ojsFt2ELHB91jta368+DiPb8m+wQx
IKeYO6vFaSwZZCKVNaWwyCpxEKmMO+ib80zfuANBzWjau3IFxMiUofSvtPGIWb+E
Zxi9qvOyqjfFE8OLfL9ckH5NPiIYcwmarAXyuYcJstiw6VX2deGBaJalueSSC0JV
TAqEtT5k1AraFS2HM0TRxar+HK5iBj3s8f0pFBfgD2+oagUln6L/486nMh2Ktac=
=l6wS
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of six patches consisting of:
- two MAINTAINER updates
- two scsi-mq fixs for the old parallel interface (not every request
is tagged and we need to set the right flags to populate the SPI
tag message)
- a fix for a memory leak in scatterlist traversal caused by a
preallocation update in 3.17
- an ipv6 fix for cxgbi"
[ The scatterlist fix also came in separately through the block layer tree ]
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
MAINTAINERS: ufs - remove self
MAINTAINERS: change hpsa and cciss maintainer
libcxgbi : support ipv6 address host_param
scsi: set REQ_QUEUE for the blk-mq case
Revert "block: all blk-mq requests are tagged"
lib/scatterlist: fix memory leak with scsi-mq
Pull drm fixes from Dave Airlie:
"Nothing too astounding or major: radeon, i915, vmwgfx, armada and
exynos.
Biggest ones:
- vmwgfx has one big locking regression fix
- i915 has come displayport fixes
- radeon has some stability and a memory alloc failure
- armada and exynos have some vblank fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (24 commits)
drm/exynos: correct connector->dpms field before resuming
drm/exynos: enable vblank after DPMS on
drm/exynos: init kms poll at the end of initialization
drm/exynos: propagate plane initialization errors
drm/exynos: vidi: fix build warning
drm/exynos: remove explicit encoder/connector de-initialization
drm/exynos: init vblank with real number of crtcs
drm/vmwgfx: Filter out modes those cannot be supported by the current VRAM size.
drm/vmwgfx: Fix hash key computation
drm/vmwgfx: fix lock breakage
drm/i915/dp: only use training pattern 3 on platforms that support it
drm/radeon: remove some buggy dead code
drm/i915: Ignore VBT backlight check on Macbook 2, 1
drm/radeon: remove invalid pci id
drm/radeon: dpm fixes for asrock systems
radeon: clean up coding style differences in radeon_get_bios()
drm/radeon: Use drm_malloc_ab instead of kmalloc_array
drm/radeon/dpm: disable ulv support on SI
drm/i915: Fix GMBUSFREQ on vlv/chv
drm/i915: Ignore long hpds on eDP ports
...
Pull VFS fixes from Al Viro:
"A bunch of assorted fixes, most of them followups to overlayfs merge"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ovl: initialize ->is_cursor
Return short read or 0 at end of a raw device, not EIO
isofs: don't bother with ->d_op for normal case
isofs_cmp(): we'll never see a dentry for . or ..
overlayfs: fix lockdep misannotation
ovl: fix check for cursor
overlayfs: barriers for opening upper-layer directory
rcu: Provide counterpart to rcu_dereference() for non-RCU situations
staging: android: logger: Fix log corruption regression
Pull networking fixes from David Miller:
"A bit has accumulated, but it's been a week or so since my last batch
of post-merge-window fixes, so...
1) Missing module license in netfilter reject module, from Pablo.
Lots of people ran into this.
2) Off by one in mac80211 baserate calculation, from Karl Beldan.
3) Fix incorrect return value from ax88179_178a driver's set_mac_addr
op, which broke use of it with bonding. From Ian Morgan.
4) Checking of skb_gso_segment()'s return value was not all
encompassing, it can return an SKB pointer, a pointer error, or
NULL. Fix from Florian Westphal.
This is crummy, and longer term will be fixed to just return error
pointers or a real SKB.
6) Encapsulation offloads not being handled by
skb_gso_transport_seglen(). From Florian Westphal.
7) Fix deadlock in TIPC stack, from Ying Xue.
8) Fix performance regression from using rhashtable for netlink
sockets. The problem was the synchronize_net() invoked for every
socket destroy. From Thomas Graf.
9) Fix bug in eBPF verifier, and remove the strong dependency of BPF
on NET. From Alexei Starovoitov.
10) In qdisc_create(), use the correct interface to allocate
->cpu_bstats, otherwise the u64_stats_sync member isn't
initialized properly. From Sabrina Dubroca.
11) Off by one in ip_set_nfnl_get_byindex(), from Dan Carpenter.
12) nf_tables_newchain() was erroneously expecting error pointers from
netdev_alloc_pcpu_stats(). It only returna a valid pointer or
NULL. From Sabrina Dubroca.
13) Fix use-after-free in _decode_session6(), from Li RongQing.
14) When we set the TX flow hash on a socket, we mistakenly do so
before we've nailed down the final source port. Move the setting
deeper to fix this. From Sathya Perla.
15) NAPI budget accounting in amd-xgbe driver was counting descriptors
instead of full packets, fix from Thomas Lendacky.
16) Fix total_data_buflen calculation in hyperv driver, from Haiyang
Zhang.
17) Fix bcma driver build with OF_ADDRESS disabled, from Hauke
Mehrtens.
18) Fix mis-use of per-cpu memory in TCP md5 code. The problem is
that something that ends up being vmalloc memory can't be passed
to the crypto hash routines via scatter-gather lists. From Eric
Dumazet.
19) Fix regression in promiscuous mode enabling in cdc-ether, from
Olivier Blin.
20) Bucket eviction and frag entry killing can race with eachother,
causing an unlink of the object from the wrong list. Fix from
Nikolay Aleksandrov.
21) Missing initialization of spinlock in cxgb4 driver, from Anish
Bhatt.
22) Do not cache ipv4 routing failures, otherwise if the sysctl for
forwarding is subsequently enabled this won't be seen. From
Nicolas Cavallari"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (131 commits)
drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode
drivers: net: cpsw: Fix broken loop condition in switch mode
net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0
stmmac: pci: set default of the filter bins
net: smc91x: Fix gpios for device tree based booting
mpls: Allow mpls_gso to be built as module
mpls: Fix mpls_gso handler.
r8152: stop submitting intr for -EPROTO
netfilter: nft_reject_bridge: restrict reject to prerouting and input
netfilter: nft_reject_bridge: don't use IP stack to reject traffic
netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions
netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions
netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing
drivers/net: macvtap and tun depend on INET
drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
drivers/net: Disable UFO through virtio
net: skb_fclone_busy() needs to detect orphaned skb
gre: Use inner mac length when computing tunnel length
mlx4: Avoid leaking steering rules on flow creation error flow
net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN
...
Pull scheduler fixes from Ingo Molnar:
"Various scheduler fixes all over the place: three SCHED_DL fixes,
three sched/numa fixes, two generic race fixes and a comment fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/dl: Fix preemption checks
sched: Update comments for CLONE_NEWNS
sched: stop the unbound recursion in preempt_schedule_context()
sched/fair: Fix division by zero sysctl_numa_balancing_scan_size
sched/fair: Care divide error in update_task_scan_period()
sched/numa: Fix unsafe get_task_struct() in task_numa_assign()
sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer()
sched/deadline: Don't replenish from a !SCHED_DEADLINE entity
sched: Fix race between task_group and sched_task_group
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus on the kernel side:
- a revert for a newly introduced PMU driver which isn't complete yet
and where we ran out of time with fixes (to be tried again in
v3.19) - this makes up for a large chunk of the diffstat.
- compilation warning fixes
- a printk message fix
- event_idx usage fixes/cleanups"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Trivial typo fix for --demangle
perf tools: Fix report -F dso_from for data without branch info
perf tools: Fix report -F dso_to for data without branch info
perf tools: Fix report -F symbol_from for data without branch info
perf tools: Fix report -F symbol_to for data without branch info
perf tools: Fix report -F mispredict for data without branch info
perf tools: Fix report -F in_tx for data without branch info
perf tools: Fix report -F abort for data without branch info
perf tools: Make CPUINFO_PROC an array to support different kernel versions
perf callchain: Use global caching provided by libunwind
perf/x86/intel: Revert incomplete and undocumented Broadwell client support
perf/x86: Fix compile warnings for intel_uncore
perf: Fix typos in sample code in the perf_event.h header
perf: Fix and clean up initialization of pmu::event_idx
perf: Fix bogus kernel printk
perf diff: Add missing hists__init() call at tool start
Pull core fixes from Ingo Molnar:
"The tree contains two RCU fixes and a compiler quirk comment fix"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Make rcu_barrier() understand about missing rcuo kthreads
compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles
rcu: More on deadlock between CPU hotplug and expedited grace periods
Pablo Neira Ayuso says:
====================
netfilter/ipvs fixes for net
The following patchset contains fixes for netfilter/ipvs. This round of
fixes is larger than usual at this stage, specifically because of the
nf_tables bridge reject fixes that I would like to see in 3.18. The
patches are:
1) Fix a null-pointer dereference that may occur when logging
errors. This problem was introduced by 4a4739d56b0 ("ipvs: Pull
out crosses_local_route_boundary logic") in v3.17-rc5.
2) Update hook mask in nft_reject_bridge so we can also filter out
packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
to filter from prerouting and postrouting"), which needs this chunk
to work.
3) Two patches to refactor common code to forge the IPv4 and IPv6
reject packets from the bridge. These are required by the nf_tables
reject bridge fix.
4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
packets from the bridge. The idea is to forge the reject packets and
inject them to the original port via br_deliver() which is now
exported for that purpose.
5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
the original skbuff may cloned after prerouting when the bridge stack
needs to flood it to several bridge ports, it is too late to reject
the traffic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:
* nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_ip6hdr_put(): to build the IPv6 header.
* nf_reject_ip6_tcphdr_put(): to build the TCP header.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:
* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Author: David Jeffery <djeffery@redhat.com>
Changes to the basic direct I/O code have broken the raw driver when reading
to the end of a raw device. Instead of returning a short read for a read that
extends partially beyond the device's end or 0 when at the end of the device,
these reads now return EIO.
The raw driver needs the same end of device handling as was added for normal
block devices. Using blkdev_read_iter, which has the needed size checks,
prevents the EIO conditions at the end of the device.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway. Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()
Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues")
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull two RCU fixes from Paul E. McKenney:
" - Complete the work of commit dd56af42bd82 (rcu: Eliminate deadlock
between CPU hotplug and expedited grace periods), which was
intended to allow synchronize_sched_expedited() to be safely
used when holding locks acquired by CPU-hotplug notifiers.
This commit makes the put_online_cpus() avoid the deadlock
instead of just handling the get_online_cpus().
- Complete the work of commit 35ce7f29a44a (rcu: Create rcuo
kthreads only for onlined CPUs), which was intended to allow
RCU to avoid allocating unneeded kthreads on systems where the
firmware says that there are more CPUs than are really present.
This commit makes rcu_barrier() aware of the mismatch, so that
it doesn't hang waiting for non-existent CPUs. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge misc fixes from Andrew Morton:
"21 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
mm/balloon_compaction: fix deflation when compaction is disabled
sh: fix sh770x SCIF memory regions
zram: avoid NULL pointer access in concurrent situation
mm/slab_common: don't check for duplicate cache names
ocfs2: fix d_splice_alias() return code checking
mm: rmap: split out page_remove_file_rmap()
mm: memcontrol: fix missed end-writeback page accounting
mm: page-writeback: inline account_page_dirtied() into single caller
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
drivers/rtc/rtc-bq32k.c: fix register value
memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()
drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock
kernel/kmod: fix use-after-free of the sub_info structure
drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc
mm, thp: fix collapsing of hugepages on madvise
drivers: of: add return value to of_reserved_mem_device_init()
mm: free compound page with correct order
gcov: add ARM64 to GCOV_PROFILE_ALL
fsnotify: next_i is freed during fsnotify_unmount_inodes.
mm/compaction.c: avoid premature range skip in isolate_migratepages_range
...
Commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") changed
page migration to uncharge the old page right away. The page is locked,
unmapped, truncated, and off the LRU, but it could race with writeback
ending, which then doesn't unaccount the page properly:
test_clear_page_writeback() migration
wait_on_page_writeback()
TestClearPageWriteback()
mem_cgroup_migrate()
clear PCG_USED
mem_cgroup_update_page_stat()
if (PageCgroupUsed(pc))
decrease memcg pages under writeback
release pc->mem_cgroup->move_lock
The per-page statistics interface is heavily optimized to avoid a
function call and a lookup_page_cgroup() in the file unmap fast path,
which means it doesn't verify whether a page is still charged before
clearing PageWriteback() and it has to do it in the stat update later.
Rework it so that it looks up the page's memcg once at the beginning of
the transaction and then uses it throughout. The charge will be
verified before clearing PageWriteback() and migration can't uncharge
the page as long as that is still set. The RCU lock will protect the
memcg past uncharge.
As far as losing the optimization goes, the following test results are
from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
three times in a nested fashion, so that there are two negative passes
that don't account but still go through the new transaction overhead.
There is no actual difference:
old: 33.195102545 seconds time elapsed ( +- 0.01% )
new: 33.199231369 seconds time elapsed ( +- 0.03% )
The time spent in page_remove_rmap()'s callees still adds up to the
same, but the time spent in the function itself seems reduced:
# Children Self Command Shared Object Symbol
old: 0.12% 0.11% filemapstress [kernel.kallsyms] [k] page_remove_rmap
new: 0.12% 0.08% filemapstress [kernel.kallsyms] [k] page_remove_rmap
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: <stable@vger.kernel.org> [3.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A follow-up patch would have changed the call signature. To save the
trouble, just fold it instead.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: <stable@vger.kernel.org> [3.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an anonymous mapping is not allowed to fault thp memory and then
madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
collapse this memory into thp memory.
This occurs because the madvise(2) handler for thp, hugepage_madvise(),
clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
until the final action of madvise_behavior(). This causes the
khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
the vma had previously had VM_NOHUGEPAGE set.
Fix this by passing the correct vma flags to the khugepaged mm slot
handler. There's no chance khugepaged can run on this vma until after
madvise_behavior() returns since we hold mm->mmap_sem.
It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
in hugepage_advise(), but I didn't want to introduce special case
behavior into madvise_behavior(). I think it's best to just let it
always set vma->vm_flags itself.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Driver calling of_reserved_mem_device_init() might be interested if the
initialization has been successful or not, so add support for returning
error code.
This fixes a build warining caused by commit 7bfa5ab6fa1b ("drivers:
dma-coherent: add initialization from device tree"), which has been
merged without this change and without fixing function return value.
Fixes: 7bfa5ab6fa1b1 ("drivers: dma-coherent: add initialization from device tree")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Josh Cartwright <joshc@codeaurora.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block layer fixes from Jens Axboe:
"A small collection of fixes for the current kernel. This contains:
- Two error handling fixes from Jan Kara. One for null_blk on
failure to add a device, and the other for the block/scsi_ioctl
SCSI_IOCTL_SEND_COMMAND fixing up the error jump point.
- A commit added in the merge window for the bio integrity bits
unfortunately disabled merging for all requests if
CONFIG_BLK_DEV_INTEGRITY wasn't set. Reverse the logic, so that
integrity checking wont disallow merges when not enabled.
- A fix from Ming Lei for merging and generating too many segments.
This caused a BUG in virtio_blk.
- Two error handling printk() fixups from Robert Elliott, improving
the information given when we rate limit.
- Error handling fixup on elevator_init() failure from Sudip
Mukherjee.
- A fix from Tony Battersby, fixing up a memory leak in the
scatterlist handling with scsi-mq"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined
lib/scatterlist: fix memory leak with scsi-mq
block: fix wrong error return in elevator_init()
scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
null_blk: Cleanup error recovery in null_add_dev()
blk-merge: recaculate segment if it isn't less than max segments
fs: clarify rate limit suppressed buffer I/O errors
fs: merge I/O error prints into one line
Pull HID fixes from Jiri Kosina:
- workarounds for a couple of misbehaving Elan Touchscreens, by Adel
Gadllah
- fix for TransducerSerialNumber field implementation, by Jason Gerecke
- a couple of new HID usages (added by HUT), by Olivier Gay
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: input: Fix TransducerSerialNumber implementation
HID: add keyboard input assist hid usages
HID: usbhid: enable always-poll quirk for Elan Touchscreen 016f
HID: usbhid: enable always-poll quirk for Elan Touchscreen 009b
Commit 4eaf99beadce switched to returning bool and as a result reversed
the logic of the integrity merge checks. However, the empty stubs used
when the block integrity code is compiled out were still returning
0. Make these stubs return "true".
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Tested-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
In an overlay directory that shadows an empty lower directory, say
/mnt/a/empty102, do:
touch /mnt/a/empty102/x
unlink /mnt/a/empty102/x
rmdir /mnt/a/empty102
It's actually harmless, but needs another level of nesting between
I_MUTEX_CHILD and I_MUTEX_NORMAL.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying. This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
To delegate promiscuous mode and multicast filtering to the subdriver.
Signed-off-by: Olivier Blin <olivier.blin@softathome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel-doc warning in <linux/skbuff.h> by making both headers_start
and headers_end private fields.
Warning(..//include/linux/skbuff.h:654): No description found for parameter 'headers_end[0]'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.
It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.
It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.
Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().
So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
Reported-by: Yanko Kaneti <yaneti@declera.com>
Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Eric B Munson <emunson@akamai.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Eric B Munson <emunson@akamai.com>
Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Tested-by: Yanko Kaneti <yaneti@declera.com>
Tested-by: Kevin Fenzi <kevin@scrye.com>
Tested-by: Meelis Roos <mroos@linux.ee>
0x4c6e is a secondary device id so should not be used
by the driver.
Noticed-by: Mark Kettenis <mark.kettenis@xs4all.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The bug referenced by the comment in this commit was not
completely fixed in GCC 4.8.2, as I mentioned in a thread back
in February:
https://lkml.org/lkml/2014/2/12/797
The conclusion at that time was to make the quirk unconditional
until the bug could be found and fixed in GCC. Unfortunately,
when I submitted the patch (commit a9f18034) I left a comment
in that claimed the bug was fixed in GCC 4.8.2+.
This comment is inaccurate, and should be removed.
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1414274982-14040-1-git-send-email-steven@uplinklabs.net
Cc: Ingo Molnar <mingo@kernel.org>
cond_resched() is a preemption point, not strictly a blocking
primitive, so exclude it from the ->state test.
In particular, preemption preserves task_struct::state.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Cc: Alex Elder <alex.elder@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Axel Lin <axel.lin@ingics.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140924082242.656559952@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Validate we call might_sleep() with TASK_RUNNING, which catches places
where we nest blocking primitives, eg. mutex usage in a wait loop.
Since all blocking is arranged through task_struct::state, nesting
this will cause the inner primitive to set TASK_RUNNING and the outer
will thus not block.
Another observed problem is calling a blocking function from
schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
then destroy the task state for the actual __schedule() call that
comes after it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140924082242.591637616@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
[<ffffffff8105bc38>] __might_sleep+0x58/0x90
[<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
[<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
Which is a false positive because sk_wait_event() will already have
TASK_RUNNING at that point if it would've gone through
schedule_timeout().
So annotate with sched_annotate_sleep(); which goes away on !DEBUG builds.
Reported-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140924082242.524407432@infradead.org
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: netdev@vger.kernel.org
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
do_wait() is a big wait loop, but we set TASK_RUNNING too late; we end
up calling potential sleeps before we reset it.
Not strictly a bug since we're guaranteed to exit the loop and not
call schedule(); put in annotations to quiet might_sleep().
WARNING: CPU: 0 PID: 1 at ../kernel/sched/core.c:7123 __might_sleep+0x7e/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109a788>] do_wait+0x88/0x270
Call Trace:
[<ffffffff81694991>] dump_stack+0x4e/0x7a
[<ffffffff8109877c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff8109886c>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff810bca6e>] __might_sleep+0x7e/0x90
[<ffffffff811a1c15>] might_fault+0x55/0xb0
[<ffffffff8109a3fb>] wait_consider_task+0x90b/0xc10
[<ffffffff8109a804>] do_wait+0x104/0x270
[<ffffffff8109b837>] SyS_wait4+0x77/0x100
[<ffffffff8169d692>] system_call_fastpath+0x16/0x1b
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: umgwanakikbuti@gmail.com
Cc: ilya.dryomov@inktank.com
Cc: Alex Elder <alex.elder@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Axel Lin <axel.lin@ingics.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Guillaume Morin <guillaume@morinfr.org>
Cc: Ionut Alexa <ionut.m.alexa@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Michal Schmidt <mschmidt@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140924082242.186408915@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add more might_sleep() checks, suppose someone put a wait_event() like
thing in a wait loop..
Can't put might_sleep() in ___wait_event() because there's the locked
primitives which call ___wait_event() with locks held.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140924082242.119255706@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are a few places that call blocking primitives from wait loops,
provide infrastructure to support this without the typical
task_struct::state collision.
We record the wakeup in wait_queue_t::flags which leaves
task_struct::state free to be used by others.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140924082242.051202318@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
struct perf_event_mmap_page has members called "index" and
"cap_user_rdpmc". Spell them correctly in the examples.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/320ba26391a8123cc16e5f02d24d34bd404332fd.1412313343.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
How we deal with updates to exclusive cpusets is currently broken.
As an example, suppose we have an exclusive cpuset composed of
two cpus: A[cpu0,cpu1]. We can assign SCHED_DEADLINE task to it
up to the allowed bandwidth. If we want now to modify cpusetA's
cpumask, we have to check that removing a cpu's amount of
bandwidth doesn't break AC guarantees. This thing isn't checked
in the current code.
This patch fixes the problem above, denying an update if the
new cpumask won't have enough bandwidth for SCHED_DEADLINE tasks
that are currently active.
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Link: http://lkml.kernel.org/r/5433E6AF.5080105@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
affinity (performing what is commonly called clustered scheduling).
Unfortunately, such thing is currently broken for two reasons:
- No check is performed when the user tries to attach a task to
an exlusive cpuset (recall that exclusive cpusets have an
associated maximum allowed bandwidth).
- Bandwidths of source and destination cpusets are not correctly
updated after a task is migrated between them.
This patch fixes both things at once, as they are opposite faces
of the same coin.
The check is performed in cpuset_can_attach(), as there aren't any
points of failure after that function. The updated is split in two
halves. We first reserve bandwidth in the destination cpuset, after
we pass the check in cpuset_can_attach(). And we then release
bandwidth from the source cpuset when the task's affinity is
actually changed. Even if there can be time windows when sched_setattr()
may erroneously fail in the source cpuset, we are fine with it, as
we can't perfom an atomic update of both cpusets at once.
Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: Vincent Legout <vincent@legout.info>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: michael@amarulasolutions.com
Cc: luca.abeni@unitn.it
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: cgroups@vger.kernel.org
Link: http://lkml.kernel.org/r/1411118561-26323-3-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
task_preempt_count() is pointless if preemption counter is per-cpu,
currently this is x86 only. It is only valid if the task is not
running, and even in this case the only info it can provide is the
state of PREEMPT_ACTIVE bit.
Change its single caller to check p->on_rq instead, this should be
the same if p->state != TASK_RUNNING, and kill this helper.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
task_preempt_count() has nothing to do with the actual preempt counter,
thread_info->saved_preempt_count is only valid right after switch_to().
__trace_sched_switch_state() can use preempt_count(), prev is still the
current task when trace_sched_switch() is called.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[ Added BUG_ON(). ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20141007195108.GB28002@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>