We currently cache socket policy bundles at xfrm_policy_sk_bundles.
These cached bundles are never used. Instead we create and cache
a new one whenever xfrm_lookup() is called on a socket policy.
Most protocols cache the used routes to the socket, so let's
remove the unused caching of socket policy bundles in xfrm.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The goal of this patch is to allow userland to dump only a part of SA by
specifying a filter during the dump.
The kernel is in charge to filter SA, this avoids to generate useless netlink
traffic (it save also some cpu cycles). This is particularly useful when there
is a big number of SA set on the system.
Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
struct netlink_callback->args is defined as a array of 6 long and the first long
is used in xfrm code to flag the cb as initialized. Hence, we must have:
sizeof(struct xfrm_state_walk) <= sizeof(long) * 5.
With the union, it was false on arm (sizeof(struct xfrm_state_walk) was
sizeof(long) * 7), due to the padding.
In fact, whatever the arch is, this union seems useless, there will be always
padding after it. Removing it will not increase the size of this struct (and
reduce it on arm).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In the case when KMs have no listeners, km_query() will fail and
temporary SAs are garbage collected immediately after their allocation.
This causes strain on memory allocation, leading even to OOM since
temporary SA alloc/free cycle is performed for every packet
and garbage collection does not keep up the pace.
The sane thing to do is to make sure we have audience before
temporary SA allocation.
Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Inserting a entry into flowcache, or flushing flowcache should be based
on per net scope. The reason to do so is flushing operation from fat
netns crammed with flow entries will also making the slim netns with only
a few flow cache entries go away in original implementation.
Since flowcache is tightly coupled with IPsec, so it would be easier to
put flow cache global parameters into xfrm namespace part. And one last
thing needs to do is bumping flow cache genid, and flush flow cache should
also be made in per net style.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
As compared with skb_to_sgvec, skb_to_sgvec_nomark only map skb to given sglist
without mark the sg which contain last skb data as the end. So the caller can
mannipulate sg list as will when padding new data after the first call without
calling sg_unmark_end to expend sg list.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pull networking updates from David Miller:
1) Fix flexcan build on big endian, from Arnd Bergmann
2) Correctly attach cpsw to GPIO bitbang MDIO drive, from Stefan Roese
3) udp_add_offload has to use GFP_ATOMIC since it can be invoked from
non-sleepable contexts. From Or Gerlitz
4) vxlan_gro_receive() does not iterate over all possible flows
properly, fix also from Or Gerlitz
5) CAN core doesn't use a proper SKB destructor when it hooks up
sockets to SKBs. Fix from Oliver Hartkopp
6) ip_tunnel_xmit() can use an uninitialized route pointer, fix from
Eric Dumazet
7) Fix address family assignment in IPVS, from Michal Kubecek
8) Fix ath9k build on ARM, from Sujith Manoharan
9) Make sure fail_over_mac only applies for the correct bonding modes,
from Ding Tianhong
10) The udp offload code doesn't use RCU correctly, from Shlomo Pongratz
11) Handle gigabit features properly in generic PHY code, from Florian
Fainelli
12) Don't blindly invoke link operations in
rtnl_link_get_slave_info_data_size, they are optional. Fix from
Fernando Luis Vazquez Cao
13) Add USB IDs for Netgear Aircard 340U, from Bjørn Mork
14) Handle netlink packet padding properly in openvswitch, from Thomas
Graf
15) Fix oops when deleting chains in nf_tables, from Patrick McHardy
16) Fix RX stalls in xen-netback driver, from Zoltan Kiss
17) Fix deadlock in mac80211 stack, from Emmanuel Grumbach
18) inet_nlmsg_size() forgets to consider ifa_cacheinfo, fix from Geert
Uytterhoeven
19) tg3_change_mtu() can deadlock, fix from Nithin Sujir
20) Fix regression in setting SCTP local source addresses on accepted
sockets, caused by some generic ipv6 socket changes. Fix from
Matija Glavinic Pecotic
21) IPPROTO_* must be pure defines, otherwise module aliases don't get
constructed properly. Fix from Jan Moskyto
22) IPV6 netconsole setup doesn't work properly unless an explicit
source address is specified, fix from Sabrina Dubroca
23) Use __GFP_NORETRY for high order skb page allocations in
sock_alloc_send_pskb and skb_page_frag_refill. From Eric Dumazet
24) Fix a regression added in netconsole over bridging, from Cong Wang
25) TCP uses an artificial offset of 1ms for SRTT, but this doesn't jive
well with TCP pacing which needs the SRTT to be accurate. Fix from
Eric Dumazet
26) Several cases of missing header file includes from Rashika Kheria
27) Add ZTE MF667 device ID to qmi_wwan driver, from Raymond Wanyoike
28) TCP Small Queues doesn't handle nonagle properly in some corner
cases, fix from Eric Dumazet
29) Remove extraneous read_unlock in bond_enslave, whoops. From Ding
Tianhong
30) Fix 9p trans_virtio handling of vmalloc buffers, from Richard Yao
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (136 commits)
6lowpan: fix lockdep splats
alx: add missing stats_lock spinlock init
9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers
bonding: remove unwanted bond lock for enslave processing
USB2NET : SR9800 : One chip USB2.0 USB2NET SR9800 Device Driver Support
tcp: tsq: fix nonagle handling
bridge: Prevent possible race condition in br_fdb_change_mac_address
bridge: Properly check if local fdb entry can be deleted when deleting vlan
bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port
bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address
bridge: Fix the way to check if a local fdb entry can be deleted
bridge: Change local fdb entries whenever mac address of bridge device changes
bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address
bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr
bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr
tcp: correct code comment stating 3 min timeout for FIN_WAIT2, we only do 1 min
net: vxge: Remove unused device pointer
net: qmi_wwan: add ZTE MF667
3c59x: Remove unused pointer in vortex_eisa_cleanup()
net: fix 'ip rule' iif/oif device rename
...
Use what we already do for arch_disable_smp_support() to fix these:
arch/x86/kernel/smpboot.c:1155:6: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
arch/x86/kernel/smpboot.c:1160:6: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
kernel/cpu.c:512:13: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
kernel/cpu.c:516:13: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs fixes from Al Viro:
"A couple of fixes, both -stable fodder. The O_SYNC bug is fairly
old..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a kmap leak in virtio_console
fix O_SYNC|O_APPEND syncing the wrong range on write()
Move prototype declaration of function to header file
include/net/net_namespace.h from net/ipx/af_ipx.c because they are used
by more than one file.
This eliminates the following warning in net/ipx/sysctl_net_ipx.c:
net/ipx/sysctl_net_ipx.c:33:6: warning: no previous prototype for ‘ipx_register_sysctl’ [-Wmissing-prototypes]
net/ipx/sysctl_net_ipx.c:38:6: warning: no previous prototype for ‘ipx_unregister_sysctl’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype declarations of function to header file
include/net/datalink.h from net/ipx/af_ipx.c because they are used by
more than one file.
This eliminates the following warning in net/ipx/pe2.c:
net/ipx/pe2.c:20:24: warning: no previous prototype for ‘make_EII_client’ [-Wmissing-prototypes]
net/ipx/pe2.c:32:6: warning: no previous prototype for ‘destroy_EII_client’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype declaration of functions to header file include/net/ipx.h
from net/ipx/af_ipx.c because they are used by more than one file.
This eliminates the following warning in
net/ipx/ipx_route.c:33:19: warning: no previous prototype for ‘ipxrtr_lookup’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:52:5: warning: no previous prototype for ‘ipxrtr_add_route’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:94:6: warning: no previous prototype for ‘ipxrtr_del_routes’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:149:5: warning: no previous prototype for ‘ipxrtr_route_skb’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:171:5: warning: no previous prototype for ‘ipxrtr_route_packet’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:261:5: warning: no previous prototype for ‘ipxrtr_ioctl’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype definition of function to header file include/net/ipx.h
from net/ipx/ipx_route.c because they are used by more than one file.
This eliminates the following warning from net/ipx/af_ipx.c:
net/ipx/af_ipx.c:193:23: warning: no previous prototype for ‘ipxitf_find_using_net’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:577:5: warning: no previous prototype for ‘ipxitf_send’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:1219:8: warning: no previous prototype for ‘ipx_cksum’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype declaration of functions to header file include/net/dn.h
from net/decnet/af_decnet.c because they are used by more than one file.
This eliminates the following warning in net/decnet/af_decnet.c:
net/decnet/sysctl_net_decnet.c:354:6: warning: no previous prototype for ‘dn_register_sysctl’ [-Wmissing-prototypes]
net/decnet/sysctl_net_decnet.c:359:6: warning: no previous prototype for ‘dn_unregister_sysctl’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype declaration of functions to header file include/net/dn_route.h
from net/decnet/af_decnet.c because it is used by more than one file.
This eliminates the following warning in net/decnet/dn_route.c:
net/decnet/dn_route.c:629:5: warning: no previous prototype for ‘dn_route_rcv’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/nftables/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes, mostly nftables
fixes, most relevantly they are:
* Fix a crash in the h323 conntrack NAT helper due to expectation list
corruption, from Alexey Dobriyan.
* A couple of RCU race fixes for conntrack, one manifests by hitting BUG_ON
in nf_nat_setup_info() and the destroy path, patches from Andrey Vagin and
me.
* Dump direction attribute in nft_ct only if it is set, from Arturo
Borrero.
* Fix IPVS bug in its own connection tracking system that may lead to
copying only 4 bytes of the IPv6 address when initializing the
ip_vs_conn object, from Michal Kubecek.
* Fix -EBUSY errors in nftables when deleting the rules, chain and tables
in a row due mixture of asynchronous and synchronous object releasing,
from me.
* Three fixes for the nf_tables set infrastructure when using intervals and
mappings, from me.
* Four patches to fixing the nf_tables log, reject and ct expressions from
the new inet table, from Patrick McHardy.
* Fix memory overrun in the map that is used to dynamically allocate names
from anonymous sets, also from Patrick.
* Fix a potential oops if you dump a set with NFPROTO_UNSPEC and a table
name, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
pos_before_write .. pos_before_write + written - 1
instead. Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().
All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().
The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull x86 fixes from Peter Anvin:
"Quite a varied little collection of fixes. Most of them are
relatively small or isolated; the biggest one is Mel Gorman's fixes
for TLB range flushing.
A couple of AMD-related fixes (including not crashing when given an
invalid microcode image) and fix a crash when compiled with gcov"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode, AMD: Unify valid container checks
x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
x86/efi: Allow mapping BGRT on x86-32
x86: Fix the initialization of physnode_map
x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
x86/intel/mid: Fix X86_INTEL_MID dependencies
arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
x86: mm: change tlb_flushall_shift for IvyBridge
x86/mm: Eliminate redundant page table walk during TLB range flushing
x86/mm: Clean up inconsistencies when flushing TLB ranges
mm, x86: Account for TLB flushes only when debugging
x86/AMD/NB: Fix amd_set_subcaches() parameter type
x86/quirks: Add workaround for AMD F16h Erratum792
x86, doc, kconfig: Fix dud URL for Microcode data
Commit cfd280c91253 ("net: sync some IP headers with glibc") changed a set of
define's to an enum (with no explanation why) which introduced a bug
in module mip6 where aliases are generated using the IPPROTO_* defines;
mip6 doesn't load if require_module called with the aliases from
xfrm_get_type().
Reverting this change back to define's to fix the aliases.
modinfo mip6 (before this change)
alias: xfrm-type-10-IPPROTO_DSTOPTS
alias: xfrm-type-10-IPPROTO_ROUTING
modinfo mip6 (after this change)
alias: xfrm-type-10-43
alias: xfrm-type-10-60
Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a patch to improve swap readahead algorithm. It's from Hugh and
I slightly changed it.
Hugh's original changelog:
swapin readahead does a blind readahead, whether or not the swapin is
sequential. This may be ok on harddisk, because large reads have
relatively small costs, and if the readahead pages are unneeded they can
be reclaimed easily - though, what if their allocation forced reclaim of
useful pages? But on SSD devices large reads are more expensive than
small ones: if the readahead pages are unneeded, reading them in caused
significant overhead.
This patch adds very simplistic random read detection. Stealing the
PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
simply looks at readahead's current success rate, and narrows or widens
its readahead window accordingly. There is little science to its
heuristic: it's about as stupid as can be whilst remaining effective.
The table below shows elapsed times (in centiseconds) when running a
single repetitive swapping load across a 1000MB mapping in 900MB ram
with 1GB swap (the harddisk tests had taken painfully too long when I
used mem=500M, but SSD shows similar results for that).
Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
which Shaohua showed to be defective; HughNew this Nov 14 patch, with
page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
(1-page reads: no readahead).
HDD for swapping to harddisk, SSD for swapping to VertexII SSD. Seq for
sequential access to the mapping, cycling five times around; Rand for
the same number of random touches. Anon for a MAP_PRIVATE anon mapping;
Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.
One weakness of Shaohua's vma/anon_vma approach was that it did not
optimize Shmem: seen below. Konstantin's approach was perhaps mistuned,
50% slower on Seq: did not compete and is not shown below.
HDD Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
Seq Anon 73921 76210 75611 76904 78191 121542
Seq Shmem 73601 73176 73855 72947 74543 118322
Rand Anon 895392 831243 871569 845197 846496 841680
Rand Shmem 1058375 1053486 827935 764955 764376 756489
SSD Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
Seq Anon 24634 24198 24673 25107 21614 70018
Seq Shmem 24959 24932 25052 25703 22030 69678
Rand Anon 43014 26146 28075 25989 26935 25901
Rand Shmem 45349 45215 28249 24268 24138 24332
These tests are, of course, two extremes of a very simple case: under
heavier mixed loads I've not yet observed any consistent improvement or
degradation, and wider testing would be welcome.
Shaohua Li:
Test shows Vanilla is slightly better in sequential workload than Hugh's
patch. I observed with Hugh's patch sometimes the readahead size is
shrinked too fast (from 8 to 1 immediately) in sequential workload if
there is no hit. And in such case, continuing doing readahead is good
actually.
I don't prepare a sophisticated algorithm for the sequential workload
because so far we can't guarantee sequential accessed pages are swap out
sequentially. So I slightly change Hugh's heuristic - don't shrink
readahead size too fast.
Here is my test result (unit second, 3 runs average):
Vanilla Hugh New
Seq 356 370 360
Random 4525 2447 2444
Attached graph is the swapin/swapout throughput I collected with 'vmstat
2'. The first part is running a random workload (till around 1200 of
the x-axis) and the second part is running a sequential workload.
swapin and swapout throughput are almost identical in steady state in
both workloads. These are expected behavior. while in Vanilla, swapin
is much bigger than swapout especially in random workload (because wrong
readahead).
Original patches by: Shaohua Li and Konstantin Khlebnikov.
[fengguang.wu@intel.com: swapin_nr_pages() can be static]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We may lost race if we flush the rule-set (which happens asynchronously
via call_rcu) and we try to remove the table (that userspace assumes
to be empty).
Fix this by recovering synchronous rule and chain deletion. This was
introduced time ago before we had no batch support, and synchronous
rule deletion performance was not good. Now that we have the batch
support, we can just postpone the purge of old rule in a second step
in the commit phase. All object deletions are synchronous after this
patch.
As a side effect, we save memory as we don't need rcu_head per rule
anymore.
Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a reject module for NFPROTO_INET. It does nothing but dispatch
to the AF-specific modules based on the hook family.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently the nft_reject module depends on symbols from ipv6. This is
wrong since no generic module should force IPv6 support to be loaded.
Split up the module into AF-specific and a generic part.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Revert "xen/grant-table: Avoid m2p_override during mapping" as it broke Xen ARM build.
- Fix CR4 not being set on AP processors in Xen PVH mode.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS8AyQAAoJEFjIrFwIi8fJbD4IAJssMuaLI5CRsSWBgDFHHDFt
srVJpDOYQiDr/TxkwFCVcL4sFy9Htb3KMArU4eIBl6uMqQbGa+3rHyXcHYI219YY
XH3D8RG+9JChwsxtaeUEzwx1C8ehcygD34vtdcoQXa7eBuEi4TL3HeLifR+HrXKO
UdFrTA34FmvpVFbSuRXkZh5sd6ca9et9xHuQHM8SIY6pVokY6xaEYOp17tfPZpwM
7A6LFjUjXeugHC2L3+/H8UOHA9nSZQvnMiZOWq2Cusc2Dt2V7emzgk2wcc2CHttf
EA6GbtiJzHqMPmt5EjubI9hHdSMB31HpY4hnQE38+ucl+BwiSdRE9z2Rm4TYClg=
=IX4M
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen fixes from Konrad Rzeszutek Wilk:
"Bug-fixes:
- Revert "xen/grant-table: Avoid m2p_override during mapping" as it
broke Xen ARM build.
- Fix CR4 not being set on AP processors in Xen PVH mode"
* tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pvh: set CR4 flags for APs
Revert "xen/grant-table: Avoid m2p_override during mapping"
Pull NVMe driver update from Matthew Wilcox:
"Looks like I missed the merge window ... but these are almost all
bugfixes anyway (the ones that aren't have been baking for months)"
* git://git.infradead.org/users/willy/linux-nvme:
NVMe: Namespace use after free on surprise removal
NVMe: Correct uses of INIT_WORK
NVMe: Include device and queue numbers in interrupt name
NVMe: Add a pci_driver shutdown method
NVMe: Disable admin queue on init failure
NVMe: Dynamically allocate partition numbers
NVMe: Async IO queue deletion
NVMe: Surprise removal handling
NVMe: Abort timed out commands
NVMe: Schedule reset for failed controllers
NVMe: Device resume error handling
NVMe: Cache dev->pci_dev in a local pointer
NVMe: Fix lockdep warnings
NVMe: compat SG_IO ioctl
NVMe: remove deprecated IRQF_DISABLED
NVMe: Avoid shift operation when writing cq head doorbell
For the reject module, we need to add AF-specific implementations to
get rid of incorrect module dependencies. Try to load an AF-specific
module first and fall back to generic modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This changes 'do_execve()' to get the executable name as a 'struct
filename', and to free it when it is done. This is what the normal
users want, and it simplifies and streamlines their error handling.
The controlled lifetime of the executable name also fixes a
use-after-free problem with the trace_sched_process_exec tracepoint: the
lifetime of the passed-in string for kernel users was not at all
obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
the pathname allocation lifetime with the execve() having finished,
which in turn meant that the trace point that happened after
mm_release() of the old process VM ended up using already free'd memory.
To solve the kernel string lifetime issue, this simply introduces
"getname_kernel()" that works like the normal user-space getname()
function, except with the source coming from kernel memory.
As Oleg points out, this also means that we could drop the tcomm[] array
from 'struct linux_binprm', since the pathname lifetime now covers
setup_new_exec(). That would be a separate cleanup.
Reported-by: Igor Zhbanov <i.zhbanov@samsung.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.
Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().
A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.
CPU1 CPU2
____nf_conntrack_find()
nf_ct_put()
destroy_conntrack()
...
init_conntrack
__nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&ct->use) (use = 2)
if (!l4proto->new(ct, skb, dataoff, timeouts))
nf_conntrack_free(ct); (use = 2 !!!)
...
__nf_conntrack_alloc (set use = 1)
if (!nf_ct_key_equal(h, tuple, zone))
nf_ct_put(ct); (use = 0)
destroy_conntrack()
/* continue to work with CT */
After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():
<4>[67096.759334] ------------[ cut here ]------------
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Vagin <avagin@parallels.com>
Cc: Florian Westphal <fw@strlen.de>
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
MII management bus clock is derived from the MAC clock by dividing it by
MIIMODER register CLKDIV field value. This value may need to be set up
in case it is undefined or its default value is too high (and
communication with PHY is too slow) or too low (and communication with
PHY is impossible). The value of CLKDIV is not specified directly, but
is derived from the MAC clock for the default MII management bus frequency
of 2.5MHz. The MAC clock may be specified in the platform data, or in
the 'clocks' device tree attribute.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 08ece5bb2312b4510b161a6ef6682f37f4eac8a1.
As it breaks ARM builds and needs more attention
on the ARM side.
Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull SLAB changes from Pekka Enberg:
"Random bug fixes that have accumulated in my inbox over the past few
months"
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: Fix warning on make htmldocs caused by slab.c
mm: slub: work around unneeded lockdep warning
mm: sl[uo]b: fix misleading comments
slub: Fix possible format string bug.
slub: use lockdep_assert_held
slub: Fix calculation of cpu slabs
slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings
nfs41_wake_and_assign_slot() relies on the task->tk_msg.rpc_argp and
task->tk_msg.rpc_resp always pointing to the session sequence arguments.
nfs4_proc_open_confirm tries to pull a fast one by reusing the open
sequence structure, thus causing corruption of the NFSv4 slot table.
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Pull vfs fixes from Al Viro:
"Several obvious fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Fix mountpoint reference leakage in linkat
hfsplus: use xattr handlers for removexattr
Typo in compat_sys_lseek() declaration
fs/super.c: sync ro remount after blocking writers
vfs: unexport the getname() symbol
Highlights:
- Fix several races in nfs_revalidate_mapping
- NFSv4.1 slot leakage in the pNFS files driver
- Stable fix for a slot leak in nfs40_sequence_done
- Don't reject NFSv4 servers that support ACLs with only ALLOW aces
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJS7Bb+AAoJEGcL54qWCgDyDuQP/17nKR5e6MLhixcAbvlcH+pN
8CGolAM3HmRXDWUW/PkBH3UguG8Tzx1Ex26vIxipPeTSwZabf6194Twj6L97DEGZ
2SouD158BW1TkAbhEN/alKB/4ZCPos05iXjZkrL7MRff+8FD0UvWR2pBT1F2jQdY
ZftG76Q72qhZHfH07ZMxM/v4Oy2Ge98RDD35gfuuqMSjHpmN9tiB55PeheW33LVY
fu6I/JEwmlJpgy2qUcDv7v0V4mDpjC7XbcjjHpMHL8zp/C5Rx/rdgt9OQPlwmjdV
FD8MWNXLc5TWxIouLDFPVUv3WZPjyu449QHS9Wc95fSqsHcdl4j4SwLAoSvUIdHt
vDI5PtWhw3WAezbtiuCQnT0xcoIOn5bLjOVP13taDcV9vlZLcFlyOpZ5gVE4/Yju
zm4sCW2+imDc74ERGa4fukF6QhzzAVmD8RlCJwuNzwCfXiZ36+xSanMYiPoUiwLL
OVNgymrm0fe7GVFQKWN2D+Vr68OQEmARO+KfA3UzP5rQV+9CU8zSHjbcoRWZ59QG
VahOS5WDLQSrMp8W37yAHH9IiAWveAAKJJTHlOniRqH90QYPgyW18fTo7YcpW313
AQGFgr/1n4t27MWRLu5rdoN5v8+kwNi0UV6oboNIPoP1v15NkEMvc7HKFj5M883R
qEYfe5wqN/eRNj68NT/+
=B7f0
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights:
- Fix several races in nfs_revalidate_mapping
- NFSv4.1 slot leakage in the pNFS files driver
- Stable fix for a slot leak in nfs40_sequence_done
- Don't reject NFSv4 servers that support ACLs with only ALLOW aces"
* tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: initialize the ACL support bits to zero.
NFSv4.1: Cleanup
NFSv4.1: Clean up nfs41_sequence_done
NFSv4: Fix a slot leak in nfs40_sequence_done
NFSv4.1 free slot before resending I/O to MDS
nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING
NFS: Fix races in nfs_revalidate_mapping
sunrpc: turn warn_gssd() log message into a dprintk()
NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping
nfs: handle servers that support only ALLOW ACE type.
Pull SCSI target updates from Nicholas Bellinger:
"The highlights this round include:
- add support for SCSI Referrals (Hannes)
- add support for T10 DIF into target core (nab + mkp)
- add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
- add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
- prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
- add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
- allow percpu_ida_alloc() to receive task state bitmask (Kent)
- fix >= v3.12 iscsi-target session reset hung task regression (nab)
- fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
- fix a long-standing network portal creation race (Andy)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
target: Fix percpu_ref_put race in transport_lun_remove_cmd
target/iscsi: Fix network portal creation race
target: Report bad sector in sense data for DIF errors
iscsi-target: Convert gfp_t parameter to task state bitmask
iscsi-target: Fix connection reset hang with percpu_ida_alloc
percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
iscsi-target: Pre-allocate more tags to avoid ack starvation
qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
IB/isert: Move fastreg descriptor creation to a function
IB/isert: Avoid frwr notation, user fastreg
IB/isert: seperate connection protection domains and dma MRs
tcm_loop: Enable DIF/DIX modes in SCSI host LLD
target/rd: Add DIF protection into rd_execute_rw
target/rd: Add support for protection SGL setup + release
target/rd: Refactor rd_build_device_space + rd_release_device_space
target/file: Add DIF protection support to fd_execute_rw
target/file: Add DIF protection init/format support
...
Pull media updates from Mauro Carvalho Chehab:
- a new jpeg codec driver for Samsung Exynos (jpeg-hw-exynos4)
- a new dvb frontend for ds2103 chipset (m88ds2103)
- a new sensor driver for Samsung S5K5BAF UXGA (s5k5baf)
- new drivers for R-Car VSP1
- a new radio driver: radio-raremono
- a new tuner driver for ts2022 chipset (m88ts2022)
- the analog part of em28xx is now a separate module that only
load/runs if the device is not a pure digital TV device
- added a staging driver for bcm2048 radio devices
- the omap 2 video driver (omap24xx) was moved to staging. This driver
is for an old hardware and uses a deprecated Kernel internal API. If
nobody cares enough to fix it, it would be removed on a couple Kernel
releases
- the sn9c102 driver was moved to staging. This driver was replaced by
gspca, and disabled on some distros, as almost all devices are known
to work properly with gspca. It should be removed from kernel on a
couple Kernel releases
- lots of driver fixes, improvements and cleanups
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (421 commits)
[media] media: v4l2-dev: fix video device index assignment
[media] rc-core: reuse device numbers
[media] em28xx-cards: properly initialize the device bitmap
[media] Staging: media: Fix line length exceeding 80 characters in as102_drv.c
[media] Staging: media: Fix line length exceeding 80 characters in as102_fe.c
[media] Staging: media: Fix quoted string split across line in as102_fe.c
[media] media: st-rc: Add reset support
[media] m2m-deinterlace: fix allocated struct type
[media] radio-usb-si4713: fix sparse non static symbol warnings
[media] em28xx-audio: remove needless check before usb_free_coherent()
[media] au0828: Fix sparse non static symbol warning
Revert "[media] go7007-usb: only use go->dev after allocated"
[media] em28xx-audio: provide an error code when URB submit fails
[media] em28xx: fix check for audio only usb interfaces when changing the usb alternate setting
[media] em28xx: fix usb alternate setting for analog and digital video endpoints > 0
[media] em28xx: make 'em28xx_ctrl_ops' static
em28xx-alsa: Fix error patch for init/fini
[media] em28xx-audio: flush work at .fini
[media] drxk: remove the option to load firmware asynchronously
[media] em28xx: adjust period size at runtime
...
- ACPI device hotplug fix preventing ACPI drivers from binding to device
objects that acpi_bus_trim() has been called for and the devices
represented by them may not be operational.
- Recent cpufreq changes related to the "boost" (turbo) feature broke
the acpi-cpufreq error code path causing a NULL pointer dereference
to occur on some systems. Fix from Konrad Rzeszutek Wilk.
- The log level of a CPU initialization error message added recently
needs to be reduced, because the particular BIOS issue indicated by
it turns out to be widespread and doesn't really matter for the
majority of systems having it. From Jiang Liu.
- The regulator API needs to be told to stay away from things on systems
with ACPI BIOSes or it may conflict with the BIOS' own handling of
voltage regulators. Fix from Mark Brown that works around a 3.13
regression in lm90 on PCs occuring if the regulator API is enabled.
- Prevent the Exynos4 devfreq driver from being built on multiplatform,
because it depends on things that aren't available during such builds.
From Sachin Kamat.
- Upstream ACPICA doesn't use the bool type as defined in the kernel,
so modify the kernel's ACPICA code to follow the upstream in that
respect (only one variable definition is affected) to reduce
divergences between the two. From Lv Zheng.
- Make the ACPI device PM code use ACPI_COMPANION() instead of its own
routine doing the same thing (and invokng ACPI_COMPANION() in the
process).
- Modify some routines in the ACPI processor driver to follow the
common convention and return negative integers on errors. From
Hanjun Guo.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJS6mQLAAoJEILEb/54YlRxc6sP/0tYmO8qOF0XwHhA9AQKlcVG
DZxui1pEbhXM3lftwqvVnma1hOKY1T014+JFxldCjvV9hyxdxTvcdvhKRxrj/eAk
DY7p2TK3N0eBScpMQLv/nKMjS20Z0bxz3QNxhC0S3eOYk5QPN3aUhsqG2iu6aFxJ
FzOOV5mQGhGT6a3LmXvjNROXjJN4dibMmGvl+bJ/8Y5zJUUvAOWmxphqjaIZm7WC
Lh4a8wUfxfmq8Mj1DDI/qk/aWP8QTBDYiG8C8XjEa1pj9oHjsrCHNogSrfRHRwa1
sDW/6EjSbLYnKuUDeoA61wXyVvlvR22uTTswDwbJ2mt18WTcmbvPP6Znh7m3oiF2
aUDoh/8KvYGRhDBZm+wf6TjD/KdcIQdOptNl6irCMsHkYNDK/n7f++Blkjy/xmea
UsF6l4XkIbqJCWWsTwO2VjB0p0QPRS49s/YtUxjETqCjdhNKnAQr2wfBH5/fcD6c
yJPkCGOlqcPQAPOrWyabAuStDfFUEUoFkZ/kbZov2xUIXyJFv5gfj1boZ3Ekqhmd
klcOmsvVP8QZLF82iFIhDdgWLNCRPESNfJpN0mLpg2sYMWQtK+r1OnSKpbVNW0Jn
6Ea23WSm0ZEd0QX8tYLjf/iUsIUjn48gJuXG+CDlTtkQtLNdr1VTQd5ZeKm2kcOg
+82EVRoeZi+sqx4YrEQi
=VBdN
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes and cleanups from Rafael Wysocki:
- ACPI device hotplug fix preventing ACPI drivers from binding to device
objects that acpi_bus_trim() has been called for and the devices
represented by them may not be operational.
- Recent cpufreq changes related to the "boost" (turbo) feature broke
the acpi-cpufreq error code path causing a NULL pointer dereference
to occur on some systems. Fix from Konrad Rzeszutek Wilk.
- The log level of a CPU initialization error message added recently
needs to be reduced, because the particular BIOS issue indicated by
it turns out to be widespread and doesn't really matter for the
majority of systems having it. From Jiang Liu.
- The regulator API needs to be told to stay away from things on systems
with ACPI BIOSes or it may conflict with the BIOS' own handling of
voltage regulators. Fix from Mark Brown that works around a 3.13
regression in lm90 on PCs occuring if the regulator API is enabled.
- Prevent the Exynos4 devfreq driver from being built on multiplatform,
because it depends on things that aren't available during such builds.
From Sachin Kamat.
- Upstream ACPICA doesn't use the bool type as defined in the kernel,
so modify the kernel's ACPICA code to follow the upstream in that
respect (only one variable definition is affected) to reduce
divergences between the two. From Lv Zheng.
- Make the ACPI device PM code use ACPI_COMPANION() instead of its own
routine doing the same thing (and invokng ACPI_COMPANION() in the
process).
- Modify some routines in the ACPI processor driver to follow the
common convention and return negative integers on errors. From
Hanjun Guo.
* tag 'pm+acpi-3.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / scan: Clear match_driver flag in acpi_bus_trim()
ACPI / init: Flag use of ACPI and ACPI idioms for power supplies to regulator API
acpi-cpufreq: De-register CPU notifier and free struct msr on error.
ACPICA: Remove bool usage from ACPICA.
PM / devfreq: Disable Exynos4 driver build on multiplatform
ACPI / PM: Use ACPI_COMPANION() to get ACPI companions of devices
ACPI / scan: reduce log level of "ACPI: \_PR_.CPU4: failed to get CPU APIC ID"
ACPI / processor: Return specific error value when mapping lapic id
Pull timer/dynticks updates from Ingo Molnar:
"This tree contains misc dynticks updates: a fix and three cleanups"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
nohz_full: fix code style issue of tick_nohz_full_stop_tick
nohz: Get timekeeping max deferment outside jiffies_lock
tick: Rename tick_check_idle() to tick_irq_enter()
Pull core debug changes from Ingo Molnar:
"This contains mostly kernel debugging related updates:
- make hung_task detection more configurable to distros
- add final bits for x86 UV NMI debugging, with related KGDB changes
- update the mailing-list of MAINTAINERS entries I'm involved with"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hung_task: Display every hung task warning
sysctl: Add neg_one as a standard constraint
x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
x86/uv/nmi: Fix Sparse warnings
kgdb/kdb: Fix no KDB config problem
MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries
- Xen ARM couldn't use the new FIFO events
- Xen ARM couldn't use the SWIOTLB if compiled as 32-bit with 64-bit PCIe devices.
- Grant table were doing needless M2P operations.
- Ratchet down the self-balloon code so it won't OOM.
- Fix misplaced kfree in Xen PVH error code paths.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS68IQAAoJEFjIrFwIi8fJAWgH/j4HStEey3rgGcqwIWSHkkap
+t55wsrT8Ylq6CzZjaUtCo3pB7HotW526x/0rA2pxVqHn/8oCN/1EtdrNtYm/umX
qOoda+db5NIjAEGVLWSLqGyokJQDrX/brXIWfYR300e9fnJi7yT/rFC4QHoZVUYl
5LME8XH/jE012vvYelNu6DbbodlRmVCT8hctJS+eB5ER2WmtD9Pkw4GybEXPVYJz
hE0Ts1DN91nKP2FGJb+mfB9UFT5X8i00akAK+Qc1R3sRnRh6eRoNV8dgyCnudKpO
UPEdiAZvgij+mzlgIYSz6nKH0U/VbvRsG3lc3i5Si3o+vR3CYPCkvzOGX2d0rjw=
=7cxW
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen bugfixes from Konrad Rzeszutek Wilk:
"Bug-fixes for the new features that were added during this cycle.
There are also two fixes for long-standing issues for which we have a
solution: grant-table operations extra work that was not needed
causing performance issues and the self balloon code was too
aggressive causing OOMs.
Details:
- Xen ARM couldn't use the new FIFO events
- Xen ARM couldn't use the SWIOTLB if compiled as 32-bit with 64-bit PCIe devices.
- Grant table were doing needless M2P operations.
- Ratchet down the self-balloon code so it won't OOM.
- Fix misplaced kfree in Xen PVH error code paths"
* tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pvh: Fix misplaced kfree from xlated_setup_gnttab_pages
drivers: xen: deaggressive selfballoon driver
xen/grant-table: Avoid m2p_override during mapping
xen/gnttab: Use phys_addr_t to describe the grant frame base address
xen: swiotlb: handle sizeof(dma_addr_t) != sizeof(phys_addr_t)
arm/xen: Initialize event channels earlier
The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the original functions were renamed to __gnttab_[un]map_refs, with a new
parameter m2p_override
- based on m2p_override either they follow the original behaviour, or just set
the private flag and call set_phys_to_machine
- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
m2p_override false
- a new function gnttab_[un]map_refs_userspace provides the old behaviour
It also removes a stray space from page.h and change ret to 0 if
XENFEAT_auto_translated_physmap, as that is the only possible return value
there.
v2:
- move the storing of the old mfn in page->index to gnttab_map_refs
- move the function header update to a separate patch
v3:
- a new approach to retain old behaviour where it needed
- squash the patches into one
v4:
- move out the common bits from m2p* functions, and pass pfn/mfn as parameter
- clear page->private before doing anything with the page, so m2p_find_override
won't race with this
v5:
- change return value handling in __gnttab_[un]map_refs
- remove a stray space in page.h
- add detail why ret = 0 now at some places
v6:
- don't pass pfn to m2p* functions, just get it locally
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Suggested-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
On x86, SLUB creates and handles <=8192-byte allocations internally.
It passes larger ones up to the allocator. Saying "up to order 2" is,
at best, ambiguous. Is that order-1? Or (order-2 bytes)? Make
it more clear.
SLOB commits a similar sin. It *handles* page-size requests, but the
comment says that it passes up "all page size and larger requests".
SLOB also swaps around the order of the very-similarly-named
KMALLOC_SHIFT_HIGH and KMALLOC_SHIFT_MAX #defines. Make it
consistent with the order of the other two allocators.
Cc: Matt Mackall <mpm@selenic.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pull btrfs updates from Chris Mason:
"This is a pretty big pull, and most of these changes have been
floating in btrfs-next for a long time. Filipe's properties work is a
cool building block for inheriting attributes like compression down on
a per inode basis.
Jeff Mahoney kicked in code to export filesystem info into sysfs.
Otherwise, lots of performance improvements, cleanups and bug fixes.
Looks like there are still a few other small pending incrementals, but
I wanted to get the bulk of this in first"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
Btrfs: fix spin_unlock in check_ref_cleanup
Btrfs: setup inode location during btrfs_init_inode_locked
Btrfs: don't use ram_bytes for uncompressed inline items
Btrfs: fix btrfs_search_slot_for_read backwards iteration
Btrfs: do not export ulist functions
Btrfs: rework ulist with list+rb_tree
Btrfs: fix memory leaks on walking backrefs failure
Btrfs: fix send file hole detection leading to data corruption
Btrfs: add a reschedule point in btrfs_find_all_roots()
Btrfs: make send's file extent item search more efficient
Btrfs: fix to catch all errors when resolving indirect ref
Btrfs: fix protection between walking backrefs and root deletion
btrfs: fix warning while merging two adjacent extents
Btrfs: fix infinite path build loops in incremental send
btrfs: undo sysfs when open_ctree() fails
Btrfs: fix snprintf usage by send's gen_unique_name
btrfs: fix defrag 32-bit integer overflow
btrfs: sysfs: list the NO_HOLES feature
btrfs: sysfs: don't show reserved incompat feature
btrfs: call permission checks earlier in ioctls and return EPERM
...
Merge misc fixes from Andrew Morton:
"A few hotfixes and various leftovers which were awaiting other merges.
Mainly movement of zram into mm/"
* emailed patches fron Andrew Morton <akpm@linux-foundation.org>: (25 commits)
memcg: fix mutex not unlocked on memcg_create_kmem_cache fail path
Documentation/filesystems/vfs.txt: update file_operations documentation
mm, oom: base root bonus on current usage
mm: don't lose the SOFT_DIRTY flag on mprotect
mm/slub.c: fix page->_count corruption (again)
mm/mempolicy.c: fix mempolicy printing in numa_maps
zram: remove zram->lock in read path and change it with mutex
zram: remove workqueue for freeing removed pending slot
zram: introduce zram->tb_lock
zram: use atomic operation for stat
zram: remove unnecessary free
zram: delay pending free request in read path
zram: fix race between reset and flushing pending work
zsmalloc: add maintainers
zram: add zram maintainers
zsmalloc: add copyright
zram: add copyright
zram: remove old private project comment
zram: promote zram from staging
zsmalloc: move it under mm
...
Pull MIPS updates from Ralf Baechle:
"The most notable new addition inside this pull request is the support
for MIPS's latest and greatest core called "inter/proAptiv". The
patch series describes this core as follows.
"The interAptiv is a power-efficient multi-core microprocessor
for use in system-on-chip (SoC) applications. The interAptiv combines
a multi-threading pipeline with a coherence manager to deliver improved
computational throughput and power efficiency. The interAptiv can
contain one to four MIPS32R3 interAptiv cores, system level
coherence manager with L2 cache, optional coherent I/O port,
and optional floating point unit."
The platform specific patches touch all 3 Broadcom families. It adds
support for the new Broadcom/Netlogix XLP9xx Soc, building a common
BCM63XX SMP kernel for all BCM63XX SoCs regardless of core type/count
and full gpio button/led descriptions for BCM47xx.
The rest of the series are cleanups and bug fixes that are MIPS
generic and consist largely of changes that Imgtec/MIPS had published
in their linux-mti-3.10.git stable tree. Random other cleanups and
patches preparing code to be merged in 3.15"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (139 commits)
mips: select ARCH_MIGHT_HAVE_PC_SERIO
mips: delete non-required instances of include <linux/init.h>
MIPS: KVM: remove shadow_tlb code
MIPS: KVM: use common EHINV aware UNIQUE_ENTRYHI
mips/ide: flush dcache also if icache does not snoop dcache
MIPS: BCM47XX: fix position of cpu_wait disabling
MIPS: BCM63XX: select correct MIPS_L1_CACHE_SHIFT value
MIPS: update MIPS_L1_CACHE_SHIFT based on MIPS_L1_CACHE_SHIFT_<N>
MIPS: introduce MIPS_L1_CACHE_SHIFT_<N>
MIPS: ZBOOT: gather string functions into string.c
arch/mips/pci: don't check resource with devm_ioremap_resource
arch/mips/lantiq/xway: don't check resource with devm_ioremap_resource
bcma: gpio: don't cast u32 to unsigned long
ssb: gpio: add own IRQ domain
MIPS: BCM47XX: fix sparse warnings in board.c
MIPS: BCM47XX: add board detection for Linksys WRT54GS V1
MIPS: BCM47XX: fix detection for some boards
MIPS: BCM47XX: Enable buttons support on SSB
MIPS: BCM47XX: Convert WNDR4500 to new syntax
MIPS: BCM47XX: Use "timer" trigger for status LEDs
...
Pull more powerpc bits from Ben Herrenschmidt:
"Here are a few more powerpc bits for this merge window. The bulk is
made of two pull requests from Scott and Anatolij that I had missed
previously (they arrived while I was away). Since both their branches
are in -next independently, and the content has been around for a
little while, they can still go in.
The rest is mostly bug and regression fixes, a small series of
cleanups to our pseries cpuidle code (including moving it to the right
place), and one new cpuidle bakend for the powernv platform. I also
wired up the new sched_attr syscalls"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (37 commits)
powerpc: Wire up sched_setattr and sched_getattr syscalls
powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var
powerpc: Make sure "cache" directory is removed when offlining cpu
powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space
powerpc/powernv/cpuidle: Back-end cpuidle driver for powernv platform.
powerpc/pseries/cpuidle: smt-snooze-delay cleanup.
powerpc/pseries/cpuidle: Remove MAX_IDLE_STATE macro.
powerpc/pseries/cpuidle: Make cpuidle-pseries backend driver a non-module.
powerpc/pseries/cpuidle: Use cpuidle_register() for initialisation.
powerpc/pseries/cpuidle: Move processor_idle.c to drivers/cpuidle.
powerpc: Fix 32-bit frames for signals delivered when transactional
powerpc/iommu: Fix initialisation of DART iommu table
powerpc/numa: Fix decimal permissions
powerpc/mm: Fix compile error of pgtable-ppc64.h
powerpc: Fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations
clk: corenet: Adds the clock binding
powerpc/booke64: Guard e6500 tlb handler with CONFIG_PPC_FSL_BOOK3E
powerpc/512x: dts: add MPC5125 clock specs
powerpc/512x: clk: support MPC5121/5123/5125 SoC variants
powerpc/512x: clk: enforce even SDHC divider values
...
Pull kbuild changes from Michal Marek:
- fix make -s detection with make-4.0
- fix for scripts/setlocalversion when the kernel repository is a
submodule
- do not hardcode ';' in macros that expand to assembler code, as some
architectures' assemblers use a different character for newline
- Fix passing --gdwarf-2 to the assembler
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
frv: Remove redundant debugging info flag
mn10300: Remove redundant debugging info flag
kbuild: Fix debugging info generation for .S files
arch: use ASM_NL instead of ';' for assembler new line character in the macro
kbuild: Fix silent builds with make-4
Fix detectition of kernel git repository in setlocalversion script [take #2]
Add my copyright to the zsmalloc source code which I maintain.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>