ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm
that it needs. Lets not assume this, as TCP stack might use a different
place.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options_echo() assumes struct ip_options is provided in &IPCB(skb)->opt
Lets break this assumption, but provide a helper to not change all call points.
ip_send_unicast_reply() gets a new struct ip_options pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6gre_tunnel_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.
So return NULL if the tunnel that should be created already
exists.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vti6_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.
So return NULL if the tunnel that should be created already
exists.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_tnl_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.
So return NULL if the tunnel that should be created already
exists.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_dma was the only external user so this can become local to tcp.c
again.
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that tcp_dma_try_early_copy() is gone nothing ever sets
copied_early.
Also reverts "53240c208776 tcp: Fix possible double-ack w/ user dma"
since it is no longer necessary.
Cc: Ali Saidi <saidi@engin.umich.edu>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Neal Cardwell <ncardwell@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.
This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.
Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():
https://lkml.org/lkml/2014/9/3/177
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: <stable@vger.kernel.org>
Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
With this alias, we don't need to load manually the module before adding an
ip6gretap interface with iproute2.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cache skb_shinfo(skb) in a variable to avoid computing it multiple
times.
Reorganize the tests to remove one indentation level.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the duplicated comment
"/* The following definitions are for users of the vport subsytem: */"
in vport.h
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
nf pull request for net
This series contains netfilter fixes for net, they are:
1) Fix lockdep splat in nft_hash when releasing sets from the
rcu_callback context. We don't the mutex there anymore.
2) Remove unnecessary spinlock_bh in the destroy path of the nf_tables
rbtree set type from rcu_callback context.
3) Fix another lockdep splat in rhashtable. None of the callers hold
a mutex when calling rhashtable_destroy.
4) Fix duplicated error reporting from nfnetlink when aborting and
replaying a batch.
5) Fix a Kconfig issue reported by kbuild robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
csum_partial() is a generic function which is not optimised for small fixed
length calculations, and its use requires to store "from" and "to" values in
memory while we already have them available in registers. This also has impact,
especially on RISC processors. In the same spirit as the change done by
Eric Dumazet on csum_replace2(), this patch rewrites inet_proto_csum_replace4()
taking into account RFC1624.
I spotted during a NATted tcp transfert that csum_partial() is one of top 5
consuming functions (around 8%), and the second user of csum_partial() is
inet_proto_csum_replace4().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While profiling TCP stack, I noticed one useless atomic operation
in tcp_sendmsg(), caused by skb_header_release().
It turns out all current skb_header_release() users have a fresh skb,
that no other user can see, so we can avoid one atomic operation.
Introduce __skb_header_release() to clearly document this.
This gave me a 1.5 % improvement on TCP_RR workload.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-09-22
Please pull this batch of updates intended for the 3.18 stream...
For the mac80211 bits, Johannes says:
"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."
For the bluetooth bits, Johan says:
"Here are some more patches intended for 3.18. Most of them are cleanups
or fixes for SMP. The only exception is a fix for BR/EDR L2CAP fixed
channels which should now work better together with the L2CAP
information request procedure."
For the iwlwifi bits, Emmanuel says:
"I fix here dvm which was broken by my last pull request. Arik
continues to work on TDLS and Luca solved a few issues in CT-Kill. Eyal
keeps digging into rate scaling code, more to come soon. Besides this,
nothing really special here."
Beyond that, there are the usual big batches of updates to ath9k, b43,
mwifiex, and wil6210 as well as a handful of other bits here and there.
Also, rtlwifi gets some btcoexist attention from Larry.
Please let me know if there are problems!
====================
Had to adjust the wil6210 code to comply with Joe Perches's recent
change in net-next to make the netdev_*() routines return void instead
of 'int'.
Signed-off-by: David S. Miller <davem@davemloft.net>
No caller or macro uses the return value so make all
the functions return void.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the NFC pull request for 3.18.
We've had major updates for TI and ST Microelectronics drivers:
For TI's trf7970a driver:
- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues
For ST Microelectronics' ST21NFCA and ST21NFCB drivers:
- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes
Finally, Marvell add ISO15693 support to the NCI stack, together with a
couple of NCI fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUIg4oAAoJEIqAPN1PVmxKg+kP/RPrgH6LA1tFubIwKR2+sGQ7
g/W2J3AE8QASZkErkpXRNCt2D/SPWEKBY/qscwz+BtcWg76taIIaGTvVUNtxSaW3
gS4hG6V1UlANWv3KFfaOKmzjEOO/SPNtkFAyI0cTOaGyUqG4o9BgBZpn1rYO16MD
ZkSC39MpjMoXB9BbsfQngoUEoWc3tZNMmRzk4IVTwE/wXuQvZxmFQXEAiZ+pnYle
NQfugaGMz0526rLG3QnrpkUakFb81iQwtONpbx6i8KW/Klkc6TN/ek6J9ecU8t5z
tdHOViZWRmA1VwMGBHwpq8F2o/ATH6GeivTgqrQjcjGNhCUUT1Ulzve2UxGEMWi6
ncjKY/GxUrYaMMtRvLv+/knrfbWtd+EnWOav07jgNrrA0tvgBNQvEKKHPoWykDVN
QKpxu3YoNxrsR/LJMS+Zjj0IIM1Y+9DTOkLXzxJ5Hvht8rOl5heYGh2DICOpWsbQ
ejrQicJOJvN5vqu+Sgcqq4msyTEdbs2LfRDrW1VC9A6ILI+KzYg2laTFGMnhZ5qn
TgsYIDdONS2iGUulFHylGHI7ANtUg/mhklLUccY1HQYyiAM1NQUtzq1tAz6yLoIH
l8iIiyzJSBWW57nWhyrULEbzHgPE+bHIjO4T+UUOxMgquYa4V11S1uP0OfWfZogR
xS24GlobS2oXHMQqh0fA
=d/C+
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"NFC: 3.18 pull request
This is the NFC pull request for 3.18.
We've had major updates for TI and ST Microelectronics drivers:
For TI's trf7970a driver:
- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues
For ST Microelectronics' ST21NFCA and ST21NFCB drivers:
- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes
Finally, Marvell add ISO15693 support to the NCI stack, together with a
couple of NCI fixes."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Jesper reported that br_netfilter always registers the hooks since
this is part of the bridge core. This harms performance for people that
don't need this.
This patch modularizes br_netfilter so it can be rmmod'ed, thus,
the hooks can be unregistered. I think the bridge netfilter should have
been a separated module since the beginning, Patrick agreed on that.
Note that this is breaking compatibility for users that expect that
bridge netfilter is going to be available after explicitly 'modprobe
bridge' or via automatic load through brctl.
However, the damage can be easily undone by modprobing br_netfilter.
The bridge core also spots a message to provide a clue to people that
didn't notice that this has been deprecated.
On top of that, the plan is that nftables will not rely on this software
layer, but integrate the connection tracking into the bridge layer to
enable stateful filtering and NAT, which is was bridge netfilter users
seem to require.
This patch still keeps the fake_dst_ops in the bridge core, since this
is required by when the bridge port is initialized. So we can safely
modprobe/rmmod br_netfilter anytime.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
Move nf_bridge_copy_header() as static inline in netfilter_bridge.h
header file. This patch prepares the modularization of the br_netfilter
code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reduce boilerplate code by using __seq_open_private() instead of seq_open()
in xt_match_open() and xt_target_open().
Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When we try to add an already existing tunnel, we don't return
an error. Instead we continue and call ip_tunnel_update().
This means that we can change existing tunnels by adding
the same tunnel multiple times. It is even possible to change
the tunnel endpoints of the fallback device.
We fix this by returning an error if we try to add an existing
tunnel.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While using a MQ + NETEM setup, I had confirmation that the default
timer migration ( /proc/sys/kernel/timer_migration ) is killing us.
Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
queues) :
EST="est 1sec 4sec"
for ETH in eth1
do
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
done
We can see that timers get migrated into a single cpu, presumably idle
at the time timers are set up.
Then all qdisc dequeues run from this cpu and huge lock contention
happens. This single cpu is stuck in softirq mode and cannot dequeue
fast enough.
39.24% [kernel] [k] _raw_spin_lock
2.65% [kernel] [k] netem_enqueue
1.80% [kernel] [k] netem_dequeue
1.63% [kernel] [k] copy_user_enhanced_fast_string
1.45% [kernel] [k] _raw_spin_lock_bh
By pinning qdisc timers on the cpu running the qdisc, we respect proper
XPS setting and remove this lock contention.
5.84% [kernel] [k] netem_enqueue
4.83% [kernel] [k] _raw_spin_lock
2.92% [kernel] [k] copy_user_enhanced_fast_string
Current Qdiscs that benefit from this change are :
netem, cbq, fq, hfsc, tbf, htb.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The send_check logic was only interesting in cases of TCP offload and
UDP UFO where the checksum needed to be initialized to the pseudo
header checksum. Now we've moved that logic into the related
gso_segment functions so gso_send_check is no longer needed.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In udp[46]_ufo_send_check the UDP checksum initialized to the pseudo
header checksum. We can move this logic into udp[46]_ufo_fragment.
After this change udp[64]_ufo_send_check is a no-op.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tcp_v[46]_gso_send_check the TCP checksum is initialized to the
pseudo header checksum using __tcp_v[46]_send_check. We can move this
logic into new tcp[46]_gso_segment functions to be done when
ip_summed != CHECKSUM_PARTIAL (ip_summed == CHECKSUM_PARTIAL should be
the common case, possibly always true when taking GSO path). After this
change tcp_v[46]_gso_send_check is no-op.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in
order to allow NFSv4 clients to disable resend timeouts. Since those
cause the RPC layer to break the connection, they mess up the duplicate
reply caches that remain indexed on the port number in NFSv4..
This patch includes the code that was missing in the original to
set the appropriate flag in struct rpc_clnt, when the caller of
rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT.
Fixes: 8a19a0b6cb2e (SUNRPC: Add RPC task and client level options to...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
- it doesn't hurt for kswapd to block occasionally. If it doesn't
want to block it would clear __GFP_WAIT. The current_is_kswapd()
was only added to avoid deadlocks and we have a new approach for
that.
- memory allocation in the SUNRPC layer can very rarely try to
->releasepage() a page it is trying to handle. The deadlock
is removed as nfs_release_page() doesn't block indefinitely.
So we don't need to set PF_FSTRANS for sunrpc network operations any
more.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The sco_param_wideband table represents the eSCO parameters for
specifically mSBC encoding. This patch renames the table to the more
descriptive esco_param_msbc name.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If an iptables drop rule is added for an nfs server, the client can end up in
a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
is ignored since the prior bits of the packet may have been successfully queued
and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
thinks that because some bits were queued it should return -EAGAIN. We then try
the request again and again, resulting in cpu spinning. Reproducer:
1) open a file on the nfs server '/nfs/foo' (mounted using udp)
2) iptables -A OUTPUT -d <nfs server ip> -j DROP
3) write to /nfs/foo
4) close /nfs/foo
5) iptables -D OUTPUT -d <nfs server ip> -j DROP
The softlockup occurs in step 4 above.
The previous patch, allows xs_sendpages() to return both a sent count and
any error values that may have occurred. Thus, if we get an -EPERM, return
that to the higher level code.
With this patch in place we can successfully abort the above sequence and
avoid the softlockup.
I also tried the above test case on an nfs mount on tcp and although the system
does not softlockup, I still ended up with the 'hung_task' firing after 120
seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
since -EPERM appears to get ignored much lower down in the stack and does not
propogate up to xs_sendpages(). This case is not quite as insidious as the
softlockup and it is not addressed here.
Reported-by: Yigong Lou <ylou@akamai.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If an error is returned after the first bits of a packet have already been
successfully queued, xs_sendpages() will return a positive 'int' value
indicating success. Callers seem to treat this as -EAGAIN.
However, there are cases where its not a question of waiting for the write
queue to drain. For example, when there is an iptables rule dropping packets
to the destination, the lower level code can return -EPERM only after parts
of the packet have been successfully queued. In this case, we can end up
continuously retrying resulting in a kernel softlockup.
This patch is intended to make no changes in behavior but is in preparation for
subsequent patches that can make decisions based on both on the number of bytes
sent by xs_sendpages() and any errors that may have be returned.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When aborting a connection to preserve source ports, don't wake the task in
xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the
connection needs to be re-established since it preserves the task's status
instead of setting it to the status of the aborting kernel_connect().
This may also avoid a potential conflict on the socket's lock.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
It is expected that new parameter combinations will have the
retransmission effort value different between some entries (mainly
because of the new S4 configuration added by HFP 1.7), so it makes sense
to move it into the table instead of having it hard coded based on the
selected SCO_AIRMODE_*.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
John W. Linville says:
====================
pull request: wireless 2014-09-23
Please consider pulling this one last batch of fixes intended for the 3.17 stream!
For the NFC bits, Samuel says:
"Hopefully not too late for a handful of NFC fixes:
- 2 potential build failures for ST21NFCA and ST21NFCB, triggered by a
depmod dependenyc cycle.
- One potential buffer overflow in the microread driver."
On top of that...
Emil Goode provides a fix for a brcmfmac off-by-one regression which
was introduced in the 3.17 cycle.
Loic Poulain fixes a polarity mismatch for a variable assignment
inside of rfkill-gpio.
Wojciech Dubowik prevents a NULL pointer dereference in ath9k.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The 6lowpan ipv6 header compression was causing problems for other interfaces
that expected a ipv6 header to still be in place, as we were replacing the
ipv6 header with a compressed version. This happened if you sent a packet to a
multicast address as the packet would be output on 802.15.4, ethernet, and also
be sent to the loopback interface. The skb data was shared between these
interfaces so all interfaces ended up with a compressed ipv6 header.
The solution is to ensure that before we do any header compression we are not
sharing the skb or skb data with any other interface. If we are then we must
take a copy of the skb and skb data before modifying the ipv6 header.
The only place we can copy the skb is inside the xmit function so we don't
leave dangling references to skb.
This patch moves all the header compression to inside the xmit function. Very
little code has been changed it has mostly been moved from lowpan_header_create
to lowpan_xmit. At the top of the xmit function we now check if the skb is
shared and if so copy it. In lowpan_header_create all we do now is store the
source and destination addresses for use later when we compress the header.
Signed-off-by: Simon Vincent <simon.vincent@xsilon.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The core specification defines valid values for the
HCI_Reject_Synchronous_Connection_Request command to be 0x0D-0x0F. So
far the code has been using HCI_ERROR_REMOTE_USER_TERM (0x13) which is
not a valid value and is therefore being rejected by some controllers:
> HCI Event: Connect Request (0x04) plen 10
bdaddr 40:6F:2A:6A:E5:E0 class 0x000000 type eSCO
< HCI Command: Reject Synchronous Connection (0x01|0x002a) plen 7
bdaddr 40:6F:2A:6A:E5:E0 reason 0x13
Reason: Remote User Terminated Connection
> HCI Event: Command Status (0x0f) plen 4
Reject Synchronous Connection (0x01|0x002a) status 0x12 ncmd 1
Error: Invalid HCI Command Parameters
This patch introduces a new define for a value from the valid range
(0x0d == Connection Rejected Due To Limited Resources) and uses it
instead for rejecting incoming connections.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
No caller or macro uses the return value so make all
the functions return void.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In NFC Forum NCI specification, some RF Protocol values are
reserved for proprietary use (from 0x80 to 0xfe).
Some CLF vendor may need to use one value within this range
for specific technology.
Furthermore, some CLF may not becompliant with NFC Froum NCI
specification 2.0 and therefore will not support RF Protocol
value 0x06 for PROTOCOL_T5T as mention in a draft specification
and in a recent push.
Adding get_rf_protocol handle to the nci_ops structure will
help to set the correct technology to target.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to make TCP more resilient in presence of reorders, we need
to allow coalescing to happen when skbs from out of order queue are
transferred into receive queue. LRO/GRO can be completely canceled
in some pathological cases, like per packet load balancing on aggregated
links.
I had to move tcp_try_coalesce() up in the file above tcp_ofo_queue()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
protected by a lock, meaning that hosts can be stuck hard if all cpus
want to check ICMP limits.
When say a DNS or NTP server process is restarted, inetpeer tree grows
quick and machine comes to its knees.
iptables can not help because the bottleneck happens before ICMP
messages are even cooked and sent.
This patch adds a new global limitation, using a token bucket filter,
controlled by two new sysctl :
icmp_msgs_per_sec - INTEGER
Limit maximal number of ICMP packets sent per second from this host.
Only messages whose type matches icmp_ratemask are
controlled by this limit.
Default: 1000
icmp_msgs_burst - INTEGER
icmp_msgs_per_sec controls number of ICMP packets sent per second,
while icmp_msgs_burst controls the burst size of these packets.
Default: 50
Note that if we really want to send millions of ICMP messages per
second, we might extend idea and infra added in commit 04ca6973f7c1a
("ip: make IP identifiers less predictable") :
add a token bucket in the ip_idents hash and no longer rely on inetpeer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
arch/mips/net/bpf_jit.c
drivers/net/can/flexcan.c
Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
SCO connection cannot be setup to devices that do not support retransmission.
Patch based on http://permalink.gmane.org/gmane.linux.bluez.kernel/7779 and
adapted for this kernel version.
Code changed to check SCO/eSCO type before setting retransmission effort
and max. latency. The purpose of the patch is to support older devices not
capable of eSCO.
Tested on Blackberry 655+ headset which does not support retransmission.
Credits go to Alexander Sommerhuber.
Signed-off-by: Bernhard Thaler <bernhard.thaler@r-it.at>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Pull networking fixes from David Miller:
1) If the user gives us a msg_namelen of 0, don't try to interpret
anything pointed to by msg_name. From Ani Sinha.
2) Fix some bnx2i/bnx2fc randconfig compilation errors.
The gist of the issue is that we firstly have drivers that span both
SCSI and networking. And at the top of that chain of dependencies
we have things like SCSI_FC_ATTRS and SCSI_NETLINK which are
selected.
But since select is a sledgehammer and ignores dependencies,
everything to select's SCSI_FC_ATTRS and/or SCSI_NETLINK has to also
explicitly select their dependencies and so on and so forth.
Generally speaking 'select' is supposed to only be used for child
nodes, those which have no dependencies of their own. And this
whole chain of dependencies in the scsi layer violates that rather
strongly.
So just make SCSI_NETLINK depend upon it's dependencies, and so on
and so forth for the things selecting it (either directly or
indirectly).
From Anish Bhatt and Randy Dunlap.
3) Fix generation of blackhole routes in IPSEC, from Steffen Klassert.
4) Actually notice netdev feature changes in rtl_open() code, from
Hayes Wang.
5) Fix divide by zero in bond enslaving, from Nikolay Aleksandrov.
6) Missing memory barrier in sunvnet driver, from David Stevens.
7) Don't leave anycast addresses around when ipv6 interface is
destroyed, from Sabrina Dubroca.
8) Don't call efx_{arch}_filter_sync_rx_mode before addr_list_lock is
initialized in SFC driver, from Edward Cree.
9) Fix missing DMA error checking in 3c59x, from Neal Horman.
10) Openvswitch doesn't emit OVS_FLOW_CMD_NEW notifications accidently,
fix from Samuel Gauthier.
11) pch_gbe needs to select NET_PTP_CLASSIFY otherwise we can get a
build error.
12) Fix macvlan regression wherein we stopped emitting
broadcast/multicast frames over software devices. From Nicolas
Dichtel.
13) Fix infiniband bug due to unintended overflow of skb->cb[], from
Eric Dumazet. And add an assertion so this doesn't happen again.
14) dm9000_parse_dt() should return error pointers, not NULL. From
Tobias Klauser.
15) IP tunneling code uses this_cpu_ptr() in preemptible contexts, fix
from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma
net: bcmgenet: fix TX reclaim accounting for fragments
ipv4: do not use this_cpu_ptr() in preemptible context
dm9000: Return an ERR_PTR() in all error conditions of dm9000_parse_dt()
r8169: fix an if condition
r8152: disable ALDPS
ipoib: validate struct ipoib_cb size
net: sched: shrink struct qdisc_skb_cb to 28 bytes
tg3: Work around HW/FW limitations with vlan encapsulated frames
macvlan: allow to enqueue broadcast pkt on virtual device
pch_gbe: 'select' NET_PTP_CLASSIFY.
scsi: Use 'depends' with LIBFC instead of 'select'.
openvswitch: restore OVS_FLOW_CMD_NEW notifications
genetlink: add function genl_has_listeners()
lib: rhashtable: remove second linux/log2.h inclusion
net: allow macvlans to move to net namespace
3c59x: Fix bad offset spec in skb_frag_dma_map
3c59x: Add dma error checking and recovery
sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG
can: at91_can: add missing prepare and unprepare of the clock
...
this_cpu_ptr() in preemptible context is generally bad
Sep 22 05:05:55 br kernel: [ 94.608310] BUG: using smp_processor_id()
in
preemptible [00000000] code: ip/2261
Sep 22 05:05:55 br kernel: [ 94.608316] caller is
tunnel_dst_set.isra.28+0x20/0x60 [ip_tunnel]
Sep 22 05:05:55 br kernel: [ 94.608319] CPU: 3 PID: 2261 Comm: ip Not
tainted
3.17.0-rc5 #82
We can simply use raw_cpu_ptr(), as preemption is safe in these
contexts.
Should fix https://bugzilla.kernel.org/show_bug.cgi?id=84991
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Joe <joe9mail@gmail.com>
Fixes: 9a4aa9af447f ("ipv4: Use percpu Cache route in IP tunnels")
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
$ grep CONFIG_CLS_U32_MARK .config
# CONFIG_CLS_U32_MARK is not set
net/sched/cls_u32.c: In function 'u32_change':
net/sched/cls_u32.c:852:1: warning: label 'errout' defined but not used
[-Wunused-label]
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2014-09-22
We generate a blackhole or queueing route if a packet
matches an IPsec policy but a state can't be resolved.
Here we assume that dst_output() is called to kill
these packets. Unfortunately this assumption is not
true in all cases, so it is possible that these packets
leave the system without the necessary transformations.
This pull request contains two patches to fix this issue:
1) Fix for blackhole routed packets.
2) Fix for queue routed packets.
Both patches are serious stable candidates.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
icsk_rto is a 32bit field, and icsk_backoff can reach 15 by default,
or more if some sysctl (eg tcp_retries2) are changed.
Better use 64bit to perform icsk_rto << icsk_backoff operations
As Joe Perches suggested, add a helper for this.
Yuchung spotted the tcp_v4_err() case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC2710 (MLDv1), section 3.7. says:
The length of a received MLD message is computed by taking the
IPv6 Payload Length value and subtracting the length of any IPv6
extension headers present between the IPv6 header and the MLD
message. If that length is greater than 24 octets, that indicates
that there are other fields present *beyond* the fields described
above, perhaps belonging to a *future backwards-compatible* version
of MLD. An implementation of the version of MLD specified in this
document *MUST NOT* send an MLD message longer than 24 octets and
MUST ignore anything past the first 24 octets of a received MLD
message.
RFC3810 (MLDv2), section 8.2.1. states for *listeners* regarding
presence of MLDv1 routers:
In order to be compatible with MLDv1 routers, MLDv2 hosts MUST
operate in version 1 compatibility mode. [...] When Host
Compatibility Mode is MLDv2, a host acts using the MLDv2 protocol
on that interface. When Host Compatibility Mode is MLDv1, a host
acts in MLDv1 compatibility mode, using *only* the MLDv1 protocol,
on that interface. [...]
While section 8.3.1. specifies *router* behaviour regarding presence
of MLDv1 routers:
MLDv2 routers may be placed on a network where there is at least
one MLDv1 router. The following requirements apply:
If an MLDv1 router is present on the link, the Querier MUST use
the *lowest* version of MLD present on the network. This must be
administratively assured. Routers that desire to be compatible
with MLDv1 MUST have a configuration option to act in MLDv1 mode;
if an MLDv1 router is present on the link, the system administrator
must explicitly configure all MLDv2 routers to act in MLDv1 mode.
When in MLDv1 mode, the Querier MUST send periodic General Queries
truncated at the Multicast Address field (i.e., 24 bytes long),
and SHOULD also warn about receiving an MLDv2 Query (such warnings
must be rate-limited). The Querier MUST also fill in the Maximum
Response Delay in the Maximum Response Code field, i.e., the
exponential algorithm described in section 5.1.3. is not used. [...]
That means that we should not get queries from different versions of
MLD. When there's a MLDv1 router present, MLDv2 enforces truncation
and MRC == MRD (both fields are overlapping within the 24 octet range).
Section 8.3.2. specifies behaviour in the presence of MLDv1 multicast
address *listeners*:
MLDv2 routers may be placed on a network where there are hosts
that have not yet been upgraded to MLDv2. In order to be compatible
with MLDv1 hosts, MLDv2 routers MUST operate in version 1 compatibility
mode. MLDv2 routers keep a compatibility mode per multicast address
record. The compatibility mode of a multicast address is determined
from the Multicast Address Compatibility Mode variable, which can be
in one of the two following states: MLDv1 or MLDv2.
The Multicast Address Compatibility Mode of a multicast address
record is set to MLDv1 whenever an MLDv1 Multicast Listener Report is
*received* for that multicast address. At the same time, the Older
Version Host Present timer for the multicast address is set to Older
Version Host Present Timeout seconds. The timer is re-set whenever a
new MLDv1 Report is received for that multicast address. If the Older
Version Host Present timer expires, the router switches back to
Multicast Address Compatibility Mode of MLDv2 for that multicast
address. [...]
That means, what can happen is the following scenario, that hosts can
act in MLDv1 compatibility mode when they previously have received an
MLDv1 query (or, simply operate in MLDv1 mode-only); and at the same
time, an MLDv2 router could start up and transmits MLDv2 startup query
messages while being unaware of the current operational mode.
Given RFC2710, section 3.7 we would need to answer to that with an MLDv1
listener report, so that the router according to RFC3810, section 8.3.2.
would receive that and internally switch to MLDv1 compatibility as well.
Right now, I believe since the initial implementation of MLDv2, Linux
hosts would just silently drop such MLDv2 queries instead of replying
with an MLDv1 listener report, which would prevent a MLDv2 router going
into fallback mode (until it receives other MLDv1 queries).
Since the mapping of MRC to MRD in exactly such cases can make use of
the exponential algorithm from 5.1.3, we cannot [strictly speaking] be
aware in MLDv1 of the encoding in MRC, it seems also not mentioned by
the RFC. Since encodings are the same up to 32767, assume in such a
situation this value as a hard upper limit we would clamp. We have asked
one of the RFC authors on that regard, and he mentioned that there seem
not to be any implementations that make use of that exponential algorithm
on startup messages. In any case, this patch fixes this MLD
interoperability issue.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clock is disabled when the device is blocked.
So, clock_enabled is the logical negation of "blocked".
Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>