17481 Commits

Author SHA1 Message Date
Christian Lamparter
2c31333a8f mac80211: ignore non-bcast mcast deauth/disassoc franes
This patch fixes an curious issue due to insufficient
rx frame filtering.

Saqeb Akhter reported frequent disconnects while streaming
videos over samba: <http://marc.info/?m=128600031109136>
> [ 1166.512087] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [ 1526.059997] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [ 2125.324356] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [...]

The reason is that the device generates frames with slightly
bogus SA/TA addresses.

e.g.:
 [ 2314.402316] Ignore 9f:1f:31:f8:64:ff
 [ 2314.402321] Ignore 9f:1f:31:f8:64:ff
 [ 2352.453804] Ignore 0d:1f:31:f8:64:ff
 [ 2352.453808] Ignore 0d:1f:31:f8:64:ff
 					   ^^ the group-address flag is set!
 (the correct SA/TA would be: 00:1f:31:f8:64:ff)

Since the AP does not know from where the frames come, it
generates a DEAUTH response for the (invalid) mcast address.
This mcast deauth frame then passes through all filters and
tricks the stack into thinking that the AP brutally kicked
us!

This patch fixes the problem by simply ignoring
non-broadcast, group-addressed deauth/disassoc frames.

Cc: Jouni Malinen <j@w1.fi>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Saqeb Akhter <saqeb.akhter@gmail.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-30 13:23:06 -05:00
Johannes Berg
dd318575ff mac80211: fix RX aggregation locking
The RX aggregation locking documentation was
wrong, which led Christian to also code the
timer timeout handling for it somewhat wrongly.

Fix the documentation, the two places that
need to hold the reorder lock across accesses
to the structure, and the debugfs code that
should just use RCU.

Also, remove acquiring the sta->lock across
reorder timeouts since it isn't necessary, and
change a few places to GFP_KERNEL because the
code path here doesn't need atomic allocations
as I noticed when reviewing all this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:30:30 -05:00
Johannes Berg
f30221e4ec mac80211: implement off-channel mgmt TX
This implements the new off-channel TX API
in mac80211 with a new work item type. The
operation doesn't add a new work item when
we're on the right channel and there's no
wait time so that for example p2p probe
responses will be transmitted without delay.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:24:35 -05:00
Johannes Berg
f7ca38dfe5 nl80211/cfg80211: extend mgmt-tx API for off-channel
With p2p, it is sometimes necessary to transmit
a frame (typically an action frame) on another
channel than the current channel. Enable this
through the CMD_FRAME API, and allow it to wait
for a response. A new command allows that wait
to be aborted.

However, allow userspace to specify whether or
not it wants to allow off-channel TX, it may
actually want to use the same channel only.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:24:35 -05:00
Jouni Malinen
7dff312553 mac80211: Fix frame injection using non-AP vif
In order for frame injection to work properly for some use cases
(e.g., finding the station entry and keys for encryption), mac80211
needs to find the correct sdata entry. This works when the main vif
is in AP mode, but commit a2c1e3dad516618cb0fbfb1a62c36d0b0744573a
broke this particular use case for station main vif. While this type of
injection is quite unusual operation, it has some uses and we should fix
it. Do this by changing the monitor vif sdata selection to allow station
vif to be selected instead of limiting it to just AP vifs. We still need
to skip some iftypes to avoid selecting unsuitable vif for injection.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 14:41:28 -05:00
David S. Miller
77148625e1 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-29 11:19:09 -08:00
Eric Dumazet
25888e3031 af_unix: limit recursion level
Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.

lkml ref : http://lkml.org/lkml/2010/11/25/8

This mechanism is used in applications, one choice we have is to have a
recursion limit.

Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.

Add a recursion_level to unix socket, allowing up to 4 levels.

Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.

Reported-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:45:15 -08:00
Eric Dumazet
a417786948 xps: add __rcu annotations
Avoid sparse warnings : add __rcu annotations and use
rcu_dereference_protected() where necessary.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:43:13 -08:00
Eric Dumazet
b02038a17b xps: NUMA allocations for per cpu data
store_xps_map() allocates maps that are used by single cpu, it makes
sense to use NUMA allocations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:43:13 -08:00
Tom Herbert
bf26414510 xps: Add CONFIG_XPS
This patch adds XPS_CONFIG option to enable and disable XPS.  This is
done in the same manner as RPS_CONFIG.  This is also fixes build
failure in XPS code when SMP is not enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 18:24:14 -08:00
Nagendra Tomar
b4ff3c90e6 inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners
inet sockets corresponding to passive connections are added to the bind hash
using ___inet_inherit_port(). These sockets are later removed from the bind
hash using __inet_put_port(). These two functions are not exactly symmetrical.
__inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas
___inet_inherit_port() does not increment them. This results in both of these
going to -ve values.

This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(),
which does the right thing.

'bsockets' and 'num_owners' were introduced by commit a9d8f9110d7e953c
(inet: Allowing more than 64k connections and heavily optimize bind(0))

Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 18:18:44 -08:00
Dan Rosenberg
3c6f27bf33 DECnet: don't leak uninitialized stack byte
A single uninitialized padding byte is leaked to userspace.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:32:30 -08:00
Gerrit Renker
0ac7887022 dccp: fix error in updating the GAR
This fixes a bug in updating the Greatest Acknowledgment number Received (GAR):
the current implementation does not track the greatest received value -
lower values in the range AWL..AWH (RFC 4340, 7.5.1) erase higher ones.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:29:27 -08:00
Jozsef Kadlecsik
82a39eb6b3 ipv6: Prepare the tree for un-inlined jhash.
jhash is widely used in the kernel and because the functions
are inlined, the cost in size is significant. Also, the new jhash
functions are slightly larger than the previous ones so better un-inline.
As a preparation step, the calls to the internal macros are replaced
with the plain jhash function calls.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:26:21 -08:00
andrew hendry
3f0a069a1d X25 remove bkl in call user data length ioctl
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:22 -08:00
andrew hendry
74a7e44080 X25 remove bkl from causediag ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:21 -08:00
andrew hendry
5b7958dfa5 X25 remove bkl from calluserdata ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:21 -08:00
andrew hendry
f90de66067 X25 remove bkl in facility ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:20 -08:00
andrew hendry
5595a1a599 X25 remove bkl in subscription ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:20 -08:00
John Fastabend
19eb5cc559 8021q: vlan device is lockless do not transfer real_num_{tx|rx}_queues
Now that the vlan device is lockless and single queue do not
transfer the real num queues. This is causing a BUG_ON to occur.

kernel BUG at net/8021q/vlan.c:345!
Call Trace:
[<ffffffff813fd6e8>] ? fib_rules_event+0x28/0x1b0
[<ffffffff814ad2b5>] notifier_call_chain+0x55/0x80
[<ffffffff81089156>] raw_notifier_call_chain+0x16/0x20
[<ffffffff813e5af7>] call_netdevice_notifiers+0x37/0x70
[<ffffffff813e6756>] netdev_features_change+0x16/0x20
[<ffffffffa02995be>] ixgbe_fcoe_enable+0xae/0x100 [ixgbe]
[<ffffffffa01da06a>] vlan_dev_fcoe_enable+0x2a/0x30 [8021q]
[<ffffffffa02d08c3>] fcoe_create+0x163/0x630 [fcoe]
[<ffffffff811244d5>] ? mmap_region+0x255/0x5a0
[<ffffffff81080ef0>] param_attr_store+0x50/0x80
[<ffffffff810809b6>] module_attr_store+0x26/0x30
[<ffffffff811b9db2>] sysfs_write_file+0xf2/0x180
[<ffffffff8114fc88>] vfs_write+0xc8/0x190
[<ffffffff81150621>] sys_write+0x51/0x90
[<ffffffff8100c0b2>] system_call_fastpath+0x16/0x1b

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:19 -08:00
Eric Dumazet
5a0d2268d2 net: add netif_tx_queue_frozen_or_stopped
When testing struct netdev_queue state against FROZEN bit, we also test
XOFF bit. We can test both bits at once and save some cycles.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:18 -08:00
Shan Wei
d3c15cab21 ipv6: kill two unused macro definition
1. IPV6_TLV_TEL_DST_SIZE
This has not been using for several years since created.

2. RT6_INFO_LEN
commit 33120b30 kill all RT6_INFO_LEN's references, but only this definition remained.

commit 33120b30cc3b8665204d4fcde7288638b0dd04d5
Author: Alexey Dobriyan <adobriyan@sw.ru>
Date:   Tue Nov 6 05:27:11 2007 -0800

    [IPV6]: Convert /proc/net/ipv6_route to seq_file interface

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:17 -08:00
Uwe Kleine-König
a40c9f88b5 net: add some KERN_CONT markers to continuation lines
Cc: netdev@vger.kernel.org
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:17 -08:00
Alexey Dobriyan
0147fc058d tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)
tcp_win_from_space() does the following:

      if (sysctl_tcp_adv_win_scale <= 0)
              return space >> (-sysctl_tcp_adv_win_scale);
      else
              return space - (space >> sysctl_tcp_adv_win_scale);

"space" is int.

As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.

Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.

Which means we busyloop in tcp_fixup_rcvbuf().

Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

Steps to reproduce:

      echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
      wget www.kernel.org
      [softlockup]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:39:45 -08:00
Pavel Emelyanov
8475ef9fd1 netns: Don't leak others' openreq-s in proc
The /proc/net/tcp leaks openreq sockets from other namespaces.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 22:57:48 -08:00
Thomas Graf
cf7afbfeb8 rtnl: make link af-specific updates atomic
As David pointed out correctly, updates to af-specific attributes
are currently not atomic. If multiple changes are requested and
one of them fails, previous updates may have been applied already
leaving the link behind in a undefined state.

This patch splits the function parse_link_af() into two functions
validate_link_af() and set_link_at(). validate_link_af() is placed
to validate_linkmsg() check for errors as early as possible before
any changes to the link have been made. set_link_af() is called to
commit the changes later.

This method is not fail proof, while it is currently sufficient
to make set_link_af() inerrable and thus 100% atomic, the
validation function method will not be able to detect all error
scenarios in the future, there will likely always be errors
depending on states which are f.e. not protected by rtnl_mutex
and thus may change between validation and setting.

Also, instead of silently ignoring unknown address families and
config blocks for address families which did not register a set
function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
returned to avoid comitting 4 out of 5 update requests without
notifying the user.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 22:56:08 -08:00
Tracey Dent
4cb6a614ba Net: ceph: Makefile: Remove unnessary code
Remove the if and else conditional because the code is in mainline and there
is no need in it being there.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 17:39:29 -08:00
John W. Linville
51cce8a590 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-11-24 16:49:20 -05:00
Johannes Berg
99ba2a1428 mac80211: implement packet loss notification
For drivers that have accurate TX status reporting
we can report the number of consecutive lost packets
to userspace using the new cfg80211 CQM event. The
threshold is fixed right now, this may need to be
improved in the future.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Johannes Berg
c063dbf52b cfg80211: allow using CQM event to notify packet loss
This adds the ability for drivers to use CQM events
to notify about packet loss for specific stations
(which could be the AP for the managed mode case).
Since the threshold might be determined by the
driver (it isn't passed in right now) it will be
passed out of the driver to userspace in the event.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Luis R. Rodriguez
48124d1a91 mac80211: avoid aggregation for VO traffic
This should help with latency issues which can happen when
using aggregation.

Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Matt Smith <matt.smith@atheros.com>
Cc: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Felix Fietkau
72a8a3edd6 mac80211: reduce the number of retries for nullfunc probing
Since nullfunc frames are transmitted as unicast frames, they're more
reliable than the broadcast probe requests, so we need fewer retries
to figure out whether the AP is really gone.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau
4e5ff37692 mac80211: use nullfunc instead of probe request for connection monitoring
nullfunc frames are better for connection monitoring, because probe requests
are answered even if the AP has already dropped the connection, whereas
nullfunc frames from an unassociated station will trigger a disassoc/deauth
frame from the AP (WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA), which allows
the station to reconnect immediately instead of waiting until it attempts to
transmit the next unicast frame.

This only works on hardware with reliable tx ACK reporting, any other hardware
needs to fall back to the probe request method.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau
dd5b4cc71c cfg80211/mac80211: improve ad-hoc multicast rate handling
- store the multicast rate as an index instead of the rate value
  (reduces cpu overhead in a hotpath)
- validate the rate values (must match a bitrate in at least one sband)

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau
46090979a5 mac80211: probe the AP when resuming
Check the connection by probing the AP (either using nullfunc or a
probe request). If nullfunc probing is supported and the assoc is no
longer valid, the AP will send a disassoc/deauth immediately.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Felix Fietkau
7ccc8bd759 mac80211: calculate beacon loss time accurately
Instead of using a fixed 2 second timeout, calculate beacon loss interval
from the advertised beacon interval and a frame count.  With this beacon
loss happens after N (default 7) consecutive frames are missed which
for a typical setup (100TU beacon interval) is ~700ms (or ~1/3 previous).

Signed-off-by: Sam Leffler <sleffler@chromium.org>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Felix Fietkau
c8a7972c3b mac80211: restart beacon miss timer on system resume from suspend
Signed-off-by: Paul Stewart <pstew@google.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Joe Perches
e9c0268f02 net/wireless: Use pr_<level> and netdev_<level>
No change in output for pr_<level> prefixes.
netdev_<level> output is different, arguably improved.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:33 -05:00
John W. Linville
d7a066c923 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-11-24 16:19:24 -05:00
John W. Linville
ccb1435401 Revert "nl80211/mac80211: Report signal average"
This reverts commit 86107fd170bc379869250eb7e1bd393a3a70e8ae.

This patch inadvertantly changed the userland ABI.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:18:36 -05:00
Phil Blundell
a27e13d370 econet: fix CVE-2010-3848
Don't declare variable sized array of iovecs on the stack since this
could cause stack overflow if msg->msgiovlen is large.  Instead, coalesce
the user-supplied data into a new buffer and use a single iovec for it.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:51:47 -08:00
Phil Blundell
16c41745c7 econet: fix CVE-2010-3850
Add missing check for capable(CAP_NET_ADMIN) in SIOCSIFADDR operation.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:49:53 -08:00
Phil Blundell
fa0e846494 econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849
Later parts of econet_sendmsg() rely on saddr != NULL, so return early
with EINVAL if NULL was passed otherwise an oops may occur.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:49:19 -08:00
David S. Miller
c39508d6f1 tcp: Make TCP_MAXSEG minimum more correct.
Use TCP_MIN_MSS instead of constant 64.

Reported-by: Min Zhang <mzhang@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:47:22 -08:00
Tom Herbert
1d24eb4815 xps: Transmit Packet Steering
This patch implements transmit packet steering (XPS) for multiqueue
devices.  XPS selects a transmit queue during packet transmission based
on configuration.  This is done by mapping the CPU transmitting the
packet to a queue.  This is the transmit side analogue to RPS-- where
RPS is selecting a CPU based on receive queue, XPS selects a queue
based on the CPU (previously there was an XPS patch from Eric
Dumazet, but that might more appropriately be called transmit completion
steering).

Each transmit queue can be associated with a number of CPUs which will
use the queue to send packets.  This is configured as a CPU mask on a
per queue basis in:

/sys/class/net/eth<n>/queues/tx-<n>/xps_cpus

The mappings are stored per device in an inverted data structure that
maps CPUs to queues.  In the netdevice structure this is an array of
num_possible_cpu structures where each structure holds and array of
queue_indexes for queues which that CPU can use.

The benefits of XPS are improved locality in the per queue data
structures.  Also, transmit completions are more likely to be done
nearer to the sending thread, so this should promote locality back
to the socket on free (e.g. UDP).  The benefits of XPS are dependent on
cache hierarchy, application load, and other factors.  XPS would
nominally be configured so that a queue would only be shared by CPUs
which are sharing a cache, the degenerative configuration woud be that
each CPU has it's own queue.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR test
with 1 byte req. and resp.

bnx2x on 16 core AMD
   XPS (16 queues, 1 TX queue per CPU)  1234K at 100% CPU
   No XPS (16 queues)                   996K at 100% CPU

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:44:20 -08:00
Tom Herbert
3853b5841c xps: Improvements in TX queue selection
In dev_pick_tx, don't do work in calculating queue
index or setting
the index in the sock unless the device has more than one queue.  This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.

We also allow the mapping of a socket to queue to be changed.  To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure.  If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket).  This patch includes the modification in TCP output
for setting this flag.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:44:19 -08:00
Eric Dumazet
bba14de987 scm: lower SCM_MAX_FD
Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
halved. (commit f8d570a4 added two pointers in this structure)

scm_fp_dup() should not copy whole structure (and trigger kmemcheck
warnings), but only the used part. While we are at it, only allocate
needed size.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:16:43 -08:00
Eric Dumazet
456b61bca8 ipv6: mcast: RCU conversion
ipv6_sk_mc_lock rwlock becomes a spinlock.

readers (inet6_mc_check()) now takes rcu_read_lock() instead of read
lock. Writers dont need to disable BH anymore.

struct ipv6_mc_socklist objects are reclaimed after one RCU grace
period.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:16:42 -08:00
Eric Dumazet
9915672d41 af_unix: limit unix_tot_inflight
Vegard Nossum found a unix socket OOM was possible, posting an exploit
program.

My analysis is we can eat all LOWMEM memory before unix_gc() being
called from unix_release_sock(). Moreover, the thread blocked in
unix_gc() can consume huge amount of time to perform cleanup because of
huge working set.

One way to handle this is to have a sensible limit on unix_tot_inflight,
tested from wait_for_unix_gc() and to force a call to unix_gc() if this
limit is hit.

This solves the OOM and also reduce overall latencies, and should not
slowdown normal workloads.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 09:15:27 -08:00
Helmut Schaa
18890d4b89 mac80211: Disable hw crypto for GTKs on AP VLAN interfaces
When using AP VLAN interfaces, each VLAN interface should be in its own
broadcast domain. Hostapd achieves this by assigning different GTKs to
different AP VLAN interfaces.

However, mac80211 drivers are not aware of AP VLAN interfaces and as
such mac80211 sends the GTK to the driver in the context of the base AP
mode interface. This causes problems when multiple AP VLAN interfaces
are used since the driver will use the same key slot for the different
GTKs (there's no way for the driver to distinguish the different GTKs
from different AP VLAN interfaces). Thus, only the clients associated
to one AP VLAN interface (the one that was created last) can actually
use broadcast traffic.

Fix this by not programming any GTKs for AP VLAN interfaces into the hw
but fall back to using software crypto. The GTK for the underlying AP
interface is still sent to the driver.

That means, broadcast traffic to stations associated to an AP VLAN
interface is encrypted in software whereas broadcast traffic to
stations associated to the non-VLAN AP interface is encrypted in
hardware.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:51 -05:00