linux-next

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git synced 2025-01-16 21:35:07 +00:00

Author	SHA1	Message	Date
David S. Miller	27f097177d	Here's a big pile of changes for this round. We have * a lot of regulatory code changes to deal with the way newer Intel devices handle this * a change to drop packets while disconnecting from an AP instead of trying to wait for them * a new attempt at improving the tailroom accounting to not kick in too much for performance reasons * improvements in wireless link statistics * many other small improvements and small fixes that didn't seem necessary for 3.19 (e.g. in hwsim which is testing only code) -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUt7WEAAoJEDBSmw7B7bqrVBoP/2EViE62HMmXdqG1SZWz8q9o Iigq8STC/sT2WCx1pYm+tKuVW4LD2O3mCriGNP8A3RwzDZ6H7sKJYb1gV6QCPV6f 4+yT5VSAB3D3lHmp/bbyNsmKCBQ5uS4LVgDrokrkbGpacDu94PYS5Wv9t3x6PBVB 5Xjky6g6A/pSuxTIstSO9k5xkzNjaB1TxvVRz/gJrGcFQVkDFSlVbuTHUVxs8p+p k6mwY/2WYijZkswWZVQTJLQlF9vRI7PYkKs5m8gz4pjNU48oFJoyu4IP3Z1Xj/Sm zgT1C9rgp0Du74HYO2niGAvLWgKajAZuW5hIacDndUPjYQQBLgGs/bCJGSntM+x9 XoOdPixdFPT/58ijyYZlmHc8rxPOd2kHsVbwGplp8f195S4VO04D+ejfOaoAUFwX v/kMvO3XIFmEH1jjkDAC3OTcRMYVMuENyWl7pFzxHIzPeRiEpQUd9iSdM4yol0F2 ZyWvKud4U75Sh+aCiDIIBETtdfCRFe12hgKs4COYbI/UYkGPTPrNei/uisopdubT JC+7pZOYdSgoX12yVi6ds6DmKE/ZpIQyhIK4wTWgVoszbnfdb9Mw7mJEThwNRjeK JJPsbuty7u8HWjXzEqHLoTV3BFv1cgRSJc5Wt0zfME+LzD7iQpEpv+QBAguwwChD Osn55Z3FnKEmBdGcOIje =vaEW -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Here's a big pile of changes for this round. We have * a lot of regulatory code changes to deal with the way newer Intel devices handle this * a change to drop packets while disconnecting from an AP instead of trying to wait for them * a new attempt at improving the tailroom accounting to not kick in too much for performance reasons * improvements in wireless link statistics * many other small improvements and small fixes that didn't seem necessary for 3.19 (e.g. in hwsim which is testing only code) Conflicts: drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c Minor overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-15 19:16:56 -05:00
Eric Dumazet	5055c371bf	ipv4: per cpu uncached list RAW sockets with hdrinc suffer from contention on rt_uncached_lock spinlock. One solution is to use percpu lists, since most routes are destroyed by the cpu that created them. It is unclear why we even have to put these routes in uncached_list, as all outgoing packets should be freed when a device is dismantled. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: caacf05e5ad1 ("ipv4: Properly purge netdev references on uncached routes.") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-15 18:26:16 -05:00
David S. Miller	4e7a84b1a5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== netfilter updates for net-next The following patchset contains netfilter updates for net-next, just a bunch of cleanups and small enhancement to selectively flush conntracks in ctnetlink, more specifically the patches are: 1) Rise default number of buckets in conntrack from 16384 to 65536 in systems with >= 4GBytes, patch from Marcelo Leitner. 2) Small refactor to save one level on indentation in xt_osf, from Joe Perches. 3) Remove unnecessary sizeof(char) in nf_log, from Fabian Frederick. 4) Another small cleanup to remove redundant variable in nfnetlink, from Duan Jiong. 5) Fix compilation warning in nfnetlink_cthelper on parisc, from Chen Gang. 6) Fix wrong format in debugging for ctseqadj, from Gao feng. 7) Selective conntrack flushing through the mark for ctnetlink, patch from Kristian Evensen. 8) Remove nf_ct_conntrack_flush_report() exported symbol now that is not required anymore after the selective flushing patch, again from Kristian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-15 01:50:25 -05:00
Thomas Graf	1dd144cf5b	openvswitch: Support VXLAN Group Policy extension Introduces support for the group policy extension to the VXLAN virtual port. The extension is disabled by default and only enabled if the user has provided the respective configuration. ovs-vsctl add-port br0 vxlan0 -- \ set Interface vxlan0 type=vxlan options:exts=gbp The configuration interface to enable the extension is based on a new attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION which can carry additional extensions as needed in the future. The group policy metadata is stored as binary blob (struct ovs_vxlan_opts) internally just like Geneve options but transported as nested Netlink attributes to user space. Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced. The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-15 01:11:41 -05:00
Thomas Graf	ac5132d1a0	vxlan: Only bind to sockets with compatible flags enabled A VXLAN net_device looking for an appropriate socket may only consider a socket which has a matching set of flags/extensions enabled. If incompatible flags are enabled, return a conflict to have the caller create a distinct socket with distinct port. The OVS VXLAN port is kept unaware of extensions at this point. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-15 01:11:41 -05:00
Thomas Graf	3511494ce2	vxlan: Group Policy extension Implements supports for the Group Policy VXLAN extension [0] to provide a lightweight and simple security label mechanism across network peers based on VXLAN. The security context and associated metadata is mapped to/from skb->mark. This allows further mapping to a SELinux context using SECMARK, to implement ACLs directly with nftables, iptables, OVS, tc, etc. The group membership is defined by the lower 16 bits of skb->mark, the upper 16 bits are used for flags. SELinux allows to manage label to secure local resources. However, distributed applications require ACLs to implemented across hosts. This is typically achieved by matching on L2-L4 fields to identify the original sending host and process on the receiver. On top of that, netlabel and specifically CIPSO [1] allow to map security contexts to universal labels. However, netlabel and CIPSO are relatively complex. This patch provides a lightweight alternative for overlay network environments with a trusted underlay. No additional control protocol is required. Host 1: Host 2: Group A Group B Group B Group A +-----+ +-------------+ +-------+ +-----+ \| lxc \| \| SELinux CTX \| \| httpd \| \| VM \| +--+--+ +--+----------+ +---+---+ +--+--+ \---+---/ \----+---/ \| \| +---+---+ +---+---+ \| vxlan \| \| vxlan \| +---+---+ +---+---+ +------------------------------+ Backwards compatibility: A VXLAN-GBP socket can receive standard VXLAN frames and will assign the default group 0x0000 to such frames. A Linux VXLAN socket will drop VXLAN-GBP frames. The extension is therefore disabled by default and needs to be specifically enabled: ip link add [...] type vxlan [...] gbp In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket must run on a separate port number. Examples: iptables: host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200 host2# iptables -I INPUT -m mark --mark 0x200 -j DROP OVS: # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL' # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop' [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy [1] http://lwn.net/Articles/204905/ Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-15 01:11:41 -05:00
Tom Herbert	dfd8645ea1	vxlan: Remote checksum offload Add support for remote checksum offload in VXLAN. This uses a reserved bit to indicate that RCO is being done, and uses the low order reserved eight bits of the VNI to hold the start and offset values in a compressed manner. Start is encoded in the low order seven bits of VNI. This is start >> 1 so that the checksum start offset is 0-254 using even values only. Checksum offset (transport checksum field) is indicated in the high order bit in the low order byte of the VNI. If the bit is set, the checksum field is for UDP (so offset = start + 6), else checksum field is for TCP (so offset = start + 16). Only TCP and UDP are supported in this implementation. Remote checksum offload for VXLAN is described in: https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4). With UDP checksums and Remote Checksum Offload IPv4 Client 11.84% CPU utilization Server 12.96% CPU utilization 9197 Mbps IPv6 Client 12.46% CPU utilization Server 14.48% CPU utilization 8963 Mbps With UDP checksums, no remote checksum offload IPv4 Client 15.67% CPU utilization Server 14.83% CPU utilization 9094 Mbps IPv6 Client 16.21% CPU utilization Server 14.32% CPU utilization 9058 Mbps No UDP checksums IPv4 Client 15.03% CPU utilization Server 23.09% CPU utilization 9089 Mbps IPv6 Client 16.18% CPU utilization Server 26.57% CPU utilization 8954 Mbps Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-14 15:20:04 -05:00
Arik Nemtsov	2c3e861c94	cfg80211: introduce sync regdom set API for self-managed A self-managed device will sometimes need to set its regdomain synchronously. Notably it should be set before usermode has a chance to query it. Expose a new API to accomplish this which requires the RTNL. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-14 09:43:44 +01:00
Jiri Pirko	df8a39defa	net: rename vlan_tx_* helpers since "tx" is misleading there The same macros are used for rx as well. So rename it. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-13 17:51:08 -05:00
Jiri Pirko	d8b9605d26	net: sched: fix skb->protocol use in case of accelerated vlan path tc code implicitly considers skb->protocol even in case of accelerated vlan paths and expects vlan protocol type here. However, on rx path, if the vlan header was already stripped, skb->protocol contains value of next header. Similar situation is on tx path. So for skbs that use skb->vlan_tci for tagging, use skb->vlan_proto instead. Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-13 17:51:08 -05:00
Tom Herbert	3bf3947526	vxlan: Improve support for header flags This patch cleans up the header flags of VXLAN in anticipation of defining some new ones: - Move header related definitions from vxlan.c to vxlan.h - Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag) - Move check for unknown flags to after we find vxlan_sock, this assumes that some flags may be processed based on tunnel configuration - Add a comment about why the stack treating unknown set flags as an error instead of ignoring them Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:05:01 -05:00
Johannes Berg	6de39808cf	nl80211: support per-TID station statistics The base for the current statistics is pretty mixed up, support exporting RX/TX statistics for MSDUs per TID. This (currently) covers received MSDUs, transmitted MSDUs and retries/failures thereof. Doing it per TID for MSDUs makes more sense than say only per AC because it's symmetric - we could export per-AC statistics for all frames (which AC we used for transmission can be determined also for management frames) but per TID is better and usually data frames are really the ones we care about. Also, on RX we can't determine the AC - but we do know the TID for any QoS MPDU we received. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:28:18 +01:00
Johannes Berg	8d791361a4	nl80211: clarify packet statistics descriptions The current statistics we keep aren't very clear, some are on MPDUs and some on MSDUs/MMPDUs. Clarify the descriptions based on the counters mac80211 keeps. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:28:16 +01:00
Johannes Berg	a76b1942a1	cfg80211: add nl80211 beacon-only statistics Add these two values: * BEACON_RX: number of beacons received from this peer * BEACON_SIGNAL_AVG: signal strength average for beacons only These can then be used for Android Lollipop's statistics request. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:28:13 +01:00
Johannes Berg	319090bf6c	cfg80211: remove enum station_info_flags This is really just duplicating the list of information that's already available in the nl80211 attribute, so remove the list. Two small changes are needed: * remove STATION_INFO_ASSOC_REQ_IES complete, but the length (assoc_req_ies_len) can be used instead * add NL80211_STA_INFO_RX_DROP_MISC which exists internally but not in nl80211 yet This gets rid of the duplicate maintenance of the two lists. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:28:10 +01:00
Johannes Berg	2b9a7e1bac	mac80211: allow drivers to provide most station statistics In many cases, drivers can filter things like beacons that will skew statistics reported by mac80211. To get correct statistics in these cases, call drivers to obtain statistics and let them override all values, filling values from mac80211 if the driver didn't provide them. Not all of them make sense for the driver to fill, so some are still always done by mac80211. Note that this doesn't currently allow a driver to say "I know this value is wrong, don't report it at all", or to sum it up with a mac80211 value (as could be useful for "dropped misc"), that can be added if it turns out to be needed. This also gets rid of the get_rssi() method as is can now be implemented using sta_statistics(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:28:06 +01:00
Johannes Berg	cf5ead822d	cfg80211: allow including station info in delete event When a station is removed, its statistics may be interesting to userspace, for example for further aggregation of statistics of all stations that ever connected to an AP. Introduce a new cfg80211_del_sta_sinfo() function (and make the cfg80211_del_sta() a static inline calling it) to allow passing a struct station_info along with this, and send the data in the nl80211 event message. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:28:00 +01:00
Johannes Berg	052536abfa	cfg80211: add scan time to survey data Add the time spent scanning to the survey data so it can be reported by drivers that collect such information. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:27:58 +01:00
Johannes Berg	11f78ac32b	cfg80211: allow survey data to return global data Not all devices are able to report survey data (particularly time spent for various operations) per channel. As all these statistics already exist in survey data, allow such devices to report them (if userspace requested it) Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:27:54 +01:00
Johannes Berg	4ed20bebf5	cfg80211: remove "channel" from survey names All of the survey data is (currently) per channel anyway, so having the word "channel" in the name does nothing. In the next patch I'll introduce global data to the survey, where the word "channel" is actually confusing. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-08 15:27:52 +01:00
Kristian Evensen	ae406bd057	netfilter: conntrack: Remove nf_ct_conntrack_flush_report The only user of nf_ct_conntrack_flush_report() was ctnetlink_del_conntrack(). After adding support for flushing connections with a given mark, this function is no longer called. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-01-08 12:16:58 +01:00
Ido Yariv	db12847ca8	mac80211: Re-fix accounting of the tailroom-needed counter When hw acceleration is enabled, the GENERATE_IV or PUT_IV_SPACE flags only require headroom space. Therefore, the tailroom-needed counter can safely be decremented for most drivers. The older incarnation of this patch (ca34e3b5) assumed that the above holds true for all drivers. As reported by Christopher Chavez and researched by Christian Lamparter and Larry Finger, this isn't a valid assumption for p54 and cw1200. Drivers that still require tailroom for ICV/MIC even when HW encryption is enabled can use IEEE80211_KEY_FLAG_RESERVE_TAILROOM to indicate it. Signed-off-by: Ido Yariv <idox.yariv@intel.com> Cc: Christopher Chavez <chrischavez@gmx.us> Cc: Christian Lamparter <chunkeey@googlemail.com> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Solomon Peachy <pizza@shaftnet.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-07 14:39:32 +01:00
Johannes Berg	3a4b0c948d	Merge branch 'mac80211' into mac80211-next Merge mac80211.git to get some changes that would otherwise cause conflicts with new changes coming here. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-07 14:39:16 +01:00
David S. Miller	44d84d7272	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-01-06 22:29:20 -05:00
David S. Miller	15ecf7a063	Here's just a single fix - a revert of a patch that broke the p54 and cw2100 drivers (arguably due to bad assumptions there.) Since this affects kernels since 3.17, I decided to revert for now and we'll revisit this optimisation properly for -next. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUq9EvAAoJEDBSmw7B7bqr7mMQAJRbdepVVqK5IFH1BH0NC7vi LkBE2lp8ZAz1Crg+OAQNUdlZUHtGyfYoXSfzezmrMG51i5xjHyOYQQikW7aJ2SQ0 XsjJJ5TcqKe83NwoakXUrMpE7KmCt/LnbjKNXDsZIvLlUkqa7ksXaS7btK195aXy WlVrmUE+BqT9a16VjFLZ6wRjI43+3bGxhtFL+g1eXw6nZ4a2o4EbIXdc9SN+/bT4 tAhWJfdAQqQc34jhesWGbMIvkXWhzy2R6Js+9gMIBNsmlAiYbFa4QZ/9tI3nBI/O yHSiDc7JnPNjkkC+3wTJxMl7mEd6fEKnAS1ryZ5L4XhPrQpV39iZuWSPvPGw6LLW kB6+wXkIyQdCSoyrQZxY75ibqOUKYYxhhkSYfMePXRKTYY6MlHYqiH8wPWFpPoqO iumLqx8/CtRW1q1t2EBAG6rZLRF8HqmfqtB+ptT0DWcAP8E81q8BImPoPFr+P9S2 XfuuSw97xKCcilOcYJ0uYSBe4XNNhy1dtC/zJ8cA9nV4WNkofALga5Z/t8ARhDsM wvP1D2uIX3U9My17bXq+Xn/fSSS7yhpLZjEHj/JNRvpDCWGf/tQl6A3ydMy//Oqe lRSKfmiAGysqhXnmK12+YhfO+4ioTz8dA88tHs1AO8qasfQwx45eRsUPemWeExiL 9Lntb0U6MhYvgiTdWqt6 =CnbJ -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2015-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Here's just a single fix - a revert of a patch that broke the p54 and cw2100 drivers (arguably due to bad assumptions there.) Since this affects kernels since 3.17, I decided to revert for now and we'll revisit this optimisation properly for -next. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-06 13:29:27 -05:00
Gautam Kumar Shukla	d75bb06b61	cfg80211: add extensible feature flag attribute With the wiphy::features flag being used up this patch adds a new field wiphy::ext_features. Considering extensibility this new field is declared as a byte array. This extensible flag is exposed to user-space by NL80211_ATTR_EXT_FEATURES. Cc: Avinash Patil <patila@marvell.com> Signed-off-by: Gautam (Gautam Kumar) Shukla <gautams@broadcom.com> Signed-off-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-06 12:10:24 +01:00
Daniel Borkmann	81164413ad	net: tcp: add per route congestion control This work adds the possibility to define a per route/destination congestion control algorithm. Generally, this opens up the possibility for a machine with different links to enforce specific congestion control algorithms with optimal strategies for each of them based on their network characteristics, even transparently for a single application listening on all links. For our specific use case, this additionally facilitates deployment of DCTCP, for example, applications can easily serve internal traffic/dsts in DCTCP and external one with CUBIC. Other scenarios would also allow for utilizing e.g. long living, low priority background flows for certain destinations/routes while still being able for normal traffic to utilize the default congestion control algorithm. We also thought about a per netns setting (where different defaults are possible), but given its actually a link specific property, we argue that a per route/destination setting is the most natural and flexible. The administrator can utilize this through ip-route(8) by appending "congctl [lock] <name>", where <name> denotes the name of a congestion control algorithm and the optional lock parameter allows to enforce the given algorithm so that applications in user space would not be allowed to overwrite that algorithm for that destination. The dst metric lookups are being done when a dst entry is already available in order to avoid a costly lookup and still before the algorithms are being initialized, thus overhead is very low when the feature is not being used. While the client side would need to drop the current reference on the module, on server side this can actually even be avoided as we just got a flat-copied socket clone. Joint work with Florian Westphal. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Daniel Borkmann	ea69763999	net: tcp: add RTAX_CC_ALGO fib handling This patch adds the minimum necessary for the RTAX_CC_ALGO congestion control metric to be set up and dumped back to user space. While the internal representation of RTAX_CC_ALGO is handled as a u32 key, we avoided to expose this implementation detail to user space, thus instead, we chose the netlink attribute that is being exchanged between user space to be the actual congestion control algorithm name, similarly as in the setsockopt(2) API in order to allow for maximum flexibility, even for 3rd party modules. It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as it should have been stored in RTAX_FEATURES instead, we first thought about reusing it for the congestion control key, but it brings more complications and/or confusion than worth it. Joint work with Florian Westphal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Daniel Borkmann	c5c6a8ab45	net: tcp: add key management to congestion control This patch adds necessary infrastructure to the congestion control framework for later per route congestion control support. For a per route congestion control possibility, our aim is to store a unique u32 key identifier into dst metrics, which can then be mapped into a tcp_congestion_ops struct. We argue that having a RTAX key entry is the most simple, generic and easy way to manage, and also keeps the memory footprint of dst entries lower on 64 bit than with storing a pointer directly, for example. Having a unique key id also allows for decoupling actual TCP congestion control module management from the FIB layer, i.e. we don't have to care about expensive module refcounting inside the FIB at this point. We first thought of using an IDR store for the realization, which takes over dynamic assignment of unused key space and also performs the key to pointer mapping in RCU. While doing so, we stumbled upon the issue that due to the nature of dynamic key distribution, it just so happens, arguably in very rare occasions, that excessive module loads and unloads can lead to a possible reuse of previously used key space. Thus, previously stale keys in the dst metric are now being reassigned to a different congestion control algorithm, which might lead to unexpected behaviour. One way to resolve this would have been to walk FIBs on the actually rare occasion of a module unload and reset the metric keys for each FIB in each netns, but that's just very costly. Therefore, we argue a better solution is to reuse the unique congestion control algorithm name member and map that into u32 key space through jhash. For that, we split the flags attribute (as it currently uses 2 bits only anyway) into two u32 attributes, flags and key, so that we can keep the cacheline boundary of 2 cachelines on x86_64 and cache the precalculated key at registration time for the fast path. On average we might expect 2 - 4 modules being loaded worst case perhaps 15, so a key collision possibility is extremely low, and guaranteed collision-free on LE/BE for all in-tree modules. Overall this results in much simpler code, and all without the overhead of an IDR. Due to the deterministic nature, modules can now be unloaded, the congestion control algorithm for a specific but unloaded key will fall back to the default one, and on module reload time it will switch back to the expected algorithm transparently. Joint work with Florian Westphal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Florian Westphal	e715b6d3a5	net: fib6: convert cfg metric to u32 outside of table write lock Do the nla validation earlier, outside the write lock. This is needed by followup patch which needs to be able to call request_module (which can sleep) if needed. Joint work with Daniel Borkmann. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:55:24 -05:00
Tom Herbert	ad6f939ab1	ip: Add offset parameter to ip_cmsg_recv Add ip_cmsg_recv_offset function which takes an offset argument that indicates the starting offset in skb where data is being received from. This will be useful in the case of UDP and provided checksum to user space. ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of zero. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:44:46 -05:00
Tom Herbert	5961de9f19	ip: Add offset parameter to ip_cmsg_recv Add ip_cmsg_recv_offset function which takes an offset argument that indicates the starting offset in skb where data is being received from. This will be useful in the case of UDP and provided checksum to user space. ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of zero. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:44:46 -05:00
Tom Herbert	c44d13d6f3	ip: IP cmsg cleanup Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that they can be referenced in other source files. Restructure ip_cmsg_recv to not go through flags using shift, check for flags by 'and'. This eliminates both the shift and a conditional per flag check. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:44:46 -05:00
Tom Herbert	224d019c4f	ip: Move checksum convert defines to inet Move convert_csum from udp_sock to inet_sock. This allows the possibility that we can use convert checksum for different types of sockets and also allows convert checksum to be enabled from inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:44:46 -05:00
Thomas Graf	149118d893	netlink: Warn on unordered or illegal nla_nest_cancel() or nlmsg_cancel() Calling nla_nest_cancel() in a different order as the nesting was built up can lead to negative offsets being calculated which results in skb_trim() being called with an underflowed unsigned int. Warn if mark < skb->data as it's definitely a bug. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-05 22:38:22 -05:00
Johannes Berg	1e359a5de8	Revert "mac80211: Fix accounting of the tailroom-needed counter" This reverts commit ca34e3b5c808385b175650605faa29e71e91991b. It turns out that the p54 and cw2100 drivers assume that there's tailroom even when they don't say they really need it. However, there's currently no way for them to explicitly say they do need it, so for now revert this. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331. Cc: stable@vger.kernel.org Fixes: ca34e3b5c808 ("mac80211: Fix accounting of the tailroom-needed counter") Reported-by: Christopher Chavez <chrischavez@gmx.us> Bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Debugged-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-05 10:33:46 +01:00
Jesse Gross	df5dba8e52	geneve: Remove socket hash table. The hash table for open Geneve ports is used only on creation and deletion time. It is not performance critical and is not likely to grow to a large number of items. Therefore, this can be changed to use a simple linked list. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-04 22:21:33 -05:00
Jesse Gross	829a3ada9c	geneve: Simplify locking. The existing Geneve locking scheme was pulled over directly from VXLAN. However, VXLAN has a number of built in mechanisms which make the locking more complex and are unlikely to be necessary with Geneve. This simplifies the locking to use a basic scheme of a mutex when doing updates plus RCU on receive. In addition to making the code easier to read, this also avoids the possibility of a race when creating or destroying sockets since UDP sockets and the list of Geneve sockets are protected by different locks. After this change, the entire operation is atomic. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-04 22:21:33 -05:00
Jesse Gross	61f3cade76	geneve: Remove workqueue. The work queue is used only to free the UDP socket upon destruction. This is not necessary with Geneve and generally makes the code more difficult to reason about. It also introduces nondeterministic behavior such as when a socket is rapidly deleted and recreated, which could fail as the the deletion happens asynchronously. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-04 22:21:33 -05:00
David S. Miller	6c032edc8a	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg say: ==================== pull request: bluetooth-next 2014-12-31 Here's the first batch of bluetooth patches for 3.20. - Cleanups & fixes to ieee802154 drivers - Fix synchronization of mgmt commands with respective HCI commands - Add self-tests for LE pairing crypto functionality - Remove 'BlueFritz!' specific handling from core using a new quirk flag - Public address configuration support for ath3012 - Refactor debugfs support into a dedicated file - Initial support for LE Data Length Extension feature from Bluetooth 4.2 Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-02 15:58:21 -05:00
Alexander Duyck	345e9b5426	fib_trie: Push rcu_read_lock/unlock to callers This change is to start cleaning up some of the rcu_read_lock/unlock handling. I realized while reviewing the code there are several spots that I don't believe are being handled correctly or are masking warnings by locally calling rcu_read_lock/unlock instead of calling them at the correct level. A common example is a call to fib_get_table followed by fib_table_lookup. The rcu_read_lock/unlock ought to wrap both but there are several spots where they were not wrapped. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-31 18:25:54 -05:00
Johannes Berg	023e2cfa36	netlink/genetlink: pass network namespace to bind/unbind Netlink families can exist in multiple namespaces, and for the most part multicast subscriptions are per network namespace. Thus it only makes sense to have bind/unbind notifications per network namespace. To achieve this, pass the network namespace of a given client socket to the bind/unbind functions. Also do this in generic netlink, and there also make sure that any bind for multicast groups that only exist in init_net is rejected. This isn't really a problem if it is accepted since a client in a different namespace will never receive any notifications from such a group, but it can confuse the family if not rejected (it's also possible to silently (without telling the family) accept it, but it would also have to be ignored on unbind so families that take any kind of action on bind/unbind won't do unnecessary work for invalid clients like that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-27 03:07:50 -05:00
Johannes Berg	c380d9a7af	genetlink: pass multicast bind/unbind to families In order to make the newly fixed multicast bind/unbind functionality in generic netlink, pass them down to the appropriate family. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-27 02:20:23 -05:00
Johannes Berg	f8403a2e47	genetlink: pass only network namespace to genl_has_listeners() There's no point to force the caller to know about the internal genl_sock to use inside struct net, just have them pass the network namespace. This doesn't really change code generation since it's an inline, but makes the caller less magic - there's never any reason to pass another socket. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-27 02:20:23 -05:00
Jesse Gross	5f35227ea3	net: Generalize ndo_gso_check to ndo_features_check GSO isn't the only offload feature with restrictions that potentially can't be expressed with the current features mechanism. Checksum is another although it's a general issue that could in theory apply to anything. Even if it may be possible to implement these restrictions in other ways, it can result in duplicate code or inefficient per-packet behavior. This generalizes ndo_gso_check so that drivers can remove any features that don't make sense for a given packet, similar to netif_skb_features(). It also converts existing driver restrictions to the new format, completing the work that was done to support tunnel protocols since the issues apply to checksums as well. By actually removing features from the set that are used to do offloading, it solves another problem with the existing interface. In these cases, GSO would run with the original set of features and not do anything because it appears that segmentation is not required. CC: Tom Herbert <therbert@google.com> CC: Joe Stringer <joestringer@nicira.com> CC: Eric Dumazet <edumazet@google.com> CC: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Tom Herbert <therbert@google.com> Fixes: 04ffcb255f22 ("net: Add ndo_gso_check") Tested-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-26 17:20:56 -05:00
Nicolas Dichtel	ef8f342b43	neigh: remove next ptr from struct neigh_table After commit d7480fd3b173 ("neigh: remove dynamic neigh table registration support"), this field is not used anymore. CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-26 17:07:08 -05:00
Marcel Holtmann	711ffa78f4	Bluetooth: Introduce HCI_QUIRK_BROKEN_LOCAL_COMMANDS constant Some controllers advertise support for Bluetooth 1.2 specification, but they do not support the HCI Read Local Supported Commands command. If that is the case, then the driver can quirk the behavior and force the core to skip this command. This will allow removing vendor specific checks out of the core. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-12-26 20:16:10 +02:00
Marcel Holtmann	72e4a6bd02	Bluetooth: Remove duplicate constant for RFCOMM PSM The RFCOMM_PSM constant is actually a duplicate. So remove it and use the L2CAP_PSM_RFCOMM constant instead. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-12-20 19:55:04 +02:00
Marcel Holtmann	23b9ceb74f	Bluetooth: Create debugfs directory for each connection handle For every internal representation of a Bluetooth connection which is identified by hci_conn, create a debugfs directory with the handle number as directory name. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-12-20 19:54:24 +02:00
Marcel Holtmann	a8e1bfaa55	Bluetooth: Store default and maximum LE data length settings When the controller supports the LE Data Length Extension feature, the default and maximum data length are read and now stored. For backwards compatibility all values are initialized to the data length values from Bluetooth 4.1 and earlier specifications. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-12-20 17:52:21 +02:00

1 2 3 4 5 ...

8035 Commits