linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-17 18:36:00 +00:00

Author	SHA1	Message	Date
Leon Romanovsky	04f00ab227	net/core: move gro function declarations to separate header Fir the following compilation warnings: 1031 \| INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff skb) net/ipv6/ip6_offload.c:182:41: warning: no previous prototype for ‘ipv6_gro_receive’ [-Wmissing-prototypes] 182 \| INDIRECT_CALLABLE_SCOPE struct sk_buff ipv6_gro_receive(struct list_head head, \| ^~~~~~~~~~~~~~~~ net/ipv6/ip6_offload.c:320:29: warning: no previous prototype for ‘ipv6_gro_complete’ [-Wmissing-prototypes] 320 \| INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff skb, int nhoff) \| ^~~~~~~~~~~~~~~~~ net/ipv6/ip6_offload.c:182:41: warning: no previous prototype for ‘ipv6_gro_receive’ [-Wmissing-prototypes] 182 \| INDIRECT_CALLABLE_SCOPE struct sk_buff ipv6_gro_receive(struct list_head head, \| ^~~~~~~~~~~~~~~~ net/ipv6/ip6_offload.c:320:29: warning: no previous prototype for ‘ipv6_gro_complete’ [-Wmissing-prototypes] 320 \| INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff) Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:57 -08:00
Leon Romanovsky	f9a4719cc1	ipv6: move udp declarations to net/udp.h Fix the following compilation warning: net/ipv6/udp.c:1031:30: warning: no previous prototype for 'udp_v6_early_demux' [-Wmissing-prototypes] 1031 \| INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff skb) \| ^~~~~~~~~~~~~~~~~~ net/ipv6/udp.c:1072:29: warning: no previous prototype for 'udpv6_rcv' [-Wmissing-prototypes] 1072 \| INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff skb) \| ^~~~~~~~~ Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:57 -08:00
Leon Romanovsky	1faba27f11	ipv6: silence compilation warning for non-IPV6 builds The W=1 compilation of allmodconfig generates the following warning: net/ipv6/icmp.c:448:6: warning: no previous prototype for 'icmp6_send' [-Wmissing-prototypes] 448 \| void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, \| ^~~~~~~~~~ Fix it by providing function declaration for builds with ipv6 as a module. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:56 -08:00
Colin Ian King	8f8a42ff00	net: hns3: remove redundant null check of an array The null check of filp->f_path.dentry->d_iname is redundant because it is an array of DNAME_INLINE_LEN chars and cannot be a null. Fix this by removing the null check. Addresses-Coverity: ("Array compared against 0") Fixes: 04987ca1b9b6 ("net: hns3: add debugfs support for tm nodes, priority and qset info") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20210203131040.21656-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:47 -08:00
wengjianfeng	d6adfd37e7	nfc: pn533: Fix typo issue change 'piority' to 'priority' change 'succesfult' to 'successful' Signed-off-by: wengjianfeng <wengjianfeng@yulong.com> Link: https://lore.kernel.org/r/20210203093842.11180-1-samirweng1979@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:40 -08:00
Jakub Kicinski	ccdafd2263	Merge branch 'net-enable-udp-v6-sockets-receiving-v4-packets-with-udp' Xin Long says: ==================== net: enable udp v6 sockets receiving v4 packets with UDP Currently, udp v6 socket can not process v4 packets with UDP GRO, as udp_encap_needed_key is not increased when udp_tunnel_encap_enable() is called for v6 socket. This patchset is to increase it and remove the unnecessary code in bareudp in Patch 1/2, and improve rxrpc encap_enable by calling udp_tunnel_encap_enable(). ==================== Link: https://lore.kernel.org/r/cover.1612342376.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:18 -08:00
Xin Long	5d30c626b6	rxrpc: call udp_tunnel_encap_enable in rxrpc_open_socket When doing encap_enable/increasing encap_needed_key, up->encap_enabled is not set in rxrpc_open_socket(), and it will cause encap_needed_key not being decreased in udpv6_destroy_sock(). This patch is to improve it by just calling udp_tunnel_encap_enable() where it increases both UDP and UDPv6 encap_needed_key and sets up->encap_enabled. v4->v5: - add the missing '#include <net/udp_tunnel.h>', as David Howells noticed. Acked-and-tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:15 -08:00
Xin Long	a4a600dd30	udp: call udp_encap_enable for v6 sockets when enabling encap When enabling encap for a ipv6 socket without udp_encap_needed_key increased, UDP GRO won't work for v4 mapped v6 address packets as sk will be NULL in udp4_gro_receive(). This patch is to enable it by increasing udp_encap_needed_key for v6 sockets in udp_tunnel_encap_enable(), and correspondingly decrease udp_encap_needed_key in udpv6_destroy_sock(). v1->v2: - add udp_encap_disable() and export it. v2->v3: - add the change for rxrpc and bareudp into one patch, as Alex suggested. v3->v4: - move rxrpc part to another patch. Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:14 -08:00
Jian Yang	c9dca822c7	net-loopback: set lo dev initial state to UP Traditionally loopback devices come up with initial state as DOWN for any new network-namespace. This would mean that anyone needing this device would have to bring this UP by issuing something like 'ip link set lo up'. This can be avoided if the initial state is set as UP. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Jian Yang <jianyang@google.com> Link: https://lore.kernel.org/r/20210201233445.2044327-1-jianyang.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:37:08 -08:00
Jakub Kicinski	e64ffa8875	Merge branch 'net-consolidate-page_is_pfmemalloc-usage' Alexander Lobakin says: ==================== net: consolidate page_is_pfmemalloc() usage page_is_pfmemalloc() is used mostly by networking drivers to test if a page can be considered for reusing/recycling. It doesn't write anything to the struct page itself, so its sole argument can be constified, as well as the first argument of skb_propagate_pfmemalloc(). In Page Pool core code, it can be simply inlined instead. Most of the callers from NIC drivers were just doppelgangers of the same condition tests. Derive them into a new common function do deduplicate the code. Resend of v3 [2]: - it missed Patchwork and Netdev archives, probably due to server-side issues. Since v2 [1]: - use more intuitive name for the new inline function since there's nothing "reserved" in remote pages (Jakub Kicinski, John Hubbard); - fold likely() inside the helper itself to make driver code a bit fancier (Jakub Kicinski); - split function introduction and using into two separate commits; - collect some more tags (Jesse Brandeburg, David Rientjes). Since v1 [0]: - new: reduce code duplication by introducing a new common function to test if a page can be reused/recycled (David Rientjes); - collect autographs for Page Pool bits (Jesper Dangaard Brouer, Ilias Apalodimas). [0] https://lore.kernel.org/netdev/20210125164612.243838-1-alobakin@pm.me [1] https://lore.kernel.org/netdev/20210127201031.98544-1-alobakin@pm.me [2] https://lore.kernel.org/lkml/20210131120844.7529-1-alobakin@pm.me ==================== Link: https://lore.kernel.org/r/20210202133030.5760-1-alobakin@pm.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:20:18 -08:00
Alexander Lobakin	05656132a8	net: page_pool: simplify page recycling condition tests pool_page_reusable() is a leftover from pre-NUMA-aware times. For now, this function is just a redundant wrapper over page_is_pfmemalloc(), so inline it into its sole call site. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:20:14 -08:00
Alexander Lobakin	a79afa78e6	net: use the new dev_page_is_reusable() instead of private versions Now we can remove a bunch of identical functions from the drivers and make them use common dev_page_is_reusable(). All {,un}likely() checks are omitted since it's already present in this helper. Also update some comments near the call sites. Suggested-by: David Rientjes <rientjes@google.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:20:14 -08:00
Alexander Lobakin	bc38f30f8d	net: introduce common dev_page_is_reusable() A bunch of drivers test the page before reusing/recycling for two common conditions: - if a page was allocated under memory pressure (pfmemalloc page); - if a page was allocated at a distant memory node (to exclude slowdowns). Introduce a new common inline for doing this, with likely() already folded inside to make driver code a bit simpler. Suggested-by: David Rientjes <rientjes@google.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:20:14 -08:00
Alexander Lobakin	48f971c9c8	skbuff: constify skb_propagate_pfmemalloc() "page" argument The function doesn't write anything to the page struct itself, so this argument can be const. Misc: align second argument to the brace while at it. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:20:13 -08:00
Alexander Lobakin	1d7bab6a94	mm: constify page_is_pfmemalloc() argument The function only tests for page->index, so its argument should be const. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:20:13 -08:00
Brian Vazquez	9c97921a51	net: fix building errors on powerpc when CONFIG_RETPOLINE is not set This commit fixes the errores reported when building for powerpc: ERROR: modpost: "ip6_dst_check" [vmlinux] is a static EXPORT_SYMBOL ERROR: modpost: "ipv4_dst_check" [vmlinux] is a static EXPORT_SYMBOL ERROR: modpost: "ipv4_mtu" [vmlinux] is a static EXPORT_SYMBOL ERROR: modpost: "ip6_mtu" [vmlinux] is a static EXPORT_SYMBOL Fixes: f67fbeaebdc0 ("net: use indirect call helpers for dst_mtu") Fixes: bbd807dfbf20 ("net: indirect call helpers for ipv4/ipv6 dst_check functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Brian Vazquez <brianvv@google.com> Link: https://lore.kernel.org/r/20210204181839.558951-2-brianvv@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:06:27 -08:00
Brian Vazquez	0053859496	net: add EXPORT_INDIRECT_CALLABLE wrapper When a static function is annotated with INDIRECT_CALLABLE_SCOPE and CONFIG_RETPOLINE is set, the static keyword is removed. Sometimes the function needs to be exported but EXPORT_SYMBOL can't be used because if CONFIG_RETPOLINE is not set, we will attempt to export a static symbol. This patch introduces a new indirect call wrapper: EXPORT_INDIRECT_CALLABLE. This basically does EXPORT_SYMBOL when CONFIG_RETPOLINE is set, but does nothing when it's not. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Brian Vazquez <brianvv@google.com> Link: https://lore.kernel.org/r/20210204181839.558951-1-brianvv@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:06:26 -08:00
Marcelo Ricardo Leitner	7e3ce05e7f	netlink: add tracepoint at NL_SET_ERR_MSG Often userspace won't request the extack information, or they don't log it because of log level or so, and even when they do, sometimes it's not enough to know exactly what caused the error. Netlink extack is the standard way of reporting erros with descriptive error messages. With a trace point on it, we then can know exactly where the error happened, regardless of userspace app. Also, we can even see if the err msg was overwritten. The wrapper do_trace_netlink_extack() is because trace points shouldn't be called from .h files, as trace points are not that small, and the function call to do_trace_netlink_extack() on the macros is not protected by tracepoint_enabled() because the macros are called from modules, and this would require exporting some trace structs. As this is error path, it's better to export just the wrapper instead. v2: removed leftover tracepoint declaration Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/4546b63e67b2989789d146498b13cc09e1fdc543.1612403190.git.marcelo.leitner@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:05:59 -08:00
Jiapeng Chong	e93fac3b51	drivers: net: xen-netfront: Simplify the calculation of variables Fix the following coccicheck warnings: ./drivers/net/xen-netfront.c:1816:52-54: WARNING !A \|\| A && B is equivalent to !A \|\| B. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/1612261069-13315-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 10:55:24 -08:00
Jakub Kicinski	493007c1fa	Merge branch 'gtp' Jonas Bonn says: ==================== There's ongoing work in this driver to provide support for IPv6, GRO, GSO, and "collect metadata" mode operation. In order to facilitate this work going forward, this short series accumulates already ACK:ed patches that are ready for the next merge window. All of these patches should be uncontroversial at this point, including the first one in the series that reverts a recently added change to introduce "collect metadata" mode. As that patch produces 'broken' packets when common GTP headers are in place, it seems better to revert it and rethink things a bit before inclusion. ==================== Link: https://lore.kernel.org/r/20210203070805.281321-1-jonas@norrbonn.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:30:00 -08:00
Jonas Bonn	9716178a3a	gtp: update rx_length_errors for abnormally short packets Based on work by Pravin Shelar. Update appropriate stats when packet transmission isn't possible. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Acked-by: Harald Welte <laforge@gnumonks.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:29:58 -08:00
Jonas Bonn	29f53b5c00	gtp: set device type Set the devtype to 'gtp' when setting up the link. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Acked-by: Harald Welte <laforge@gnumonks.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:29:58 -08:00
Jonas Bonn	70d1324629	gtp: drop unnecessary call to skb_dst_drop The call to skb_dst_drop() is already done as part of udp_tunnel_xmit(). Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Acked-by: Harald Welte <laforge@gnumonks.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:29:58 -08:00
Jonas Bonn	a9c0df76d0	gtp: really check namespaces before xmit Blindly assuming that packet transmission crosses namespaces results in skb marks being lost in the single namespace case. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Acked-by: Harald Welte <laforge@gnumonks.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:29:58 -08:00
Jonas Bonn	e1b2914e64	gtp: include role in link info Querying link info for the GTP interface doesn't reveal in which "role" the device is set to operate. Include this information in the info query result. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Acked-by: Harald Welte <laforge@gnumonks.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:29:58 -08:00
Jonas Bonn	e21eb3a065	gtp: set initial MTU The GTP link is brought up with a default MTU of zero. This can lead to some rather unexpected behaviour for users who are more accustomed to interfaces coming online with reasonable defaults. This patch sets an initial MTU for the GTP link of 1500 less worst-case tunnel overhead. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Acked-by: Harald Welte <laforge@gnumonks.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:29:57 -08:00
Jonas Bonn	49ecc587dc	Revert "GTP: add support for flow based tunneling API" This reverts commit 9ab7e76aefc97a9aa664accb59d6e8dc5e52514a. This patch was committed without maintainer approval and despite a number of unaddressed concerns from review. There are several issues that impede the acceptance of this patch and that make a reversion of this particular instance of these changes the best way forward: i) the patch contains several logically separate changes that would be better served as smaller patches (for review purposes) ii) functionality like the handling of end markers has been introduced without further explanation iii) symmetry between the handling of GTPv0 and GTPv1 has been unnecessarily broken iv) the patchset produces 'broken' packets when extension headers are included v) there are no available userspace tools to allow for testing this functionality vi) there is an unaddressed Coverity report against the patch concering memory leakage vii) most importantly, the patch contains a large amount of superfluous churn that impedes other ongoing work with this driver This patch will be reworked into a series that aligns with other ongoing work and facilitates review. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Acked-by: Harald Welte <laforge@gnumonks.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:29:57 -08:00
Hariharan Ananthakrishnan	3dd344ea84	net: tracepoint: exposing sk_family in all tcp:tracepoints Similar to sock:inet_sock_set_state tracepoint, expose sk_family to distinguish AF_INET and AF_INET6 families. The following tcp tracepoints are updated: tcp:tcp_destroy_sock tcp:tcp_rcv_space_adjust tcp:tcp_retransmit_skb tcp:tcp_send_reset tcp:tcp_receive_reset tcp:tcp_retransmit_synack tcp:tcp_probe Signed-off-by: Hariharan Ananthakrishnan <hari@netflix.com> Signed-off-by: Brendan Gregg <bgregg@netflix.com> Link: https://lore.kernel.org/r/20210129001210.344438-1-hari@netflix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 09:25:36 -08:00
Wei Wang	f5a5589c72	tcp: use a smaller percpu_counter batch size for sk_alloc Currently, a percpu_counter with the default batch size (2nr_cpus) is used to record the total # of active sockets per protocol. This means sk_sockets_allocated_read_positive() could be off by +/-2(nr_cpus^2). This under/over-estimation could lead to wrong memory suppression conditions in __sk_raise_mem_allocated(). Fix this by using a more reasonable fixed batch size of 16. See related commit cf86a086a180 ("net/dst: use a smaller percpu_counter batch for dst entries accounting") that addresses a similar issue. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20210202193408.1171634-1-weiwan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:43:17 -08:00
Jakub Kicinski	6fd5eeee1f	Merge branch 'support-setting-lanes-via-ethtool' Danielle Ratson says: ==================== Support setting lanes via ethtool Some speeds can be achieved with different number of lanes. For example, 100Gbps can be achieved using two lanes of 50Gbps or four lanes of 25Gbps. This patchset adds a new selector that allows ethtool to advertise link modes according to their number of lanes and also force a specific number of lanes when autonegotiation is off. Advertising all link modes with a speed of 100Gbps that use two lanes: $ ethtool -s swp1 speed 100000 lanes 2 autoneg on Forcing a speed of 100Gbps using four lanes: $ ethtool -s swp1 speed 100000 lanes 4 autoneg off ==================== Link: https://lore.kernel.org/r/20210202180612.325099-1-danieller@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:31 -08:00
Danielle Ratson	f72e2f48c7	net: selftests: Add lanes setting test Test that setting lanes parameter is working. Set max speed and max lanes in the list of advertised link modes, and then try to set max speed with the lanes below max lanes if exists in the list. And then, test that setting number of lanes larger than max lanes fails. Do the above for both autoneg on and off. $ ./ethtool_lanes.sh TEST: 4 lanes is autonegotiated [ OK ] TEST: Lanes number larger than max width is not set [ OK ] TEST: Autoneg off, 4 lanes detected during force mode [ OK ] TEST: Lanes number larger than max width is not set [ OK ] Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	25a96f057a	mlxsw: ethtool: Pass link mode in use to ethtool Currently, when user space queries the link's parameters, as speed and duplex, each parameter is passed from the driver to ethtool. Instead, pass the link mode bit in use. In Spectrum-1, simply pass the bit that is set to '1' from PTYS register. In Spectrum-2, pass the first link mode bit in the mask of the used link mode. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	763ece86f0	mlxsw: ethtool: Add support for setting lanes when autoneg is off Currently, when auto negotiation is set to off, the user can force a specific speed or both speed and duplex. The user cannot influence the number of lanes that will be forced. Add support for setting speed along with lanes so one would be able to choose how many lanes will be forced. When lanes parameter is passed from user space, choose the link mode that its actual width equals to it. Otherwise, the default link mode will be the one that supports the width of the port. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	5fc4053df3	mlxsw: ethtool: Remove max lanes filtering Currently, when a speed can be supported by different number of lanes, the supported link modes bitmask contains only link modes with a single number of lanes. This was done in order to prevent auto negotiation on number of lanes after 50G-1-lane and 100G-2-lanes link modes were introduced. For example, if a port's max width is 4, only link modes with 4 lanes will be presented as supported by that port, so 100G is always achieved by 4 lanes of 25G. After the previous patches that allow selection of the number of lanes, auto negotiation on number of lanes becomes practical. Remove that filtering of the maximum number of lanes supported link modes, so indeed all the supported and advertised link modes will be shown. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	7dc33f0914	ethtool: Expose the number of lanes in use Currently, ethtool does not expose how many lanes are used when the link is up. After adding a possibility to advertise or force a specific number of lanes, the lanes in use value can be either the maximum width of the port or below. Extend ethtool to expose the number of lanes currently in use for drivers that support it. For example: $ ethtool -s swp1 speed 100000 lanes 4 $ ethtool -s swp2 speed 100000 lanes 4 $ ip link set swp1 up $ ip link set swp2 up $ ethtool swp1 Settings for swp1: Supported ports: [ FIBRE Backplane ] Supported link modes: 1000baseT/Full 10000baseT/Full 1000baseKX/Full 10000baseKR/Full 10000baseR_FEC 40000baseKR4/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 25000baseCR/Full 25000baseKR/Full 25000baseSR/Full 50000baseCR2/Full 50000baseKR2/Full 100000baseKR4/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full 50000baseSR2/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full 10000baseER/Full 50000baseKR/Full 50000baseSR/Full 50000baseCR/Full 50000baseLR_ER_FR/Full 50000baseDR/Full 100000baseKR2/Full 100000baseSR2/Full 100000baseCR2/Full 100000baseLR2_ER2_FR2/Full 100000baseDR2/Full 200000baseKR4/Full 200000baseSR4/Full 200000baseLR4_ER4_FR4/Full 200000baseDR4/Full 200000baseCR4/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 1000baseT/Full 10000baseT/Full 1000baseKX/Full 1000baseKX/Full 10000baseKR/Full 10000baseR_FEC 40000baseKR4/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 25000baseCR/Full 25000baseKR/Full 25000baseSR/Full 50000baseCR2/Full 50000baseKR2/Full 100000baseKR4/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full 50000baseSR2/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full 10000baseER/Full 200000baseKR4/Full 200000baseSR4/Full 200000baseLR4_ER4_FR4/Full 200000baseDR4/Full 200000baseCR4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Advertised link modes: 100000baseKR4/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: 100000Mb/s Lanes: 4 Duplex: Full Auto-negotiation: on Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	c8907043c6	ethtool: Get link mode in use instead of speed and duplex parameters Currently, when user space queries the link's parameters, as speed and duplex, each parameter is passed from the driver to ethtool. Instead, get the link mode bit in use, and derive each of the parameters from it in ethtool. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	012ce4dd31	ethtool: Extend link modes settings uAPI with lanes Currently, when auto negotiation is on, the user can advertise all the linkmodes which correspond to a specific speed, but does not have a similar selector for the number of lanes. This is significant when a specific speed can be achieved using different number of lanes. For example, 2x50 or 4x25. Add 'ETHTOOL_A_LINKMODES_LANES' attribute and expand 'struct ethtool_link_settings' with lanes field in order to implement a new lanes-selector that will enable the user to advertise a specific number of lanes as well. When auto negotiation is off, lanes parameter can be forced only if the driver supports it. Add a capability bit in 'struct ethtool_ops' that allows ethtool know if the driver can handle the lanes parameter when auto negotiation is off, so if it does not, an error message will be returned when trying to set lanes. Example: $ ethtool -s swp1 lanes 4 $ ethtool swp1 Settings for swp1: Supported ports: [ FIBRE ] Supported link modes: 1000baseKX/Full 10000baseKR/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 25000baseCR/Full 25000baseSR/Full 50000baseCR2/Full 100000baseSR4/Full 100000baseCR4/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 100000baseSR4/Full 100000baseCR4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: Unknown! Duplex: Unknown! (255) Auto-negotiation: on Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Link detected: no Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:28 -08:00
Danielle Ratson	189e7a8d94	ethtool: Validate master slave configuration before rtnl_lock() Create a new function for input validations to be called before rtnl_lock() and move the master slave validation to that function. This would be a cleanup for next patch that would add another validation to the new function. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:28 -08:00
Vladimir Oltean	99b8202b17	net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored The bridge emits VLAN filtering events and quite a few others via switchdev with orig_dev = br->dev. After the blamed commit, these events started getting ignored. The point of the patch was to not offload switchdev objects for ports that didn't go through dsa_port_bridge_join, because the configuration is unsupported: - ports that offload a bonding/team interface go through dsa_port_bridge_join when that bonding/team interface is later bridged with another switch port or LAG - ports that don't offload LAG don't get notified of the bridge that is on top of that LAG. Sadly, a check is missing, which is that the orig_dev is equal to the bridge device. This check is compatible with the original intention, because ports that don't offload bridging because they use a software LAG don't have dp->bridge_dev set. On a semi-related note, we should not offload switchdev objects or populate dp->bridge_dev if the driver doesn't implement .port_bridge_join either. However there is no regression associated with that, so it can be done separately. Fixes: 5696c8aedfcc ("net: dsa: Don't offload port attributes on standalone ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Tested-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20210202233109.1591466-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:25:49 -08:00
Jakub Kicinski	75b8f78fb9	Merge branch 'chelsio-cxgb-use-threaded-interrupts-for-deferred-work' Sebastian Andrzej Siewior says: ==================== chelsio: cxgb: Use threaded interrupts for deferred work Patch #2 fixes an issue in which del_timer_sync() and tasklet_kill() is invoked from the interrupt handler. This is probably a rare error case since it disables interrupts / the card in that case. Patch #1 converts a worker to use a threaded interrupt which is then also used in patch #2 instead adding another worker for this task (and flush_work() to synchronise vs rmmod). ==================== Link: https://lore.kernel.org/r/20210202170104.1909200-1-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:41:04 -08:00
Sebastian Andrzej Siewior	82154580a7	chelsio: cxgb: Disable the card on error in threaded interrupt t1_fatal_err() is invoked from the interrupt handler. The bad part is that it invokes (via t1_sge_stop()) del_timer_sync() and tasklet_kill(). Both functions must not be called from an interrupt because it is possible that it will wait for the completion of the timer/tasklet it just interrupted. In case of a fatal error, use t1_interrupts_disable() to disable all interrupt sources and then wake the interrupt thread with F_PL_INTR_SGE_ERR as pending flag. The threaded-interrupt will stop the card via t1_sge_stop() and not re-enable the interrupts again. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:41:01 -08:00
Sebastian Andrzej Siewior	fec7fa0a75	chelsio: cxgb: Replace the workqueue with threaded interrupt The external interrupt (F_PL_INTR_EXT) needs to be handled in a process context and this is accomplished by utilizing a workqueue. The process context can also be provided by a threaded interrupt instead of a workqueue. The threaded interrupt can be used later for other interrupt related processing which require non-atomic context without using yet another workqueue. free_irq() also ensures that the thread is done which is currently missing (the worker could continue after the module has been removed). Save pending flags in pending_thread_intr. Use the same mechanism to disable F_PL_INTR_EXT as interrupt source like it is used before the worker is scheduled. Enable the interrupt again once t1_elmer0_ext_intr_handler() is done. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:41:01 -08:00
Jakub Kicinski	462e99a18b	Merge branch 'support-for-octeontx2-98xx-cpt-block' Srujana Challa says: ==================== Support for OcteonTX2 98xx CPT block. OcteonTX2 series of silicons have multiple variants, the 98xx variant has two crypto (CPT) blocks to double the crypto performance. This patchset adds support for new CPT block(CPT1). ==================== Link: https://lore.kernel.org/r/20210202152709.20450-1-schalla@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:31:36 -08:00
Srujana Challa	c57c58fd5c	octeontx2-af: Handle CPT function level reset When FLR is initiated for a VF (PCI function level reset), the parent PF gets a interrupt. PF then sends a message to admin function (AF), which then cleans up all resources attached to that VF. This patch adds support to handle CPT FLR. Signed-off-by: Narayana Prasad Raju Atherya <pathreya@marvell.com> Signed-off-by: Suheil Chandran <schandran@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:31:34 -08:00
Srujana Challa	b0f60fab78	octeontx2-af: Add support for CPT1 in debugfs Adds support to display block CPT1 stats at "/sys/kernel/debug/octeontx2/cpt1". Signed-off-by: Mahipal Challa <mchalla@marvell.com> Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:31:34 -08:00
Srujana Challa	de2854c87c	octeontx2-af: Mailbox changes for 98xx CPT block This patch changes CPT mailbox message format to support new block CPT1 in 98xx silicon. cpt_rd_wr_reg -> Modify cpt_rd_wr_reg mailbox and its handler to accommodate new block CPT1. cpt_lf_alloc -> Modify cpt_lf_alloc mailbox and its handler to configure LFs from a block address out of multiple blocks of same type. If a PF/VF needs to configure LFs from both the blocks then this mbox should be called twice. Signed-off-by: Mahipal Challa <mchalla@marvell.com> Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:31:34 -08:00
Mike Looijmans	e0183b974d	net: mdiobus: Prevent spike on MDIO bus reset signal The mdio_bus reset code first de-asserted the reset by allocating with GPIOD_OUT_LOW, then asserted and de-asserted again. In other words, if the reset signal defaulted to asserted, there'd be a short "spike" before the reset. Here is what happens depending on the pre-existing state of the reset signal: Reset (previously asserted): ~~~\|_\|~~~~\|_______ Reset (previously deasserted): _____\|~~~~\|_______ ^ ^ ^ A B C At point A, the low going transition is because the reset line is requested using GPIOD_OUT_LOW. If the line is successfully requested, the first thing we do is set it high _without_ any delay. This is point B. So, a glitch occurs between A and B. We then fsleep() and finally set the GPIO low at point C. Requesting the line using GPIOD_OUT_HIGH eliminates the A and B transitions. Instead we get: Reset (previously asserted) : ~~~~~~~~~~\|______ Reset (previously deasserted): ____\|~~~~~\|______ ^ ^ A C Where A and C are the points described above in the code. Point B has been eliminated. The issue was found when we pulled down the reset signal for the Marvell 88E1512P PHY (because it requires at least 50ms after POR with an active clock). Looking at the reset signal with a scope revealed a short spike, point B in the artwork above. Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210202143239.10714-1-mike.looijmans@topic.nl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:28:05 -08:00
Dan Carpenter	4160d9ec5b	net: mscc: ocelot: fix error code in mscc_ocelot_probe() Probe should return an error code if platform_get_irq_byname() fails but it returns success instead. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXyFIl4V9hgxYM@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 16:18:10 -08:00
Dan Carpenter	e0c1623357	net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: 6c30384eb1de ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 16:18:10 -08:00
Jakub Kicinski	2d912da016	Merge branch 'net-use-indirect_call-in-some-dst_ops' Brian Vazquez says: ==================== net: use INDIRECT_CALL in some dst_ops This patch series uses the INDIRECT_CALL wrappers in some dst_ops functions to mitigate retpoline costs. Benefits depend on the platform as described below. Background: The kernel rewrites the retpoline code at __x86_indirect_thunk_r11 depending on the CPU's requirements. The INDIRECT_CALL wrappers provide hints on possible targets and save the retpoline overhead using a direct call in case the target matches one of the hints. The retpoline overhead for the following three cases has been measured by Luigi Rizzo in microbenchmarks, using CPU performance counters, and cover reasonably well the range of possible retpoline overheads compared to a plain indirect call (in equal conditions, specifically with predicted branch, hot cache): - just "jmp (%r11)" on modern platforms like Intel Cascadelake. In this case the overhead is just 2 clock cycles: - "lfence; jmp (%r11)" on e.g. some recent AMD CPUs. In this case the lfence is blocked until pending reads complete, so the actual overhead depends on previous instructions. The best case we have measured 15 clock cycles of overhead. - worst case, e.g. skylake, the full retpoline is used __x86_indirect_thunk_r11: call set_u_target capture_speculation: pause lfence jmp capture_speculation .align 16 set_up_target: mov %r11, (%rsp) ret In this case the overhead has been measured in 35-40 clock cycles. The actual time saved hence depends on the platform and current clock speed (which varies heavily, especially when C-states are active). Also note that actual benefit might be lower than expected if the longer retpoline overlaps with some pending memory read. MEASUREMENTS: The INDIRECT_CALL wrappers in this patchset involve the processing of incoming SYN and generation of syncookies. Hence, the test has been run by configuring a receiving host with a single NIC rx queue, disabling RPS and RFS so that all processing occurs on the same core. An external source generates SYN fast enough to saturate the receiving CPU. We ran two sets of experiments, with and without the dst_output patch, comparing the number of syncookies generated over a 20s period in multiple runs. Assuming the CPU is saturated, the time per packet is t = number_of_packets/total_time and if the two datasets have statistically meaningful difference, the difference in times between the two cases gives an estimate of the benefits from one INDIRECT_CALL. Here are the experimental results: Skylake Syncookies over 20s (5 tests) --------------------------------------------------- indirect 9166325 9182023 9170093 9134014 9171082 retpoline 9099308 9126350 9154841 9056377 9122376 Computing the stats on the ns_pkt = 20e6/total_packets gives the following: $ ministat -c 95 -w 70 /tmp/sk-indirect /tmp/sk-retp x /tmp/sk-indirect + /tmp/sk-retp +----------------------------------------------------------------------+ \|x xx x + x + + + +\| \|\|______M__A_______\|_\|____________M_____A___________________\| \| +----------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 2.17817e-06 2.18962e-06 2.181e-06 2.182292e-06 4.3252133e-09 + 5 2.18464e-06 2.20839e-06 2.19241e-06 2.194974e-06 8.8695958e-09 Difference at 95.0% confidence 1.2682e-08 +/- 1.01766e-08 0.581132% +/- 0.466326% (Student's t, pooled s = 6.97772e-09) This suggests a difference of 13ns +/- 10ns Our expectation from microbenchmarks was 35-40 cycles per call, but part of the gains may be eaten by stalls from pending memory reads. For Cascadelake: Cascadelake Syncookies over 20s (5 tests) --------------------------------------------------------- indirect 10339797 10297547 10366826 10378891 10384854 retpoline 10332674 10366805 10320374 10334272 10374087 Computing the stats on the ns_pkt = 20e6/total_packets gives no meaningful difference even at just 80% (this was expected): $ ministat -c 80 -w 70 /tmp/cl-indirect /tmp/cl-retp x /tmp/cl-indirect + /tmp/cl-retp +----------------------------------------------------------------------+ \| x x + * x + + + x\| \|\|______________\|_M_________A_____A_______M________\|___\| \| +----------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 1.92588e-06 1.94221e-06 1.92923e-06 1.931716e-06 6.6936746e-09 + 5 1.92788e-06 1.93791e-06 1.93531e-06 1.933188e-06 4.3734106e-09 No difference proven at 80.0% confidence ==================== Link: https://lore.kernel.org/r/20210201174132.3534118-1-brianvv@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 14:51:41 -08:00

1 2 3 4 5 ...

984688 Commits