linux-next

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git synced 2024-12-29 17:22:07 +00:00

Author	SHA1	Message	Date
Maciej Żenczykowski	246ef40670	ipv6: eliminate ndisc_ops_is_useropt() as it doesn't seem to offer anything of value. There's only 1 trivial user: int lowpan_ndisc_is_useropt(u8 nd_opt_type) { return nd_opt_type == ND_OPT_6CO; } but there's no harm to always treating that as a useropt... Cc: David Ahern <dsahern@kernel.org> Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Link: https://patch.msgid.link/20240730003010.156977-1-maze@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 17:23:57 -07:00
Jakub Kicinski	9a4615be65	Merge branch 'eth-fbnic-add-basic-stats' Jakub Kicinski says: ==================== eth: fbnic: add basic stats Add basic interface stats to fbnic. v1: https://lore.kernel.org/20240807022631.1664327-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20240810054322.2766421-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 15:44:34 -07:00
Stanislav Fomichev	8be1bd91db	eth: fbnic: add support for basic qstats Implement netdev_stat_ops and export the basic per-queue stats. This interface expect users to set the values that are used either to zero or to some other preserved value (they are 0xff by default). So here we export bytes/packets/drops from tx and rx_stats plus set some of the values that are exposed by queue stats to zero. $ cd tools/testing/selftests/drivers/net && ./stats.py [...] Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20240810054322.2766421-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 15:44:23 -07:00
Jakub Kicinski	45d84008cc	eth: fbnic: add basic rtnl stats Count packets, bytes and drop on the datapath, and report to the user. Since queues are completely freed when the device is down - accumulate the stats in the main netdev struct. This means that per-queue stats will only report values since last reset (per qstat recommendation). Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240810054322.2766421-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-12 15:44:23 -07:00
David S. Miller	fe1f433555	Merge branch 'ethtool-rss-driver-tweaks' Jakub Kicinski says: ==================== ethtool: rss: driver tweaks and netlink context dumps This series is a semi-related collection of RSS patches. Main point is supporting dumping RSS contexts via ethtool netlink. At present additional RSS contexts can be queried one by one, and assuming user know the right IDs. This series uses the XArray added by Ed to provide netlink dump support for ETHTOOL_GET_RSS. Patch 1 is a trivial selftest debug patch. Patch 2 coverts mvpp2 for no real reason other than that I had a grand plan of converting all drivers at some stage. Patch 3 removes a now moot check from mlx5 so that all tests can pass. Patch 4 and 5 make a bit used for context support optional, for easier grepping of drivers which need converting if nothing else. Patch 6 OTOH adds a new cap bit; some devices don't support using a different key per context and currently act in surprising ways. Patch 7 and 8 update the RSS netlink code to use XArray. Patch 9 and 10 add support for dumping contexts. Patch 11 and 12 are small adjustments to spec and a new test. I'm getting distracted with other work, so probably won't have the time soon to complete next steps, but things which are missing are (and some of these may be bad ideas): - better discovery Some sort of API to tell the user who many contexts the device can create. Upper bound, devices often share contexts between ports etc. so it's hard to tell exactly and upfront number of contexts for a netdev. But order of magnitude (4 vs 10s) may be enough for container management system to know whether to bother. - create/modify/delete via netlink The only question here is how to handle all the tricky IOCTL legacy. "No change" maps trivially to attribute not present. "reset" (indir_size = 0) probably needs to be a new NLA_FLAG? - better table size handling The current API assumes the LUT has fixed size, which isn't true for modern devices. We should have better APIs for the drivers to resize the tables, and in user facing API - the ability to specify pattern and min size rather than exact table expected (sort of like ethtool CLI already does). - recounted / socket-bound contexts Support for contexts which get "cleaned up" when their parent netlink socket gets closed. The major catch is that ntuple filters (which we don't currently track) depend on the context, so we need auto-removal for both. v5: - fix build v4: https://lore.kernel.org/20240809031827.2373341-1-kuba@kernel.org - adjust to the meaning of max context from net v3: https://lore.kernel.org/20240806193317.1491822-1-kuba@kernel.org - quite a few code comments and commit message changes - mvpp2: fix interpretation of max_context_id (I'll take care of the net -> net-next merge as needed) - filter by ifindex in the selftest v2: https://lore.kernel.org/20240803042624.970352-1-kuba@kernel.org - fix bugs and build in mvpp2 v1: https://lore.kernel.org/20240802001801.565176-1-kuba@kernel.org ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:25 +01:00
Jakub Kicinski	c1ad8ef804	selftests: drv-net: rss_ctx: test dumping RSS contexts Add a test for dumping RSS contexts. Make sure indir table and key are sane when contexts are created with various combination of inputs. Test the dump filtering by ifname and start-context. Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:25 +01:00
Jakub Kicinski	8ad3be1352	netlink: specs: decode indirection table as u32 array Indirection table is dumped as a raw u32 array, decode it. It's tempting to decode hash key, too, but it is an actual bitstream, so leave it be for now. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	3d50c66c06	ethtool: rss: support skipping contexts during dump Applications may want to deal with dynamic RSS contexts only. So dumping context 0 will be counter-productive for them. Support starting the dump from a given context ID. Alternative would be to implement a dump flag to skip just context 0, not sure which is better... Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	f6122900f4	ethtool: rss: support dumping RSS contexts Now that we track RSS contexts in the core we can easily dump them. This is a major introspection improvement, as previously the only way to find all contexts would be to try all ids (of which there may be 2^32 - 1). Don't use the XArray iterators (like xa_for_each_start()) as they do not move the index past the end of the array once done, which caused multiple bugs in Netlink dumps in the past. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	bb87f2c796	ethtool: rss: report info about additional contexts from XArray IOCTL already uses the XArray when reporting info about additional contexts. Do the same thing in netlink code. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	a7ddfd5d57	ethtool: rss: move the device op invocation out of rss_prepare_data() Factor calling device ops out of rss_prepare_data(). Next patch will add alternative path using xarray. No functional changes. Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	ec6e57beaf	ethtool: rss: don't report key if device doesn't support it marvell/otx2 and mvpp2 do not support setting different keys for different RSS contexts. Contexts have separate indirection tables but key is shared with all other contexts. This is likely fine, indirection table is the most important piece. Don't report the key-related parameters from such drivers. This prevents driver-errors, e.g. otx2 always writes the main key, even when user asks to change per-context key. The second reason is that without this change tracking the keys by the core gets complicated. Even if the driver correctly reject setting key with rss_context != 0, change of the main key would have to be reflected in the XArray for all additional contexts. Since the additional contexts don't have their own keys not including the attributes (in Netlink speak) seems intuitive. ethtool CLI seems to deal with it just fine. Having to set the flag in majority of the drivers is a bit tedious but not reporting the key is a safer default. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	fb770fe758	eth: remove .cap_rss_ctx_supported from updated drivers Remove .cap_rss_ctx_supported from drivers which moved to the new API. This makes it easy to grep for drivers which still need to be converted. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	ce056504e2	ethtool: make ethtool_ops::cap_rss_ctx_supported optional cap_rss_ctx_supported was created because the API for creating and configuring additional contexts is mux'ed with the normal RSS API. Presence of ops does not imply driver can actually support rss_context != 0 (in fact drivers mostly ignore that field). cap_rss_ctx_supported lets core check that the driver is context-aware before calling it. Now that we have .create_rxfh_context, there is no such ambiguity. We can depend on presence of the op. Make setting the bit optional. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	a7f6f56f60	eth: mlx5: allow disabling queues when RSS contexts exist Since commit `24ac7e5440` ("ethtool: use the rss context XArray in ring deactivation safety-check") core will prevent queues from being disabled while being used by additional RSS contexts. The safety check is no longer necessary, and core will do a more accurate job of only rejecting changes which can actually break things. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	f203fd85e6	eth: mvpp2: implement new RSS context API Implement the separate create/modify/delete ops for RSS. No problems with IDs - even tho RSS tables are per device the driver already seems to allocate IDs linearly per port. There's a translation table from per-port context ID to device context ID. mvpp2 doesn't have a key for the hash, it defaults to an empty/previous indir table. Note that there is no key at all, so we don't have to be concerned with reporting the wrong one (which is addressed by a patch later in the series). Compile-tested only. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	10fbe8c082	selftests: drv-net: rss_ctx: add identifier to traffic comments Include the "name" of the context in the comment for traffic checks. Makes it easier to reason about which context failed when we loop over 32 contexts (it may matter if we failed in first vs last, for example). Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:23 +01:00
Menglong Dong	6b8a024d25	net: vxlan: remove duplicated initialization in vxlan_xmit The variable "did_rsc" is initialized twice, which is unnecessary. Just remove one of them. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 13:37:43 +01:00
Rosen Penev	f547e956dd	net: sunvnet: use ethtool_sprintf/puts Simpler and allows avoiding manual pointer addition. Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 13:25:38 +01:00
Enguerrand de Ribaucourt	c4e82c025b	net: dsa: microchip: ksz9477: split half-duplex monitoring function In order to respect the 80 columns limit, split the half-duplex monitoring function in two. This is just a styling change, no functional change. Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:08:34 +01:00
David S. Miller	462a94ec9f	Merge branch 'phylib-fixed-speed-1G' Russell King says: ==================== net: phylib: fix fixed-speed >= 1G This is v2 of the patch (now patches) adding support for ethtool !autoneg while respecting the requirements of IEEE 802.3. v2 fixes the build errors in the previous patch by first constifying the "advertisement" argument to the linkmode functions that only read from this pointer. It also fixes the incorrectly named linkmode_set function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:04:29 +01:00
Russell King (Oracle)	6ff3cddc36	net: phylib: do not disable autoneg for fixed speeds >= 1G We have an increasing number of drivers that are forcing auto-negotiation to be enabled for speeds of 1G or faster. It would appear that auto-negotiation is mandatory for speeds above 100M. In 802.3, Annex 40C's state diagrams seems to imply that mr_autoneg_enable (BMCR AN ENABLE) doesn't affect whether or not the AN state machines work for 1000base-T, and some PHY datasheets (e.g. Marvell Alaska) state that disabling mr_autoneg_enable leaves AN enabled but forced to 1G full duplex. Other PHY datasheets imply that BMCR AN ENABLE should not be cleared for >= 1G. Thus, this should be handled in phylib rather than in each driver. Rather than erroring out, arrange to implement the Marvell Alaska solution but in software for all PHYs: generate an appropriate single-speed advertisement for the requested speed, and keep AN enabled to the PHY driver. However, to avoid userspace API breakage, continue to report to userspace that we have AN disabled. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:04:29 +01:00
Russell King (Oracle)	aa9fbc5dd9	net: mii: constify advertising mask Constify the advertising mask to linkmode functions that only read from the advertising mask. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:04:29 +01:00
David S. Miller	4efee05fef	Merge branch 'mvpp2-child-port-removal' Javier Carrasco says: ==================== net: mvpp2: rework child node/port removal handling These two patches used to be part of another series [1] that did not apply to the networking tree without conflicts. This is therefore just a partial resend with no code modifications, just rebased onto net/main. Link: https://lore.kernel.org/all/20240806181026.5fe7f777@kernel.org/ [1] ==================== Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:00:33 +01:00
Javier Carrasco	a7b3274447	net: mvpp2: use device_for_each_child_node() to access device child nodes The iterated nodes are direct children of the device node, and the `device_for_each_child_node()` macro accounts for child node availability. `fwnode_for_each_available_child_node()` is meant to access the child nodes of an fwnode, and therefore not direct child nodes of the device node. The child nodes within mvpp2_probe are not accessed outside the loops, and the scoped version of the macro can be used to automatically decrement the refcount on early exits. Use `device_for_each_child_node()` and its scoped variant to indicate device's direct child nodes. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:00:33 +01:00
Javier Carrasco	e81d00a6b3	net: mvpp2: use port_count to remove ports As discussed in [1], there is no need to iterate over child nodes to remove the list of ports. Instead, a loop up to `port_count` ports can be used, and is in fact more reliable in case the child node availability changes. The suggested approach removes the need for the `fwnode` and `port_fwnode` variables in mvpp2_remove() as well. Link: https://lore.kernel.org/all/ZqdRgDkK1PzoI2Pf@shell.armlinux.org.uk/ [1] Suggested-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 17:00:33 +01:00
David S. Miller	80d021bc57	Merge branch 'bnxt_en-fix-queue-reset-when-queue-active' David Wei says: ==================== fix bnxt_en queue reset when queue is active The current bnxt_en queue API implementation is buggy when resetting a queue that has active traffic. The problem is that there is no FW involved to stop the flow of packets and relying on napi_disable() isn't enough. To fix this, call bnxt_hwrm_vnic_update() with MRU set to 0 for both the default and the ntuple vnic to stop the flow of packets. This works for any Rx queue and not only those that have ntuple rules since every Rx queue is either in the default or the ntuple vnic. For bnxt_hwrm_vnic_update() to work, proper flushing must be done by the FW. A FW flag is there to indicate support and queue_mgmt_ops is keyed behind this. The first three patches are from Michael Chan and adds the prerequisite vnic functions and FW flags indicating that it will properly flush during vnic update. Tested on BCM957504 while iperf3 is active: 1. Reset a queue that has an ntuple rule steering flow into it 2. Reset all queues in order, one at a time In both cases the flow is not interrupted. Sending this to net-next as there is no in-tree kernel consumer of queue API just yet, and there is a patch that changes when the queue_mgmt_ops is registered. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> --- v3: - include patches from Michael Chan that adds a FW flag for vnic flush capability - key support for queue_mgmt_ops behind this new flag v2: - split setting vnic->mru into a separate patch (Wojciech) - clarify why napi_enable()/disable() is removed ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:03 +01:00
David Wei	97cbf3d0ac	bnxt_en: only set dev->queue_mgmt_ops if supported by FW The queue API calls bnxt_hwrm_vnic_update() to stop/start the flow of packets, which can only properly flush the pipeline if FW indicates support. Add a macro BNXT_SUPPORTS_QUEUE_API that checks for the required flags and only set queue_mgmt_ops if true. Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
David Wei	b9d2956e86	bnxt_en: stop packet flow during bnxt_queue_stop/start The current implementation when resetting a queue while packets are flowing puts the queue into an inconsistent state. There needs to be some synchronisation with the FW. Add calls to bnxt_hwrm_vnic_update() to set the MRU for both the default and ntuple vnic during queue start/stop. When the MRU is set to 0, flow is stopped. Each Rx queue belongs to either the default or the ntuple vnic. With calling bnxt_hwrm_vnic_update() the calls to napi_enable() and napi_disable() must be removed for reset to work on a queue that has active traffic flowing e.g. iperf3. Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
David Wei	d41575f76a	bnxt_en: set vnic->mru in bnxt_hwrm_vnic_cfg() Set the newly added vnic->mru field in bnxt_hwrm_vnic_cfg(). Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
Michael Chan	6e360862c0	bnxt_en: Check the FW's VNIC flush capability Check the HWRM_VNIC_QCAPS FW response for the receive engine flush capability. This capability indicates that we can reliably support RX ring restart when calling HWRM_VNIC_UPDATE with MRU set to 0. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
Michael Chan	f2878cdeb7	bnxt_en: Add support to call FW to update a VNIC Add the function bnxt_hwrm_vnic_update() to call FW to update a VNIC. This call can be used when disabling and enabling a receive ring within a VNIC. The mru which is the maximum receive size of packets received by the VNIC can be updated. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
Michael Chan	fbda8ee64b	bnxt_en: Update firmware interface to 1.10.3.68 The main changes are: 1. HWRM_VNIC_UPDATE used to safely disable and enable an RX ring within the VNIC. 2. New flag in HWRM_VNIC_QCAPS to indicate FW will do the proper flush during HWRM_VNIC_UPDATE. 3. New flag in HWRM_FUNC_QCAPS to indicate that reservations for some resources such as VNIC can be reduced. 4. New backing store memory types not used by the driver yet. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
David S. Miller	969afb4347	Merge branch 'l2tp-misc-improvements' James Chapman says: ==================== l2tp: misc improvements This series makes several improvements to l2tp: * update documentation to be consistent with recent l2tp changes. * move l2tp_ip socket tables to per-net data. * fix handling of hash key collisions in l2tp_v3_session_get * implement and use get-next APIs for management and procfs/debugfs. * improve l2tp refcount helpers. * use per-cpu dev->tstats in l2tpeth devices. * fix a lockdep splat. * fix a race between l2tp_pre_exit_net and pppol2tp_release. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:50 +01:00
James Chapman	c1b2e36b87	l2tp: flush workqueue before draining it syzbot exposes a race where a net used by l2tp is removed while an existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues TUNNEL_DELETE work items to close each tunnel in the net. When these are run, new SESSION_DELETE work items are queued to delete each session in the tunnel. This all happens in drain_workqueue. However, drain_workqueue allows only new work items if they are queued by other work items which are already in the queue. If pppol2tp_release runs after drain_workqueue has started, it may queue a SESSION_DELETE work item, which results in the warning below in drain_workqueue. Address this by flushing the workqueue before drain_workqueue such that all queued TUNNEL_DELETE work items run before drain_workqueue is started. This will queue SESSION_DELETE work items for each session in the tunnel, hence pppol2tp_release or other API requests won't queue SESSION_DELETE requests once drain_workqueue is started. WARNING: CPU: 1 PID: 5467 at kernel/workqueue.c:2259 __queue_work+0xcd3/0xf50 kernel/workqueue.c:2258 Modules linked in: CPU: 1 UID: 0 PID: 5467 Comm: syz.3.43 Not tainted 6.11.0-rc1-syzkaller-00247-g3608d6aca5e7 #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 RIP: 0010:__queue_work+0xcd3/0xf50 kernel/workqueue.c:2258 Code: ff e8 11 84 36 00 90 0f 0b 90 e9 1e fd ff ff e8 03 84 36 00 eb 13 e8 fc 83 36 00 eb 0c e8 f5 83 36 00 eb 05 e8 ee 83 36 00 90 <0f> 0b 90 48 83 c4 60 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc RSP: 0018:ffffc90004607b48 EFLAGS: 00010093 RAX: ffffffff815ce274 RBX: ffff8880661fda00 RCX: ffff8880661fda00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff815cd6d4 R09: 0000000000000000 R10: ffffc90004607c20 R11: fffff520008c0f85 R12: ffff88802ac33800 R13: ffff88802ac339c0 R14: dffffc0000000000 R15: 0000000000000008 FS: 00005555713eb500(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000001eda6000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> queue_work_on+0x1c2/0x380 kernel/workqueue.c:2392 pppol2tp_release+0x163/0x230 net/l2tp/l2tp_ppp.c:445 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x24a/0x8a0 fs/file_table.c:422 task_work_run+0x24f/0x310 kernel/task_work.c:228 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f061e9779f9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffff1c1fce8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 000000000001017d RCX: 00007f061e9779f9 RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007ffff1c1fdc0 R08: 0000000000000001 R09: 00007ffff1c1ffcf R10: 00007f061e800000 R11: 0000000000000246 R12: 0000000000000032 R13: 00007ffff1c1fde0 R14: 00007ffff1c1fe00 R15: ffffffffffffffff </TASK> Fixes: `fc7ec7f554` ("l2tp: delete sessions using work queue") Reported-by: syzbot+0e85b10481d2f5478053@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0e85b10481d2f5478053 Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:50 +01:00
James Chapman	dcc59d3e32	l2tp: l2tp_eth: use per-cpu counters from dev->tstats l2tp_eth uses old-style dev->stats for fastpath packet/byte counters. Convert it to use dev->tstats per-cpu counters. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:50 +01:00
James Chapman	abe7a1a7d0	l2tp: improve tunnel/session refcount helpers l2tp_tunnel_inc_refcount and l2tp_session_inc_refcount wrap refcount_inc. They add no value so just use the refcount APIs directly and drop l2tp's helpers. l2tp already uses refcount_inc_not_zero anyway. Rename l2tp_tunnel_dec_refcount and l2tp_session_dec_refcount to l2tp_tunnel_put and l2tp_session_put to better match their use pairing various _get getters. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:50 +01:00
James Chapman	1f4c3dce91	l2tp: use get_next APIs for management requests and procfs/debugfs l2tp netlink and procfs/debugfs iterate over tunnel and session lists to obtain data. They currently use very inefficient get_nth functions to do so. Replace these with get_next. For netlink, use nl cb->ctx[] for passing state instead of the obsolete cb->args[]. l2tp_tunnel_get_nth and l2tp_session_get_nth are no longer used so they can be removed. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	aa92c1cec9	l2tp: add tunnel/session get_next helpers l2tp management APIs and procfs/debugfs iterate over l2tp tunnel and session lists. Since these lists are now implemented using IDR, we can use IDR get_next APIs to iterate them. Add tunnel/session get_next functions to do so. The session get_next functions get the next session in a given tunnel and need to account for l2tpv2 and l2tpv3 differences: * l2tpv2 sessions are keyed by tunnel ID / session ID. Iteration for a given tunnel ID, TID, can therefore start with a key given by TID/0 and finish when the next entry's tunnel ID is not TID. This is possible only because the tunnel ID part of the key is the upper 16 bits and the session ID part the lower 16 bits; when idr_next increments the key value, it therefore finds the next sessions of the current tunnel before those of the next tunnel. Entries with session ID 0 are always skipped because they are used internally by pppol2tp. * l2tpv3 sessions are keyed by session ID. Iteration starts at the first IDR entry and skips entries where the tunnel does not match. Iteration must also consider session ID collisions and walk the list of colliding sessions (if any) for one which matches the supplied tunnel. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	b0a8deda06	l2tp: handle hash key collisions in l2tp_v3_session_get To handle colliding l2tpv3 session IDs, l2tp_v3_session_get searches a hashed list keyed by ID and sk. Although unlikely, if hash keys collide, it is possible that hash_for_each_possible loops over a session which doesn't have the ID that we are searching for. So check for session ID match when looping over possible hash key matches. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	ebed6606b9	l2tp: move l2tp_ip and l2tp_ip6 data to pernet l2tp_ip[6] have always used global socket tables. It is therefore not possible to create l2tpip sockets in different namespaces with the same socket address. To support this, move l2tpip socket tables to pernet data. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	168464c19e	l2tp: remove inline from functions in c sources Update l2tp to remove the inline keyword from several functions in C sources, since this is now discouraged. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	e2b1762cf3	documentation/networking: update l2tp docs l2tp no longer uses sk_user_data in tunnel sockets and now manages tunnel/session lifetimes slightly differently. Update docs to cover this. CC: linux-doc@vger.kernel.org CC: corbet@lwn.net Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
Jakub Kicinski	bbfeba2603	Merge branch 'mlx5-misc-patches-2024-08-08' Tariq Toukan says: ==================== mlx5 misc patches 2024-08-08 This patchset contains multiple enhancements from the team to the mlx5 core and Eth drivers. Patch #1 by Chris bumps a defined value to permit more devices doing TC offloads. Patch #2 by Jianbo adds an IPsec fast-path optimization to replace the slow async handling. Patches #3 and #4 by Jianbo add TC offload support for complicated rules to overcome firmware limitation. Patch #5 by Gal unifies the access macro to advertised/supported link modes. Patches #6 to #9 by Gal adds extack messages in ethtool ops to replace prints to the kernel log. Patch #10 by Cosmin switches to using 'update' verb instead of 'replace' to better reflect the operation. Patch #11 by Cosmin exposes an update connection tracking operation to replace the assumed delete+add implementaiton. ==================== Link: https://patch.msgid.link/20240808055927.2059700-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:17 -07:00
Cosmin Ratiu	6b5662b759	net/mlx5e: CT: Update connection tracking steering entries Previously, replacing a connection tracking steering entry was done by adding a new rule (with the same tag but possibly different mod hdr actions/labels) then removing the old rule. This approach doesn't work in hardware steering because two steering entries with the same tag cannot coexist in a hardware steering table. This commit prepares for that by adding a new ct_rule_update operation on the ct_fs_ops struct which is used instead of add+delete. Implementations for both dmfs (firmware steering) and smfs (software steering) are provided, which simply add the new rule and delete the old one. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Cosmin Ratiu	486aeb2db5	net/mlx5e: CT: 'update' rules instead of 'replace' Offloaded rules can be updated with a new modify header action containing a changed restore cookie. This was done using the verb 'replace', while in some configurations 'update' is a better fit. This commit renames the functions used to reflect that. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	b5100b72da	net/mlx5e: Use extack in get module eeprom by page callback In case of errors in get module eeprom by page, reflect it through extack instead of a dmesg print. While at it, make the messages more human friendly. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	9c4298b466	net/mlx5e: Use extack in set coalesce callback In case of errors in set coalesce, reflect it through extack instead of a dmesg print. While at it, make the messages more human friendly. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	29a943d71d	net/mlx5e: Use extack in get coalesce callback In case of errors in get coalesce, reflect it through extack instead of a dmesg print. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	ab666b5287	net/mlx5e: Use extack in set ringparams callback In case of errors in set ringparams, reflect it through extack instead of a dmesg print. While at it, make the messages more human friendly and remove two redundant checks that are already validated by the core. Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00

1 2 3 4 5 ...

1295161 Commits