Preparing for udp 4-tuple hash (uhash4 for short).
To implement uhash4 without cache line missing when lookup, hslot2 is
used to record the number of hashed sockets in hslot4. Thus adding a new
struct udp_hslot_main with field hash4_cnt, which is used by hash2. The
new struct is used to avoid doubling the size of udp_hslot.
Before uhash4 lookup, firstly checking hash4_cnt to see if there are
hashed sks in hslot4. Because hslot2 is always used in lookup, there is
no cache line miss.
Related helpers are updated, and use the helpers as possible.
uhash4 is implemented in following patches.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmc3A/gACgkQrB3Eaf9P
W7fNew//XCIhIvFYaQcP2x84T4EYB679NkGlwMATxXgn40+sp7muSwVweynEWNIu
FltfBAwYD/MxD7g519abVPMWXs/iYI5duw3vvqnxmkOoebWLLocg2VoqFIdVXlQw
/hj+1X/oNT4OKcaQAw/FAGRuYvkc90YB/rRG51RwAIR0tyBjRwfUsozMM8QX/zQI
I0cLCgGAf/kylQre+dhvUkMhXaLogMF5v0qzPxhyMBD02JaUpe6+5cdHQcmKOhqa
ksTpySYnIKIHZrLizeFGDZpinaDIph20vGaDvDXpqTYFuwvCQsZczJy02dF4otf2
2dZz6+2La+ZM+WsGIqpALqKCNhr8fOcQxCRH3eGLPBwoXXt5CFAMgJKob8hKuonW
FgJaYMBZOjYbgGah8WbEe/YsWq4y3uRs48pFtY+T5cn7AskNxIvUoLNjSS83Hlqu
PJbveiKsZygig966Q/zUFATYnvj3zEgjVEcSbK6LRyBXL79Njr8l+PZ0Zoz76tc4
bF1Xv0x+lRYmwa9rvOFaeqrP/GTe0xvlitFzuCN7HnXiN8URKnnDY2odkXYzo+Z7
MBbP8wR/CaoiAvdMw74116nAIFOW95LPtvdGJTvlS9jAOt1P7dWQ3/mFKEpItndv
cJjWzI7HKl0+85FcCDw+tmsDWWGbALUyPw96i8UgUcDGyqVKUgA=
=Ioo8
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next-11-15
1) Add support for RFC 9611 per cpu xfrm state handling.
2) Add inbound and outbound xfrm state caches to speed up
state lookups.
3) Convert xfrm to dscp_t. From Guillaume Nault.
4) Fix error handling in build_aevent.
From Everest K.C.
5) Replace strncpy with strscpy_pad in copy_to_user_auth.
From Daniel Yang.
6) Fix an uninitialized symbol during acquire state insertion.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xuan Zhuo says:
====================
virtio-net: support AF_XDP zero copy (tx)
XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support
this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.
At present, we have completed some preparation:
1. vq-reset (virtio spec and kernel code)
2. virtio-core premapped dma
3. virtio-net xdp refactor
So it is time for Virtio-Net to complete the support for the XDP Socket
Zerocopy.
Virtio-net can not increase the queue num at will, so xsk shares the queue with
kernel.
This patch set includes some refactor to the virtio-net to let that to support
AF_XDP.
The current configuration sets the virtqueue (vq) to premapped mode,
implying that all buffers submitted to this queue must be mapped ahead
of time. This presents a challenge for the virtnet send queue (sq): the
virtnet driver would be required to keep track of dma information for vq
size * 17, which can be substantial. However, if the premapped mode were
applied on a per-buffer basis, the complexity would be greatly reduced.
With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
skb buffers could remain unmapped.
We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this
indicates that the driver has performed DMA mapping in advance, allowing
the Virtio core to directly utilize sg_dma_address(sg) without
conducting any internal DMA mapping. Additionally, DMA unmap operations
for this buffer will be bypassed.
ENV: Qemu with vhost-user(polling mode).
Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584
RX-errors: 0
RX-nombuf: 0
TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664
Throughput (since last show)
Rx-pps: 8861574 Rx-bps: 3969985208
Tx-pps: 8861493 Tx-bps: 3969962736
############################################################################
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712
RX-errors: 0
RX-nombuf: 0
TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152
Throughput (since last show)
Rx-pps: 6333196 Rx-bps: 2837272088
Tx-pps: 6333227 Tx-bps: 2837285936
############################################################################
But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
====================
Link: https://patch.msgid.link/20241112012928.102478-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now, we support AF_XDP(xsk). Add NETDEV_XDP_ACT_XSK_ZEROCOPY to
xdp_features.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-14-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If send queue sent some packets, we update the tx timeout
record to prevent the tx timeout.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-13-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver's tx napi is very important for XSK. It is responsible for
obtaining data from the XSK queue and sending it out.
At the beginning, we need to trigger tx napi.
virtnet_free_old_xmit distinguishes three type ptr(skb, xdp frame, xsk
buffer) by the last bits of the pointer.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-12-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since xsk's TX queue is consumed by TX NAPI, if sq is bound to xsk, then
we must stop tx napi from being disabled.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-11-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because the af-xdp will introduce a new xmit type, so I refactor the
xmit type mechanism first.
We know both xdp_frame and sk_buff are at least 4 bytes aligned.
For the xdp tx, we do not pass any pointer to virtio core as data,
we just need to pass the len of the packet. So we will push len
to the void pointer. We can make sure the pointer is 4 bytes aligned.
And the data structure of AF_XDP also is at least 4 bytes aligned.
So the last two bits of the pointers are free, we can't use these to
distinguish them.
00 for skb
01 for SKB_ORPHAN
10 for XDP
11 for AF-XDP tx
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-9-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two APIs are introduced to submit premapped per-buffers.
int virtqueue_add_inbuf_premapped(struct virtqueue *vq,
struct scatterlist *sg, unsigned int num,
void *data,
void *ctx,
gfp_t gfp);
int virtqueue_add_outbuf_premapped(struct virtqueue *vq,
struct scatterlist *sg, unsigned int num,
void *data,
gfp_t gfp);
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-6-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current configuration sets the virtqueue (vq) to premapped mode,
implying that all buffers submitted to this queue must be mapped ahead
of time. This presents a challenge for the virtnet send queue (sq): the
virtnet driver would be required to keep track of dma information for vq
size * 17, which can be substantial. However, if the premapped mode were
applied on a per-buffer basis, the complexity would be greatly reduced.
With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
skb buffers could remain unmapped.
And consider that some sgs are not generated by the virtio driver,
that may be passed from the block stack. So we can not change the
sgs, new APIs are the better way.
So we pass the new argument 'premapped' to indicate the buffers
submitted to virtio are premapped in advance. Additionally,
DMA unmap operations for these buffers will be bypassed.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-5-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The subsequent commit needs to know whether every indirect buffer is
premapped or not. So we need to introduce an extra struct for every
indirect buffer to record this info.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-4-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The subsequent commit needs to know whether every indirect buffer is
premapped or not. So we need to introduce an extra struct for every
indirect buffer to record this info.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-3-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To make the code readable, introduce vring_need_unmap_buffer() to
replace do_unmap.
use_dma_api premapped -> vring_need_unmap_buffer()
1. false false false
2. true false true
3. true true false
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-2-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-11-05 (ice, ixgbe, igc. igb, igbvf, e1000)
For ice:
Mateusz refactors and adds additional SerDes configuration values to be
output.
Przemek refactors processing of DDP and adds support for a flag field in
the DDP's signature segment header.
Joe Damato adds support for persistent NAPI config.
Brett adjusts setting of Tx promiscuous based on unicast/multicast
setting.
Jake moves setting of pf->supported_rxdids to occur directly after DDP
load and changes a small struct to use stack memory.
Frederic Weisbecker adds WQ_UNBOUND flag to the workqueue.
For ixgbe:
Diomidis Spinellis removes a circular dependency.
For igc:
Vitaly removes an unneeded autoneg parameter.
For igb:
Johnny Park fixes a couple of typos.
For igbvf:
Wander Lairson Costa removes an unused spinlock.
For e1000:
Joe Damato adds RTNL lock to some calls where it is expected to be held.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
e1000: Hold RTNL when e1000_down can be called
igbvf: remove unused spinlock
igb: Fix 2 typos in comments in igb_main.c
igc: remove autoneg parameter from igc_mac_info
ixgbe: Break include dependency cycle
ice: Unbind the workqueue
ice: use stack variable for virtchnl_supported_rxdids
ice: initialize pf->supported_rxdids immediately after loading DDP
ice: only allow Tx promiscuous for multicast
ice: Add support for persistent NAPI config
ice: support optional flags in signature segment header
ice: refactor "last" segment of DDP pkg
ice: extend dump serdes equalizer values feature
ice: rework of dump serdes equalizer values feature
====================
Link: https://patch.msgid.link/20241113185431.1289708-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata says:
====================
net: ndo_fdb_add/del: Have drivers report whether they notified
Currently when FDB entries are added to or deleted from a VXLAN netdevice,
the VXLAN driver emits one notification, including the VXLAN-specific
attributes. The core however always sends a notification as well, a generic
one. Thus two notifications are unnecessarily sent for these operations. A
similar situation comes up with bridge driver, which also emits
notifications on its own.
# ip link add name vx type vxlan id 1000 dstport 4789
# bridge monitor fdb &
[1] 1981693
# bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1
de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent
de:ad:be:ef:13:37 dev vx self permanent
In order to prevent this duplicity, add a parameter, bool *notified, to
ndo_fdb_add and ndo_fdb_del. The flag is primed to false, and if the callee
sends a notification on its own, it sets the flag to true, thus informing
the core that it should not generate another notification.
Patches #1 to #2 are concerned with the above.
In the remaining patches, #3 to #7, add a selftest. This takes place across
several patches. Many of the helpers we would like to use for the test are
in forwarding/lib.sh, whereas net/ is a more suitable place for the test,
so the libraries need to be massaged a bit first.
====================
Link: https://patch.msgid.link/cover.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Check that only one notification is produced for various FDB edit
operations.
Regarding the ip_link_add() and ip_link_master() helpers. This pattern of
action plus corresponding defer is bound to come up often, and a dedicated
vocabulary to capture it will be handy. tunnel_create() and vlan_create()
from forwarding/lib.sh are somewhat opaque and perhaps too kitchen-sinky,
so I tried to go in the opposite direction with these ones, and wrapped
only the bare minimum to schedule a corresponding cleanup.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://patch.msgid.link/910c5880ae6d3b558d6889cbdba2be690c2615c6.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A number of selftests run processes in the background and need to kill them
afterwards. Instead for everyone to open-code the kill / wait / redirect
mantra, add a helper in net/lib.sh. Convert existing open-code sites.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Link: https://patch.msgid.link/a9db102067d741c118f0bd93b10c75e2a34665ea.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For logging to be useful, something has to set RET and retmsg by calling
ret_set_ksft_status(). There is a suite of functions to that end in
forwarding/lib: check_err, check_fail et.al. Move them to net/lib.sh so
that every net test can use them.
Existing lib.sh users might be using these same names for their functions.
However lib.sh is always sourced near the top of the file (checked), and
whatever new definitions will simply override the ones provided by lib.sh.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://patch.msgid.link/f488a00dc85b8e0c1f3c71476b32b21b5189a847.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It would be good to use the same mechanism for scheduling and dispatching
general net tests as the many forwarding tests already use. To that end,
move the logging helpers to net/lib.sh so that every net test can use them.
Existing lib.sh users might be using the name themselves. However lib.sh is
always sourced near the top of the file (checked), and whatever new
definition will simply override the one provided by lib.sh.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://patch.msgid.link/a6fc083486493425b2c61185c327845b6ce3233a.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Many net selftests invent their own logging helpers. These really should be
in a library sourced by these tests. Currently forwarding/lib.sh has a
suite of perfectly fine logging helpers, but sourcing a forwarding/ library
from a higher-level directory smells of layering violation. In this patch,
move the logging helpers to net/lib.sh so that every net test can use them.
Together with the logging helpers, it's also necessary to move
pause_on_fail(), and EXIT_STATUS and RET.
Existing lib.sh users might be using these same names for their functions
or variables. However lib.sh is always sourced near the top of the
file (checked), and whatever new definitions will simply override the ones
provided by lib.sh.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://patch.msgid.link/edd3785a3bd72ffbe1409300989e993ee50ae98b.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a similar fashion to ndo_fdb_add, which was covered in the previous
patch, add the bool *notified argument to ndo_fdb_del. Callees that send a
notification on their own set the flag to true.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/06b1acf4953ef0a5ed153ef1f32d7292044f2be6.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently when FDB entries are added to or deleted from a VXLAN netdevice,
the VXLAN driver emits one notification, including the VXLAN-specific
attributes. The core however always sends a notification as well, a generic
one. Thus two notifications are unnecessarily sent for these operations. A
similar situation comes up with bridge driver, which also emits
notifications on its own:
# ip link add name vx type vxlan id 1000 dstport 4789
# bridge monitor fdb &
[1] 1981693
# bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1
de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent
de:ad:be:ef:13:37 dev vx self permanent
In order to prevent this duplicity, add a paremeter to ndo_fdb_add,
bool *notified. The flag is primed to false, and if the callee sends a
notification on its own, it sets it to true, thus informing the core that
it should not generate another notification.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/cbf6ae8195e85cbf922f8058ce4eba770f3b71ed.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Justin Lai says:
====================
Modifying format and renaming goto labels
This patch set primarily involves modifying the enum rtase_registers
format and renaming the goto labels in rtase_init_one.
====================
Link: https://patch.msgid.link/20241114112549.376101-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao says:
====================
net: netpoll: Improve SKB pool management
The netpoll subsystem pre-allocates 32 SKBs in a pool for emergency use
during out-of-memory conditions. However, the current implementation has
several inefficiencies:
* The SKB pool, once allocated, is never freed:
* Resources remain allocated even after netpoll users are removed
* Failed initialization can leave pool populated forever
* The global pool design makes resource tracking difficult
This series addresses these issues through three patches:
Patch 1 ("net: netpoll: Individualize the skb pool"):
- Replace global pool with per-user pools in netpoll struct
Patch 2 ("net: netpoll: flush skb pool during cleanup"):
- Properly free pool resources during netconsole cleanup
These changes improve resource management and make the code more
maintainable. As a side benefit, the improved structure would allow
netpoll to be modularized if desired in the future.
v2: https://lore.kernel.org/20241107-skb_buffers_v2-v2-0-288c6264ba4f@debian.org
v1: https://lore.kernel.org/20241025142025.3558051-1-leitao@debian.org
====================
Link: https://patch.msgid.link/20241114-skb_buffers_v2-v3-0-9be9f52a8b69@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netpoll subsystem maintains a pool of 32 pre-allocated SKBs per
instance, but these SKBs are not freed when the netpoll user is brought
down. This leads to memory waste as these buffers remain allocated but
unused.
Add skb_pool_flush() to properly clean up these SKBs when netconsole is
terminated, improving memory efficiency.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241114-skb_buffers_v2-v3-2-9be9f52a8b69@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current implementation of the netpoll system uses a global skb
pool, which can lead to inefficient memory usage and
waste when targets are disabled or no longer in use.
This can result in a significant amount of memory being unnecessarily
allocated and retained, potentially causing performance issues and
limiting the availability of resources for other system components.
Modify the netpoll system to assign a skb pool to each target instead of
using a global one.
This approach allows for more fine-grained control over memory
allocation and deallocation, ensuring that resources are only allocated
and retained as needed.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241114-skb_buffers_v2-v3-1-9be9f52a8b69@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the kdump check into enic_adjust_resources() so that everything
that modifies resources is in the same function.
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-7-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the enic resource adjustments out of enic_set_intr_mode() and into
its own function, enic_adjust_resources().
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-6-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of failing to use MSI-X if resources aren't configured exactly
right, use the resources we do have. Since we could start using large
numbers of rq resources, we do limit the rq count to what
netif_get_num_default_rss_queues() recommends.
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-5-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allocate wq, rq, cq, intr, and napi arrays based on the number of
resources configured in the VIC.
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-4-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Save the resources counts for wq,rq,cq, and interrupts in *_avail variables
so that we don't lose the information when adjusting the counts we are
actually using.
Report the wq_avail and rq_avail as the channel maximums in 'ethtool -l'
output.
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-3-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The VIC hardware has a constraint that the MSIX interrupt used for errors
be specified as a 7 bit number. Before this patch, it was allocated after
the I/O interrupts, which would cause a problem if 128 or more I/O
interrupts are in use.
So make the required interrupts come before the I/O interrupts to
guarantee the error interrupt offset never exceeds 7 bits.
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-2-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bundling the wq/rq specific data into dedicated enic_wq/rq structures
cleans up the enic structure and simplifies future changes related to
wq/rq.
Co-developed-by: John Daley <johndale@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241113-remove_vic_resource_limits-v4-1-a34cf8570c67@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adds support for clause-45 PHY loopback for the Microchip LAN887x driver.
Signed-off-by: Tarun Alle <Tarun.Alle@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241114101951.382996-1-Tarun.Alle@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
"id" is not a documented property, so drop it.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20241113225642.1783485-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current implementation of gettimex64() makes at least 3 PCIe reads to
get current PHC time. It takes at least 2.2us to get this value back to
userspace. At the same time there is cached value of upper bits of PHC
available for packet timestamps already. This patch reuses cached value
to speed up reading of PHC time.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241114114820.1411660-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- btusb: add Foxconn 0xe0fc for Qualcomm WCN785x
- btmtk: Fix ISO interface handling
- Add quirk for ATS2851
- btusb: Add RTL8852BE device 0489:e123
- ISO: Do not emit LE PA/BIG Create Sync if previous is pending
- btusb: Add USB HW IDs for MT7920/MT7925
- btintel_pcie: Add handshake between driver and firmware
- btintel_pcie: Add recovery mechanism
- hci_conn: Use disable_delayed_work_sync
- SCO: Use kref to track lifetime of sco_conn
- ISO: Use kref to track lifetime of iso_conn
- btnxpuart: Add GPIO support to power save feature
- btusb: Add 0x0489:0xe0f3 and 0x13d3:0x3623 for Qualcomm WCN785x
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmc2bBcZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKfnGEACV1YylQ9kJxzyDwyxrZtYG
3T2I+8qQwIpokyGcYHpMC0kJFAzCbywGxjS3njpOrM0nTuvGpvLAD40Sn+pMHbKb
3XzNNSixuxgJvr3CyDM0KOua/2nFQZyPe0DXe4D9bzOproIHDyoQkWbVqntLCyXO
DDwwSF/CcgfZsIf5dfGir6erUBOXzYUUlCL7Q0ap2DNdeYAv6XLWwVSMu7okIFpH
H/vBcWMWXwNNyEtDIHPmRYut14qEFASTRWsSe7IiIoL2V5VG5BVUALk8rAmoVv10
IjE5kAmdLBHplhtnDIEg55CHIivlnbOp9d7WMhKkwY+vrPx571uFuwXeCBrg+5cd
SyekP701TtS5oscT2SazimZTdtS0YLmJgVlhxX8DAIeElO9pEPdJt9CNCafFKmEY
LMleGXDrH4bnTA1k6nMX2Ky4/oqlSonPbYXZ4GzL5ZMRg9biIkRI3YyRvKLM1plh
MoO14zXhS184Cf0vSaGSeZ2nqsv7Z0lPkGJxCpGyzkOoA1VnzORl9teik5C6eeCw
7wOoM+x+aJU8hxyD65DkyPNzLkvrEohRJWx7XMOKZEC1uFvBrJfEc/lb7TH5E+Zd
PbPG1+x5Y3CAwvOcQzbpVeF0ujL0+KvJF+Y1q7eJ3mB+KoDPBEVbcO1ypvL6kbfW
ETYrD/fuBiuxQE/XM0Xdlw==
=niGG
-----END PGP SIGNATURE-----
Merge tag 'for-net-next-2024-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- btusb: add Foxconn 0xe0fc for Qualcomm WCN785x
- btmtk: Fix ISO interface handling
- Add quirk for ATS2851
- btusb: Add RTL8852BE device 0489:e123
- ISO: Do not emit LE PA/BIG Create Sync if previous is pending
- btusb: Add USB HW IDs for MT7920/MT7925
- btintel_pcie: Add handshake between driver and firmware
- btintel_pcie: Add recovery mechanism
- hci_conn: Use disable_delayed_work_sync
- SCO: Use kref to track lifetime of sco_conn
- ISO: Use kref to track lifetime of iso_conn
- btnxpuart: Add GPIO support to power save feature
- btusb: Add 0x0489:0xe0f3 and 0x13d3:0x3623 for Qualcomm WCN785x
* tag 'for-net-next-2024-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (51 commits)
Bluetooth: MGMT: Add initial implementation of MGMT_OP_HCI_CMD_SYNC
Bluetooth: fix use-after-free in device_for_each_child()
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
Bluetooth: hci_bcm: Use the devm_clk_get_optional() helper
Bluetooth: ISO: Send BIG Create Sync via hci_sync
Bluetooth: hci_conn: Remove alloc from critical section
Bluetooth: ISO: Use kref to track lifetime of iso_conn
Bluetooth: SCO: Use kref to track lifetime of sco_conn
Bluetooth: HCI: Add IPC(11) bus type
Bluetooth: btusb: Add 3 HWIDs for MT7925
Bluetooth: btusb: Add new VID/PID 0489/e124 for MT7925
Bluetooth: ISO: Update hci_conn_hash_lookup_big for Broadcast slave
Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending
Bluetooth: ISO: Fix matching parent socket for BIS slave
Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pending
Bluetooth: btrtl: Decrease HCI_OP_RESET timeout from 10 s to 2 s
Bluetooth: btbcm: fix missing of_node_put() in btbcm_get_board_name()
Bluetooth: btusb: Add new VID/PID 0489/e111 for MT7925
Bluetooth: btmtk: adjust the position to init iso data anchor
...
====================
Link: https://patch.msgid.link/20241114214731.1994446-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmc3S9AACgkQ1w0aZmrP
KyF7Sg/9GBfCiuuxUqrbigUitY8dJFuCTt+fKxMDfTb6sqU7FgQK/ylqwuW2zikz
MgyVRXTAMbgD1KU5U+v1VEf5kq8iCU/rpdCC1xMOK9GvbaYQ9l/0cw8PR1jGgmSZ
P1NWgmpv30IbZ/bQblU9/SbP8sFWg3DLC9lFrqYlLkJjijhfSDTflI6uVVWwt+rn
9jWqgzf6mUYKAKJ56gFfUW/09jYPkQ5OLYz9CLqvIZLhdYNPGy2GEgldzXkHaVPv
O65lMjrNojVYfITcinjkVfVVTlcLtQPNG9novclXrsf+qSsov5h/583n0c+7Xh3N
r+EY1NBzZEcxLloTowJ/iq7xtDHHDG6Rv3BGTMS2JWFxhUDOV3Ks2qj/bIZUkzh5
/Kl8n4NFbE+f1F3TGOoivZ0CFK1s3jcdIu3RTMXwwa41eiOAt8dPvhckfxTW20kT
GdIYMNpUC1UVw2a1bxPEw27omB2UF2VADK5vHm97WJ8FBjA1HwPA9afF+PmyNMZ6
cCOKT225DpXkt2WAMX+bgDyqQN150B05/JrBRdiT5hT5++xJn+heZLvx56L4mPA2
8Y8NnnXLsyx5pwtE6HKgBOZNXXno2xpE/OrafF5n2zHwiMnF5qeF1+Jwerm8SxUa
ZTuUS1mAi922IJzksnjRtiVggEA4X9Arq4NRwlIMWunTxybmHnE=
=Tp49
-----END PGP SIGNATURE-----
Merge tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Extended netlink error reporting if nfnetlink attribute parser fails,
from Donald Hunter.
2) Incorrect request_module() module, from Simon Horman.
3) A series of patches to reduce memory consumption for set element
transactions.
Florian Westphal says:
"When doing a flush on a set or mass adding/removing elements from a
set, each element needs to allocate 96 bytes to hold the transactional
state.
In such cases, virtually all the information in struct nft_trans_elem
is the same.
Change nft_trans_elem to a flex-array, i.e. a single nft_trans_elem
can hold multiple set element pointers.
The number of elements that can be stored in one nft_trans_elem is limited
by the slab allocator, this series limits the compaction to at most 62
elements as it caps the reallocation to 2048 bytes of memory."
4) A series of patches to prepare the transition to dscp_t in .flowi_tos.
From Guillaume Nault.
5) Support for bitwise operations with two source registers,
from Jeremy Sowden.
* tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: bitwise: add support for doing AND, OR and XOR directly
netfilter: bitwise: rename some boolean operation functions
netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t.
netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t.
netfilter: rpfilter: Convert rpfilter_mt() to dscp_t.
netfilter: flow_offload: Convert nft_flow_route() to dscp_t.
netfilter: ipv4: Convert ip_route_me_harder() to dscp_t.
netfilter: nf_tables: allocate element update information dynamically
netfilter: nf_tables: switch trans_elem to real flex array
netfilter: nf_tables: prepare nft audit for set element compaction
netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure
netfilter: nf_tables: add nft_trans_commit_list_add_elem helper
netfilter: bpf: Pass string literal as format argument of request_module()
netfilter: nfnetlink: Report extack policy errors for batched ops
====================
Link: https://patch.msgid.link/20241115133207.8907-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hitherto, these operations have been converted in user space to
mask-and-xor operations on one register and two immediate values, and it
is the latter which have been evaluated by the kernel. We add support
for evaluating these operations directly in kernel space on one register
and either an immediate value or a second register.
Pablo made a few changes to the original patch:
- EINVAL if NFTA_BITWISE_SREG2 is used with fast version.
- Allow _AND,_OR,_XOR with _DATA != sizeof(u32)
- Dump _SREG2 or _DATA with _AND,_OR,_XOR
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In the next patch we add support for doing AND, OR and XOR operations
directly in the kernel, so rename some functions and an enum constant
related to mask-and-xor boolean operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use ip4h_dscp() instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>