linux-next/net/core
Jakub Kicinski 8c48eea3ad page_pool: allow caching from safely localized NAPI
Recent patches to mlx5 mentioned a regression when moving from
driver local page pool to only using the generic page pool code.
Page pool has two recycling paths (1) direct one, which runs in
safe NAPI context (basically consumer context, so producing
can be lockless); and (2) via a ptr_ring, which takes a spin
lock because the freeing can happen from any CPU; producer
and consumer may run concurrently.

Since the page pool code was added, Eric introduced a revised version
of deferred skb freeing. TCP skbs are now usually returned to the CPU
which allocated them, and freed in softirq context. This places the
freeing (producing of pages back to the pool) enticingly close to
the allocation (consumer).

If we can prove that we're freeing in the same softirq context in which
the consumer NAPI will run - lockless use of the cache is perfectly fine,
no need for the lock.

Let drivers link the page pool to a NAPI instance. If the NAPI instance
is scheduled on the same CPU on which we're freeing - place the pages
in the direct cache.

With that and patched bnxt (XDP enabled to engage the page pool, sigh,
bnxt really needs page pool work :() I see a 2.6% perf boost with
a TCP stream test (app on a different physical core than softirq).

The CPU use of relevant functions decreases as expected:

  page_pool_refill_alloc_cache   1.17% -> 0%
  _raw_spin_lock                 2.41% -> 0.98%

Only consider lockless path to be safe when NAPI is scheduled
- in practice this should cover majority if not all of steady state
workloads. It's usually the NAPI kicking in that causes the skb flush.

The main case we'll miss out on is when application runs on the same
CPU as NAPI. In that case we don't use the deferred skb free path.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-14 18:56:12 -07:00
..
bpf_sk_storage.c bpf: Teach verifier that certain helpers accept NULL pointer. 2023-04-04 16:57:16 -07:00
datagram.c net-zerocopy: Reduce compound page head access 2023-03-22 15:34:31 +01:00
dev_addr_lists_test.c kunit: Use KUNIT_EXPECT_MEMEQ macro 2022-10-27 02:40:14 -06:00
dev_addr_lists.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
dev_ioctl.c net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub 2023-04-09 15:35:49 +01:00
dev.c page_pool: allow caching from safely localized NAPI 2023-04-14 18:56:12 -07:00
dev.h net-sysctl: factor-out rpm mask manipulation helpers 2023-02-09 17:45:55 -08:00
drop_monitor.c genetlink: introduce split op representation 2022-11-07 12:30:16 +00:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-11-29 19:50:45 -08:00
dst.c net: dst: Switch to rcuref_t reference counting 2023-03-28 18:52:28 -07:00
failover.c net: failover: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf 2022-12-12 15:18:25 -08:00
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c fib: expand fib_rule_policy 2021-12-16 07:18:35 -08:00
filter.c bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
flow_dissector.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-11-29 13:04:52 -08:00
flow_offload.c net: flow_offload: add support for ARP frame matching 2022-11-14 11:24:16 +00:00
gen_estimator.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
gen_stats.c net: Remove the obsolte u64_stats_fetch_*_irq() users (net). 2022-10-28 20:13:54 -07:00
gro_cells.c net: drop the weight argument from netif_napi_add 2022-09-28 18:57:14 -07:00
gro.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-02-02 14:49:55 -08:00
hwbm.c net: hwbm: Make the hwbm_pool lock a mutex 2019-06-09 19:40:10 -07:00
link_watch.c net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down 2022-11-16 09:45:00 +00:00
lwt_bpf.c bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook 2022-04-22 17:45:25 +02:00
lwtunnel.c xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available 2022-10-12 10:45:51 +02:00
Makefile netdev-genl: create a simple family for netdev stuff 2023-02-02 20:48:23 -08:00
neighbour.c neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
net_namespace.c net: initialize net->notrefcnt_tracker earlier 2023-02-09 22:49:25 -08:00
net-procfs.c net-sysfs: display two backlog queue len separately 2023-03-22 12:03:52 +01:00
net-sysfs.c net: make default_rps_mask a per netns attribute 2023-02-20 11:22:54 +00:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c net: bridge: Add a tracepoint for MDB overflows 2023-02-06 08:48:25 +00:00
netclassid_cgroup.c core: Variable type completion 2022-08-31 09:40:34 +01:00
netdev-genl-gen.c tools: ynl: skip the explicit op array size when not needed 2023-03-21 21:45:31 -07:00
netdev-genl-gen.h ynl: broaden the license even more 2023-03-16 21:20:32 -07:00
netdev-genl.c netdev-genl: create a simple family for netdev stuff 2023-02-02 20:48:23 -08:00
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c net: don't let netpoll invoke NAPI if in xmit context 2023-04-02 13:26:21 +01:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
of_net.c of: net: export of_get_mac_address_nvmem() 2022-11-29 10:45:53 +01:00
page_pool.c page_pool: allow caching from safely localized NAPI 2023-04-14 18:56:12 -07:00
pktgen.c treewide: use get_random_u32_inclusive() when possible 2022-11-18 02:18:02 +01:00
ptp_classifier.c ptp: Add generic PTP is_sync() function 2022-03-07 11:31:34 +00:00
request_sock.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
rtnetlink.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-13 16:04:28 -07:00
scm.c net: Ensure ->msg_control_user is used for user buffers 2023-04-14 11:09:27 +01:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-20 10:14:49 +01:00
selftests.c net: core: constify mac addrs in selftests 2021-10-24 13:59:44 +01:00
skbuff.c page_pool: allow caching from safely localized NAPI 2023-04-14 18:56:12 -07:00
skmsg.c net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
sock_destructor.h skb_expand_head() adjust skb->truesize incorrectly 2021-10-22 12:35:51 -07:00
sock_diag.c net: fix __sock_gen_cookie() 2022-11-21 20:36:30 -08:00
sock_map.c bpf, sockmap: Revert buggy deadlock fix in the sockhash and sockmap 2023-04-13 20:36:32 +02:00
sock_reuseport.c soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-10-25 11:35:16 +02:00
sock.c net: make SO_BUSY_POLL available to all users 2023-04-07 20:07:07 -07:00
stream.c net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues(). 2023-02-10 19:53:42 -08:00
sysctl_net_core.c net: make default_rps_mask a per netns attribute 2023-02-20 11:22:54 +00:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: tso: inline tso_count_descs() 2022-12-12 15:04:39 -08:00
utils.c net: core: inet[46]_pton strlen len types 2022-11-01 21:14:39 -07:00
xdp.c bpf-next-for-netdev 2023-04-13 16:43:38 -07:00