linux-stable/include
Wangyang Guo d288a162dd net: dst: Prevent false sharing vs. dst_entry:: __refcnt
dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_t, so the
reference count operations have to take the cache-line exclusive.

Aside of the unavoidable reference count contention there is another
significant problem which is caused by that: False sharing.

perf top identified two affected read accesses. dst_entry::lwtstate and
rtable::rt_genid.

dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into
a seperate cacheline vs. the read mostly members located at the beginning
of the struct.

That prevents false sharing vs. the struct members in the first 64
bytes of the structure, but there is also

  dst_entry::lwtstate

which is located after the reference count and in the same cache line. This
member is read after a reference count has been acquired.

struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a
size of 112 bytes, which means that the struct members of rtable which
follow the dst member share the same cache line as dst_entry::__refcnt.
Especially

  rtable::rt_genid

is also read by the contexts which have a reference count acquired
already.

When dst_entry:__refcnt is incremented or decremented via an atomic
operation these read accesses stall. This was found when analysing the
memtier benchmark in 1:100 mode, which amplifies the problem extremly.

Move the rt[6i]_uncached[_list] members out of struct rtable and struct
rt6_info into struct dst_entry to provide padding and move the lwtstate
member after that so it ends up in the same cache line.

The resulting improvement depends on the micro-architecture and the number
of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached
benchmark.

[ tglx: Rearrange struct ]

Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230323102800.042297517@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28 18:52:22 -07:00
..
acpi ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper 2023-03-07 14:15:10 +01:00
asm-generic Driver core changes for 6.3-rc1 2023-02-24 12:58:55 -08:00
clocksource
crypto crypto: api - Use data directly in completion function 2023-02-13 18:35:14 +08:00
drm drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc 2023-03-15 10:06:06 +01:00
dt-bindings ARM: SoC drivers for 6.3 2023-02-27 10:04:49 -08:00
keys
kunit kunit: Expose 'static stub' API to redirect functions 2023-02-08 14:28:17 -07:00
kvm KVM: arm64: timers: Convert per-vcpu virtual offset to a global value 2023-03-11 02:00:40 -08:00
linux Merge branch 'locking/rcuref' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2023-03-28 18:49:35 -07:00
math-emu
media media updates for v6.3-rc1 2023-02-26 11:47:26 -08:00
memory
misc
net net: dst: Prevent false sharing vs. dst_entry:: __refcnt 2023-03-28 18:52:22 -07:00
pcmcia
ras
rdma RDMA/umem: Remove unused 'work' member from struct ib_umem 2023-02-12 20:25:25 +02:00
rv
scsi scsi: core: Add BLIST_NO_VPD_SIZE for some VDASD 2023-03-09 20:57:58 -05:00
soc net: mscc: ocelot: expose serdes configuration function 2023-03-20 09:08:48 +00:00
sound sound fixes for 6.3-rc1 2023-03-04 10:53:59 -08:00
target
trace inet: preserve const qualifier in inet_sk() 2023-03-17 08:56:37 +00:00
uapi ethtool: Add support for configuring tx_push_buf_len 2023-03-27 19:49:58 -07:00
ufs SCSI misc on 20230303 2023-03-03 14:41:50 -08:00
vdso vdso/bits.h: Add BIT_ULL() for the sake of consistency 2023-01-31 14:42:10 +01:00
video fbdev: remove w100fb driver 2023-02-01 17:23:38 +01:00
xen xen: branch for v6.3-rc3 2023-03-17 10:45:49 -07:00