linux-stable/arch
Jason Gunthorpe ead79118da arm64/io: Provide a WC friendly __iowriteXX_copy()
The kernel provides driver support for using write combining IO memory
through the __iowriteXX_copy() API which is commonly used as an optional
optimization to generate 16/32/64 byte MemWr TLPs in a PCIe environment.

iomap_copy.c provides a generic implementation as a simple 4/8 byte at a
time copy loop that has worked well with past ARM64 CPUs, giving a high
frequency of large TLPs being successfully formed.

However modern ARM64 CPUs are quite sensitive to how the write combining
CPU HW is operated and a compiler generated loop with intermixed
load/store is not sufficient to frequently generate a large TLP. The CPUs
would like to see the entire TLP generated by consecutive store
instructions from registers. Compilers like gcc tend to intermix loads and
stores and have poor code generation, in part, due to the ARM64 situation
that writeq() does not codegen anything other than "[xN]". However even
with that resolved compilers like clang still do not have good code
generation.

This means on modern ARM64 CPUs the rate at which __iowriteXX_copy()
successfully generates large TLPs is very small (less than 1 in 10,000)
tries), to the point that the use of WC is pointless.

Implement __iowrite32/64_copy() specifically for ARM64 and use inline
assembly to build consecutive blocks of STR instructions. Provide direct
support for 64/32/16 large TLP generation in this manner. Optimize for
common constant lengths so that the compiler can directly inline the store
blocks.

This brings the frequency of large TLP generation up to a high level that
is comparable with older CPU generations.

As the __iowriteXX_copy() family of APIs is intended for use with WC
incorporate the DGH hint directly into the function.

Link: https://lore.kernel.org/r/4-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22 17:11:20 -03:00
..
alpha Kbuild updates for v6.9 2024-03-21 14:41:00 -07:00
arc - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
arm ARM: prctl: reject PR_SET_MDWE on pre-ARMv6 2024-03-26 11:07:22 -07:00
arm64 arm64/io: Provide a WC friendly __iowriteXX_copy() 2024-04-22 17:11:20 -03:00
csky - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
hexagon hexagon: vmlinux.lds.S: handle attributes section 2024-03-26 11:07:23 -07:00
loongarch LoongArch changes for v6.9 2024-03-22 10:22:45 -07:00
m68k TTY/Serial driver update for 6.9-rc1 2024-03-21 12:44:10 -07:00
microblaze arch: define CONFIG_PAGE_SIZE_*KB on all architectures 2024-03-06 19:29:09 +01:00
mips MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice 2024-03-27 01:58:34 +09:00
nios2 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
openrisc OpenRISC updates for 6.9 2024-03-14 15:53:10 -07:00
parisc prctl: generalize PR_SET_MDWE support check to be per-arch 2024-03-26 11:07:22 -07:00
powerpc powerpc updates for 6.9 #2 2024-03-23 09:21:26 -07:00
riscv Including fixes from bpf, WiFi and netfilter. 2024-03-28 13:09:37 -07:00
s390 s390: Stop using weak symbols for __iowrite64_copy() 2024-04-22 17:11:20 -03:00
sh sh updates for v6.9 2024-03-21 10:13:47 -07:00
sparc This includes the following changes related to sparc for v6.9: 2024-03-15 12:47:21 -07:00
um Devicetree updates for v6.9: 2024-03-15 12:37:59 -07:00
x86 x86: Stop using weak symbols for __iowrite32_copy() 2024-04-22 17:11:19 -03:00
xtensa - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
.gitignore
Kconfig hardening fixes for v6.9-rc1 2024-03-23 08:43:21 -07:00