Xiao Wang a43fe27d65
riscv: Optimize crc32 with Zbc extension
As suggested by the B-ext spec, the Zbc (carry-less multiplication)
instructions can be used to accelerate CRC calculations. Currently, the
crc32 is the most widely used crc function inside kernel, so this patch
focuses on the optimization of just the crc32 APIs.

Compared with the current table-lookup based optimization, Zbc based
optimization can also achieve large stride during CRC calculation loop,
meantime, it avoids the memory access latency of the table-lookup based
implementation and it reduces memory footprint.

If Zbc feature is not supported in a runtime environment, then the
table-lookup based implementation would serve as fallback via alternative
mechanism.

By inspecting the vmlinux built by gcc v12.2.0 with default optimization
level (-O2), we can see below instruction count change for each 8-byte
stride in the CRC32 loop:

rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13)
rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16)

The compile target CPU is little endian, extra effort is needed for byte
swapping for the crc32_be API, thus, the instruction count change is not
as significant as that in the *_le cases.

This patch is tested on QEMU VM with the kernel CRC32 selftest for both
rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1)
with Zbc extension shows 65% and 125% performance improvement respectively
on crc32_test() and crc32c_test().

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10 13:19:50 -07:00

83 lines
3.0 KiB
C

/*
* crc32.h
* See linux/lib/crc32.c for license and changes
*/
#ifndef _LINUX_CRC32_H
#define _LINUX_CRC32_H
#include <linux/types.h>
#include <linux/bitrev.h>
u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len);
u32 __pure crc32_le_base(u32 crc, unsigned char const *p, size_t len);
u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len);
u32 __pure crc32_be_base(u32 crc, unsigned char const *p, size_t len);
/**
* crc32_le_combine - Combine two crc32 check values into one. For two
* sequences of bytes, seq1 and seq2 with lengths len1
* and len2, crc32_le() check values were calculated
* for each, crc1 and crc2.
*
* @crc1: crc32 of the first block
* @crc2: crc32 of the second block
* @len2: length of the second block
*
* Return: The crc32_le() check value of seq1 and seq2 concatenated,
* requiring only crc1, crc2, and len2. Note: If seq_full denotes
* the concatenated memory area of seq1 with seq2, and crc_full
* the crc32_le() value of seq_full, then crc_full ==
* crc32_le_combine(crc1, crc2, len2) when crc_full was seeded
* with the same initializer as crc1, and crc2 seed was 0. See
* also crc32_combine_test().
*/
u32 __attribute_const__ crc32_le_shift(u32 crc, size_t len);
static inline u32 crc32_le_combine(u32 crc1, u32 crc2, size_t len2)
{
return crc32_le_shift(crc1, len2) ^ crc2;
}
u32 __pure __crc32c_le(u32 crc, unsigned char const *p, size_t len);
u32 __pure __crc32c_le_base(u32 crc, unsigned char const *p, size_t len);
/**
* __crc32c_le_combine - Combine two crc32c check values into one. For two
* sequences of bytes, seq1 and seq2 with lengths len1
* and len2, __crc32c_le() check values were calculated
* for each, crc1 and crc2.
*
* @crc1: crc32c of the first block
* @crc2: crc32c of the second block
* @len2: length of the second block
*
* Return: The __crc32c_le() check value of seq1 and seq2 concatenated,
* requiring only crc1, crc2, and len2. Note: If seq_full denotes
* the concatenated memory area of seq1 with seq2, and crc_full
* the __crc32c_le() value of seq_full, then crc_full ==
* __crc32c_le_combine(crc1, crc2, len2) when crc_full was
* seeded with the same initializer as crc1, and crc2 seed
* was 0. See also crc32c_combine_test().
*/
u32 __attribute_const__ __crc32c_le_shift(u32 crc, size_t len);
static inline u32 __crc32c_le_combine(u32 crc1, u32 crc2, size_t len2)
{
return __crc32c_le_shift(crc1, len2) ^ crc2;
}
#define crc32(seed, data, length) crc32_le(seed, (unsigned char const *)(data), length)
/*
* Helpers for hash table generation of ethernet nics:
*
* Ethernet sends the least significant bit of a byte first, thus crc32_le
* is used. The output of crc32_le is bit reversed [most significant bit
* is in bit nr 0], thus it must be reversed before use. Except for
* nics that bit swap the result internally...
*/
#define ether_crc(length, data) bitrev32(crc32_le(~0, data, length))
#define ether_crc_le(length, data) crc32_le(~0, data, length)
#endif /* _LINUX_CRC32_H */