mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
synced 2025-01-01 10:42:11 +00:00
40eb0e915d
Backport upstream commit c7269ad [0] to improve zstd decoding speed.
Updating the kernel to zstd v1.5.5 earlier in this patch series
regressed zstd decoding speed. This turned out to be because gcc was not
unrolling the inner loops of the Huffman decoder which are executed a
constant number of times [1]. This really hurts performance, as we expect
this loop to be completely branch-free. This commit fixes the issue by
unrolling the loop manually [2].
The commit fixes one more minor issue, which is to mask a variable shift
by 0x3F. The shift was guaranteed to be less than 64, but gcc couldn't
prove that, and emitted suboptimal code.
Finally, the upstream commit added a build macro
`HUF_DISABLE_FAST_DECODE` which is not used in the kernel, but is
maintained to keep a clean import from upstream.
This commit was generated from upstream signed tag v1.5.5-kernel [3] by:
export ZSTD=/path/to/repo/zstd/
export LINUX=/path/to/repo/linux/
cd "$ZSTD/contrib/linux-kernel"
git checkout v1.5.5-kernel
make import LINUX="$LINUX"
I ran my benchmark & test suite before and after this commit to measure
the overall decompression speed benefit. It benchmarks zstd at several
compression levels. These benchmarks measure the total time it takes to
read data from the compressed filesystem.
Component, Level, Read time delta
Btrfs , 1, -7.0%
Btrfs , 3, -3.9%
Btrfs , 5, -4.7%
Btrfs , 7, -5.5%
Btrfs , 9, -2.4%
Squashfs , 1, -9.1%
Link:
|
||
---|---|---|
.. | ||
huf_decompress.c | ||
zstd_ddict.c | ||
zstd_ddict.h | ||
zstd_decompress_block.c | ||
zstd_decompress_block.h | ||
zstd_decompress_internal.h | ||
zstd_decompress.c |