mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2024-12-28 16:53:49 +00:00
d5dc958361
Add the build support for using Clang's Propeller optimizer. Like AutoFDO, Propeller uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. The support requires a Clang compiler LLVM 19 or later, and the create_llvm_prof tool (https://github.com/google/autofdo/releases/tag/v0.30.1). This commit is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. Here is an example workflow for building an AutoFDO+Propeller optimized kernel: 1) Build the kernel on the host machine, with AutoFDO and Propeller build config CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y then $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> “<autofdo_profile>” is the profile collected when doing a non-Propeller AutoFDO build. This step builds a kernel that has the same optimization level as AutoFDO, plus a metadata section that records basic block information. This kernel image runs as fast as an AutoFDO optimized kernel. 2) Install the kernel on test/production machines. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 # To see if Zen3 support LBR: $ cat proc/cpuinfo | grep " brs" # To see if Zen4 support LBR: $ cat proc/cpuinfo | grep amd_lbr_v2 # If the result is yes, then collect the profile using: $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the host machine. 5) Generate Propeller profile: $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=propeller --propeller_output_module_name \ --out=<propeller_profile_prefix>_cc_profile.txt \ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt “create_llvm_prof” is the profile conversion tool, and a prebuilt binary for linux can be found on https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build from source). "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". This command generates a pair of Propeller profiles: "<propeller_profile_prefix>_cc_profile.txt" and "<propeller_profile_prefix>_ld_profile.txt". 6) Rebuild the kernel using the AutoFDO and Propeller profile files. CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y and $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \ CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Stephane Eranian <eranian@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
163 lines
5.8 KiB
ReStructuredText
163 lines
5.8 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
=====================================
|
|
Using Propeller with the Linux kernel
|
|
=====================================
|
|
|
|
This enables Propeller build support for the kernel when using Clang
|
|
compiler. Propeller is a profile-guided optimization (PGO) method used
|
|
to optimize binary executables. Like AutoFDO, it utilizes hardware
|
|
sampling to gather information about the frequency of execution of
|
|
different code paths within a binary. Unlike AutoFDO, this information
|
|
is then used right before linking phase to optimize (among others)
|
|
block layout within and across functions.
|
|
|
|
A few important notes about adopting Propeller optimization:
|
|
|
|
#. Although it can be used as a standalone optimization step, it is
|
|
strongly recommended to apply Propeller on top of AutoFDO,
|
|
AutoFDO+ThinLTO or Instrument FDO. The rest of this document
|
|
assumes this paradigm.
|
|
|
|
#. Propeller uses another round of profiling on top of
|
|
AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
|
|
"build-afdo - train-afdo - build-propeller - train-propeller -
|
|
build-optimized".
|
|
|
|
#. Propeller requires LLVM 19 release or later for Clang/Clang++
|
|
and the linker(ld.lld).
|
|
|
|
#. In addition to LLVM toolchain, Propeller requires a profiling
|
|
conversion tool: https://github.com/google/autofdo with a release
|
|
after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
|
|
|
|
The Propeller optimization process involves the following steps:
|
|
|
|
#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
|
|
you would normally do, but with a set of compile-time / link-time
|
|
flags, so that a special metadata section is created within the
|
|
kernel binary. The special section is only intend to be used by the
|
|
profiling tool, it is not part of the runtime image, nor does it
|
|
change kernel run time text sections.
|
|
|
|
#. Profiling: The above kernel is then run with a representative
|
|
workload to gather execution frequency data. This data is collected
|
|
using hardware sampling, via perf. Propeller is most effective on
|
|
platforms supporting advanced PMU features like LBR on Intel
|
|
machines. This step is the same as profiling the kernel for AutoFDO
|
|
(the exact perf parameters can be different).
|
|
|
|
#. Propeller profile generation: Perf output file is converted to a
|
|
pair of Propeller profiles via an offline tool.
|
|
|
|
#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
|
|
binary as you would normally do, but with a compile-time /
|
|
link-time flag to pick up the Propeller compile time and link time
|
|
profiles. This build step uses 3 profiles - the AutoFDO profile,
|
|
the Propeller compile-time profile and the Propeller link-time
|
|
profile.
|
|
|
|
#. Deployment: The optimized kernel binary is deployed and used
|
|
in production environments, providing improved performance
|
|
and reduced latency.
|
|
|
|
Preparation
|
|
===========
|
|
|
|
Configure the kernel with::
|
|
|
|
CONFIG_AUTOFDO_CLANG=y
|
|
CONFIG_PROPELLER_CLANG=y
|
|
|
|
Customization
|
|
=============
|
|
|
|
The default CONFIG_PROPELLER_CLANG setting covers kernel space objects
|
|
for Propeller builds. One can, however, enable or disable Propeller build
|
|
for individual files and directories by adding a line similar to the
|
|
following to the respective kernel Makefile:
|
|
|
|
- For enabling a single file (e.g. foo.o)::
|
|
|
|
PROPELLER_PROFILE_foo.o := y
|
|
|
|
- For enabling all files in one directory::
|
|
|
|
PROPELLER_PROFILE := y
|
|
|
|
- For disabling one file::
|
|
|
|
PROPELLER_PROFILE_foo.o := n
|
|
|
|
- For disabling all files in one directory::
|
|
|
|
PROPELLER__PROFILE := n
|
|
|
|
|
|
Workflow
|
|
========
|
|
|
|
Here is an example workflow for building an AutoFDO+Propeller kernel:
|
|
|
|
1) Assuming an AutoFDO profile is already collected following
|
|
instructions in the AutoFDO document, build the kernel on the host
|
|
machine, with AutoFDO and Propeller build configs ::
|
|
|
|
CONFIG_AUTOFDO_CLANG=y
|
|
CONFIG_PROPELLER_CLANG=y
|
|
|
|
and ::
|
|
|
|
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
|
|
|
|
2) Install the kernel on the test machine.
|
|
|
|
3) Run the load tests. The '-c' option in perf specifies the sample
|
|
event period. We suggest using a suitable prime number, like 500009,
|
|
for this purpose.
|
|
|
|
- For Intel platforms::
|
|
|
|
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
|
|
|
|
- For AMD platforms::
|
|
|
|
$ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
|
|
|
|
Note you can repeat the above steps to collect multiple <perf_file>s.
|
|
|
|
4) (Optional) Download the raw perf file(s) to the host machine.
|
|
|
|
5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
|
|
generate Propeller profile. ::
|
|
|
|
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
|
|
--format=propeller --propeller_output_module_name
|
|
--out=<propeller_profile_prefix>_cc_profile.txt
|
|
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
|
|
|
|
"<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
|
|
|
|
This command generates a pair of Propeller profiles:
|
|
"<propeller_profile_prefix>_cc_profile.txt" and
|
|
"<propeller_profile_prefix>_ld_profile.txt".
|
|
|
|
If there are more than 1 perf_file collected in the previous step,
|
|
you can create a temp list file "<perf_file_list>" with each line
|
|
containing one perf file name and run::
|
|
|
|
$ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
|
|
--format=propeller --propeller_output_module_name
|
|
--out=<propeller_profile_prefix>_cc_profile.txt
|
|
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
|
|
|
|
6) Rebuild the kernel using the AutoFDO and Propeller
|
|
profiles. ::
|
|
|
|
CONFIG_AUTOFDO_CLANG=y
|
|
CONFIG_PROPELLER_CLANG=y
|
|
|
|
and ::
|
|
|
|
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
|