mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-16 18:08:20 +00:00
259 lines
8.9 KiB
Markdown
259 lines
8.9 KiB
Markdown
|
SCHED_EXT EXAMPLE SCHEDULERS
|
||
|
============================
|
||
|
|
||
|
# Introduction
|
||
|
|
||
|
This directory contains a number of example sched_ext schedulers. These
|
||
|
schedulers are meant to provide examples of different types of schedulers
|
||
|
that can be built using sched_ext, and illustrate how various features of
|
||
|
sched_ext can be used.
|
||
|
|
||
|
Some of the examples are performant, production-ready schedulers. That is, for
|
||
|
the correct workload and with the correct tuning, they may be deployed in a
|
||
|
production environment with acceptable or possibly even improved performance.
|
||
|
Others are just examples that in practice, would not provide acceptable
|
||
|
performance (though they could be improved to get there).
|
||
|
|
||
|
This README will describe these example schedulers, including describing the
|
||
|
types of workloads or scenarios they're designed to accommodate, and whether or
|
||
|
not they're production ready. For more details on any of these schedulers,
|
||
|
please see the header comment in their .bpf.c file.
|
||
|
|
||
|
|
||
|
# Compiling the examples
|
||
|
|
||
|
There are a few toolchain dependencies for compiling the example schedulers.
|
||
|
|
||
|
## Toolchain dependencies
|
||
|
|
||
|
1. clang >= 16.0.0
|
||
|
|
||
|
The schedulers are BPF programs, and therefore must be compiled with clang. gcc
|
||
|
is actively working on adding a BPF backend compiler as well, but are still
|
||
|
missing some features such as BTF type tags which are necessary for using
|
||
|
kptrs.
|
||
|
|
||
|
2. pahole >= 1.25
|
||
|
|
||
|
You may need pahole in order to generate BTF from DWARF.
|
||
|
|
||
|
3. rust >= 1.70.0
|
||
|
|
||
|
Rust schedulers uses features present in the rust toolchain >= 1.70.0. You
|
||
|
should be able to use the stable build from rustup, but if that doesn't
|
||
|
work, try using the rustup nightly build.
|
||
|
|
||
|
There are other requirements as well, such as make, but these are the main /
|
||
|
non-trivial ones.
|
||
|
|
||
|
## Compiling the kernel
|
||
|
|
||
|
In order to run a sched_ext scheduler, you'll have to run a kernel compiled
|
||
|
with the patches in this repository, and with a minimum set of necessary
|
||
|
Kconfig options:
|
||
|
|
||
|
```
|
||
|
CONFIG_BPF=y
|
||
|
CONFIG_SCHED_CLASS_EXT=y
|
||
|
CONFIG_BPF_SYSCALL=y
|
||
|
CONFIG_BPF_JIT=y
|
||
|
CONFIG_DEBUG_INFO_BTF=y
|
||
|
```
|
||
|
|
||
|
It's also recommended that you also include the following Kconfig options:
|
||
|
|
||
|
```
|
||
|
CONFIG_BPF_JIT_ALWAYS_ON=y
|
||
|
CONFIG_BPF_JIT_DEFAULT_ON=y
|
||
|
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
|
||
|
CONFIG_PAHOLE_HAS_BTF_TAG=y
|
||
|
```
|
||
|
|
||
|
There is a `Kconfig` file in this directory whose contents you can append to
|
||
|
your local `.config` file, as long as there are no conflicts with any existing
|
||
|
options in the file.
|
||
|
|
||
|
## Getting a vmlinux.h file
|
||
|
|
||
|
You may notice that most of the example schedulers include a "vmlinux.h" file.
|
||
|
This is a large, auto-generated header file that contains all of the types
|
||
|
defined in some vmlinux binary that was compiled with
|
||
|
[BTF](https://docs.kernel.org/bpf/btf.html) (i.e. with the BTF-related Kconfig
|
||
|
options specified above).
|
||
|
|
||
|
The header file is created using `bpftool`, by passing it a vmlinux binary
|
||
|
compiled with BTF as follows:
|
||
|
|
||
|
```bash
|
||
|
$ bpftool btf dump file /path/to/vmlinux format c > vmlinux.h
|
||
|
```
|
||
|
|
||
|
`bpftool` analyzes all of the BTF encodings in the binary, and produces a
|
||
|
header file that can be included by BPF programs to access those types. For
|
||
|
example, using vmlinux.h allows a scheduler to access fields defined directly
|
||
|
in vmlinux as follows:
|
||
|
|
||
|
```c
|
||
|
#include "vmlinux.h"
|
||
|
// vmlinux.h is also implicitly included by scx_common.bpf.h.
|
||
|
#include "scx_common.bpf.h"
|
||
|
|
||
|
/*
|
||
|
* vmlinux.h provides definitions for struct task_struct and
|
||
|
* struct scx_enable_args.
|
||
|
*/
|
||
|
void BPF_STRUCT_OPS(example_enable, struct task_struct *p,
|
||
|
struct scx_enable_args *args)
|
||
|
{
|
||
|
bpf_printk("Task %s enabled in example scheduler", p->comm);
|
||
|
}
|
||
|
|
||
|
// vmlinux.h provides the definition for struct sched_ext_ops.
|
||
|
SEC(".struct_ops.link")
|
||
|
struct sched_ext_ops example_ops {
|
||
|
.enable = (void *)example_enable,
|
||
|
.name = "example",
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The scheduler build system will generate this vmlinux.h file as part of the
|
||
|
scheduler build pipeline. It looks for a vmlinux file in the following
|
||
|
dependency order:
|
||
|
|
||
|
1. If the O= environment variable is defined, at `$O/vmlinux`
|
||
|
2. If the KBUILD_OUTPUT= environment variable is defined, at
|
||
|
`$KBUILD_OUTPUT/vmlinux`
|
||
|
3. At `../../vmlinux` (i.e. at the root of the kernel tree where you're
|
||
|
compiling the schedulers)
|
||
|
3. `/sys/kernel/btf/vmlinux`
|
||
|
4. `/boot/vmlinux-$(uname -r)`
|
||
|
|
||
|
In other words, if you have compiled a kernel in your local repo, its vmlinux
|
||
|
file will be used to generate vmlinux.h. Otherwise, it will be the vmlinux of
|
||
|
the kernel you're currently running on. This means that if you're running on a
|
||
|
kernel with sched_ext support, you may not need to compile a local kernel at
|
||
|
all.
|
||
|
|
||
|
### Aside on CO-RE
|
||
|
|
||
|
One of the cooler features of BPF is that it supports
|
||
|
[CO-RE](https://nakryiko.com/posts/bpf-core-reference-guide/) (Compile Once Run
|
||
|
Everywhere). This feature allows you to reference fields inside of structs with
|
||
|
types defined internal to the kernel, and not have to recompile if you load the
|
||
|
BPF program on a different kernel with the field at a different offset. In our
|
||
|
example above, we print out a task name with `p->comm`. CO-RE would perform
|
||
|
relocations for that access when the program is loaded to ensure that it's
|
||
|
referencing the correct offset for the currently running kernel.
|
||
|
|
||
|
## Compiling the schedulers
|
||
|
|
||
|
Once you have your toolchain setup, and a vmlinux that can be used to generate
|
||
|
a full vmlinux.h file, you can compile the schedulers using `make`:
|
||
|
|
||
|
```bash
|
||
|
$ make -j($nproc)
|
||
|
```
|
||
|
|
||
|
# Example schedulers
|
||
|
|
||
|
This directory contains the following example schedulers. These schedulers are
|
||
|
for testing and demonstrating different aspects of sched_ext. While some may be
|
||
|
useful in limited scenarios, they are not intended to be practical.
|
||
|
|
||
|
For more scheduler implementations, tools and documentation, visit
|
||
|
https://github.com/sched-ext/scx.
|
||
|
|
||
|
## scx_simple
|
||
|
|
||
|
A simple scheduler that provides an example of a minimal sched_ext scheduler.
|
||
|
scx_simple can be run in either global weighted vtime mode, or FIFO mode.
|
||
|
|
||
|
Though very simple, in limited scenarios, this scheduler can perform reasonably
|
||
|
well on single-socket systems with a unified L3 cache.
|
||
|
|
||
|
## scx_qmap
|
||
|
|
||
|
Another simple, yet slightly more complex scheduler that provides an example of
|
||
|
a basic weighted FIFO queuing policy. It also provides examples of some common
|
||
|
useful BPF features, such as sleepable per-task storage allocation in the
|
||
|
`ops.prep_enable()` callback, and using the `BPF_MAP_TYPE_QUEUE` map type to
|
||
|
enqueue tasks. It also illustrates how core-sched support could be implemented.
|
||
|
|
||
|
## scx_central
|
||
|
|
||
|
A "central" scheduler where scheduling decisions are made from a single CPU.
|
||
|
This scheduler illustrates how scheduling decisions can be dispatched from a
|
||
|
single CPU, allowing other cores to run with infinite slices, without timer
|
||
|
ticks, and without having to incur the overhead of making scheduling decisions.
|
||
|
|
||
|
The approach demonstrated by this scheduler may be useful for any workload that
|
||
|
benefits from minimizing scheduling overhead and timer ticks. An example of
|
||
|
where this could be particularly useful is running VMs, where running with
|
||
|
infinite slices and no timer ticks allows the VM to avoid unnecessary expensive
|
||
|
vmexits.
|
||
|
|
||
|
|
||
|
# Troubleshooting
|
||
|
|
||
|
There are a number of common issues that you may run into when building the
|
||
|
schedulers. We'll go over some of the common ones here.
|
||
|
|
||
|
## Build Failures
|
||
|
|
||
|
### Old version of clang
|
||
|
|
||
|
```
|
||
|
error: static assertion failed due to requirement 'SCX_DSQ_FLAG_BUILTIN': bpftool generated vmlinux.h is missing high bits for 64bit enums, upgrade clang and pahole
|
||
|
_Static_assert(SCX_DSQ_FLAG_BUILTIN,
|
||
|
^~~~~~~~~~~~~~~~~~~~
|
||
|
1 error generated.
|
||
|
```
|
||
|
|
||
|
This means you built the kernel or the schedulers with an older version of
|
||
|
clang than what's supported (i.e. older than 16.0.0). To remediate this:
|
||
|
|
||
|
1. `which clang` to make sure you're using a sufficiently new version of clang.
|
||
|
|
||
|
2. `make fullclean` in the root path of the repository, and rebuild the kernel
|
||
|
and schedulers.
|
||
|
|
||
|
3. Rebuild the kernel, and then your example schedulers.
|
||
|
|
||
|
The schedulers are also cleaned if you invoke `make mrproper` in the root
|
||
|
directory of the tree.
|
||
|
|
||
|
### Stale kernel build / incomplete vmlinux.h file
|
||
|
|
||
|
As described above, you'll need a `vmlinux.h` file that was generated from a
|
||
|
vmlinux built with BTF, and with sched_ext support enabled. If you don't,
|
||
|
you'll see errors such as the following which indicate that a type being
|
||
|
referenced in a scheduler is unknown:
|
||
|
|
||
|
```
|
||
|
/path/to/sched_ext/tools/sched_ext/user_exit_info.h:25:23: note: forward declaration of 'struct scx_exit_info'
|
||
|
|
||
|
const struct scx_exit_info *ei)
|
||
|
|
||
|
^
|
||
|
```
|
||
|
|
||
|
In order to resolve this, please follow the steps above in
|
||
|
[Getting a vmlinux.h file](#getting-a-vmlinuxh-file) in order to ensure your
|
||
|
schedulers are using a vmlinux.h file that includes the requisite types.
|
||
|
|
||
|
## Misc
|
||
|
|
||
|
### llvm: [OFF]
|
||
|
|
||
|
You may see the following output when building the schedulers:
|
||
|
|
||
|
```
|
||
|
Auto-detecting system features:
|
||
|
... clang-bpf-co-re: [ on ]
|
||
|
... llvm: [ OFF ]
|
||
|
... libcap: [ on ]
|
||
|
... libbfd: [ on ]
|
||
|
```
|
||
|
|
||
|
Seeing `llvm: [ OFF ]` here is not an issue. You can safely ignore.
|