linux-next/io_uring
Jens Axboe aa00f67adc io_uring: add support for fixed wait regions
Generally applications have 1 or a few waits of waiting, yet they pass
in a struct io_uring_getevents_arg every time. This needs to get copied
and, in turn, the timeout value needs to get copied.

Rather than do this for every invocation, allow the application to
register a fixed set of wait regions that can simply be indexed when
asking the kernel to wait on events.

At ring setup time, the application can register a number of these wait
regions and initialize region/index 0 upfront:

	struct io_uring_reg_wait *reg;

	reg = io_uring_setup_reg_wait(ring, nr_regions, &ret);

	/* set timeout and mark as set, sigmask/sigmask_sz as needed */
	reg->ts.tv_sec = 0;
	reg->ts.tv_nsec = 100000;
	reg->flags = IORING_REG_WAIT_TS;

where nr_regions >= 1 && nr_regions <= PAGE_SIZE / sizeof(*reg). The
above initializes index 0, but 63 other regions can be initialized,
if needed. Now, instead of doing:

	struct __kernel_timespec timeout = { .tv_nsec = 100000, };

	io_uring_submit_and_wait_timeout(ring, &cqe, nr, &t, NULL);

to wait for events for each submit_and_wait, or just wait, operation, it
can just reference the above region at offset 0 and do:

	io_uring_submit_and_wait_reg(ring, &cqe, nr, 0);

to achieve the same goal of waiting 100usec without needing to copy
both struct io_uring_getevents_arg (24b) and struct __kernel_timeout
(16b) for each invocation. Struct io_uring_reg_wait looks as follows:

struct io_uring_reg_wait {
	struct __kernel_timespec	ts;
	__u32				min_wait_usec;
	__u32				flags;
	__u64				sigmask;
	__u32				sigmask_sz;
	__u32				pad[3];
	__u64				pad2[2];
};

embedding the timeout itself in the region, rather than passing it as
a pointer as well. Note that the signal mask is still passed as a
pointer, both for compatability reasons, but also because there doesn't
seem to be a lot of high frequency waits scenarios that involve setting
and resetting the signal mask for each wait.

The application is free to modify any region before a wait call, or it
can use keep multiple regions with different settings to avoid needing to
modify the same one for wait calls. Up to a page size of regions is mapped
by default, allowing PAGE_SIZE / 64 available regions for use.

The registered region must fit within a page. On a 4kb page size system,
that allows for 64 wait regions if a full page is used, as the size of
struct io_uring_reg_wait is 64b. The region registered must be aligned
to io_uring_reg_wait in size. It's valid to register less than 64
entries.

In network performance testing with zero-copy, this reduced the time
spent waiting on the TX side from 3.12% to 0.3% and the RX side from 4.4%
to 0.3%.

Wait regions are fixed for the lifetime of the ring - once registered,
they are persistent until the ring is torn down. The regions support
minimum wait timeout as well as the regular waits.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29 13:43:28 -06:00
..
advise.c io_uring/advise: support 64-bit lengths 2024-06-16 14:54:55 -06:00
advise.h io_uring: split out fadvise/madvise operations 2022-07-24 18:39:11 -06:00
alloc_cache.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
cancel.c io_uring/cancel: get rid of init_hash_table() helper 2024-10-29 13:43:27 -06:00
cancel.h io_uring/cancel: get rid of init_hash_table() helper 2024-10-29 13:43:27 -06:00
epoll.c io_uring: undeprecate epoll_ctl support 2023-05-26 20:22:41 -06:00
epoll.h io_uring: move epoll handler to its own file 2022-07-24 18:39:11 -06:00
eventfd.c io_uring/eventfd: move ctx->evfd_last_cq_tail into io_ev_fd 2024-10-29 13:43:26 -06:00
eventfd.h io_uring/eventfd: move eventfd handling to separate file 2024-06-16 14:54:55 -06:00
fdinfo.c io_uring/poll: get rid of unlocked cancel hash 2024-10-29 13:43:27 -06:00
fdinfo.h io_uring: move fdinfo helpers to its own file 2022-07-24 18:39:12 -06:00
filetable.c io_uring/filetable: don't unnecessarily clear/reset bitmap 2024-05-08 08:27:45 -06:00
filetable.h io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-08 13:27:03 -07:00
fs.c io_uring/fs: consider link->flags when getting path for LINKAT 2023-11-20 09:01:42 -07:00
fs.h io_uring: split out filesystem related operations 2022-07-24 18:39:11 -06:00
futex.c io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
futex.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
io_uring.c io_uring: add support for fixed wait regions 2024-10-29 13:43:28 -06:00
io_uring.h io_uring: abstract out a bit of the ring filling logic 2024-10-29 13:43:27 -06:00
io-wq.c io_uring/io-wq: inherit cpuset of cgroup in io worker 2024-09-11 07:27:56 -06:00
io-wq.h io_uring/io-wq: make io_wq_work flags atomic 2024-06-16 14:54:55 -06:00
kbuf.c for-6.12/io_uring-20240913 2024-09-16 13:29:00 +02:00
kbuf.h io_uring/kbuf: add support for incremental buffer consumption 2024-08-29 08:44:58 -06:00
Makefile io_uring: add GCOV_PROFILE_URING Kconfig option 2024-08-30 10:52:02 -06:00
memmap.c io_uring/register: add IORING_REGISTER_RESIZE_RINGS 2024-10-29 13:43:27 -06:00
memmap.h io_uring: move mapping/allocation helpers to a separate file 2024-04-15 08:10:26 -06:00
msg_ring.c io_uring/msg_ring: add support for sending a sync message 2024-10-29 13:43:26 -06:00
msg_ring.h io_uring/msg_ring: add support for sending a sync message 2024-10-29 13:43:26 -06:00
napi.c io_uring: user registered clockid for wait timeouts 2024-08-25 08:27:01 -06:00
napi.h io_uring/napi: postpone napi timeout adjustment 2024-08-25 08:27:01 -06:00
net.c io_uring/net: clean up io_msg_copy_hdr 2024-10-29 13:43:27 -06:00
net.h io_uring: Introduce IORING_OP_LISTEN 2024-06-19 07:57:21 -06:00
nop.c io_uring: support to inject result for NOP 2024-05-10 06:09:45 -06:00
nop.h io_uring: move nop into its own file 2022-07-24 18:39:11 -06:00
notif.c io_uring/notif: disable LAZY_WAKE for linked notifs 2024-04-30 13:06:27 -06:00
notif.h io_uring/notif: implement notification stacking 2024-04-22 19:31:18 -06:00
opdef.c io_uring: Fix probe of disabled operations 2024-06-19 08:58:00 -06:00
opdef.h io_uring: Fix probe of disabled operations 2024-06-19 08:58:00 -06:00
openclose.c io_uring: enable audit and restrict cred override for IORING_OP_FIXED_FD_INSTALL 2024-01-23 15:25:14 -07:00
openclose.h io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL 2023-12-12 07:42:57 -07:00
poll.c io_uring/poll: get rid of per-hashtable bucket locks 2024-10-29 13:43:27 -06:00
poll.h io_uring/poll: shrink alloc cache size to 32 2024-04-15 08:10:25 -06:00
refs.h io_uring: kill dead code in io_req_complete_post 2024-04-15 08:10:26 -06:00
register.c io_uring: add support for fixed wait regions 2024-10-29 13:43:28 -06:00
register.h io_uring: add support for fixed wait regions 2024-10-29 13:43:28 -06:00
rsrc.c io_uring/rsrc: don't assign bvec twice in io_import_fixed() 2024-10-29 13:43:27 -06:00
rsrc.h io_uring: remove 'issue_flags' argument for io_req_set_rsrc_node() 2024-10-29 13:43:27 -06:00
rw.c io_uring: remove 'issue_flags' argument for io_req_set_rsrc_node() 2024-10-29 13:43:27 -06:00
rw.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
slist.h io_uring: silence variable ‘prev’ set but not used warning 2023-03-09 10:10:58 -07:00
splice.c splice: return type ssize_t from all helpers 2023-12-12 16:19:59 +01:00
splice.h io_uring: split out splice related operations 2022-07-24 18:39:11 -06:00
sqpoll.c io_uring/sqpoll: wait on sqd->wait for thread parking 2024-10-29 13:43:27 -06:00
sqpoll.h io_uring/sqpoll: statistics of the true utilization of sq threads 2024-03-01 06:28:19 -07:00
statx.c vfs: retire user_path_at_empty and drop empty arg from getname_flags 2024-06-05 17:03:57 +02:00
statx.h io_uring: move statx handling to its own file 2022-07-24 18:39:11 -06:00
sync.c io_uring: for requests that require async, force it 2023-01-29 15:18:26 -07:00
sync.h io_uring: split out fs related sync/fallocate functions 2022-07-24 18:39:11 -06:00
tctx.c io_uring: Add io_uring_setup flag to pre-register ring fd and never install it 2023-05-16 08:06:00 -06:00
tctx.h io_uring: simplify __io_uring_add_tctx_node 2022-10-07 12:25:30 -06:00
timeout.c io_uring: fix io_match_task must_hold 2024-07-24 08:01:49 -06:00
timeout.h io_uring: remove unused return from io_disarm_next 2022-09-21 13:15:01 -06:00
truncate.c io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
truncate.h io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
uring_cmd.c io_uring: remove 'issue_flags' argument for io_req_set_rsrc_node() 2024-10-29 13:43:27 -06:00
uring_cmd.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
waitid.c io_uring: remove struct io_tw_state::locked 2024-04-15 08:10:24 -06:00
waitid.h io_uring: add IORING_OP_WAITID support 2023-09-21 12:04:45 -06:00
xattr.c vfs: retire user_path_at_empty and drop empty arg from getname_flags 2024-06-05 17:03:57 +02:00
xattr.h io_uring: move xattr related opcodes to its own file 2022-07-24 18:39:11 -06:00