linux-stable/io_uring
Jens Axboe 7029acd8a9 io_uring/rsrc: get rid of per-ring io_rsrc_node list
Work in progress, but get rid of the per-ring serialization of resource
nodes, like registered buffers and files. Main issue here is that one
node can otherwise hold up a bunch of other nodes from getting freed,
which is especially a problem for file resource nodes and networked
workloads where some descriptors may not see activity in a long time.

As an example, instantiate an io_uring ring fd and create a sparse
registered file table. Even 2 will do. Then create a socket and register
it as fixed file 0, F0. The number of open files in the app is now 5,
with 0/1/2 being the usual stdin/out/err, 3 being the ring fd, and 4
being the socket. Register this socket (eg "the listener") in slot 0 of
the registered file table. Now add an operation on the socket that uses
slot 0. Finally, loop N times, where each loop creates a new socket,
registers said socket as a file, then unregisters the socket, and
finally closes the socket. This is roughly similar to what a basic
accept loop would look like.

At the end of this loop, it's not unreasonable to expect that there
would still be 5 open files. Each socket created and registered in the
loop is also unregistered and closed. But since the listener socket
registered first still has references to its resource node due to still
being active, each subsequent socket unregistration is stuck behind it
for reclaim. Hence 5 + N files are still open at that point, where N is
awaiting the final put held up by the listener socket.

Rewrite the io_rsrc_node handling to NOT rely on serialization. Struct
io_kiocb now gets explicit resource nodes assigned, with each holding a
reference to the parent node. A parent node is either of type FILE or
BUFFER, which are the two types of nodes that exist. A request can have
two nodes assigned, if it's using both registered files and buffers.
Since request issue and task_work completion is both under the ring
private lock, no atomics are needed to handle these references. It's a
simple unlocked inc/dec. As before, the registered buffer or file table
each hold a reference as well to the registered nodes. Final put of the
node will remove the node and free the underlying resource, eg unmap the
buffer or put the file.

Outside of removing the stall in resource reclaim described above, it
has the following advantages:

1) It's a lot simpler than the previous scheme, and easier to follow.
   No need to specific quiesce handling anymore.

2) There are no resource node allocations in the fast path, all of that
   happens at resource registration time.

3) The structs related to resource handling can all get simplified
   quite a bit, like io_rsrc_node and io_rsrc_data. io_rsrc_put can
   go away completely.

4) Handling of resource tags is much simpler, and doesn't require
   persistent storage as it can simply get assigned up front at
   registration time. Just copy them in one-by-one at registration time
   and assign to the resource node.

The only real downside is that a request is now explicitly limited to
pinning 2 resources, one file and one buffer, where before just
assigning a resource node to a request would pin all of them. The upside
is that it's easier to follow now, as an individual resource is
explicitly referenced and assigned to the request.

With this in place, the above mentioned example will be using exactly 5
files at the end of the loop, not N.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02 15:44:18 -06:00
..
advise.c io_uring/advise: support 64-bit lengths 2024-06-16 14:54:55 -06:00
advise.h io_uring: split out fadvise/madvise operations 2022-07-24 18:39:11 -06:00
alloc_cache.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
cancel.c io_uring/cancel: get rid of init_hash_table() helper 2024-10-29 13:43:27 -06:00
cancel.h io_uring/cancel: get rid of init_hash_table() helper 2024-10-29 13:43:27 -06:00
epoll.c io_uring: undeprecate epoll_ctl support 2023-05-26 20:22:41 -06:00
epoll.h io_uring: move epoll handler to its own file 2022-07-24 18:39:11 -06:00
eventfd.c io_uring/eventfd: move ctx->evfd_last_cq_tail into io_ev_fd 2024-10-29 13:43:26 -06:00
eventfd.h io_uring/eventfd: move eventfd handling to separate file 2024-06-16 14:54:55 -06:00
fdinfo.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
fdinfo.h io_uring: move fdinfo helpers to its own file 2022-07-24 18:39:12 -06:00
filetable.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
filetable.h io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
fs.c io_uring/fs: consider link->flags when getting path for LINKAT 2023-11-20 09:01:42 -07:00
fs.h io_uring: split out filesystem related operations 2022-07-24 18:39:11 -06:00
futex.c io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
futex.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
io_uring.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
io_uring.h io_uring: abstract out a bit of the ring filling logic 2024-10-29 13:43:27 -06:00
io-wq.c io_uring/io-wq: inherit cpuset of cgroup in io worker 2024-09-11 07:27:56 -06:00
io-wq.h io_uring/io-wq: make io_wq_work flags atomic 2024-06-16 14:54:55 -06:00
kbuf.c for-6.12/io_uring-20240913 2024-09-16 13:29:00 +02:00
kbuf.h io_uring/kbuf: add support for incremental buffer consumption 2024-08-29 08:44:58 -06:00
Makefile io_uring: add GCOV_PROFILE_URING Kconfig option 2024-08-30 10:52:02 -06:00
memmap.c io_uring/register: add IORING_REGISTER_RESIZE_RINGS 2024-10-29 13:43:27 -06:00
memmap.h io_uring: move mapping/allocation helpers to a separate file 2024-04-15 08:10:26 -06:00
msg_ring.c io_uring/msg_ring: add support for sending a sync message 2024-10-29 13:43:26 -06:00
msg_ring.h io_uring/msg_ring: add support for sending a sync message 2024-10-29 13:43:26 -06:00
napi.c io_uring: user registered clockid for wait timeouts 2024-08-25 08:27:01 -06:00
napi.h io_uring/napi: postpone napi timeout adjustment 2024-08-25 08:27:01 -06:00
net.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
net.h io_uring: Introduce IORING_OP_LISTEN 2024-06-19 07:57:21 -06:00
nop.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
nop.h io_uring: move nop into its own file 2022-07-24 18:39:11 -06:00
notif.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
notif.h io_uring/notif: implement notification stacking 2024-04-22 19:31:18 -06:00
opdef.c io_uring/splice: open code 2nd direct file assignment 2024-10-29 13:43:28 -06:00
opdef.h io_uring: Fix probe of disabled operations 2024-06-19 08:58:00 -06:00
openclose.c io_uring: enable audit and restrict cred override for IORING_OP_FIXED_FD_INSTALL 2024-01-23 15:25:14 -07:00
openclose.h io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL 2023-12-12 07:42:57 -07:00
poll.c io_uring/poll: get rid of per-hashtable bucket locks 2024-10-29 13:43:27 -06:00
poll.h io_uring/poll: shrink alloc cache size to 32 2024-04-15 08:10:25 -06:00
refs.h io_uring: kill dead code in io_req_complete_post 2024-04-15 08:10:26 -06:00
register.c io_uring: add support for fixed wait regions 2024-10-29 13:43:28 -06:00
register.h io_uring: add support for fixed wait regions 2024-10-29 13:43:28 -06:00
rsrc.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
rsrc.h io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
rw.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
rw.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
slist.h io_uring: silence variable ‘prev’ set but not used warning 2023-03-09 10:10:58 -07:00
splice.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
splice.h io_uring/splice: open code 2nd direct file assignment 2024-10-29 13:43:28 -06:00
sqpoll.c io_uring/sqpoll: wait on sqd->wait for thread parking 2024-10-29 13:43:27 -06:00
sqpoll.h io_uring/sqpoll: statistics of the true utilization of sq threads 2024-03-01 06:28:19 -07:00
statx.c vfs: retire user_path_at_empty and drop empty arg from getname_flags 2024-06-05 17:03:57 +02:00
statx.h io_uring: move statx handling to its own file 2022-07-24 18:39:11 -06:00
sync.c io_uring: for requests that require async, force it 2023-01-29 15:18:26 -07:00
sync.h io_uring: split out fs related sync/fallocate functions 2022-07-24 18:39:11 -06:00
tctx.c io_uring: Add io_uring_setup flag to pre-register ring fd and never install it 2023-05-16 08:06:00 -06:00
tctx.h io_uring: simplify __io_uring_add_tctx_node 2022-10-07 12:25:30 -06:00
timeout.c io_uring: fix io_match_task must_hold 2024-07-24 08:01:49 -06:00
timeout.h io_uring: remove unused return from io_disarm_next 2022-09-21 13:15:01 -06:00
truncate.c io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
truncate.h io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
uring_cmd.c io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
uring_cmd.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
waitid.c io_uring: remove struct io_tw_state::locked 2024-04-15 08:10:24 -06:00
waitid.h io_uring: add IORING_OP_WAITID support 2023-09-21 12:04:45 -06:00
xattr.c vfs: retire user_path_at_empty and drop empty arg from getname_flags 2024-06-05 17:03:57 +02:00
xattr.h io_uring: move xattr related opcodes to its own file 2022-07-24 18:39:11 -06:00