linux-stable/io_uring
Linus Torvalds 9961a78594 for-6.10/io_uring-20240511
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmY/YdYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnmVEADBq8QT9Oa3HTIONHwxjmGMOalr7PSrBP89
 S6Inv/l+3xDlyolyLh1HIXUC84iS9Ihi2pNC3dZct4fNcpA99H0CFaHDGwZ5rVri
 MrFaubZAps1qSzeypqEq3zWGKVUoaYWaOKhuOjye5Ei2tKymbguhDKl1WiKibD21
 E9qOYbhSUFdub/xtx9Rv4BS05QW5bHZ2Y/tTFqB8MY4JUsdb9g/deVZkyGUQYRSd
 40mDallRldjQQTQ8iU4H6/ORdGIN/90aLPbmzMdFtQcymnmRyid3rOEwhwWYe4NO
 ljnI8m1SJQilZz1d5oHBXBB5QubVptY1JWxbk8GQCSmOU5wrCq+ARCJXUtBXwniJ
 K4VFsGm9MkZcc5vsIwIzvsrk8DODla6EVo/jyDy8iFceZcNWfVxdwa5NS67V/6QT
 macbF785XDsmA5E4UjslbZqU047w+A5N1yazcZWzMk0coJDeB8AtsA1/C2WZOm8p
 HVoiAzsqt81hvPItnjCyZluL/YW+BKeOTnq04QbpQKcJpZBzszO4ZLtuD+IXkE69
 8ZZPGFPnPS4ZMQojKkwsBr+Yo65S18oBDkib36mr2lsdnoWTpGq47C7ScUDBbqGm
 iI7U8tYMnVVkQQHVVmGI4KOr5/4lxxp8398kqCaxfW3D5BQhbtUOF/OBjBHj1ZSV
 9aZx87CyhA==
 =DwAV
 -----END PGP SIGNATURE-----

Merge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:

 - Greatly improve send zerocopy performance, by enabling coalescing of
   sent buffers.

   MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the
   io_uring side did not. In local testing, the crossover point for send
   zerocopy being faster is now around 3000 byte packets, and it
   performs better than the sync syscall variants as well.

   This feature relies on a shared branch with net-next, which was
   pulled into both branches.

 - Unification of how async preparation is done across opcodes.

   Previously, opcodes that required extra memory for async retry would
   allocate that as needed, using on-stack state until that was the
   case. If async retry was needed, the on-stack state was adjusted
   appropriately for a retry and then copied to the allocated memory.

   This led to some fragile and ugly code, particularly for read/write
   handling, and made storage retries more difficult than they needed to
   be. Allocate the memory upfront, as it's cheap from our pools, and
   use that state consistently both initially and also from the retry
   side.

 - Move away from using remap_pfn_range() for mapping the rings.

   This is really not the right interface to use and can cause lifetime
   issues or leaks. Additionally, it means the ring sq/cq arrays need to
   be physically contigious, which can cause problems in production with
   larger rings when services are restarted, as memory can be very
   fragmented at that point.

   Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply
   the same treatment to mapped ring provided buffers. This also helps
   unify the code we have dealing with allocating and mapping memory.

   Hard to see in the diffstat as we're adding a few features as well,
   but this kills about ~400 lines of code from the codebase as well.

 - Add support for bundles for send/recv.

   When used with provided buffers, bundles support sending or receiving
   more than one buffer at the time, improving the efficiency by only
   needing to call into the networking stack once for multiple sends or
   receives.

 - Tweaks for our accept operations, supporting both a DONTWAIT flag for
   skipping poll arm and retry if we can, and a POLLFIRST flag that the
   application can use to skip the initial accept attempt and rely
   purely on poll for triggering the operation. Both of these have
   identical flags on the receive side already.

 - Make the task_work ctx locking unconditional.

   We had various code paths here that would do a mix of lock/trylock
   and set the task_work state to whether or not it was locked. All of
   that goes away, we lock it unconditionally and get rid of the state
   flag indicating whether it's locked or not.

   The state struct still exists as an empty type, can go away in the
   future.

 - Add support for specifying NOP completion values, allowing it to be
   used for error handling testing.

 - Use set/test bit for io-wq worker flags. Not strictly needed, but
   also doesn't hurt and helps silence a KCSAN warning.

 - Cleanups for io-wq locking and work assignments, closing a tiny race
   where cancelations would not be able to find the work item reliably.

 - Misc fixes, cleanups, and improvements

* tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits)
  io_uring: support to inject result for NOP
  io_uring: fail NOP if non-zero op flags is passed in
  io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
  io_uring/net: add IORING_ACCEPT_DONTWAIT flag
  io_uring/filetable: don't unnecessarily clear/reset bitmap
  io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
  io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
  io_uring: Require zeroed sqe->len on provided-buffers send
  io_uring/notif: disable LAZY_WAKE for linked notifs
  io_uring/net: fix sendzc lazy wake polling
  io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it
  io_uring/rw: reinstate thread check for retries
  io_uring/notif: implement notification stacking
  io_uring/notif: simplify io_notif_flush()
  net: add callback for setting a ubuf_info to skb
  net: extend ubuf_info callback to ops structure
  io_uring/net: support bundles for recv
  io_uring/net: support bundles for send
  io_uring/kbuf: add helpers for getting/peeking multiple buffers
  io_uring/net: add provided buffer support for IORING_OP_SEND
  ...
2024-05-13 12:48:06 -07:00
..
advise.c io_uring: always go async for unsupported fadvise flags 2023-01-29 15:18:26 -07:00
advise.h io_uring: split out fadvise/madvise operations 2022-07-24 18:39:11 -06:00
alloc_cache.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
cancel.c io_uring: fix warnings on shadow variables 2024-04-15 08:10:26 -06:00
cancel.h io_uring/cancel: don't default to setting req->work.cancel_seq 2024-02-08 13:27:06 -07:00
epoll.c io_uring: undeprecate epoll_ctl support 2023-05-26 20:22:41 -06:00
epoll.h io_uring: move epoll handler to its own file 2022-07-24 18:39:11 -06:00
fdinfo.c io_uring: fix warnings on shadow variables 2024-04-15 08:10:26 -06:00
fdinfo.h io_uring: move fdinfo helpers to its own file 2022-07-24 18:39:12 -06:00
filetable.c io_uring/filetable: don't unnecessarily clear/reset bitmap 2024-05-08 08:27:45 -06:00
filetable.h io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-08 13:27:03 -07:00
fs.c io_uring/fs: consider link->flags when getting path for LINKAT 2023-11-20 09:01:42 -07:00
fs.h io_uring: split out filesystem related operations 2022-07-24 18:39:11 -06:00
futex.c io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
futex.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
io_uring.c for-6.10/io_uring-20240511 2024-05-13 12:48:06 -07:00
io_uring.h io_uring/rw: reinstate thread check for retries 2024-04-25 09:04:32 -06:00
io-wq.c io_uring/io-wq: Use set_bit() and test_bit() at worker->flags 2024-05-07 13:17:19 -06:00
io-wq.h io_uring: break out of iowq iopoll on teardown 2023-09-07 09:02:27 -06:00
kbuf.c io_uring/kbuf: add helpers for getting/peeking multiple buffers 2024-04-22 11:26:01 -06:00
kbuf.h io_uring/kbuf: add helpers for getting/peeking multiple buffers 2024-04-22 11:26:01 -06:00
Makefile io_uring: move mapping/allocation helpers to a separate file 2024-04-15 08:10:26 -06:00
memmap.c io_uring: move mapping/allocation helpers to a separate file 2024-04-15 08:10:26 -06:00
memmap.h io_uring: move mapping/allocation helpers to a separate file 2024-04-15 08:10:26 -06:00
msg_ring.c io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring 2024-05-01 17:13:51 -06:00
msg_ring.h io_uring: get rid of double locking 2022-12-07 06:47:13 -07:00
napi.c io_uring/napi: enable even with a timeout of 0 2024-02-15 15:37:28 -07:00
napi.h io_uring: add register/unregister napi function 2024-02-09 11:54:32 -07:00
net.c io_uring/net: add IORING_ACCEPT_POLL_FIRST flag 2024-05-09 12:22:11 -06:00
net.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
nop.c io_uring: support to inject result for NOP 2024-05-10 06:09:45 -06:00
nop.h io_uring: move nop into its own file 2022-07-24 18:39:11 -06:00
notif.c io_uring/notif: disable LAZY_WAKE for linked notifs 2024-04-30 13:06:27 -06:00
notif.h io_uring/notif: implement notification stacking 2024-04-22 19:31:18 -06:00
opdef.c io_uring/net: add provided buffer support for IORING_OP_SEND 2024-04-22 11:25:56 -06:00
opdef.h io_uring: drop ->prep_async() 2024-04-15 08:10:25 -06:00
openclose.c io_uring: enable audit and restrict cred override for IORING_OP_FIXED_FD_INSTALL 2024-01-23 15:25:14 -07:00
openclose.h io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL 2023-12-12 07:42:57 -07:00
poll.c io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
poll.h io_uring/poll: shrink alloc cache size to 32 2024-04-15 08:10:25 -06:00
refs.h io_uring: kill dead code in io_req_complete_post 2024-04-15 08:10:26 -06:00
register.c io_uring: fix warnings on shadow variables 2024-04-15 08:10:26 -06:00
register.h io_uring/register: move io_uring_register(2) related code to register.c 2023-12-19 08:54:20 -07:00
rsrc.c io_uring: move mapping/allocation helpers to a separate file 2024-04-15 08:10:26 -06:00
rsrc.h io_uring: remove io_req_put_rsrc_locked() 2024-04-15 08:10:26 -06:00
rw.c for-6.10/io_uring-20240511 2024-05-13 12:48:06 -07:00
rw.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
slist.h io_uring: silence variable ‘prev’ set but not used warning 2023-03-09 10:10:58 -07:00
splice.c splice: return type ssize_t from all helpers 2023-12-12 16:19:59 +01:00
splice.h io_uring: split out splice related operations 2022-07-24 18:39:11 -06:00
sqpoll.c io_uring/sqpoll: work around a potential audit memory leak 2024-04-15 13:06:19 -06:00
sqpoll.h io_uring/sqpoll: statistics of the true utilization of sq threads 2024-03-01 06:28:19 -07:00
statx.c io_uring: for requests that require async, force it 2023-01-29 15:18:26 -07:00
statx.h io_uring: move statx handling to its own file 2022-07-24 18:39:11 -06:00
sync.c io_uring: for requests that require async, force it 2023-01-29 15:18:26 -07:00
sync.h io_uring: split out fs related sync/fallocate functions 2022-07-24 18:39:11 -06:00
tctx.c io_uring: Add io_uring_setup flag to pre-register ring fd and never install it 2023-05-16 08:06:00 -06:00
tctx.h io_uring: simplify __io_uring_add_tctx_node 2022-10-07 12:25:30 -06:00
timeout.c io_uring/timeout: remove duplicate initialization of the io_timeout list. 2024-04-15 08:10:27 -06:00
timeout.h io_uring: remove unused return from io_disarm_next 2022-09-21 13:15:01 -06:00
truncate.c io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
truncate.h io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
uring_cmd.c io_uring: separate header for exported net bits 2024-04-15 08:10:26 -06:00
uring_cmd.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
waitid.c io_uring: remove struct io_tw_state::locked 2024-04-15 08:10:24 -06:00
waitid.h io_uring: add IORING_OP_WAITID support 2023-09-21 12:04:45 -06:00
xattr.c io_uring: use file_mnt_idmap helper 2024-02-06 19:55:14 -07:00
xattr.h io_uring: move xattr related opcodes to its own file 2022-07-24 18:39:11 -06:00