Commit Graph

184 Commits

Author SHA1 Message Date
Dylan Yudaken
aebb224fd4 io_uring: for requests that require async, force it
Some requests require being run async as they do not support
non-blocking. Instead of trying to issue these requests, getting -EAGAIN
and then queueing them for async issue, rather just force async upfront.

Add WARN_ON_ONCE to make sure surprising code paths do not come up,
however in those cases the bug would end up being a blocking
io_uring_enter(2) which should not be critical.

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20230127135227.3646353-3-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:18:26 -07:00
Jens Axboe
b00c51ef8f io_uring/net: cache provided buffer group value for multishot receives
If we're using ring provided buffers with multishot receive, and we end
up doing an io-wq based issue at some points that also needs to select
a buffer, we'll lose the initially assigned buffer group as
io_ring_buffer_select() correctly clears the buffer group list as the
issue isn't serialized by the ctx uring_lock. This is fine for normal
receives as the request puts the buffer and finishes, but for multishot,
we will re-arm and do further receives. On the next trigger for this
multishot receive, the receive will try and pick from a buffer group
whose value is the same as the buffer ID of the las receive. That is
obviously incorrect, and will result in a premature -ENOUFS error for
the receive even if we had available buffers in the correct group.

Cache the buffer group value at prep time, so we can restore it for
future receives. This only needs doing for the above mentioned case, but
just do it by default to keep it easier to read.

Cc: stable@vger.kernel.org
Fixes: b3fdea6ecb ("io_uring: multishot recv")
Fixes: 9bb66906f2 ("io_uring: support multishot in recvmsg")
Cc: Dylan Yudaken <dylany@meta.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-23 07:08:08 -07:00
Jens Axboe
4b61152e10 io_uring: switch network send/recv to ITER_UBUF
This is more efficient than iter_iov.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
[merged to 6.2]
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-01-08 20:59:17 -07:00
Pavel Begunkov
6c3e8955d4 io_uring/net: fix cleanup after recycle
Don't access io_async_msghdr io_netmsg_recycle(), it may be reallocated.

Cc: stable@vger.kernel.org
Fixes: 9bb66906f2 ("io_uring: support multishot in recvmsg")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e326f4ad4046ddadf15bf34bf3fa58c6372f6b5.1671461985.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-19 08:28:28 -07:00
Jens Axboe
990a4de57e io_uring/net: ensure compat import handlers clear free_iov
If we're not allocating the vectors because the count is below
UIO_FASTIOV, we still do need to properly clear ->free_iov to prevent
an erronous free of on-stack data.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Fixes: 4c17a496a7 ("io_uring/net: fix cleanup double free free_iov init")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-19 07:35:16 -07:00
Linus Torvalds
96f7e448b9 for-6.2/io_uring-next-2022-12-08
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmOSZJAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpp+OD/9M04tGsVFCdqtKty5lBlDs03OnA03f404c
 Ottp2pop2JrBx7+ycS/dl78MQjmghh8Ectel8kOiswDeRQc98TtzWY31DF1d56yS
 aGYCq2Ww2gx5ziYmJgiFU7RRLTFlpfa/vZUBMK4HW4MYm2ihxtfNc72Oa8H9KGDJ
 /RYk5+PSCau+UFwyWu91rORVNRXjLr1mFmgzRTmFhL2unYYuOO83mK4GpK2f8rHx
 qpT7Wn9IS9xiTpr8rHqs8y6rxV6+Tnv/HqR8kKoviHvQU/u6fzvKSNEu1NvK/Znm
 V0z8cI4JJZUelDyqJe5ITq8cIS59amzILEIneclYQd20NkqcFYPlS56K6qR9qL7J
 6eNHvgH7iKvnk9JlR2soKojC6KWEPtVni2BjPEXgXHrfUWdINMKrT6MwTNOjztWj
 h+EaqLBGQb5m/nRCCMeE9kfUK23Rg6Ev+H+aas0SgqD5Isg/hVG+aMtjWLmWquCU
 pKg2UqxqsR9ymKj92KJSoN7F8Z2U0JsoHBKzAammlnmfhxl294RjGqBMJjjI5eS9
 Zu+fTte4EuFY5s/TvE5FCBmQ0Oeg+ud2f+GKXDzF25equtct8QCfHIbguM6Yr3X2
 3ANYGtLwP7Cj96U1Y++RWgTpBTTKwGkWyzEyS9SN/+MRXhIjeP8JZeGdXBDWaXLC
 Vp7j39Avgg==
 =MzhS
 -----END PGP SIGNATURE-----

Merge tag 'for-6.2/io_uring-next-2022-12-08' of git://git.kernel.dk/linux

Pull io_uring updates part two from Jens Axboe:

 - Misc fixes (me, Lin)

 - Series from Pavel extending the single task exclusive ring mode,
   yielding nice improvements for the common case of having a single
   ring per thread (Pavel)

 - Cleanup for MSG_RING, removing our IOPOLL hack (Pavel)

 - Further poll cleanups and fixes (Pavel)

 - Misc cleanups and fixes (Pavel)

* tag 'for-6.2/io_uring-next-2022-12-08' of git://git.kernel.dk/linux: (22 commits)
  io_uring/msg_ring: flag target ring as having task_work, if needed
  io_uring: skip spinlocking for ->task_complete
  io_uring: do msg_ring in target task via tw
  io_uring: extract a io_msg_install_complete helper
  io_uring: get rid of double locking
  io_uring: never run tw and fallback in parallel
  io_uring: use tw for putting rsrc
  io_uring: force multishot CQEs into task context
  io_uring: complete all requests in task context
  io_uring: don't check overflow flush failures
  io_uring: skip overflow CQE posting for dying ring
  io_uring: improve io_double_lock_ctx fail handling
  io_uring: dont remove file from msg_ring reqs
  io_uring: reshuffle issue_flags
  io_uring: don't reinstall quiesce node for each tw
  io_uring: improve rsrc quiesce refs checks
  io_uring: don't raw spin unlock to match cq_lock
  io_uring: combine poll tw handlers
  io_uring: improve poll warning handling
  io_uring: remove ctx variable in io_poll_check_events
  ...
2022-12-13 10:40:31 -08:00
Linus Torvalds
54e60e505d for-6.2/io_uring-2022-12-08
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmOSYp4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgWNEADTcHejQ5y8Ctc1BEsCiEnCbR5DeIwESBKN
 SfK7uK5N5GjKZgKeMtkdxNeckH2wSZ79VFpz2/b4fszM11H+P81r/PP+OwQojcra
 1YyPbp2jXlQZGEYijiXYDohnKr8NnJYj/nuwm4T73OPOD48ekbrY36/t3Hd75jKk
 M5L/HOpbKhA+fypPHYlm3XlwEM9/4wupsDeiabTuAeFpLh66V/h85ZLY91c3Bf7j
 yllzT2CCGQsh+fnqdovpsqUxM4Sh6sHfa2QhwkfJFfU9OLwDz0TLlcvSVwWWICD1
 DGeDE/MPBKZ4Z5zp8t94vTnZDAiExwkZbmCOnOkdYoPdMDqCDhp8a9WywV3IiJJi
 +hRN6Au6BWw8Rj3b3pbobfdXFJAvoHqXHM1SDAsWmEIMpuMhNalIpYqtna9JwRDe
 KV2ERMpGOUpmvQiRaJa5Wtz6hlFetmuav6Ij9DWb5f8+NXikiFAXk4g6TtD1Z7Cb
 15klwM8qW50zX7YUlzBLcjNdj7+HuIswLx0obZ3Uv1ogDwMQKXEvPaAHUwVN02q6
 cWP0P2Bc0EKvdwTSNpgNflhNsiV2DFJZpZfxkdPauN4t2EkJ38iEpHewffy0ecM7
 uyaXGQVQW7WFDuGno4cGbThQIG95MqkRPBhEyB4cOmcyS/aZ92ZFtV1iI8dVDA+v
 uuEIMc3OCA==
 =EgDc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.2/io_uring-2022-12-08' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:

 - Always ensure proper ordering in case of CQ ring overflow, which then
   means we can remove some work-arounds for that (Dylan)

 - Support completion batching for multishot, greatly increasing the
   efficiency for those (Dylan)

 - Flag epoll/eventfd wakeups done from io_uring, so that we can easily
   tell if we're recursing into io_uring again.

   Previously, this would have resulted in repeated multishot
   notifications if we had a dependency there. That could happen if an
   eventfd was registered as the ring eventfd, and we multishot polled
   for events on it. Or if an io_uring fd was added to epoll, and
   io_uring had a multishot request for the epoll fd.

   Test cases here:
	https://git.kernel.dk/cgit/liburing/commit/?id=919755a7d0096fda08fb6d65ac54ad8d0fe027cd

   Previously these got terminated when the CQ ring eventually
   overflowed, now it's handled gracefully (me).

 - Tightening of the IOPOLL based completions (Pavel)

 - Optimizations of the networking zero-copy paths (Pavel)

 - Various tweaks and fixes (Dylan, Pavel)

* tag 'for-6.2/io_uring-2022-12-08' of git://git.kernel.dk/linux: (41 commits)
  io_uring: keep unlock_post inlined in hot path
  io_uring: don't use complete_post in kbuf
  io_uring: spelling fix
  io_uring: remove io_req_complete_post_tw
  io_uring: allow multishot polled reqs to defer completion
  io_uring: remove overflow param from io_post_aux_cqe
  io_uring: add lockdep assertion in io_fill_cqe_aux
  io_uring: make io_fill_cqe_aux static
  io_uring: add io_aux_cqe which allows deferred completion
  io_uring: allow defer completion for aux posted cqes
  io_uring: defer all io_req_complete_failed
  io_uring: always lock in io_apoll_task_func
  io_uring: remove iopoll spinlock
  io_uring: iopoll protect complete_post
  io_uring: inline __io_req_complete_put()
  io_uring: remove io_req_tw_post_queue
  io_uring: use io_req_task_complete() in timeout
  io_uring: hold locks for io_req_complete_failed
  io_uring: add completion locking for iopoll
  io_uring: kill io_cqring_ev_posted() and __io_cq_unlock_post()
  ...
2022-12-13 10:33:08 -08:00
Pavel Begunkov
17add5cea2 io_uring: force multishot CQEs into task context
Multishot are posting CQEs outside of the normal request completion
path, which is usually done from within a task work handler. However, it
might be not the case when it's yet to be polled but has been punted to
io-wq. Make it abide ->task_complete and push it to the polling path
when executed by io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7714aaff583096769a0f26e8e747759e556feb1.1670384893.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-07 06:47:13 -07:00
Al Viro
de4eda9de2 use less confusing names for iov_iter direction initializers
READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25 13:01:55 -05:00
Dylan Yudaken
9b8c54755a io_uring: add io_aux_cqe which allows deferred completion
Use the just introduced deferred post cqe completion state when possible
in io_aux_cqe. If not possible fallback to io_post_aux_cqe.

This introduces a complication because of allow_overflow. For deferred
completions we cannot know without locking the completion_lock if it will
overflow (and even if we locked it, another post could sneak in and cause
this cqe to be in overflow).
However since overflow protection is mostly a best effort defence in depth
to prevent infinite loops of CQEs for poll, just checking the overflow bit
is going to be good enough and will result in at most 16 (array size of
deferred cqes) overflows.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221124093559.3780686-6-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25 06:10:04 -07:00
Dylan Yudaken
e2ad599d1e io_uring: allow multishot recv CQEs to overflow
With commit aa1df3a360 ("io_uring: fix CQE reordering"), there are
stronger guarantees for overflow ordering. Specifically ensuring that
userspace will not receive out of order receive CQEs. Therefore this is
not needed any more for recv/recvmsg.

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221107125236.260132-4-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-21 07:44:09 -07:00
Dylan Yudaken
515e269612 io_uring: revert "io_uring fix multishot accept ordering"
This is no longer needed after commit aa1df3a360 ("io_uring: fix CQE
reordering"), since all reordering is now taken care of.

This reverts commit cbd2574854 ("io_uring: fix multishot accept
ordering").

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221107125236.260132-2-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-21 07:43:42 -07:00
Xinghui Li
df730ec21f io_uring: fix two assignments in if conditions
Fixes two errors:

"ERROR: do not use assignment in if condition
130: FILE: io_uring/net.c:130:
+       if (!(issue_flags & IO_URING_F_UNLOCKED) &&

ERROR: do not use assignment in if condition
599: FILE: io_uring/poll.c:599:
+       } else if (!(issue_flags & IO_URING_F_UNLOCKED) &&"
reported by checkpatch.pl in net.c and poll.c .

Signed-off-by: Xinghui Li <korantli@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20221102082503.32236-1-korantwork@gmail.com
[axboe: style tweaks]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-21 07:38:31 -07:00
Pavel Begunkov
42385b02ba io_uring/net: move mm accounting to a slower path
We can also move mm accounting to the extended callbacks. It removes a
few cycles from the hot path including skipping one function call and
setting io_req_task_complete as a callback directly. For user backed I/O
it shouldn't make any difference taking into considering atomic mm
accounting and page pinning.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1062f270273ad11c1b7b45ec59a6a317533d5e64.1667557923.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-21 07:38:31 -07:00
Pavel Begunkov
40725d1b96 io_uring: move zc reporting from the hot path
Add custom tw and notif callbacks on top of usual bits also handling zc
reporting. That moves it from the hot path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/40de4a6409042478e1f35adc4912e23226cb1b5c.1667557923.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-21 07:38:31 -07:00
Stefan Metzmacher
e307e66981 io_uring/net: introduce IORING_SEND_ZC_REPORT_USAGE flag
It might be useful for applications to detect if a zero copy transfer with
SEND[MSG]_ZC was actually possible or not. The application can fallback to
plain SEND[MSG] in order to avoid the overhead of two cqes per request. Or
it can generate a log message that could indicate to an administrator that
no zero copy was possible and could explain degraded performance.

Cc: stable@vger.kernel.org # 6.1
Link: https://lore.kernel.org/io-uring/fb6a7599-8a9b-15e5-9b64-6cd9d01c6ff4@gmail.com/T/#m2b0d9df94ce43b0e69e6c089bdff0ce6babbdfaa
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8945b01756d902f5d5b0667f20b957ad3f742e5e.1666895626.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-21 07:38:31 -07:00
Pavel Begunkov
100d6b17c0 io_uring: fix multishot recv request leaks
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.

Cc: stable@vger.kernel.org
Fixes: 1300ebb20286b ("io_uring: multishot recv")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17 12:33:33 -07:00
Pavel Begunkov
9148286476 io_uring: fix multishot accept request leaks
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.

Cc: stable@vger.kernel.org
Fixes: 390ed29b5e ("io_uring: add IORING_ACCEPT_MULTISHOT for accept")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17 12:33:33 -07:00
Pavel Begunkov
cc767e7c69 io_uring/net: fail zc sendmsg when unsupported by socket
The previous patch fails zerocopy send requests for protocols that don't
support it, do the same for zerocopy sendmsg.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0854e7bb4c3d810a48ec8b5853e2f61af36a0467.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22 08:43:03 -06:00
Pavel Begunkov
edf8143879 io_uring/net: fail zc send when unsupported by socket
If a protocol doesn't support zerocopy it will silently fall back to
copying. This type of behaviour has always been a source of troubles
so it's better to fail such requests instead.

Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22 08:43:03 -06:00
Jens Axboe
3fb1bd6881 io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT
We treat EINPROGRESS like EAGAIN, but if we're retrying post getting
EINPROGRESS, then we just need to check the socket for errors and
terminate the request.

This was exposed on a bluetooth connection request which ends up
taking a while and hitting EINPROGRESS, and yields a CQE result of
-EBADFD because we're retrying a connect on a socket that is now
connected.

Cc: stable@vger.kernel.org
Fixes: 87f80d623c ("io_uring: handle connect -EINPROGRESS like -EAGAIN")
Link: https://github.com/axboe/liburing/issues/671
Reported-by: Aidan Sun <aidansun05@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12 16:30:56 -06:00
Pavel Begunkov
108893ddcc io_uring/net: fix notif cqe reordering
send zc is not restricted to !IO_URING_F_UNLOCKED anymore and so
we can't use task-tw ordering trick to order notification cqes
with requests completions. In this case leave it alone and let
io_send_zc_cleanup() flush it.

Cc: stable@vger.kernel.org
Fixes: 53bdc88aac ("io_uring/notif: order notif vs send CQEs")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0031f3a00d492e814a4a0935a2029a46d9c9ba06.1664486545.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-29 17:46:04 -06:00
Pavel Begunkov
6f10ae8a15 io_uring/net: don't update msg_name if not provided
io_sendmsg_copy_hdr() may clear msg->msg_name if the userspace didn't
provide it, we should retain NULL in this case.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/97d49f61b5ec76d0900df658cfde3aa59ff22121.1664486545.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-29 17:46:04 -06:00
Stefan Metzmacher
3e4cb6ebbb io_uring/net: fix fast_iov assignment in io_setup_async_msg()
I hit a very bad problem during my tests of SENDMSG_ZC.
BUG(); in first_iovec_segment() triggered very easily.
The problem was io_setup_async_msg() in the partial retry case,
which seems to happen more often with _ZC.

iov_iter_iovec_advance() may change i->iov in order to have i->iov_offset
being only relative to the first element.

Which means kmsg->msg.msg_iter.iov is no longer the
same as kmsg->fast_iov.

But this would rewind the copy to be the start of
async_msg->fast_iov, which means the internal
state of sync_msg->msg.msg_iter is inconsitent.

I tested with 5 vectors with length like this 4, 0, 64, 20, 8388608
and got a short writes with:
- ret=2675244 min_ret=8388692 => remaining 5713448 sr->done_io=2675244
- ret=-EAGAIN => io_uring_poll_arm
- ret=4911225 min_ret=5713448 => remaining 802223  sr->done_io=7586469
- ret=-EAGAIN => io_uring_poll_arm
- ret=802223  min_ret=802223  => res=8388692

While this was easily triggered with SENDMSG_ZC (queued for 6.1),
it was a potential problem starting with 7ba89d2af1
in 5.18 for IORING_OP_RECVMSG.
And also with 4c3c09439c in 5.19
for IORING_OP_SENDMSG.

However 257e84a537 introduced the critical
code into io_setup_async_msg() in 5.11.

Fixes: 7ba89d2af1 ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly")
Fixes: 257e84a537 ("io_uring: refactor sendmsg/recvmsg iov managing")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2e7be246e2fb173520862b0c7098e55767567a2.1664436949.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-29 07:08:21 -06:00
Pavel Begunkov
04360d3e05 io_uring/net: fix non-zc send with address
We're currently ignoring the dest address with non-zerocopy send because
even though we copy it from the userspace shortly after ->msg_name gets
zeroed. Move msghdr init earlier.

Fixes: 516e82f0e0 ("io_uring/net: support non-zerocopy sendto")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/176ced5e8568aa5d300ca899b7f05b303ebc49fd.1664409532.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-28 19:29:12 -06:00
Pavel Begunkov
6ae91ac9a6 io_uring/net: don't skip notifs for failed requests
We currently only add a notification CQE when the send succeded, i.e.
cqe.res >= 0. However, it'd be more robust to do buffer notifications
for failed requests as well in case drivers decide do something fanky.

Always return a buffer notification after initial prep, don't hide it.
This behaviour is better aligned with documentation and the patch also
helps the userspace to respect it.

Cc: stable@vger.kernel.org # 6.0
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9c8bead87b2b980fcec441b8faef52188b4a6588.1664292100.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-28 07:53:15 -06:00
Pavel Begunkov
4c17a496a7 io_uring/net: fix cleanup double free free_iov init
Having ->async_data doesn't mean it's initialised and previously we vere
relying on setting F_CLEANUP at the right moment. With zc sendmsg
though, we set F_CLEANUP early in prep when we alloc a notif and so we
may allocate async_data, fail in copy_msg_hdr() leaving
struct io_async_msghdr not initialised correctly but with F_CLEANUP
set, which causes a ->free_iov double free and probably other nastiness.

Always initialise ->free_iov. Also, now it might point to fast_iov when
fails, so avoid freeing it during cleanups.

Reported-by: syzbot+edfd15cd4246a3fc615a@syzkaller.appspotmail.com
Fixes: 493108d95f ("io_uring/net: zerocopy sendmsg")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-26 08:36:50 -06:00
Pavel Begunkov
a75155faef io_uring/net: fix UAF in io_sendrecv_fail()
We should not assume anything about ->free_iov just from
REQ_F_ASYNC_DATA but rather rely on REQ_F_NEED_CLEANUP, as we may
allocate ->async_data but failed init would leave the field in not
consistent state. The easiest solution is to remove removing
REQ_F_NEED_CLEANUP and so ->async_data dealloc from io_sendrecv_fail()
and let io_send_zc_cleanup() do the job. The catch here is that we also
need to prevent double notif flushing, just test it for NULL and zero
where it's needed.

BUG: KASAN: use-after-free in io_sendrecv_fail+0x3b0/0x3e0 io_uring/net.c:1221
Write of size 8 at addr ffff8880771b4080 by task syz-executor.3/30199

CPU: 1 PID: 30199 Comm: syz-executor.3 Not tainted 6.0.0-rc6-next-20220923-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:284 [inline]
 print_report+0x15e/0x45d mm/kasan/report.c:395
 kasan_report+0xbb/0x1f0 mm/kasan/report.c:495
 io_sendrecv_fail+0x3b0/0x3e0 io_uring/net.c:1221
 io_req_complete_failed+0x155/0x1b0 io_uring/io_uring.c:873
 io_drain_req io_uring/io_uring.c:1648 [inline]
 io_queue_sqe_fallback.cold+0x29f/0x788 io_uring/io_uring.c:1931
 io_submit_sqe io_uring/io_uring.c:2160 [inline]
 io_submit_sqes+0x1180/0x1df0 io_uring/io_uring.c:2276
 __do_sys_io_uring_enter+0xac6/0x2410 io_uring/io_uring.c:3216
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: c4c0009e0b ("io_uring/net: combine fail handlers")
Reported-by: syzbot+4c597a574a3f5a251bda@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/23ab8346e407ea50b1198a172c8a97e1cf22915b.1663945875.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-23 14:57:38 -06:00
Pavel Begunkov
493108d95f io_uring/net: zerocopy sendmsg
Add a zerocopy version of sendmsg.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6aabc4bdfc0ec78df6ec9328137e394af9d4e7ef.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 13:15:02 -06:00
Pavel Begunkov
c4c0009e0b io_uring/net: combine fail handlers
Merge io_send_zc_fail() into io_sendrecv_fail(), saves a few lines of
code and some headache for following patch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e0eba1d577413aef5602cd45f588b9230207082d.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 13:15:02 -06:00
Pavel Begunkov
b0e9b5517e io_uring/net: rename io_sendzc()
Simple renaming of io_sendzc*() functions in preparatio to adding
a zerocopy sendmsg variant.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/265af46829e6076dd220011b1858dc3151969226.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 13:15:02 -06:00
Pavel Begunkov
516e82f0e0 io_uring/net: support non-zerocopy sendto
We have normal sends, but what is missing is sendto-like requests. Add
sendto() capabilities to IORING_OP_SEND by passing in addr just as we do
for IORING_OP_SEND_ZC.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/69fbd8b2cb830e57d1bf9ec351e9bf95c5b77e3f.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 13:15:02 -06:00
Pavel Begunkov
6ae61b7aa2 io_uring/net: refactor io_setup_async_addr
Instead of passing the right address into io_setup_async_addr() only
specify local on-stack storage and let the function infer where to grab
it from. It optimises out one local variable we have to deal with.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6bfa9ab810d776853eb26ed59301e2536c3a5471.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 13:15:02 -06:00
Pavel Begunkov
5693bcce89 io_uring/net: don't lose partial send_zc on fail
Partial zc send may end up in io_req_complete_failed(), which not only
would return invalid result but also mask out the notification leading
to lifetime issues.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5673285b5e83e6ceca323727b4ddaa584b5cc91e.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 13:15:02 -06:00
Pavel Begunkov
7e6b638ed5 io_uring/net: don't lose partial send/recv on fail
Just as with rw, partial send/recv may end up in
io_req_complete_failed() and loose the result, make sure we return the
number of bytes processed.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a4ff95897b5419356fca9ea55db91ac15b2975f9.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 13:15:02 -06:00
Pavel Begunkov
ac9e5784bb io_uring/net: use io_sr_msg for sendzc
Reuse struct io_sr_msg for zerocopy sends, which is handy. There is
only one zerocopy specific field, namely .notif, and we have enough
space for it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/408c5b1b2d8869e1a12da5f5a78ed72cac112149.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 10:30:43 -06:00
Pavel Begunkov
0b048557db io_uring/net: refactor io_sr_msg types
In preparation for using struct io_sr_msg for zerocopy sends, clean up
types. First, flags can be u16 as it's provided by the userspace in u16
ioprio, as well as addr_len. This saves us 4 bytes. Also use unsigned
for size and done_io, both are as well limited to u32.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42c2639d6385b8b2181342d2af3a42d3b1c5bcd2.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 10:30:43 -06:00
Pavel Begunkov
cd9021e88f io_uring/net: add non-bvec sg chunking callback
Add a sg_from_iter() for when we initiate non-bvec zerocopy sends, which
helps us to remove some extra steps from io_sg_from_iter(). The only
thing the new function has to do before giving control away to
__zerocopy_sg_from_iter() is to check if the skb has managed frags and
downgrade them if so.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cda3dea0d36f7931f63a70f350130f085ac3f3dd.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 10:30:43 -06:00
Pavel Begunkov
6bf8ad25fc io_uring/net: io_async_msghdr caches for sendzc
We already keep io_async_msghdr caches for normal send/recv requests,
use them also for zerocopy send.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42fa615b6e0be25f47a685c35d7b5e4f1b03d348.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 10:30:43 -06:00
Pavel Begunkov
858c293e5d io_uring/net: use async caches for async prep
send/recv have async_data caches but there are only used from within
issue handlers. Extend their use also to ->prep_async, should be handy
with links and IOSQE_ASYNC.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b9a2264b807582a97ed606c5bfcdc2399384e8a5.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 10:30:43 -06:00
Pavel Begunkov
95eafc74be io_uring/net: reshuffle error handling
We should prioritise send/recv retry cases over failures, they're more
important. Shuffle -ERESTARTSYS after we handled retries.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d9059691b30d0963b7269fa4a0c81ee7720555e6.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 10:30:43 -06:00
Pavel Begunkov
e3366e0234 io_uring/net: fix zc fixed buf lifetime
Notifications usually outlive requests, so we need to pin buffers with
it by assigning a rsrc to it instead of the request.

Fixed: b48c312be0 ("io_uring/net: simplify zerocopy send user API")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dd6406ff8a90887f2b36ed6205dac9fda17c1f35.1663366886.git.asml.silence@gmail.com
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-18 05:07:51 -06:00
Pavel Begunkov
3c8400532d io_uring/net: copy addr for zc on POLL_FIRST
Every time we return from an issue handler and expect the request to be
retried we should also setup it for async exec ourselves. Do that when
we return on IORING_RECVSEND_POLL_FIRST in io_sendzc(), otherwise it'll
re-read the address, which might be a surprise for the userspace.

Fixes: 092aeedb75 ("io_uring: allow to pass addr into sendzc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ab1d0657890d6721339c56d2e161a4bba06f85d0.1662642013.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-08 08:28:38 -06:00
Pavel Begunkov
b48c312be0 io_uring/net: simplify zerocopy send user API
Following user feedback, this patch simplifies zerocopy send API. One of
the main complaints is that the current API is difficult with the
userspace managing notification slots, and then send retries with error
handling make it even worse.

Instead of keeping notification slots change it to the per-request
notifications model, which posts both completion and notification CQEs
for each request when any data has been sent, and only one CQE if it
fails. All notification CQEs will have IORING_CQE_F_NOTIF set and
IORING_CQE_F_MORE in completion CQEs indicates whether to wait a
notification or not.

IOSQE_CQE_SKIP_SUCCESS is disallowed with zerocopy sends for now.

This is less flexible, but greatly simplifies the user API and also the
kernel implementation. We reuse notif helpers in this patch, but in the
future there won't be need for keeping two requests.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/95287640ab98fc9417370afb16e310677c63e6ce.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-01 09:13:33 -06:00
Pavel Begunkov
57f332246a io_uring/notif: remove notif registration
We're going to remove the userspace exposed zerocopy notification API,
remove notification registration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6ff00b97be99869c386958a990593c9c31cf105b.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-01 09:13:33 -06:00
Pavel Begunkov
dfb58b1796 io_uring/net: fix overexcessive retries
Length parameter of io_sg_from_iter() can be smaller than the iterator's
size, as it's with TCP, so when we set from->count at the end of the
function we truncate the iterator forcing TCP to return preliminary with
a short send. It affects zerocopy sends with large payload sizes and
leads to retries and possible request failures.

Fixes: 3ff1a0d395 ("io_uring: enable managed frags with register buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0bc0d5179c665b4ef5c328377c84c7a1f298467e.1661530037.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-26 10:31:42 -06:00
Pavel Begunkov
581711c466 io_uring/net: save address for sendzc async execution
We usually copy all bits that a request needs from the userspace for
async execution, so the userspace can keep them on the stack. However,
send zerocopy violates this pattern for addresses and may reloads it
e.g. from io-wq. Save the address if any in ->async_data as usual.

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com
[axboe: fold in incremental fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-25 07:52:30 -06:00
Pavel Begunkov
986e263def io_uring/net: fix indentation
Fix up indentation before we get complaints from tooling.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bd5754e3764215ccd7fb04cd636ea9167aaa275d.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-24 08:57:15 -06:00
Pavel Begunkov
5a848b7c9e io_uring/net: fix zc send link failing
Failed requests should be marked with req_set_fail(), so links and cqe
skipping work correctly, which is missing in io_sendzc(). Note,
io_sendzc() return IOU_OK on failure, so the core code won't do the
cleanup for us.

Fixes: 06a5464be8 ("io_uring: wire send zc request type")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e47d46fda9db30154ce66a549bb0d3380b780520.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-24 08:57:00 -06:00
Pavel Begunkov
3f743e9bbb io_uring/net: use right helpers for async_data
There is another spot where we check ->async_data directly instead of
using req_has_async_data(), which is the way to do it, fix it up.

Fixes: 43e0bbbd0b ("io_uring: add netmsg cache")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42f33b9a81dd6ae65dda92f0372b0ff82d548517.1660822636.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-18 07:27:20 -06:00
Pavel Begunkov
86dc8f23bb io_uring/net: improve zc addr import error handling
We may account memory to a memcg of a request that didn't even got to
the network layer. It's not a bug as it'll be routinely cleaned up on
flush, but it might be confusing for the userspace.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8aae61f4c3ddc4da97c1da876bb73871f352d50.1660566179.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-15 21:34:00 -06:00
Pavel Begunkov
063604265f io_uring/net: use right helpers for async recycle
We have a helper that checks for whether a request contains anything in
->async_data or not, namely req_has_async_data(). It's better to use it
as it might have some extra considerations.

Fixes: 43e0bbbd0b ("io_uring: add netmsg cache")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7414da4e7c3c32c31fc02dfd1355af4ccf4ca5f.1660566179.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-15 21:34:00 -06:00
Stefan Metzmacher
f2ccb5aed7 io_uring: make io_kiocb_to_cmd() typesafe
We need to make sure (at build time) that struct io_cmd_data is not
casted to a structure that's larger.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-12 17:01:00 -06:00
Dylan Yudaken
d1f6222c49 io_uring: fix io_recvmsg_prep_multishot sparse warnings
Fix casts missing the __user parts. This seemed to only cause errors on
the alpha build, or if checked with sparse, but it was definitely an
oversight.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 9bb66906f2 ("io_uring: support multishot in recvmsg")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220805115450.3921352-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-05 08:41:18 -06:00
Pavel Begunkov
4a933e6208 io_uring/net: send retry for zerocopy
io_uring handles short sends/recvs for stream sockets when MSG_WAITALL
is set, however new zerocopy send is inconsistent in this regard, which
might be confusing. Handle short sends.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b876a4838597d9bba4f3215db60d72c33c448ad0.1659622472.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-04 08:35:16 -06:00
Pavel Begunkov
14b146b688 io_uring: notification completion optimisation
We want to use all optimisations that we have for io_uring requests like
completion batching, memory caching and more but for zc notifications.
Fortunately, notification perfectly fit the request model so we can
overlay them onto struct io_kiocb and use all the infratructure.

Most of the fields of struct io_notif natively fits into io_kiocb, so we
replace struct io_notif with struct io_kiocb carrying struct
io_notif_data in the cmd cache line. Then we adapt io_alloc_notif() to
use io_alloc_req()/io_alloc_req_refill(), and kill leftovers of hand
coded caching. __io_notif_complete_tw() is converted to use io_uring's
tw infra.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e010125175e80baf51f0ca63bdc7cc6a4a9fa56.1658913593.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-27 08:50:50 -06:00
Pavel Begunkov
293402e564 io_uring/net: use unsigned for flags
Use unsigned int type for msg flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5cfaed13d3191337b14b8664ca68b515d9e2d1b4.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-25 09:48:25 -06:00
Pavel Begunkov
6a9ce66f4d io_uring/net: make page accounting more consistent
Make network page accounting more consistent with how buffer
registration is working, i.e. account all memory to ctx->user.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4aacfe64bbb81b27f9ecf5d5c219c69a07e5aa56.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-25 09:48:25 -06:00
Pavel Begunkov
2e32ba5607 io_uring/net: checks errors of zc mem accounting
mm_account_pinned_pages() may fail, don't ignore the return value.

Fixes: e29e3bd4b9 ("io_uring: account locked pages for non-fixed zc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dae0542ed8e6706071bb83ad3e7ad6a70d207fd9.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-25 09:48:17 -06:00
Pavel Begunkov
3ff1a0d395 io_uring: enable managed frags with register buffers
io_uring's registered buffers infra has a good performant way of pinning
pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely
register buffer backed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/278731d3f20caf346cfc025fbee0b4c9ee4ed751.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:41:07 -06:00
Pavel Begunkov
63809137eb io_uring: flush notifiers after sendzc
Allow to flush notifiers as a part of sendzc request by setting
IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will
flush the used [active] notifier.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:41:07 -06:00
Pavel Begunkov
10c7d33ecd io_uring: sendzc with fixed buffers
Allow zerocopy sends to use fixed buffers. There is an optimisation for
this case, the network layer don't need to reference the pages, see
SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed
buffers until the notifier is released.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1d8bd1b5934e541d90c1824eb4020ae3f5f43f3.1657643355.git.asml.silence@gmail.com
[axboe: fold in 32-bit pointer cast warning fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:41:07 -06:00
Pavel Begunkov
092aeedb75 io_uring: allow to pass addr into sendzc
Allow to specify an address to zerocopy sends making it more like
sendto(2).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70417a8f7c5b51ab454690bae08adc0c187f89e8.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:41:07 -06:00
Pavel Begunkov
e29e3bd4b9 io_uring: account locked pages for non-fixed zc
Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec
based zerocopy sends. Do the accounting on the io_uring side.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19b6e3975440f59f1f6199c7ee7acf977b4eecdc.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:41:07 -06:00
Pavel Begunkov
06a5464be8 io_uring: wire send zc request type
Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from
IORING_OP_SEND is that the user should specify a notification slot
index in sqe::notification_idx and the buffers are safe to reuse only
when the used notification is flushed and completes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a80387c6a68ce9cf99b3b6ef6f71068468761fb7.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:41:07 -06:00
Pavel Begunkov
e02b665127 io_uring: initialise msghdr::msg_ubuf
Initialise newly added ->msg_ubuf in io_recv() and io_send().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8f9f263875a4a36e7f26cc5d55ebe315308f57d.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:41:06 -06:00
Jens Axboe
4f6a94d337 net: fix compat pointer in get_compat_msghdr()
A previous change enabled external users to copy the data before
calling __get_compat_msghdr(), but didn't modify get_compat_msghdr() or
__io_compat_recvmsg_copy_hdr() to take that into account. They are both
stil passing in the __user pointer rather than the copied version.

Ensure we pass in the kernel struct, not the pointer to the user data.

Link: https://lore.kernel.org/all/46439555-644d-08a1-7d66-16f8f9a320f0@samsung.com/
Fixes: 1a3e4e94a1b9 ("net: copy from user before calling __get_compat_msghdr")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:18 -06:00
Dylan Yudaken
9b0fc3c054 io_uring: fix types in io_recvmsg_multishot_overflow
io_recvmsg_multishot_overflow had incorrect types on non x64 system.
But also it had an unnecessary INT_MAX check, which could just be done
by changing the type of the accumulator to int (also simplifying the
casts).

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a8b38c4ce724 ("io_uring: support multishot in recvmsg")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220715130252.610639-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:18 -06:00
Dylan Yudaken
9bb66906f2 io_uring: support multishot in recvmsg
Similar to multishot recv, this will require provided buffers to be
used. However recvmsg is much more complex than recv as it has multiple
outputs. Specifically flags, name, and control messages.

Support this by introducing a new struct io_uring_recvmsg_out with 4
fields. namelen, controllen and flags match the similar out fields in
msghdr from standard recvmsg(2), payloadlen is the length of the payload
following the header.
This struct is placed at the start of the returned buffer. Based on what
the user specifies in struct msghdr, the next bytes of the buffer will be
name (the next msg_namelen bytes), and then control (the next
msg_controllen bytes). The payload will come at the end. The return value
in the CQE is the total used size of the provided buffer.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-4-dylany@fb.com
[axboe: style fixups, see link]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:18 -06:00
Dylan Yudaken
72c531f8ef net: copy from user before calling __get_compat_msghdr
this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
7fa875b8e5 net: copy from user before calling __copy_msghdr
this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
6d2f75a0cf io_uring: support 0 length iov in buffer select in compat
Match up work done in "io_uring: allow iov_len = 0 for recvmsg and buffer
select", but for compat code path.

Fixes: a68caad69ce5 ("io_uring: allow iov_len = 0 for recvmsg and buffer select")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220708181838.1495428-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
e2df2ccb75 io_uring: fix multishot ending when not polled
If multishot is not actually polling then return IOU_OK rather than the
result.
If the result was > 0 this will confuse things further up the callstack
which expect a return <= 0.

Fixes: 1300ebb20286 ("io_uring: multishot recv")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220708181838.1495428-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Jens Axboe
43e0bbbd0b io_uring: add netmsg cache
For recvmsg/sendmsg, if they don't complete inline, we currently need
to allocate a struct io_async_msghdr for each request. This is a
somewhat large struct.

Hook up sendmsg/recvmsg to use the io_alloc_cache. This reduces the
alloc + free overhead considerably, yielding 4-5% of extra performance
running netbench.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
cf0dd9527e io_uring: disable multishot recvmsg
recvmsg has semantics that do not make it trivial to extend to
multishot. Specifically it has user pointers and returns data in the
original parameter. In order to make this API useful these will need to be
somehow included with the provided buffers.

For now remove multishot for recvmsg as it is not useful.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220704140106.200167-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
b3fdea6ecb io_uring: multishot recv
Support multishot receive for io_uring.
Typical server applications will run a loop where for each recv CQE it
requeues another recv/recvmsg.

This can be simplified by using the existing multishot functionality
combined with io_uring's provided buffers.
The API is to add the IORING_RECV_MULTISHOT flag to the SQE. CQEs will
then be posted (with IORING_CQE_F_MORE flag set) when data is available
and is read. Once an error occurs or the socket ends, the multishot will
be removed and a completion without IORING_CQE_F_MORE will be posted.

The benefit to this is that the recv is much more performant.
 * Subsequent receives are queued up straight away without requiring the
   application to finish a processing loop.
 * If there are more data in the socket (sat the provided buffer size is
   smaller than the socket buffer) then the data is immediately
   returned, improving batching.
 * Poll is only armed once and reused, saving CPU cycles

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-11-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
cbd2574854 io_uring: fix multishot accept ordering
Similar to multishot poll, drop multishot accept when CQE overflow occurs.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-10-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
52120f0fad io_uring: add allow_overflow to io_post_aux_cqe
Some use cases of io_post_aux_cqe would not want to overflow as is, but
might want to change the flags/result. For example multishot receive
requires in order CQE, and so if there is an overflow it would need to
stop receiving until the overflow is taken care of.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-8-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
d4e097dae2 io_uring: recycle buffers on error
Rather than passing an error back to the user with a buffer attached,
recycle the buffer immediately.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-5-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:16 -06:00
Dylan Yudaken
5702196e7d io_uring: allow iov_len = 0 for recvmsg and buffer select
When using BUFFER_SELECT there is no technical requirement that the user
actually provides iov, and this removes one copy_from_user call.

So allow iov_len to be 0.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:16 -06:00
Pavel Begunkov
27a9d66fec io_uring: kill extra io_uring_types.h includes
io_uring/io_uring.h already includes io_uring_types.h, no need to
include it every time. Kill it in a bunch of places, it prepares us for
following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
d245bca637 io_uring: don't expose io_fill_cqe_aux()
Deduplicate some code and add a helper for filling an aux CQE, locking
and notification.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7c6557c8f9dc5c4cfb01292116c682a0ff61081.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Jens Axboe
3b77495a97 io_uring: split provided buffers handling into its own file
Move both the opcodes related to it, and the internals code dealing with
it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
f9ead18c10 io_uring: split network related opcodes into its own file
While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00