Commit Graph

1154954 Commits

Author SHA1 Message Date
Dylan Yudaken
6bb3085556 io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async
REQ_F_FORCE_ASYNC was being ignored for re-queueing linked
requests. Instead obey that flag.

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20230127135227.3646353-2-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:18:26 -07:00
Jens Axboe
f586800854 io_uring: add reschedule point to handle_tw_list()
If CONFIG_PREEMPT_NONE is set and the task_work chains are long, we
could be running into issues blocking others for too long. Add a
reschedule check in handle_tw_list(), and flush the ctx if we need to
reschedule.

Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Jens Axboe
fcc926bb85 io_uring: add a conditional reschedule to the IOPOLL cancelation loop
If the kernel is configured with CONFIG_PREEMPT_NONE, we could be
sitting in a tight loop reaping events but not giving them a chance to
finish. This results in a trace ala:

rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 	2-...!: (5249 ticks this GP) idle=935c/1/0x4000000000000000 softirq=4265/4274 fqs=1
	(t=5251 jiffies g=465 q=4135 ncpus=4)
rcu: rcu_sched kthread starved for 5249 jiffies! g465 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_sched       state:R  running task     stack:0     pid:12    ppid:2      flags:0x00000008
Call trace:
 __switch_to+0xb0/0xc8
 __schedule+0x43c/0x520
 schedule+0x4c/0x98
 schedule_timeout+0xbc/0xdc
 rcu_gp_fqs_loop+0x308/0x344
 rcu_gp_kthread+0xd8/0xf0
 kthread+0xb8/0xc8
 ret_from_fork+0x10/0x20
rcu: Stack dump where RCU GP kthread last ran:
Task dump for CPU 0:
task:kworker/u8:10   state:R  running task     stack:0     pid:89    ppid:2      flags:0x0000000a
Workqueue: events_unbound io_ring_exit_work
Call trace:
 __switch_to+0xb0/0xc8
 0xffff0000c8fefd28
CPU: 2 PID: 95 Comm: kworker/u8:13 Not tainted 6.2.0-rc5-00042-g40316e337c80-dirty #2759
Hardware name: linux,dummy-virt (DT)
Workqueue: events_unbound io_ring_exit_work
pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : io_do_iopoll+0x344/0x360
lr : io_do_iopoll+0xb8/0x360
sp : ffff800009bebc60
x29: ffff800009bebc60 x28: 0000000000000000 x27: 0000000000000000
x26: ffff0000c0f67d48 x25: ffff0000c0f67840 x24: ffff800008950024
x23: 0000000000000001 x22: 0000000000000000 x21: ffff0000c27d3200
x20: ffff0000c0f67840 x19: ffff0000c0f67800 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000
x11: 0000000000000179 x10: 0000000000000870 x9 : ffff800009bebd60
x8 : ffff0000c27d3ad0 x7 : fefefefefefefeff x6 : 0000646e756f626e
x5 : ffff0000c0f67840 x4 : 0000000000000000 x3 : ffff0000c2398000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 io_do_iopoll+0x344/0x360
 io_uring_try_cancel_requests+0x21c/0x334
 io_ring_exit_work+0x90/0x40c
 process_one_work+0x1a4/0x254
 worker_thread+0x1ec/0x258
 kthread+0xb8/0xc8
 ret_from_fork+0x10/0x20

Add a cond_resched() in the cancelation IOPOLL loop to fix this.

Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
50470fc572 io_uring: return normal tw run linking optimisation
io_submit_flush_completions() may produce new task_work items, so it's a
good idea to recheck the task_work list after flushing completions. The
optimisation is not new and was accidentially removed by
f88262e60b ("io_uring: lockless task list")

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a7ed5ede84de190832cc33ebbcdd6e91cd90f5b6.1674484266.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
cb6bf7f285 io_uring: refactor tctx_task_work
Merge almost identical sections of tctx_task_work(), this will make code
modifications later easier and also inlines handle_tw_list().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d06592d91e3e7559e7a4dbb8907d110863008dc7.1674484266.git.asml.silence@gmail.com
[axboe: fold in setting count to zero patch from Tom Rix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
5afa465071 io_uring: refactor io_put_task helpers
Add a helper for putting refs from the target task context, rename
__io_put_task() and add a couple of comments around. Use the remote
version for __io_req_complete_post(), the local is only needed for
__io_submit_flush_completions().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3bf92ebd594769d8a5d648472a8e335f2031d542.1674484266.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
c8576f3e61 io_uring: refactor req allocation
Follow the io_get_sqe pattern returning the result via a pointer
and hide request cache refill inside io_alloc_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8c37c2e8a3cb5e4cd6a8ae3b91371227a92708a6.1674484266.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
b5083dfa36 io_uring: improve io_get_sqe
Return an SQE from io_get_sqe() as a parameter and use the return value
to determine if it failed or not. This enables the compiler to compile out
the sqe NULL check when we know that the return SQE is valid.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9cceb11329240ea097dffef6bf0a675bca14cf42.1674484266.git.asml.silence@gmail.com
[axboe: remove bogus const modifier on return value]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
b2aa66aff6 io_uring: kill outdated comment about overflow flush
__io_cqring_overflow_flush() doesn't return anything anymore, remove
outdate comment.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4ce2bcbb17eac80cdf883fd1459d5ee6586e238c.1674484266.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
c10bb64684 io_uring: use user visible tail in io_uring_poll()
We return POLLIN from io_uring_poll() depending on whether there are
CQEs for the userspace, and so we should use the user visible tail
pointer instead of a transient cached value.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/228ffcbf30ba98856f66ffdb9a6a60ead1dd96c0.1674484266.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Jens Axboe
f499254474 io_uring: pass in io_issue_def to io_assign_file()
This generates better code for me, avoiding an extra load on arm64, and
both call sites already have this variable available for easy passing.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Breno Leitao
c1755c25a7 io_uring: Enable KASAN for request cache
Every io_uring request is represented by struct io_kiocb, which is
cached locally by io_uring (not SLAB/SLUB) in the list called
submit_state.freelist. This patch simply enabled KASAN for this free
list.

This list is initially created by KMEM_CACHE, but later, managed by
io_uring. This patch basically poisons the objects that are not used
(i.e., they are the free list), and unpoisons it when the object is
allocated/removed from the list.

Touching these poisoned objects while in the freelist will cause a KASAN
warning.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Jens Axboe
b5d3ae202f io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work()
for PF_IO_WORKER threads. They never return to usermode, hence never get
a chance to process any items that are marked by this flag. Most notably
this includes the final put of files, but also any throttling markers set
by block cgroups.

Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Jens Axboe
8572df941c io_uring/msg-ring: ensure flags passing works for task_work completions
If the target ring is using IORING_SETUP_SINGLE_ISSUER and we're posting
a message from a different thread, then we need to ensure that the
fallback task_work that posts the CQE knwos about the flags passing as
well. If not we'll always be posting 0 as the flags.

Fixes: 3563d7ed58a5 ("io_uring/msg_ring: Pass custom flags to the cqe")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Breno Leitao
f30bd4d038 io_uring: Split io_issue_def struct
This patch removes some "cold" fields from `struct io_issue_def`.

The plan is to keep only highly used fields into `struct io_issue_def`, so,
it may be hot in the cache. The hot fields are basically all the bitfields
and the callback functions for .issue and .prep.

The other less frequently used fields are now located in a secondary and
cold struct, called `io_cold_def`.

This is the size for the structs:

Before: io_issue_def = 56 bytes
After: io_issue_def = 24 bytes; io_cold_def = 40 bytes

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20230112144411.2624698-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Breno Leitao
a7dd27828b io_uring: Rename struct io_op_def
The current io_op_def struct is becoming huge and the name is a bit
generic.

The goal of this patch is to rename this struct to `io_issue_def`. This
struct will contain the hot functions associated with the issue code
path.

For now, this patch only renames the structure, and an upcoming patch
will break up the structure in two, moving the non-issue fields to a
secondary struct.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20230112144411.2624698-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
68a2cc1bba io_uring: refactor __io_req_complete_post
Keep parts of __io_req_complete_post() relying on req->flags together so
the value can be cached.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2b4fbb42f404a0e75c4d9f0a5b16f314a839d0a9.1673887636.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
632ffe0956 io_uring: optimise ctx flags layout
There may be different cost for reeading just one byte or more, so it's
benificial to keep ctx flag bits that we access together in a single
byte. That affected code generation of __io_cq_unlock_post_flush() and
removed one memory load.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bbe8ca4705704690319d65e45845f9fc9d35f420.1673887636.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
31f084b7b0 io_uring: simplify fallback execution
Lock the ring with uring_lock in io_fallback_req_func(), which should
make it a bit safer and easier. With that we also don't need refs
pinning as io_ring_exit_work() will wait until uring_lock is freed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/56170e6a0cbfc8edee2794c6613e8f6f1d76d276.1673887636.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
89800a2dd5 io_uring: don't export io_put_task()
io_put_task() is only used in uring.c so enclose it there together with
__io_put_task().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/43c7f9227e2ab215f1a6069dadbc5382bed346fe.1673887636.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Pavel Begunkov
b0b7a7d24b io_uring: return back links tw run optimisation
io_submit_flush_completions() may queue new requests for tw execution,
especially true for linked requests. Recheck the tw list for emptiness
after flushing completions.

Note that this doesn't really fix the commit referenced below, but it
does reinstate an optimization that existed before that got merged.

Fixes: f88262e60b ("io_uring: lockless task list")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6328acdbb5e60efc762b18003382de077e6e1367.1673887636.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:41 -07:00
Quanfa Fu
88b80534f6 io_uring: make io_sqpoll_wait_sq return void
Change the return type to void since it always return 0, and no need
to do the checking in syscall io_uring_enter.

Signed-off-by: Quanfa Fu <quanfafu@gmail.com>
Link: https://lore.kernel.org/r/20230115071519.554282-1-quanfafu@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
c3f4d39ee4 io_uring: optimise deferred tw execution
We needed fake nodes in __io_run_local_work() and to avoid unecessary wake
ups while the task already running task_works, but we don't need them
anymore since wake ups are protected by cq_waiting, which is always
cleared by the time we're executing deferred task_work items.

Note that because of loose sync around cq_waiting clearing
io_req_local_work_add() may wake the task more than once, but that's
fine and should be rare to not hurt perf.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8839534891f0a2f1076e78554a31ea7e099f7de5.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
d80c0f00d0 io_uring: add io_req_local_work_add wake fast path
Don't wake the master task after queueing a deferred tw unless it's
currently waiting in io_cqring_wait.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/717702d772825a6647e6c315b4690277ba84c3fc.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
130bd686d9 io_uring: waitqueue-less cq waiting
With DEFER_TASKRUN only ctx->submitter_task might be waiting for CQEs,
we can use this to optimise io_cqring_wait(). Replace ->cq_wait
waitqueue with waking the task directly.

It works but misses an important optimisation covered by the following
patch, so this patch without follow ups might hurt performance.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/103d174d35d919d4cb0922d8a9c93a8f0c35f74a.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
3181e22fb7 io_uring: wake up optimisations
Flush completions is done either from the submit syscall or by the
task_work, both are in the context of the submitter task, and when it
goes for a single threaded rings like implied by ->task_complete, there
won't be any waiters on ->cq_wait but the master task. That means that
there can be no tasks sleeping on cq_wait while we run
__io_submit_flush_completions() and so waking up can be skipped.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60ad9768ec74435a0ddaa6eec0ffa7729474f69f.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
bca39f3905 io_uring: add lazy poll_wq activation
Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do
for another waitqueue, it's not going to be the case in the future and
so we want to have a fast path for it when the ring has never been
polled.

Move poll_wq wake ups into __io_commit_cqring_flush() using a new flag
called ->poll_activated. The idea behind the flag is to set it when the
ring was polled for the first time. This requires additional sync to not
miss events, which is done here by using task_work for ->task_complete
rings, and by default enabling the flag for all other types of rings.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/060785e8e9137a920b232c0c7f575b131af19cac.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
7b235dd82a io_uring: separate wq for ring polling
Don't use ->cq_wait for ring polling but add a separate wait queue for
it. We need it for following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dea0be0bf990503443c5c6c337fc66824af7d590.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
360173ab9e io_uring: move io_run_local_work_locked
io_run_local_work_locked() is only used in io_uring.c, move it there.
With that we can also make __io_run_local_work() static.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/91757bcb33e5774e49fed6f2b6e058630608119b.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
3e5655552a io_uring: mark io_run_local_work static
io_run_local_work is enclosed in io_uring.c, we don't need to export it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b477fb81f5e77044f724a06fe245d5c078659364.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
2f413956cc io_uring: don't set TASK_RUNNING in local tw runner
The CQ waiting loop sets TASK_RUNNING before trying to execute
task_work, no need to repeat it in io_run_local_work().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d9422c429ef3f9457b4f4b8288bf4789564f33b.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
bd550173ac io_uring: refactor io_wake_function
Remove a local variable ctx in io_wake_function(), we don't need it if
io_should_wake() triggers it to wake up.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e60eb1008aebe286aab7d34c772ed01c447bddb1.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
dde40322ae io_uring: move submitter_task out of cold cacheline
->submitter_task is used somewhat more frequent now than before, i.e.
for local tw enqueue and run, let's move it from the end of ctx, which
is full of cold data, to the first cacheline with mostly constants.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/415ca91dc5ad1dec612b892e489cda98e1069542.1673274244.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Dmitrii Bundin
81594e7e7a io_uring: remove excessive unlikely on IS_ERR
The IS_ERR function uses the IS_ERR_VALUE macro under the hood which
already wraps the condition into unlikely.

Signed-off-by: Dmitrii Bundin <dmitrii.bundin.a@gmail.com>
Link: https://lore.kernel.org/r/20230109185854.25698-1-dmitrii.bundin.a@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Breno Leitao
cbeb47a7b5 io_uring/msg_ring: Pass custom flags to the cqe
This patch adds a new flag (IORING_MSG_RING_FLAGS_PASS) in the message
ring operations (IORING_OP_MSG_RING). This new flag enables the sender
to specify custom flags, which will be copied over to cqe->flags in the
receiving ring.  These custom flags should be specified using the
sqe->file_index field.

This mechanism provides additional flexibility when sending messages
between rings.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230103160507.617416-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
d33a39e577 io_uring: keep timeout in io_wait_queue
Move waiting timeout into io_wait_queue

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e4b48a9e26a3b1cf97c80121e62d4b5ab873d28d.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
46ae7eef44 io_uring: optimise non-timeout waiting
Unlike the jiffy scheduling version, schedule_hrtimeout() jumps a few
functions before getting into schedule() even if there is no actual
timeout needed. Some tests showed that it takes up to 1% of CPU cycles.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/89f880574eceee6f4899783377ead234df7b3d04.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
326a9e482e io_uring: set TASK_RUNNING right after schedule
Instead of constantly watching that the state of the task is running
before executing tw or taking locks in io_cqring_wait(), switch it back
to TASK_RUNNING immediately.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/246dddee247d89fd52023f785ed17cc34962a008.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
490c00eb4f io_uring: simplify io_has_work
->work_llist should never be non-empty for a non DEFER_TASKRUN ring, so
we can safely skip checking the flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/26af9f73c09a56c9a035f94db56127358688f3aa.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
846072f16e io_uring: mimimise io_cqring_wait_schedule
io_cqring_wait_schedule() is called after we started waiting on the cq
wq and set the state to TASK_INTERRUPTIBLE, for that reason we have to
constantly worry whether we has returned the state back to running or
not. Leave only quick checks in io_cqring_wait_schedule() and move the
rest including running task work to the callers. Note, we run tw in the
loop after the sched checks because of the fast path in the beginning of
the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2814fabe75e2e019e7ca43ea07daa94564349805.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:40 -07:00
Pavel Begunkov
3fcf19d592 io_uring: parse check_cq out of wq waiting
We already avoid flushing overflows in io_cqring_wait_schedule() but
only return an error for the outer loop to handle it. Minimise it even
further by moving all ->check_cq parsing there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9dfcec3121013f98208dbf79368d636d74e1231a.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:39 -07:00
Pavel Begunkov
140102ae9a io_uring: move defer tw task checks
Most places that want to run local tw explicitly and in advance check if
they are allowed to do so. Don't rely on a similar check in
__io_run_local_work(), leave it as a just-in-case warning and make sure
callers checks capabilities themselves.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/990fe0e8e70fd4d57e43625e5ce8fba584821d1a.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:39 -07:00
Pavel Begunkov
1414d62985 io_uring: kill io_run_task_work_ctx
There is only one user of io_run_task_work_ctx(), inline it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/40953c65f7c88fb00cdc4d870ca5d5319fb3d7ea.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:39 -07:00
Pavel Begunkov
f36ba6cf1a io_uring: don't iterate cq wait fast path
Task work runners keep running until all queues tw items are exhausted.
It's also rare for defer tw to queue normal tw and vise versa. Taking it
into account, there is only a dim chance that further iterating the
io_cqring_wait() fast path will get us anything and so we can remove
the loop there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1f9565726661266abaa5d921e97433c831759ecf.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:39 -07:00
Pavel Begunkov
0c4fe008c9 io_uring: rearrange defer list checks
There should be nothing in the ->work_llist for non DEFER_TASKRUN rings,
so we can skip flag checks and test the list emptiness directly. Also
move it out of io_run_local_work() for inlining.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/331d63fd15ca79b35b95c82a82d9246110686392.1672916894.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29 15:17:39 -07:00
Linus Torvalds
6d796c50f8 Linux 6.2-rc6 2023-01-29 13:59:43 -08:00
Linus Torvalds
ab072681ea - Cleanup the firmware node for the new IRQ MSI domain properly, to
avoid leaking memory
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPWd70ACgkQEsHwGGHe
 VUpTgxAAjKo7kjhyE9PjKA8tJ5pv6I/21tieu/JAmRyb8AVkhi21elXJaRoEPW4k
 2TuSd0WMxZeoRJ1N366HcGPZcsldKN45v3xXEv1e3XSL6ZTAIFsYvwni6adwVocS
 ztnZBrMuHf1KZW/6ES956qGmWsum0rYcU2sr0kvnS9ihPiXJsbeLfNF1VXPwaH2y
 DTnTxgl/QtuipSQYOUvLSk1XBduvE+Wx8kE1S6TrdlWE9qsD/Jn1Jn7btaLQeHfs
 27AXTNGYnVHeRZSVy6d38dIwU4Gh6LcLMFfTtNENIqTVCUfc/QcAb/gItDF0NvIr
 fgKbkdmI7jc0rIfLGlBUM8WZ0OaxopFwvchVywjcogDpCfLvJbgR7JplEKTfltgz
 62IjvTF6vXUCcjySwu/C48EE2G7CWL0FtIcJofZsI0yTeHESkwFf1r5eOjnTV2ao
 M2QYGeToRtk7z9vb+pa0ZttD55TKXQKZDBv+lBRKLTy8SeIzlExJswIAuGTpWuKR
 ntwrdtekyM2HNir/l9ad1X2mjVblivEEv1hLMpsxPsthigWXDr+PZtQIxHCao8wj
 aKY4bI0F1AXByuKhQAwX/RVHUmB+AGrzJ/M5Fdj+ealQ6Nk+xCrPyTX2qDmveiSF
 NlQZpGCxV+gI/8q9YU4coJm1AioT8TSFsMxaufGZCreqqaWBXl0=
 =qrIZ
 -----END PGP SIGNATURE-----

Merge tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Borislav Petkov:

 - Cleanup the firmware node for the new IRQ MSI domain properly, to
   avoid leaking memory

* tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/msi: Free the fwnode created by msi_create_device_irq_domain()
2023-01-29 11:26:49 -08:00
Linus Torvalds
bc6bc34b10 - Start checking for -mindirect-branch-cs-prefix clang support too now that LLVM
16 will support it
 
 - Fix a NULL ptr deref when suspending with Xen PV
 
 - Have a SEV-SNP guest check explicitly for features enabled by the hypervisor
   and fail gracefully if some are unsupported by the guest instead of failing in
   a non-obvious and hard-to-debug way
 
 - Fix a MSI descriptor leakage under Xen
 
 - Mark Xen's MSI domain as supporting MSI-X
 
 - Prevent legacy PIC interrupts from being resent in software by marking them
   level triggered, as they should be, which lead to a NULL ptr deref
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPWdaAACgkQEsHwGGHe
 VUrHUBAAvRh7cDVKvr2wPNJ+9RFjPFubcLq/+4yTe4Bu14wnYn8+gb2S7P2bPJWl
 TxbZhJGV2IeYyB+aJybjGwX96AzE5lWD6epQLRLI/BOVmqLA8Eyw9te0NEQDd9Wq
 hgY1/hhUmLdpXubq09YjIht5nlxVZ+JFfppouTR7LVtZl7MFvQbAOU8rWzpcbm7l
 UlA4GLakFeOYpbI5g+zzsZ1a1M0cSz8wQ43plD5aAtNy2wKbWiA4QDXJo9J7+ZQg
 1FmOHZ3ChPjEhf9k/N2uAZ6RUHVEDIJ7hDfsM3j/qsKGzCV4cho2Ify+0PFWo4pt
 FO2bRh1yDFqdz4m6ulhnmuMnoRnEuswwrTzrG+HYu1ntpUt72dULmmRBpJ0s6C7W
 BHpCThNWIlcfHbLNY+VAOJ2hiOqfU/8ld+8R0sn7xmomjgCRYHh9kDGOeuvh1hJl
 jfITDuL2gjcj3Ph6+xh6KvOga41ff3EcxkfvqolZ/emRllPiDWsKYXVgUIl22FHt
 23xH7gutPCp27MdoXM1EaWs5s/PQfctvm4LrW6XS8IjWURLo4RrkU6y7YD5VKVy8
 KCKcpF+JMdE1Ao1WpMFfjDG4ObyMYyqnD790nQVM1e5kIhOpeWaa7WzkGFRyIo6Q
 5BU3GGDyMT8SHy7NFhQL62YJe0P2ZctNIDjiTQlYDnWWhjmvk8I=
 =4QMN
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Start checking for -mindirect-branch-cs-prefix clang support too now
   that LLVM 16 will support it

 - Fix a NULL ptr deref when suspending with Xen PV

 - Have a SEV-SNP guest check explicitly for features enabled by the
   hypervisor and fail gracefully if some are unsupported by the guest
   instead of failing in a non-obvious and hard-to-debug way

 - Fix a MSI descriptor leakage under Xen

 - Mark Xen's MSI domain as supporting MSI-X

 - Prevent legacy PIC interrupts from being resent in software by
   marking them level triggered, as they should be, which lead to a NULL
   ptr deref

* tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
  acpi: Fix suspend with Xen PV
  x86/sev: Add SEV-SNP guest feature negotiation support
  x86/pci/xen: Fixup fallout from the PCI/MSI overhaul
  x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain
  x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
2023-01-29 11:17:34 -08:00
Linus Torvalds
80826e9020 Input updates for v6.2-rc5
- touchpads on HP 15-* laptops switched back to PS/2 emulation mode
 
 - a quirk for Clevo PCX0DX/TUXEDO XP1511 to make sure keyboard is
   responding after resume
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCY9XAkgAKCRBAj56VGEWX
 nNVfAQCktxtfTvXVFHfeEzCsxggvkyqRGGsQSrnxoe4bQ25gnQEA99pSlrs3lbvP
 mvqPm1ufI2EYZJVfiFKK5tl3n9K8BgM=
 =HWA5
 -----END PGP SIGNATURE-----

Merge tag 'input-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - touchpads on HP 15-* laptops switched back to PS/2 emulation mode

 - a quirk for Clevo PCX0DX/TUXEDO XP1511 to make sure keyboard is
   responding after resume

* tag 'input-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - add Clevo PCX0DX to i8042 quirk table
  Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode"
2023-01-29 11:06:47 -08:00
Linus Torvalds
803929285a cxl fixes for 6.2-rc6
- Fix a crash regression due to module load order of cxl_pmem.ko
 
 - Fix wrong register offset read in CXL AER handling path
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY9W0XwAKCRDfioYZHlFs
 Z9y9AQDQXnlfFuVrSZ27lM53U+ZKQJWXV8SnPlcvbWbnVJb5owEA9J5p5ZCT0mHq
 sRFVgJoAlU73p39lcJoX8/ZP6OI6sgU=
 =bzei
 -----END PGP SIGNATURE-----

Merge tag 'cxl-fixes-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dan Williams:
 "A couple of fixes for bugs introduced during the merge window. One is
  a regression, the other was a bug in the CXL AER handler:

   - Fix a crash regression due to module load order of cxl_pmem.ko

   - Fix wrong register offset read in CXL AER handling path"

* tag 'cxl-fixes-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/pmem: Fix nvdimm unregistration when cxl_pmem driver is absent
  cxl: fix cxl_report_and_clear() RAS UE addr mis-assignment
2023-01-29 10:47:22 -08:00