linux-next/io_uring
Lin Ma 12ad3d2d6c io_uring/poll: fix poll_refs race with cancelation
There is an interesting race condition of poll_refs which could result
in a NULL pointer dereference. The crash trace is like:

KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 30781 Comm: syz-executor.2 Not tainted 6.0.0-g493ffd6605b2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:io_poll_remove_entry io_uring/poll.c:154 [inline]
RIP: 0010:io_poll_remove_entries+0x171/0x5b4 io_uring/poll.c:190
Code: ...
RSP: 0018:ffff88810dfefba0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000040000
RDX: ffffc900030c4000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: 0000000000000008 R08: ffffffff9764d3dd R09: fffffbfff3836781
R10: fffffbfff3836781 R11: 0000000000000000 R12: 1ffff11003422d60
R13: ffff88801a116b04 R14: ffff88801a116ac0 R15: dffffc0000000000
FS:  00007f9c07497700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffb5c00ea98 CR3: 0000000105680005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 io_apoll_task_func+0x3f/0xa0 io_uring/poll.c:299
 handle_tw_list io_uring/io_uring.c:1037 [inline]
 tctx_task_work+0x37e/0x4f0 io_uring/io_uring.c:1090
 task_work_run+0x13a/0x1b0 kernel/task_work.c:177
 get_signal+0x2402/0x25a0 kernel/signal.c:2635
 arch_do_signal_or_restart+0x3b/0x660 arch/x86/kernel/signal.c:869
 exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
 exit_to_user_mode_prepare+0xc2/0x160 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
 syscall_exit_to_user_mode+0x58/0x160 kernel/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause for this is a tiny overlooking in
io_poll_check_events() when cocurrently run with poll cancel routine
io_poll_cancel_req().

The interleaving to trigger use-after-free:

CPU0                                       |  CPU1
                                           |
io_apoll_task_func()                       |  io_poll_cancel_req()
 io_poll_check_events()                    |
  // do while first loop                   |
  v = atomic_read(...)                     |
  // v = poll_refs = 1                     |
  ...                                      |  io_poll_mark_cancelled()
                                           |   atomic_or()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |
  atomic_sub_return(...)                   |
  // poll_refs = IO_POLL_CANCEL_FLAG       |
  // loop continue                         |
                                           |
                                           |  io_poll_execute()
                                           |   io_poll_get_ownership()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |   // gets the ownership
  v = atomic_read(...)                     |
  // poll_refs not change                  |
                                           |
  if (v & IO_POLL_CANCEL_FLAG)             |
   return -ECANCELED;                      |
  // io_poll_check_events return           |
  // will go into                          |
  // io_req_complete_failed() free req     |
                                           |
                                           |  io_apoll_task_func()
                                           |  // also go into
io_req_complete_failed()

And the interleaving to trigger the kernel WARNING:

CPU0                                       |  CPU1
                                           |
io_apoll_task_func()                       |  io_poll_cancel_req()
 io_poll_check_events()                    |
  // do while first loop                   |
  v = atomic_read(...)                     |
  // v = poll_refs = 1                     |
  ...                                      |  io_poll_mark_cancelled()
                                           |   atomic_or()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |
  atomic_sub_return(...)                   |
  // poll_refs = IO_POLL_CANCEL_FLAG       |
  // loop continue                         |
                                           |
  v = atomic_read(...)                     |
  // v = IO_POLL_CANCEL_FLAG               |
                                           |  io_poll_execute()
                                           |   io_poll_get_ownership()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |   // gets the ownership
                                           |
  WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))   |
  // v & IO_POLL_REF_MASK = 0 WARN         |
                                           |
                                           |  io_apoll_task_func()
                                           |  // also go into
io_req_complete_failed()

By looking up the source code and communicating with Pavel, the
implementation of this atomic poll refs should continue the loop of
io_poll_check_events() just to avoid somewhere else to grab the
ownership. Therefore, this patch simply adds another AND operation to
make sure the loop will stop if it finds the poll_refs is exactly equal
to IO_POLL_CANCEL_FLAG. Since io_poll_cancel_req() grabs ownership and
will finally make its way to io_req_complete_failed(), the req will
be reclaimed as expected.

Fixes: aa43477b04 ("io_uring: poll rework")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: tweak description and code style]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25 07:17:33 -07:00
..
advise.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
advise.h
alloc_cache.h
cancel.c io_uring: add IORING_SETUP_DEFER_TASKRUN 2022-09-21 10:30:42 -06:00
cancel.h
epoll.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
epoll.h
fdinfo.c io_uring: fix fdinfo sqe offsets calculation 2022-10-12 16:30:56 -06:00
fdinfo.h
filetable.c io_uring/filetable: fix file reference underflow 2022-11-25 06:54:46 -07:00
filetable.h io_uring: kill hot path fixed file bitmap debug checks 2022-10-16 17:07:53 -06:00
fs.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
fs.h
io_uring.c io_uring: fix multishot accept request leaks 2022-11-17 12:33:33 -07:00
io_uring.h io_uring: fix multishot accept request leaks 2022-11-17 12:33:33 -07:00
io-wq.c io-wq: Fix memory leak in worker creation 2022-10-20 05:48:59 -07:00
io-wq.h
kbuf.c io_uring: check for rollover of buffer ID when providing buffers 2022-11-10 11:07:41 -07:00
kbuf.h io_uring: allow buffer recycling in READV 2022-09-21 10:30:43 -06:00
Makefile
msg_ring.c io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd() 2022-10-19 12:33:33 -07:00
msg_ring.h
net.c io_uring: fix multishot recv request leaks 2022-11-17 12:33:33 -07:00
net.h io_uring/net: zerocopy sendmsg 2022-09-21 13:15:02 -06:00
nop.c
nop.h
notif.c io_uring/notif: Remove the unused function io_notif_complete() 2022-09-05 11:42:39 -06:00
notif.h io_uring/net: simplify zerocopy send user API 2022-09-01 09:13:33 -06:00
opdef.c io_uring/opdef: remove 'audit_skip' from SENDMSG_ZC 2022-10-12 16:30:56 -06:00
opdef.h io_uring: add custom opcode hooks on fail 2022-09-21 13:15:02 -06:00
openclose.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
openclose.h
poll.c io_uring/poll: fix poll_refs race with cancelation 2022-11-25 07:17:33 -07:00
poll.h
refs.h
rsrc.c io_uring: remove FFS_SCM 2022-10-16 17:07:12 -06:00
rsrc.h io_uring: remove FFS_SCM 2022-10-16 17:07:12 -06:00
rw.c io_uring/rw: remove leftover debug statement 2022-10-16 17:24:10 -06:00
rw.h io_uring/rw: don't lose partial IO result on fail 2022-09-21 13:15:02 -06:00
slist.h
splice.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
splice.h
sqpoll.c audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() 2022-08-04 08:33:54 -06:00
sqpoll.h
statx.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
statx.h
sync.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
sync.h
tctx.c io_uring: remove io_register_submitter 2022-10-07 12:25:30 -06:00
tctx.h io_uring: simplify __io_uring_add_tctx_node 2022-10-07 12:25:30 -06:00
timeout.c io_uring: remove unused return from io_disarm_next 2022-09-21 13:15:01 -06:00
timeout.h io_uring: remove unused return from io_disarm_next 2022-09-21 13:15:01 -06:00
uring_cmd.c io_uring: introduce fixed buffer support for io_uring_cmd 2022-09-30 07:50:59 -06:00
uring_cmd.h
xattr.c __io_setxattr(): constify path 2022-09-01 17:39:05 -04:00
xattr.h