70299 Commits

Author SHA1 Message Date
Pavel Begunkov
3e9424989b io_uring: simplify io_rsrc_data refcounting
We don't take many references of struct io_rsrc_data, only one per each
io_rsrc_node, so using percpu refs is overkill. Use atomic ref instead,
which is much simpler.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1551d90f7c9b183cf2f0d7b5e5b923430acb03fa.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-12 09:32:47 -06:00
Miklos Szeredi
c4fe8aef2f ovl: remove unneeded ioctls
The FS_IOC_[GS]ETFLAGS/FS_IOC_FS[GS]ETXATTR ioctls are now handled via the
fileattr api.  The only unconverted filesystem remaining is CIFS and it is
not allowed to be overlayed due to case insensitive filenames.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
72227eac17 fuse: convert to fileattr
Since fuse just passes ioctl args through to/from server, converting to the
fileattr API is more involved, than most other filesystems.

Both .fileattr_set() and .fileattr_get() need to obtain an open file to
operate on.  The simplest way is with the following sequence:

  FUSE_OPEN
  FUSE_IOCTL
  FUSE_RELEASE

If this turns out to be a performance problem, it could be optimized for
the case when there's already a file (any file) open for the inode.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
b9d54c6f29 fuse: add internal open/release helpers
Clean out 'struct file' from internal helpers.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
54d601cb67 fuse: unsigned open flags
Release helpers used signed int.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
9ac29fd3f8 fuse: move ioctl to separate source file
Next patch will expand ioctl code and fuse/file.c is large enough as it is.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
51db776a43 vfs: remove unused ioctl helpers
Remove vfs_ioc_setflags_prepare(), vfs_ioc_fssetxattr_check() and
simple_fill_fsxattr(), which are no longer used.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
8871d84c8f ubifs: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Richard Weinberger <richard@nod.at>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
03eb606613 reiserfs: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Jan Kara <jack@suse.cz>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
2b5f52c562 ocfs2: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Joel Becker <jlbec@evilplan.org>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
7c7c436e14 nilfs2: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
2ca58e30b1 jfs: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
9cbae74838 hfsplus: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
d701ea284c efivars: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
9fefd5db08 xfs: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Darrick J. Wong <djwong@kernel.org>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
1f26b0627b orangefs: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Mike Marshall <hubcap@omnibond.com>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
88b631cbfb gfs2: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
9b1bb01c8a f2fs: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
4db5c2e623 ext4: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
aba405e33e ext2: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Jan Kara <jack@suse.cz>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
97fc297754 btrfs: convert to fileattr
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: David Sterba <dsterba@suse.com>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
66dbfabf10 ovl: stack fileattr ops
Add stacking for the fileattr operations.

Add hack for calling security_file_ioctl() for now.  Probably better to
have a pair of specific hooks for these operations.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
97e2dee975 ecryptfs: stack fileattr ops
Add stacking for the fileattr operations.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Tyler Hicks <code@tyhicks.com>
2021-04-12 15:04:29 +02:00
Miklos Szeredi
4c5b479975 vfs: add fileattr ops
There's a substantial amount of boilerplate in filesystems handling
FS_IOC_[GS]ETFLAGS/ FS_IOC_FS[GS]ETXATTR ioctls.

Also due to userspace buffers being involved in the ioctl API this is
difficult to stack, as shown by overlayfs issues related to these ioctls.

Introduce a new internal API named "fileattr" (fsxattr can be confused with
xattr, xflags is inappropriate, since this is more than just flags).

There's significant overlap between flags and xflags and this API handles
the conversions automatically, so filesystems may choose which one to use.

In ->fileattr_get() a hint is provided to the filesystem whether flags or
xattr are being requested by userspace, but in this series this hint is
ignored by all filesystems, since generating all the attributes is cheap.

If a filesystem doesn't implemement the fileattr API, just fall back to
f_op->ioctl().  When all filesystems are converted, the fallback can be
removed.

32bit compat ioctls are now handled by the generic code as well.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:23 +02:00
Christoph Hellwig
a8ed1a0607 block: remove the -ERESTARTSYS handling in blkdev_get_by_dev
Now that md has been cleaned up we can get rid of this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-12 06:55:31 -06:00
Christoph Hellwig
d173b65aa7 block: initialize ret in bdev_disk_changed
Avoid a potentially initialized variabe in the invalidate case.

Fixes: d3c4a43d9291 ("block: refactor blk_drop_partitions")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20210408194140.1816537-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-12 06:44:24 -06:00
Jens Axboe
a1ff1e3f0e io_uring: provide io_resubmit_prep() stub for !CONFIG_BLOCK
Randy reports the following error on CONFIG_BLOCK not being set:

../fs/io_uring.c: In function ‘kiocb_done’:
../fs/io_uring.c:2766:7: error: implicit declaration of function ‘io_resubmit_prep’; did you mean ‘io_put_req’? [-Werror=implicit-function-declaration]
   if (io_resubmit_prep(req)) {

Provide a dummy stub for io_resubmit_prep() like we do for
io_rw_should_reissue(), which also helps remove an ifdef sequence from
io_complete_rw_iopoll() as well.

Fixes: 8c130827f417 ("io_uring: don't alter iopoll reissue fail ret code")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-12 06:40:02 -06:00
Pavel Begunkov
8d13326e56 io_uring: optimise fill_event() by inlining
There are three cases where we much care about performance of
io_cqring_fill_event() -- flushing inline completions, iopoll and
io_req_complete_post(). Inline a hot part of fill_event() into them.

All others are not as important and we don't want to bloat binary for
them, so add a noinline version of the function for all other use
use cases.

nops test(batch=32): 16.932 vs 17.822 KIOPS

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a11d59424bf4417aca33f5ec21008bb3b0ebd11e.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:41 -06:00
Pavel Begunkov
ff64216423 io_uring: always pass cflags into fill_event()
A simple preparation patch inlining io_cqring_fill_event(), which only
role was to pass cflags=0 into an actual fill event. It helps to keep
number of related helpers sane in following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/704f9c85b7d9843e4ad50a9f057200c58f5adc6e.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:41 -06:00
Pavel Begunkov
44c769de6f io_uring: optimise non-eventfd post-event
Eventfd is not the canonical way of using io_uring, annotate
io_should_trigger_evfd() with likely so it improves code generation for
non-eventfd branch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42fdaa51c68d39479f02cef4fe5bcb24624d60fa.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:41 -06:00
Pavel Begunkov
4af3417a34 io_uring: refactor compat_msghdr import
Add an entry for user pointer to compat_msghdr into io_connect, so it's
explicit that we may use it as this, and removes annoying casts.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/73fd644dea1518f528d3648981cf777ce6e537e9.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:41 -06:00
Pavel Begunkov
0bdf3398b0 io_uring: enable inline completion for more cases
Take advantage of delayed/inline completion flushing and pass right
issue flags for completion of open, open2, fadvise and poll remove
opcodes. All others either already use it or always punted and never
executed inline.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0badc7512e82f7350b73bb09abbebbecbdd5dab8.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:41 -06:00
Pavel Begunkov
a1fde923e3 io_uring: refactor io_close
A small refactoring shrinking it and making easier to read.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19b24eed7cd491a0243b50366dd2a23b558e2665.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:41 -06:00
Pavel Begunkov
3f48cf18f8 io_uring: unify files and task cancel
Now __io_uring_cancel() and __io_uring_files_cancel() are very similar
and mostly differ by how we count requests, merge them and allow
tctx_inflight() to handle counting.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1a5986a97df4dc1378f3fe0ca1eb483dbcf42112.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:41 -06:00
Pavel Begunkov
b303fe2e5a io_uring: track inflight requests through counter
Instead of keeping requests in a inflight_list, just track them with a
per tctx atomic counter. Apart from it being much easier and more
consistent with task cancel, it frees ->inflight_entry from being shared
between iopoll and cancel-track, so less headache for us.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3c2ee0863cd7eeefa605f3eaff4c1c461a6f1157.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:41 -06:00
Pavel Begunkov
368b208085 io_uring: unify task and files cancel loops
Move tracked inflight number check up the stack into
__io_uring_files_cancel() so it's similar to task cancel. Will be used
for further cleaning.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dca5a395efebd1e3e0f3bbc6b9640c5e8aa7e468.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:40 -06:00
Pavel Begunkov
0ea13b448e io_uring: simplify apoll hash removal
hash_del() works well with non-hashed nodes, there's no need to check
if it is hashed first.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:40 -06:00
Pavel Begunkov
e27414bef7 io_uring: refactor io_poll_complete()
Remove error parameter from io_poll_complete(), 0 is always passed,
and do a bit of cleaning on top.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:40 -06:00
Pavel Begunkov
f40b964a66 io_uring: clean up io_poll_task_func()
io_poll_complete() always fills an event (even an overflowed one), so we
always should do io_cqring_ev_posted() afterwards. And that's what is
currently happening, because second EPOLLONESHOT check is always true,
it can't return !done for oneshots.

Remove those branching, it's much easier to read.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:40 -06:00
Peter Zijlstra
e0051d7d18 io-wq: Fix io_wq_worker_affinity()
Do not include private headers and do not frob in internals.

On top of that, while the previous code restores the affinity, it
doesn't ensure the task actually moves there if it was running,
leading to the fun situation that it can be observed running outside
of its allowed mask for potentially significant time.

Use the proper API instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/YG7QkiUzlEbW85TU@hirez.programming.kicks-ass.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:40 -06:00
Jens Axboe
cb3b200e4f io_uring: don't attempt re-add of multishot poll request if racing
We currently allow racy updates to multishot requests, but we can end up
double adding the poll request if both completion and update does it.
Ensure that we skip re-add on the update side if someone else is
completing it.

Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests")
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:35 -06:00
Hao Xu
417b5052be io-wq: simplify code in __io_worker_busy()
Leverage XOR to simplify the code in __io_worker_busy.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1617678525-3129-1-git-send-email-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:35 -06:00
Pavel Begunkov
53a3126756 io_uring: kill outdated comment about splice punt
The splice/tee comment in io_prep_async_work() isn't relevant since the
section was moved, delete it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/892a549c89c3d422b679677b8e68ffd3fcb736b6.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:35 -06:00
Pavel Begunkov
a04b0ac0cb io_uring: encapsulate fixed files into struct
Add struct io_fixed_file representing a single registered file, first to
hide ugly struct file **, which may be misleading, and secondly to
retype it to unsigned long as conversions to it and back to file * for
handling and masking FFS_* flags are getting nasty.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/78669731a605a7614c577c3de552631cfaf0869a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:35 -06:00
Pavel Begunkov
846a4ef22b io_uring: refactor file tables alloc/free
Introduce a heler io_free_file_tables() doing all the cleaning, there
are several places where it's hand coded. Also move all allocations into
io_sqe_alloc_file_tables() and rename it, so all of it is in one place.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/502a84ebf41ff119b095e59661e678eacb752bf8.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:35 -06:00
Pavel Begunkov
f4f7d21ce4 io_uring: don't quiesce intial files register
There is no reason why we would want to fully quiesce ring on
IORING_REGISTER_FILES, if it's already registered we fail.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/563bb8060bb2d3efbc32fce6101678281c574d2a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:35 -06:00
Pavel Begunkov
9a321c9849 io_uring: set proper FFS* flags on reg file update
Set FFS_* flags (e.g. FFS_ASYNC_READ) not only in initial registration
but also on registered files update. Not a bug, but may miss getting
profit out of the feature.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/df29a841a2d3d3695b509cdffce5070777d9d942.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:35 -06:00
Pavel Begunkov
044118069a io_uring: deduplicate NOSIGNAL setting
Set MSG_NOSIGNAL and REQ_F_NOWAIT in send/recv prep routines and don't
duplicate it in all four send/recv handlers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1133a3ed1c0e192975b7341ea4b0bf91f63b132.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:35 -06:00
Pavel Begunkov
df9727affa io_uring: put link timeout req consistently
Don't put linked timeout req in io_async_find_and_cancel() but do it in
io_link_timeout_fn(), so we have only one point for that and won't have
to do it differently as it's now (put vs put_deferred). Btw, improve a
bit io_async_find_and_cancel()'s locking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d75b70957f245275ab7cba83e0ac9c1b86aae78a.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:34 -06:00
Pavel Begunkov
c4ea060e85 io_uring: simplify overflow handling
Overflowed CQEs doesn't lock requests anymore, so we don't care so much
about cancelling them, so kill cq_overflow_flushed and simplify the
code.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5799867aeba9e713c32f49aef78e5e1aef9fbc43.1617287883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:30:34 -06:00