6435 Commits

Author SHA1 Message Date
Trond Myklebust
0bae835b63 NFS: Avoid writeback threads getting stuck in mempool_alloc()
In a low memory situation, allow the NFS writeback code to fail without
getting stuck in infinite loops in mempool_alloc().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:56 -04:00
Trond Myklebust
515dcdcd48 NFS: nfsiod should not block forever in mempool_alloc()
The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.

Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:56 -04:00
Trond Myklebust
e47a62df29 NFS: Fix revalidation of empty readdir pages
If the page is empty, we need to check the array->last_cookie instead of
the first entry. Add a helper for the cases where we care.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:55 -04:00
Linus Torvalds
14705fda8f New features:
- NFSv3 support in NFSD is now always built
 - Added NFSD support for the NFSv4 birth-time file attribute
 - Added support for storing and displaying sockaddrs in trace points
 - NFSD now recognizes RPC_AUTH_TLS probes
 
 Performance improvements:
 - Optimized the svc transport enqueuing mechanism
 - Added micro-optimizations for the duplicate reply cache
 
 Notable bug fixes:
 - Allocation of the NFSD file cache hash table is more reliable
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmI3XNkACgkQM2qzM29m
 f5dNERAAqJ/nzfVp2H5BKLszJ7p/s13wFqbW719Rzymzl6/tUUHOqIsVdBsJsa/b
 BdZQLLDwa6ZB5zOAnC6FosRKYu4lwixOOf94pC6a9ZDD/glYVKF8mZG+RZXPAy16
 g3JUOi/bcyHXv0ZUhbv7DqW+HHM+owPP4vlNJ9ChiiLr/Xdp8NBKj+4Qtn/wcAo+
 Xuvx7fU/5Mbemh+dd5mWker4afHvpt9V6U6s04m5LiTPPnHVnxmeyekJGUCOY0QO
 cm/6SPNDqyn/VEfM/SRxEnLE9QcHRhZo/4PKRGF4wYolcviIogbILE1M7Ig/r/Gv
 6Du2kcRAhyZ3zgWnu799Ivn3Q6IrVjxZwqmsi7YHURTwYKyZtxYsUk0MCBcpnxrE
 WyTS2onpElbMop3viKCnQdpIetbsHnUNg3udUV6ugbdCbnZuhUw5B/d6x0o8ZWDE
 C0f+jnX+GnBstn0vkcj2H0+VQTd5hUJtXMrooI42ODJoboQRZcmePwoXjqCmw3sy
 PXTxLZC/5+4zNHGUuz4Pq4V7FKr4nHhDzaW4ZDO3mILx4ahceotulY1B/yoBUu8o
 /LAhu2kJ6nFQkmpzdrGzPeOstgJYHm9CaitRvMzg+NJxEAJdebypdQDbX5iNpgfW
 MDXH4n8eIqroTlQ/mQYEV0EbC7BaTqSCL6rQdcrcFUPu9n4Fcno=
 =5nac
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "New features:

   - NFSv3 support in NFSD is now always built

   - Added NFSD support for the NFSv4 birth-time file attribute

   - Added support for storing and displaying sockaddrs in trace points

   - NFSD now recognizes RPC_AUTH_TLS probes

  Performance improvements:

   - Optimized the svc transport enqueuing mechanism

   - Added micro-optimizations for the duplicate reply cache

  Notable bug fixes:

   - Allocation of the NFSD file cache hash table is more reliable"

* tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (30 commits)
  nfsd: fix using the correct variable for sizeof()
  nfsd: use correct format characters
  NFSD: prevent integer overflow on 32 bit systems
  NFSD: prevent underflow in nfssvc_decode_writeargs()
  fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
  NFSD: Fix nfsd_breaker_owns_lease() return values
  NFSD: Clean up _lm_ operation names
  arch: Remove references to CONFIG_NFSD_V3 in the default configs
  NFSD: Remove CONFIG_NFSD_V3
  nfsd: more robust allocation failure handling in nfsd_file_cache_init
  SUNRPC: Teach server to recognize RPC_AUTH_TLS
  NFSD: Move svc_serv_ops::svo_function into struct svc_serv
  NFSD: Remove svc_serv_ops::svo_module
  SUNRPC: Remove svc_shutdown_net()
  SUNRPC: Rename svc_close_xprt()
  SUNRPC: Rename svc_create_xprt()
  SUNRPC: Remove svo_shutdown method
  SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
  SUNRPC: Remove the .svo_enqueue_xprt method
  SUNRPC: Record endpoint information in trace log
  ...
2022-03-22 10:29:51 -07:00
Trond Myklebust
648a4548d6 NFS: Don't deadlock when cookie hashes collide
In the very rare case where the readdir reply contains multiple cookies
that map to the same hash value, we can end up deadlocking waiting for a
page lock that we already hold. In this case we should fail the page
lock by using grab_cache_page_nowait().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 09:14:39 -04:00
Linus Torvalds
616355cc81 for-5.18/block-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0+GcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprUpD/9aTJEnj7VCw7UouSsg098sdjtoy9ilslU3
 ew47K8CIXHbCB4CDqLnFyvCwAdG1XGgS+fUmFAxvTr29R9SZeS5d+bXL6sZzEo0C
 bwxsJy9MM2QRtMvB+giAt1myXbwB8cG+ketMBWXqwXXRHRzPbbQfMZia7FqWMnfY
 KQanH9IwYHp1oa5U/W6Qcjm4oCnLgBMRwqByzUCtiF3y9qgaLkK+3IgkNwjJQjLA
 DTeUJ/9CgxGQQbzA+LPktbw2xfTqiUfcKq0mWx6Zt4wwNXn1ClqUDUXX6QSM8/5u
 3OimbscSkEPPTIYZbVBPkhFnAlQb4JaJEgOrbXvYKVV2Dh+eZY81XwNeE/E8gdBY
 TnHOTOCjkN/4sR3hIrWazlJzPLdpPA0eOYrhguCraQsX9mcsYNxlJ9otRv/Ve99g
 uqL0RZg3+NoK84fm79FCGy/ZmPQJvJttlBT9CKVwylv/Lky42xWe7AdM3OipKluY
 2nh+zN5Ai7WxZdTKXQFRhCSWfWQ+1qW51tB3dcGW+BooZr/oox47qKQVcHsEWbq1
 RNR45F5a4AuPwYUHF/P36WviLnEuq9AvX7OTTyYOplyVQohKIoDXp9chVzLNzBiZ
 KBR00W6MLKKKN+8foalQWgNyb2i2PH7Ib4xRXvXj/22Vwxg5UmUoBmSDSas9SZUS
 +dMo7CtNgA==
 =DpgP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)

 - blk-rq-qos completion fix (Tejun)

 - blk-cgroup merge fix (Tejun)

 - Add offline error return value to distinguish it from an IO error on
   the device (Song)

 - IO stats fixes (Zhang, Christoph)

 - blkcg refcount fixes (Ming, Yu)

 - Fix for indefinite dispatch loop softlockup (Shin'ichiro)

 - blk-mq hardware queue management improvements (Ming)

 - sbitmap dead code removal (Ming, John)

 - Plugging merge improvements (me)

 - Show blk-crypto capabilities in sysfs (Eric)

 - Multiple delayed queue run improvement (David)

 - Block throttling fixes (Ming)

 - Start deprecating auto module loading based on dev_t (Christoph)

 - bio allocation improvements (Christoph, Chaitanya)

 - Get rid of bio_devname (Christoph)

 - bio clone improvements (Christoph)

 - Block plugging improvements (Christoph)

 - Get rid of genhd.h header (Christoph)

 - Ensure drivers use appropriate flush helpers (Christoph)

 - Refcounting improvements (Christoph)

 - Queue initialization and teardown improvements (Ming, Christoph)

 - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
   Lukas, Nian, Yang, Eric, Chengming)

* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
  block: cancel all throttled bios in del_gendisk()
  block: let blkcg_gq grab request queue's refcnt
  block: avoid use-after-free on throttle data
  block: limit request dispatch loop duration
  block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
  sr: simplify the local variable initialization in sr_block_open()
  block: don't merge across cgroup boundaries if blkcg is enabled
  block: fix rq-qos breakage from skipping rq_qos_done_bio()
  block: flush plug based on hardware and software queue order
  block: ensure plug merging checks the correct queue at least once
  block: move rq_qos_exit() into disk_release()
  block: do more work in elevator_exit
  block: move blk_exit_queue into disk_release
  block: move q_usage_counter release into blk_queue_release
  block: don't remove hctx debugfs dir from blk_mq_exit_queue
  block: move blkcg initialization/destroy into disk allocation/release handler
  sr: implement ->free_disk to simplify refcounting
  sd: implement ->free_disk to simplify refcounting
  sd: delay calling free_opal_dev
  sd: call sd_zbc_release_disk before releasing the scsi_device reference
  ...
2022-03-21 16:48:55 -07:00
Olga Kornievskaia
a43bf60444 NFSv4.1 provide mount option to toggle trunking discovery
Introduce a new mount option -- trunkdiscovery,notrunkdiscovery -- to
toggle whether or not the client will engage in actively discovery
of trunking locations.

v2 make notrunkdiscovery default

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: 1976b2b31462 ("NFSv4.1 query for fs_location attr on a new file system")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-21 10:36:49 -04:00
Jeffle Xu
e9b57aaae6 fscache: export fscache_end_operation()
Export fscache_end_operation() to avoid code duplication.

Besides, considering the paired fscache_begin_read_operation() is
already exported, it shall make sense to also export
fscache_end_operation().

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com

Link: https://lore.kernel.org/r/20220302125134.131039-2-jefflexu@linux.alibaba.com/ # Jeffle's v4
Link: https://lore.kernel.org/r/164622971432.3564931.12184135678781328146.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/164678190346.1200972.7453733431978569479.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/164692888334.2099075.5166283293894267365.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/20220316131723.111553-2-jefflexu@linux.alibaba.com/ # v5
2022-03-18 09:23:34 +00:00
Matthew Wilcox (Oracle)
187c82cb03 fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio
These filesystems use __set_page_dirty_nobuffers() either directly or
with a very thin wrapper; convert them en masse.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15 08:34:38 -04:00
Matthew Wilcox (Oracle)
15a30ab2b3 nfs: Convert from launder_page to launder_folio
We don't need to use page_file_mapping() here because launder_folio
is never called for swap cache pages.  We also don't need to
cast an loff_t in order to print it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15 08:23:30 -04:00
Matthew Wilcox (Oracle)
6d740c76ea nfs: Convert from invalidatepage to invalidate_folio
Print the folio index instead of the pointer, since this is more
useful.  We also don't need to use page_file_mapping() as we do not
invalidate swapcache pages.  Since this is the only caller of
nfs_wb_page_cancel(), convert it to nfs_wb_folio_cancel().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15 08:23:30 -04:00
NeilBrown
c265de257f NFS: swap-out must always use STABLE writes.
The commit handling code is not safe against memory-pressure deadlocks
when writing to swap.  In particular, nfs_commitdata_alloc() blocks
indefinitely waiting for memory, and this can consume all available
workqueue threads.

swap-out most likely uses STABLE writes anyway as COND_STABLE indicates
that a stable write should be used if the write fits in a single
request, and it normally does.  However if we ever swap with a small
wsize, or gather unusually large numbers of pages for a single write,
this might change.

For safety, make it explicit in the code that direct writes used for swap
must always use FLUSH_STABLE.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
NeilBrown
64158668ac NFS: swap IO handling is slightly different for O_DIRECT IO
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
   possible deadlocks with "fs_reclaim".  These deadlocks could, I believe,
   eventuate if a buffered read on the swapfile was attempted.

   We don't need coherence with the page cache for a swap file, and
   buffered writes are forbidden anyway.  There is no other need for
   i_rwsem during direct IO.  So never take it for swap_rw()

2/ generic_write_checks() explicitly forbids writes to swap, and
   performs checks that are not needed for swap.  So bypass it
   for swap_rw().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
NeilBrown
4dc73c6791 NFSv4: keep state manager thread active if swap is enabled
If we are swapping over NFSv4, we may not be able to allocate memory to
start the state-manager thread at the time when we need it.
So keep it always running when swap is enabled, and just signal it to
start.

This requires updating and testing the cl_swapper count on the root
rpc_clnt after following all ->cl_parent links.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
NeilBrown
8db55a032a SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC
rpc tasks can be marked as RPC_TASK_SWAPPER.  This causes GFP_MEMALLOC
to be used for some allocations.  This is needed in some cases, but not
in all where it is currently provided, and in some where it isn't
provided.

Currently *all* tasks associated with a rpc_client on which swap is
enabled get the flag and hence some GFP_MEMALLOC support.

GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it.
However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does
need it.

xdr_alloc_bvec is called while the XPRT_LOCK is held.  If this blocks,
then it blocks all other queued tasks.  So this allocation needs
GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used
for any swap writes.

Similarly, if the transport is not connected, that will block all
requests including swap writes, so memory allocations should get
GFP_MEMALLOC if swap writes are possible.

So with this patch:
 1/ we ONLY set RPC_TASK_SWAPPER for swap writes.
 2/ __rpc_execute() sets PF_MEMALLOC while handling any task
    with RPC_TASK_SWAPPER set, or when handling any task that
    holds the XPRT_LOCKED lock on an xprt used for swap.
    This removes the need for the RPC_IS_SWAPPER() test
    in ->buf_alloc handlers.
 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking
    any task to a swapper xprt.  __rpc_execute() will clear it.
 3/ PF_MEMALLOC is set for all the connect workers.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com> (for xprtrdma parts)
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
NeilBrown
89c2be8a95 NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS
NFS_RPC_SWAPFLAGS is only used for READ requests.
It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to
requests.  This is not needed for swap READ - though it is for writes
where it is set via a different mechanism.

RPC_TASK_ROOTCREDS causes the 'machine' credential to be used.
This is not needed as the root credential is saved when the swap file is
opened, and this is used for all IO.

So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of
RPC_TASK_ROOTCREDS, that isn't needed either.

Remove both.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
NeilBrown
944d95f766 NFS: remove IS_SWAPFILE hack
This code is pointless as IS_SWAPFILE is always defined.
So remove it.

Suggested-by: Mark Hemment <markhemm@googlemail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
Dave Wysochanski
b5fdf66f6e NFS: Remove remaining dfprintks related to fscache and remove NFSDBG_FSCACHE
The fscache cookie APIs including fscache_acquire_cookie() and
fscache_relinquish_cookie() now have very good tracing.  Thus,
there is no real need for dfprintks in the NFS fscache interface.

The NFS fscache interface has removed all dfprintks so remove the
NFSDBG_FSCACHE defines.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
Dave Wysochanski
e3f0a7fe69 NFS: Replace dfprintks with tracepoints in fscache read and write page functions
Most of fscache and other NFS IO paths are now using tracepoints.
Remove the dfprintks in the NFS fscache read/write page functions
and replace with tracepoints at the begin and end of the functions.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
Dave Wysochanski
fc1c5abfca NFS: Rename fscache read and write pages functions
Rename NFS fscache functions in a more consistent fashion
to better reflect when we read from and write to fscache.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
Dave Wysochanski
45f3a70ba6 NFS: Cleanup usage of nfs_inode in fscache interface
A number of places in the fscache interface used nfs_inode when inode could
be used, simplifying the code.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
Olga Kornievskaia
b4be2c598b NFSv4.1 restrict GETATTR fs_location query to the main transport
In the presence of trunking transports, it's helpful to make sure
that during the migration event, the GETATTR for fs_location attribute
happens on the main transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:34 -04:00
Alexey Khoroshilov
cb8fac6d27 NFS: remove unneeded check in decode_devicenotify_args()
[You don't often get email from khoroshilov@ispras.ru. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]

Overflow check in not needed anymore after we switch to kmalloc_array().

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Fixes: a4f743a6bb20 ("NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:34 -04:00
Trond Myklebust
612896ec5a NFS: Cache all entries in the readdirplus reply
Even if we're not able to cache all the entries in the readdir buffer,
let's ensure that we do prime the dcache.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
0adf85b445 NFS: Optimise away the previous cookie field
Replace the 'previous cookie' field in struct nfs_entry with the
array->last_cookie.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
b0365ccb07 NFS: Fix up forced readdirplus
Avoid clearing the entire readdir page cache if we're just doing forced
readdirplus for the 'ls -l' heuristic.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
f648022faa NFS: Convert readdir page cache to use a cookie based index
Instead of using a linear index to address the pages, use the cookie of
the first entry, since that is what we use to match the page anyway.

This allows us to avoid re-reading the entire cache on a seekdir() type
of operation. The latter is very common when re-exporting NFS, and is a
major performance drain.

The change does affect our duplicate cookie detection, since we can no
longer rely on the page index as a linear offset for detecting whether
we looped backwards. However since we no longer do a linear search
through all the pages on each call to nfs_readdir(), this is less of a
concern than it was previously.
The other downside is that invalidate_mapping_pages() no longer can use
the page index to avoid clearing pages that have been read. A subsequent
patch will restore the functionality this provides to the 'ls -l'
heuristic.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
9332cf14e2 NFS: Clean up page array initialisation/free
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
11d03d0a1e NFS: Trace effects of the readdirplus heuristic
Enable tracking of when the readdirplus heuristic causes a page cache
invalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
eace45a18c NFS: Trace effects of readdirplus on the dcache
Trace the effects of readdirplus on attribute and dentry revalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
310e318745 NFS: Add basic readdir tracing
Add tracing to track how often the client goes to the server for updated
readdir information.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
0b3cc71b5a NFS: Don't request readdirplus when revalidation was forced
If the revalidation was forced, due to the presence of a LOOKUP_EXCL or
a LOOKUP_REVAL flag, then readdirplus won't help. It also can't help
when we're doing a path component lookup.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
2c2c336506 NFS: Readdirplus can't help lookup for case insensitive filesystems
If the filesystem is case insensitive, then readdirplus can't help with
cache misses, since it won't return case folded variants of the filename.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
c49c68944f NFSv4: Ask for a full XDR buffer of readdir goodness
Instead of pretending that we know the ratio of directory info vs
readdirplus attribute info, just set the 'dircount' field to the same
value as the 'maxcount' field.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
ad1e109a41 NFS: Don't ask for readdirplus unless it can help nfs_getattr()
If attribute caching is turned off, then use of readdirplus is not going
to help stat() performance.
Readdirplus also doesn't help if a file is being written to, since we
will have to flush those writes in order to sync the mtime/ctime.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
230bc98f7a NFS: Improve heuristic for readdirplus
The heuristic for readdirplus is designed to try to detect 'ls -l' and
similar patterns. It does so by looking for cache hit/miss patterns in
both the attribute cache and in the dcache of the files in a given
directory, and then sets a flag for the readdirplus code to interpret.

The problem with this approach is that a single attribute or dcache miss
can cause the NFS code to force a refresh of the attributes for the
entire set of files contained in the directory.

To be able to make a more nuanced decision, let's sample the number of
hits and misses in the set of open directory descriptors. That allows us
to set thresholds at which we start preferring READDIRPLUS over regular
READDIR, or at which we start to force a re-read of the remaining
readdir cache using READDIRPLUS.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
9c3f4d988c NFS: Reduce use of uncached readdir
When reading a very large directory, we want to try to keep the page
cache up to date if doing so is inexpensive. With the change to allow
readdir to continue reading even when the cache is incomplete, we no
longer need to fall back to uncached readdir in order to scale to large
directories.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
9ff89c25d8 NFS: Simplify nfs_readdir_xdr_to_array()
Recent changes to readdir mean that we can cope with partially filled
page cache entries, so we no longer need to rely on looping in
nfs_readdir_xdr_to_array().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
6c34f05b75 NFS: If the cookie verifier changes, we must invalidate the page cache
Ensure that if the cookie verifier changes when we use the zero-valued
cookie, then we invalidate any cached pages.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
580f236737 NFS: Adjust the amount of readahead performed by NFS readdir
The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.

This patch therefore tries to tone down the amount of readahead we
perform, and adjust it to try to match the amount of data being
requested by user space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
c8f0523ba3 NFS: Don't advance the page pointer unless the page is full
When we hit the end of the data in the readdir page, we don't want to
start filling a new page, unless this one is full.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
728dd0ab37 NFS: Don't re-read the entire page cache to find the next cookie
If the page cache entry that was last read gets invalidated for some
reason, then make sure we can re-create it on the next call to readdir.
This, combined with the cache page validation, allows us to reuse the
cached value of page-index on successive calls to nfs_readdir.

Credit is due to Benjamin Coddington for showing that the concept works,
and that it allows for improved cache sharing between processes even in
the case where pages are lost due to LRU or active invalidation.

Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
d09e673f49 NFS: Store the change attribute in the directory page cache
Use the change attribute and the first cookie in a directory page cache
entry to validate that the page is up to date.

Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:33 -05:00
Chuck Lever
37902c6313 NFSD: Move svc_serv_ops::svo_function into struct svc_serv
Hoist svo_function back into svc_serv and remove struct
svc_serv_ops, since the struct is now devoid of fields.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
f49169c97f NFSD: Remove svc_serv_ops::svo_module
struct svc_serv_ops is about to be removed.

Neil Brown says:
> I suspect svo_module can go as well - I don't think the thread is
> ever the thing that primarily keeps a module active.

A random sample of kthread_create() callers shows sunrpc is the only
one that manages module reference count in this way.

Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
c7d7ec8f04 SUNRPC: Remove svc_shutdown_net()
Clean up: svc_shutdown_net() now does nothing but call
svc_close_net(). Replace all external call sites.

svc_close_net() is renamed to be the inverse of svc_xprt_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
352ad31448 SUNRPC: Rename svc_create_xprt()
Clean up: Use the "svc_xprt_<task>" function naming convention as
is used for other external APIs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
a9ff2e99e9 SUNRPC: Remove the .svo_enqueue_xprt method
We have never been able to track down and address the underlying
cause of the performance issues with workqueue-based service
support. svo_enqueue_xprt is called multiple times per RPC, so
it adds instruction path length, but always ends up at the same
function: svc_xprt_do_enqueue(). We do not anticipate needing
this flexibility for dynamic nfsd thread management support.

As a micro-optimization, remove .svo_enqueue_xprt because
Spectre/Meltdown makes virtual function calls more costly.

This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc:
turn enqueueing a svc_xprt into a svc_serv operation").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Trond Myklebust
0b2662b7e7 NFS: Calculate page offsets algorithmically
Instead of relying on counting the page offsets as we walk through the
page cache, switch to calculating them algorithmically.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28 10:11:33 -05:00
Trond Myklebust
281f31b2e5 NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28 10:11:32 -05:00