Commit Graph

94152 Commits

Author SHA1 Message Date
Linus Torvalds
ac34bb40f7 12 smb3 client fixes, and also an important netfs fix for cifs mtime write regression
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmb0mWUACgkQiiy9cAdy
 T1Fwbgv/Zoe5LZukUe4s87xO7IC73Wfn2UBUQmvDUtK1djRF3HrL1QOtXLnFfPb/
 pFJTPiNljM/NPcpXAk+7qz1XFihkOwGNJOFFuQPNrwcDX4LLF35sqoeRij1qRkXn
 06yLPQRBI2SQLehLqi/Avk4TEatber7uGZMXgOaLN54doiNY8kMYcsIgEQWoe15h
 muxCUoPopSokU5+s0H6ObDoXX10KS3ir/1ArmmZ8oh1be363ysye0bf6+mnVNr/P
 I5yiERdYrN+oo6ZzC0XjyYSp0SnCbu8jck2g5ydIKUyQ7gbiSE8XqCNVy6ALndxg
 URMlYtL+gVknmJk9NJcc8gVp79EZcdjUIbFSTQ1Pa8x++nQCBl9rge1AZ9G/zzY2
 Ul6xIVoP5DNgcwXvMka+lJgAsoRgB5olcEBMdltaCpKCLjWNjyzvOzb+kP2L30IC
 /nPZJbVQSrdr3ropybapAlHLG57Jk1ad1QdaBEiu5ss528mSmKc+t288zPQKIhU5
 Ogqr3CxB
 =nVf0
 -----END PGP SIGNATURE-----

Merge tag 'v6.12-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Most are from the recent SMB3.1.1 test event, and also an important
  netfs fix for a cifs mtime write regression

   - fix mode reported by stat of readonly directories and files

   - DFS (global namespace) related fixes

   - fixes for special file support via reparse points

   - mount improvement and reconnect fix

   - fix for noisy log message on umount

   - two netfs related fixes, one fixing a recent regression, and add
     new write tracepoint"

* tag 'v6.12-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  netfs, cifs: Fix mtime/ctime update for mmapped writes
  cifs: update internal version number
  smb: client: print failed session logoffs with FYI
  cifs: Fix reversion of the iter in cifs_readv_receive().
  smb3: fix incorrect mode displayed for read-only files
  smb: client: fix parsing of device numbers
  smb: client: set correct device number on nfs reparse points
  smb: client: propagate error from cifs_construct_tcon()
  smb: client: fix DFS failover in multiuser mounts
  cifs: Make the write_{enter,done,err} tracepoints display netfs info
  smb: client: fix DFS interlink failover
  smb: client: improve purging of cached referrals
  smb: client: avoid unnecessary reconnects when refreshing referrals
2024-09-26 09:20:19 -07:00
Linus Torvalds
0181f8c809 virtio: features, fixes, cleanups
Several new features here:
 
 	virtio-balloon supports new stats
 
 	vdpa supports setting mac address
 
 	vdpa/mlx5 suspend/resume as well as MKEY ops are now faster
 
 	virtio_fs supports new sysfs entries for queue info
 
 	virtio/vsock performance has been improved
 
 Fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmbz7ykPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpkk8H/A3vMRYXBzne9anezZLvADKS/CpX7v0DFEVj
 VfSMWXvYdUariYDyyb7pZsvK5QR22pE0pIaW6Kcgv9fNwq27M/H6g6NJk5ny8a7d
 216AQs1J28pXPPY+q03fhf3SzE3yHP8aeD9lyiO9QJYfs9vjtoyZeBGt3a4IUSX4
 ZeNBAx8xWTBcEDIIcZLdY1DNDTbZ4+qQ12Ln9IKq7D4xkE6l7Xh+HGdgTWTnDZ8P
 qEUUOmJTFKTQdOiVuU4NN3wzgHKWHdwKg0uWXo7ereYr3kYe3q//jCcLMv88a1x0
 XP7NRBQg/rsErwTMdLz6ffyqXJs6lGGqNXzRfZKEwAvmnh/+zs4=
 =gNBq
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "Several new features here:

   - virtio-balloon supports new stats

   - vdpa supports setting mac address

   - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster

   - virtio_fs supports new sysfs entries for queue info

   - virtio/vsock performance has been improved

  And fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
  vsock/virtio: avoid queuing packets when intermediate queue is empty
  vsock/virtio: refactor virtio_transport_send_pkt_work
  fw_cfg: Constify struct kobj_type
  vdpa/mlx5: Postpone MR deletion
  vdpa/mlx5: Introduce init/destroy for MR resources
  vdpa/mlx5: Rename mr_mtx -> lock
  vdpa/mlx5: Extract mr members in own resource struct
  vdpa/mlx5: Rename function
  vdpa/mlx5: Delete direct MKEYs in parallel
  vdpa/mlx5: Create direct MKEYs in parallel
  MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section
  virtio_fs: add sysfs entries for queue information
  virtio_fs: introduce virtio_fs_put_locked helper
  vdpa: Remove unused declarations
  vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command
  vdpa/mlx5: Small improvement for change_num_qps()
  vdpa/mlx5: Keep notifiers during suspend but ignore
  vdpa/mlx5: Parallelize device resume
  vdpa/mlx5: Parallelize device suspend
  vdpa/mlx5: Use async API for vq modify commands
  ...
2024-09-26 08:43:17 -07:00
Max Gurtovoy
87cbdc396a virtio_fs: add sysfs entries for queue information
Introduce sysfs entries to provide visibility to the multiple queues
used by the Virtio FS device. This enhancement allows users to query
information about these queues.

Specifically, add two sysfs entries:
1. Queue name: Provides the name of each queue (e.g. hiprio/requests.8).
2. CPU list: Shows the list of CPUs that can process requests for each
queue.

The CPU list feature is inspired by similar functionality in the block
MQ layer, which provides analogous sysfs entries for block devices.

These new sysfs entries will improve observability and aid in debugging
and performance tuning of Virtio FS devices.

Reviewed-by: Idan Zach <izach@nvidia.com>
Reviewed-by: Shai Malin <smalin@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20240825130716.9506-2-mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-25 07:07:43 -04:00
Max Gurtovoy
4045b64298 virtio_fs: introduce virtio_fs_put_locked helper
Introduce a new helper function virtio_fs_put_locked to encapsulate the
common pattern of releasing a virtio_fs reference while holding a lock.
The existing virtio_fs_put helper will be used to release a virtio_fs
reference while not holding a lock.

Also add an assertion in case the lock is not taken when it should.

Reviewed-by: Idan Zach <izach@nvidia.com>
Reviewed-by: Shai Malin <smalin@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20240825130716.9506-1-mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-09-25 07:07:43 -04:00
David Howells
665db14d07 netfs, cifs: Fix mtime/ctime update for mmapped writes
The cifs flag CIFS_INO_MODIFIED_ATTR, which indicates that the mtime and
ctime need to be written back on close, got taken over by netfs as
NETFS_ICTX_MODIFIED_ATTR to avoid the need to call a function pointer to
set it.

The flag gets set correctly on buffered writes, but doesn't get set by
netfs_page_mkwrite(), leading to occasional failures in generic/080 and
generic/215.

Fix this by setting the flag in netfs_page_mkwrite().

Fixes: 73425800ac ("netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409161629.98887b2-oliver.sang@intel.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:57:00 -05:00
Steve French
387676fabf cifs: update internal version number
To 2.51

Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:54:06 -05:00
Paulo Alcantara
6c7f1b994a smb: client: print failed session logoffs with FYI
Do not flood dmesg with failed session logoffs as kerberos tickets
getting expired or passwords being rotated is a very common scenario.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:54:03 -05:00
David Howells
307f77e7f5 cifs: Fix reversion of the iter in cifs_readv_receive().
cifs_read_iter_from_socket() copies the iterator that's passed in for the
socket to modify as and if it will, and then advances the original iterator
by the amount sent.  However, both callers revert the advancement (although
receive_encrypted_read() zeros beyond the iterator first).  The problem is,
though, that cifs_readv_receive() reverts by the original length, not the
amount transmitted which can cause an oops in iov_iter_revert().

Fix this by:

 (1) Remove the iov_iter_advance() from cifs_read_iter_from_socket().

 (2) Remove the iov_iter_revert() from both callers.  This fixes the bug in
     cifs_readv_receive().

 (3) In receive_encrypted_read(), if we didn't get back as much data as the
     buffer will hold, copy the iterator, advance the copy and use the copy
     to drive iov_iter_zero().

As a bonus, this gets rid of some unnecessary work.

This was triggered by generic/074 with the "-o sign" mount option.

Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:53:08 -05:00
Steve French
2f3017e7cc smb3: fix incorrect mode displayed for read-only files
Commands like "chmod 0444" mark a file readonly via the attribute flag
(when mapping of mode bits into the ACL are not set, or POSIX extensions
are not negotiated), but they were not reported correctly for stat of
directories (they were reported ok for files and for "ls").  See example
below:

    root:~# ls /mnt2 -l
    total 12
    drwxr-xr-x 2 root root         0 Sep 21 18:03 normaldir
    -rwxr-xr-x 1 root root         0 Sep 21 23:24 normalfile
    dr-xr-xr-x 2 root root         0 Sep 21 17:55 readonly-dir
    -r-xr-xr-x 1 root root 209716224 Sep 21 18:15 readonly-file
    root:~# stat -c %a /mnt2/readonly-dir
    755
    root:~# stat -c %a /mnt2/readonly-file
    555

This fixes the stat of directories when ATTR_READONLY is set
(in cases where the mode can not be obtained other ways).

    root:~# stat -c %a /mnt2/readonly-dir
    555

Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:48 -05:00
Paulo Alcantara
663f295e35 smb: client: fix parsing of device numbers
Report correct major and minor numbers from special files created with
NFS reparse points.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:48 -05:00
Paulo Alcantara
a9de67336a smb: client: set correct device number on nfs reparse points
Fix major and minor numbers set on special files created with NFS
reparse points.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:48 -05:00
Paulo Alcantara
4e3ba580f5 smb: client: propagate error from cifs_construct_tcon()
Propagate error from cifs_construct_tcon() in cifs_sb_tlink() instead of
always returning -EACCES.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:48 -05:00
Paulo Alcantara
0826b134c0 smb: client: fix DFS failover in multiuser mounts
For sessions and tcons created on behalf of new users accessing a
multiuser mount, matching their sessions in tcon_super_cb() with
master tcon will always lead to false as every new user will have its
own session and tcon.

All multiuser sessions, however, will inherit ->dfs_root_ses from
master tcon, so match it instead.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:48 -05:00
David Howells
85633c00ad cifs: Make the write_{enter,done,err} tracepoints display netfs info
Make the write RPC tracepoints use the same trace macro complexes as the
read tracepoints and display the netfs request and subrequest IDs where
available (see commit 519be98971 "cifs: Add a tracepoint to track credits
involved in R/W requests").

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:48 -05:00
Paulo Alcantara
4f42a8b54b smb: client: fix DFS interlink failover
The DFS interlinks point to different DFS namespaces so make sure to
use the correct DFS root server to chase any DFS links under it by
storing the SMB session in dfs_ref_walk structure and then using it on
every referral walk.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:48 -05:00
Paulo Alcantara
9190cc0c97 smb: client: improve purging of cached referrals
Purge cached referrals that have a single target when reaching maximum
of cache size as the client won't need them to failover.  Otherwise
remove oldest cache entry.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:48 -05:00
Paulo Alcantara
242d23efc9 smb: client: avoid unnecessary reconnects when refreshing referrals
Do not mark tcons for reconnect when current connection matches any of
the targets returned by new referral even when there is no cached
entry.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24 21:51:47 -05:00
Linus Torvalds
684a64bf32 NFS Client Updates for Linux 6.12
New Features:
   * Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
   * Add support for the LOCALIO protocol extention
 
 Bugfixes:
   * Fix memory leak in error path of nfs4_do_reclaim()
   * Simplify and guarantee lock owner uniqueness
   * Fix -Wformat-truncation warning
   * Fix folio refcounts by using folio_attach_private()
   * Fix failing the mount system call when the server is down
   * Fix detection of "Proxying of Times" server support
 
 Cleanups:
   * Annotate struct nfs_cache_array with __counted_by()
   * Remove unnecessary NULL checks before kfree()
   * Convert RPC_TASK_* constants to an enum
   * Remove obsolete or misleading comments and declerations
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmbzKDAACgkQ18tUv7Cl
 QOusjBAAxTSoVbHocl+9eYpvKnscPArgPXnfd6mB9rnQRgtnceTO2ei7cdiE2qhz
 dxQiyzlAXh3e7dGwoy02qEd6wTTqWeQ8ESdMpCAqSacBU4tu5owfzNSWunZvgYYj
 QOhjdmv8M1IfZnTstPlVrRNaZcDkhV1tzEtpZppkEqhTB0bHWqrcM4EdklTWT0Yc
 PGMpGbfuGsa4qZy2vWl7doERVEgK8mBeahLtYFD2W6phIvNWgD6IlKy66RaK2RfH
 nXmZoZbI2/ioi4TKvNyY8xoGMGvetLI1h8YNQYkEg060XCkisLZDOvoodUAylOTR
 2jHQLG5+/ejhpD/zgPghGZDSGNN1GyZaH09E/vtiS+3k9OXxFz6Rq68VnC6kpMA4
 TIUYsT8ejPzs2gW59iDFGB6cKI4XnRtxgmApW/Za0y9A72PSi+G/pbWAk7ThjTxf
 +HySsba4baA63opIgBSLVBrUsXZfdn/KTDTZ4nkPiq57BggGcZv7Y2ItOTXA+pB/
 5nigDKkhWsYVjMbkx6wmh+VO2gv4/Z8WqsmiDwFMpVqM0w8eycBOHjOumuuc6nmw
 y+2OKZqU2Npm2HI/R8lA7nB1m2QP5t7CRM2+xlZNuavHrfsMaqHNl8/9VgxlCATQ
 /Zo74hbhmCgQYxrTjL8XFQG9/8y0o3H5IcTEr/SgCVxHyDSan1I=
 =YjyC
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
   - Add support for the LOCALIO protocol extention

  Bugfixes:
   - Fix memory leak in error path of nfs4_do_reclaim()
   - Simplify and guarantee lock owner uniqueness
   - Fix -Wformat-truncation warning
   - Fix folio refcounts by using folio_attach_private()
   - Fix failing the mount system call when the server is down
   - Fix detection of "Proxying of Times" server support

  Cleanups:
   - Annotate struct nfs_cache_array with __counted_by()
   - Remove unnecessary NULL checks before kfree()
   - Convert RPC_TASK_* constants to an enum
   - Remove obsolete or misleading comments and declerations"

* tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits)
  nfs: Fix `make htmldocs` warnings in the localio documentation
  nfs: add "NFS Client and Server Interlock" section to localio.rst
  nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
  nfs: add Documentation/filesystems/nfs/localio.rst
  nfs: implement client support for NFS_LOCALIO_PROGRAM
  nfs/localio: use dedicated workqueues for filesystem read and write
  pnfs/flexfiles: enable localio support
  nfs: enable localio for non-pNFS IO
  nfs: add LOCALIO support
  nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit
  nfsd: implement server support for NFS_LOCALIO_PROGRAM
  nfsd: add LOCALIO support
  nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
  nfs_common: add NFS LOCALIO auxiliary protocol enablement
  SUNRPC: replace program list with program array
  SUNRPC: add svcauth_map_clnt_to_svc_cred_local
  SUNRPC: remove call_allocate() BUG_ONs
  nfsd: add nfsd_serv_try_get and nfsd_serv_put
  nfsd: add nfsd_file_acquire_local()
  nfsd: factor out __fh_verify to allow NULL rqstp to be passed
  ...
2024-09-24 15:44:18 -07:00
Linus Torvalds
f7fccaa772 fuse update for 6.12
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZvKlbgAKCRDh3BK/laaZ
 PLliAP9q5btlhlffnRg2LWCf4rIzbJ6vkORkc+GeyAXnWkIljQEA9En1K2vyg7Tk
 f9FvNQK9C+pS0GxURDRI7YedJ2f9FQ0=
 =wuY0
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Add support for idmapped fuse mounts (Alexander Mikhalitsyn)

 - Add optimization when checking for writeback (yangyun)

 - Add tracepoints (Josef Bacik)

 - Clean up writeback code (Joanne Koong)

 - Clean up request queuing (me)

 - Misc fixes

* tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (32 commits)
  fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set
  fuse: clear FR_PENDING if abort is detected when sending request
  fs/fuse: convert to use invalid_mnt_idmap
  fs/mnt_idmapping: introduce an invalid_mnt_idmap
  fs/fuse: introduce and use fuse_simple_idmap_request() helper
  fs/fuse: fix null-ptr-deref when checking SB_I_NOIDMAP flag
  fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN
  virtio_fs: allow idmapped mounts
  fuse: allow idmapped mounts
  fuse: warn if fuse_access is called when idmapped mounts are allowed
  fuse: handle idmappings properly in ->write_iter()
  fuse: support idmapped ->rename op
  fuse: support idmapped ->set_acl
  fuse: drop idmap argument from __fuse_get_acl
  fuse: support idmapped ->setattr op
  fuse: support idmapped ->permission inode op
  fuse: support idmapped getattr inode op
  fuse: support idmap for mkdir/mknod/symlink/create/tmpfile
  fuse: support idmapped FUSE_EXT_GROUPS
  fuse: add an idmap argument to fuse_simple_request
  ...
2024-09-24 15:29:42 -07:00
Linus Torvalds
4165cee7ec Description for this pull request:
- Clean-up unnecessary codes as ->valid_size is supported.
 - buffered-IO fallback is no longer needed when using direct-IO.
 - Move ->valid_size extension from mmap to ->page_mkwrite.
   This improves the overhead caused by unnecessary zero-out during mmap.
 - Fix memleaks from exfat_load_bitmap() and exfat_create_upcase_table().
 - Add sops->shutdown and ioctl.
 - Add Yuezhang Mo as a reviwer.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmbytEQWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCEqUD/sEerRjBeNi+ivTvYtxqQGaDCnj
 Re6gBUt138rF2qyVcX3dP0wMHVNEHzjtdJjZGuQXAKttkZ1qW1wGbz0kyIyFjRfZ
 MHPaaqAavDiDFqxZnJvB9xKsuU6mb0Kr0JC6mKet3KD+Q2VekePSX+3SvwRDcPNb
 4CroYvJtOOWy21FKvKc2LxZBrowTElCPIhiXbHgWRhJBVhi4edrDo0391enzkKwt
 Is0/RzMbAsQ08Ap+TH6YIlPtA9aVSiTDyal1YaIgpXjaVxqF3MpMfPFG6+XJ8GOw
 k9BXM5XH5YXPZXallG8Fkx5Hh6Nrf9Vuvt68KbLQuzL6MdDEb8vTPEycQFHpapLx
 hk5TrL23Ok2RU/AJJXUDxii+J+3YzuTgIL6sdgJbaYb1ZYebiMzjRkwUJpH3dqg+
 lx1QtYWsVRR8fTtBEle1yVbOPcuyUWUkMpKVIUseVL0EiQNpiwBSGKKuus3Cul4O
 KA6Kx8hYEguHAIBn5U52mzIl9Ye+j+QyRmcmA/qnObk/1h+5FKn+HgnMINex0qmz
 PXzI+cLta6TZKtb8+KnTNImRXCDtcvtG9wkF25M3vmzBMiLfTnEZsXKwF+fPiydw
 +N19vX6HVT8JpIOGhbsRQp7abLR2IhYCeZQCWdT09Ol0VUsXx87+CfsLQpM3xw4U
 79nicqiwHjVP98Wjyg==
 =vVfO
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Clean-up unnecessary codes as ->valid_size is supported

 - buffered-IO fallback is no longer needed when using direct-IO

 - Move ->valid_size extension from mmap to ->page_mkwrite. This
   improves the overhead caused by unnecessary zero-out during mmap.

 - Fix memleaks from exfat_load_bitmap() and exfat_create_upcase_table()

 - Add sops->shutdown and ioctl

 - Add Yuezhang Mo as a reviwer

* tag 'exfat-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  MAINTAINERS: exfat: add myself as reviewer
  exfat: resolve memory leak from exfat_create_upcase_table()
  exfat: move extend valid_size into ->page_mkwrite()
  exfat: fix memory leak in exfat_load_bitmap()
  exfat: Implement sops->shutdown and ioctl
  exfat: do not fallback to buffered write
  exfat: drop ->i_size_ondisk
2024-09-24 15:26:04 -07:00
Linus Torvalds
79952bdcbc f2fs-6.12-rc1
In this series, the main changes include 1) converting major IO paths to use
 folio, and 2) adding various knobs to control GC more flexibly for Zoned
 devices. In addition, there are several patches to address corner cases of
 atomic file operations and better support for file pinning on zoned device.
 
 Enhancement:
  - add knobs to tune foreground/background GCs for Zoned devices
  - convert IO paths to use folio
  - reduce expensive checkpoint trigger frequency
  - allow F2FS_IPU_NOCACHE for pinned file
  - forcibly migrate to secure space for zoned device file pinning
  - get rid of buffer_head use
  - add write priority option based on zone UFS
  - get rid of online repair on corrupted directory
 
 Bug fix:
  - fix to don't panic system for no free segment fault injection
  - fix to don't set SB_RDONLY in f2fs_handle_critical_error()
  - avoid unused block when dio write in LFS mode
  - compress: don't redirty sparse cluster during {,de}compress
  - check discard support for conventional zones
  - atomic: prevent atomic file from being dirtied before commit
  - atomic: fix to check atomic_file in f2fs ioctl interfaces
  - atomic: fix to forbid dio in atomic_file
  - atomic: fix to truncate pagecache before on-disk metadata truncation
  - atomic: create COW inode from parent dentry
  - atomic: fix to avoid racing w/ GC
  - atomic: require FMODE_WRITE for atomic write ioctls
  - fix to wait page writeback before setting gcing flag
  - fix to avoid racing in between read and OPU dio write, dio completion
  - fix several potential integer overflows in file offsets and dir_block_index
  - fix to avoid use-after-free in f2fs_stop_gc_thread()
 
 As usual, there are several code clean-ups and refactorings.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmbyJn8ACgkQQBSofoJI
 UNJz9Q/+LDDJjD6xh0Fs6H2NeltFNbuNmS79kN5oG0xfjIAiKXE1lsw2n2gwrDKv
 EHKUPa2D4Rztckp8EFF6/st2SXVXH5U7YY2z5jkIUFccbeod+CrK9AGHjJe54iXL
 D0ulbgE2jR8uuwAkNEooNJK1a5ZhZLVy+fXknNIgKoqx31YYE+mKOJaaJFbCxvNT
 grZdH9ApweJB8L4A4ebwIWyBy8Bh4lhr2d6ngsx6HA5TFA2Ay0V9kaoZrLPZvJhv
 3qJ+xu3oeGJbP4e5h5g9omafBskI1pfEE6/sY94o1Zy5Ahx3iCR6U/qehtyyU3TF
 5QLoMXTvIz0MkRuBaW1XxVDpFevVzUfYmbLycuxjArBtjHnvsdh12DKT1Pk5BDZ4
 GgkUyt4pK4PYyEZFtayCleLZljSRzKzi+Y9XEs82z01s41mvx71kz44bR8SPcb1Q
 D4VOJld4O4qMmNrZhhwW8sj4UiDVgliURwmpiZwz9zT9fXU/ZPD1gThcfSWJZ/53
 rrx87e1Bnyk/cMuN/gxEdVV20nggxng4hl2oDcUzBBV1G1R9I3RZJWQt/YFXpB0O
 Whv5pJkV8BZXFWoRmm9cpWe0MslRRhsKBPzcKmlowy/lYdgjpQTmh7TSJ1Teh+2Y
 r77XI31Y/ACaKDJsRmUVbtqdM3N/88N97Fa52wOByK0PjMbgM0E=
 =EKzY
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "The main changes include converting major IO paths to use folio, and
  adding various knobs to control GC more flexibly for Zoned devices.

  In addition, there are several patches to address corner cases of
  atomic file operations and better support for file pinning on zoned
  device.

  Enhancement:
   - add knobs to tune foreground/background GCs for Zoned devices
   - convert IO paths to use folio
   - reduce expensive checkpoint trigger frequency
   - allow F2FS_IPU_NOCACHE for pinned file
   - forcibly migrate to secure space for zoned device file pinning
   - get rid of buffer_head use
   - add write priority option based on zone UFS
   - get rid of online repair on corrupted directory

  Bug fixes:
   - fix to don't panic system for no free segment fault injection
   - fix to don't set SB_RDONLY in f2fs_handle_critical_error()
   - avoid unused block when dio write in LFS mode
   - compress: don't redirty sparse cluster during {,de}compress
   - check discard support for conventional zones
   - atomic: prevent atomic file from being dirtied before commit
   - atomic: fix to check atomic_file in f2fs ioctl interfaces
   - atomic: fix to forbid dio in atomic_file
   - atomic: fix to truncate pagecache before on-disk metadata truncation
   - atomic: create COW inode from parent dentry
   - atomic: fix to avoid racing w/ GC
   - atomic: require FMODE_WRITE for atomic write ioctls
   - fix to wait page writeback before setting gcing flag
   - fix to avoid racing in between read and OPU dio write, dio completion
   - fix several potential integer overflows in file offsets and dir_block_index
   - fix to avoid use-after-free in f2fs_stop_gc_thread()

  As usual, there are several code clean-ups and refactorings"

* tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
  f2fs: allow F2FS_IPU_NOCACHE for pinned file
  f2fs: forcibly migrate to secure space for zoned device file pinning
  f2fs: remove unused parameters
  f2fs: fix to don't panic system for no free segment fault injection
  f2fs: fix to don't set SB_RDONLY in f2fs_handle_critical_error()
  f2fs: add valid block ratio not to do excessive GC for one time GC
  f2fs: create gc_no_zoned_gc_percent and gc_boost_zoned_gc_percent
  f2fs: do FG_GC when GC boosting is required for zoned devices
  f2fs: increase BG GC migration window granularity when boosted for zoned devices
  f2fs: add reserved_segments sysfs node
  f2fs: introduce migration_window_granularity
  f2fs: make BG GC more aggressive for zoned devices
  f2fs: avoid unused block when dio write in LFS mode
  f2fs: fix to check atomic_file in f2fs ioctl interfaces
  f2fs: get rid of online repaire on corrupted directory
  f2fs: prevent atomic file from being dirtied before commit
  f2fs: get rid of page->index
  f2fs: convert read_node_page() to use folio
  f2fs: convert __write_node_page() to use folio
  f2fs: convert f2fs_write_data_page() to use folio
  ...
2024-09-24 15:12:38 -07:00
Linus Torvalds
172d513936 Summary
* Bug fix: Avoid evaluating non-mount ctl_tables as a sysctl_mount_point by
   removing the unlikely (but possible) chance that the permanently empty
   ctl_table array shares its address with another ctl_table.
 * Update Joel Granados' contact info in MAINTAINERS.
 
 Testing
 
 * Bug fix merged to linux-next after 6.11-rc5
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmbtUfsACgkQupfNUreW
 QU+ptAv+JEbJR+VMLDZk3xAm1iRvqyzpVank8pJGgdp3kSYPY1KdJqAHo+ZXAD1r
 jtJMwyI4ELIQ7NnLq50qUCPScdBpXpG73QVp8Foip43x0E/pOxiSiz5C0m4NbRsU
 cKQuiRauOpfqrNvh22TVB3jjpL4cZbRpyP1kpMgf8edn3YlhYhJ04oXjVUk+zMMA
 muUifAlUAUMhQiHOqLynA7ZObwlqY+QoiJF8v4IPAcynYk0ZKNBmqAAMdIoBF4c8
 rrgtIt/xlaJB6a1usS9B5xFbamrlaNPaiA3ul+lUeMLcArtoB5gwMTz+AGLu46aB
 +RCjhXmY3LBvMD16eyY9cge/3Qud2jyoyh0Qp9wHhgJwsaDWlG51qzi+PHr8ZtV5
 jdtk8QzHZs07JFsE2HabppCtrgNxnwPiwBS7xm45u/p8dZ7buCvtZm1MEjgHu9M/
 6iVSEs+/S3+AZMz+/K3WaqaP/kYhXklwT16xN7b1+oxLNFfJ2RLcx0Xc7yZpGwRM
 8YbaGnR2
 =85Xa
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl update from Joel Granados:

 - Avoid evaluating non-mount ctl_tables as a sysctl_mount_point by
   removing the unlikely (but possible) chance that the permanently
   empty ctl_table array shares its address with another ctl_table

 - Update Joel Granados' contact info in MAINTAINERS

* tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  MAINTAINERS: update email for Joel Granados
  sysctl: avoid spurious permanent empty tables
2024-09-24 11:08:40 -07:00
yangyun
2f3d8ff457 fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set
This may be a typo. The comment has said shared locks are
not allowed when this bit is set. If using shared lock, the
wait in `fuse_file_cached_io_open` may be forever.

Fixes: 205c1d8026 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP")
CC: stable@vger.kernel.org # v6.9
Signed-off-by: yangyun <yangyun50@huawei.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-09-24 13:21:33 +02:00
Miklos Szeredi
fcd2d9e1fd fuse: clear FR_PENDING if abort is detected when sending request
The (!fiq->connected) check was moved into the queuing method resulting in
the following:

Fixes: 5de8acb41c ("fuse: cleanup request queuing towards virtiofs")
Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/all/ZvFEAM6JfrBKsOU0@ly-workstation/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-09-24 10:56:09 +02:00
Mike Snitzer
56bcd0f07f nfs: implement client support for NFS_LOCALIO_PROGRAM
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common for subsequent lookup and verification
by the NFS server.  If matched, the NFS server populates members in the
nfs_uuid_t struct.  The NFS client then transfers these nfs_uuid_t
struct member pointers to the nfs_client struct and cleans up the
nfs_uuid_t struct.  See: fs/nfs/localio.c:nfs_local_probe()

This protocol isn't part of an IETF standard, nor does it need to be
considering it is Linux-to-Linux auxiliary RPC protocol that amounts
to an implementation detail.

Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
AUTH_SYS) is used (enforced by fs/nfs/localio.c:nfs_local_probe()).

The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes).  The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.

Having a nonce (single-use uuid) is better than using the same uuid
for the life of the server, and sending it proactively by client
rather than reactively by the server is also safer.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Trond Myklebust
b9f5dd57f4 nfs/localio: use dedicated workqueues for filesystem read and write
For localio access, don't call filesystem read() and write() routines
directly.  This solves two problems:

1) localio writes need to use a normal (non-memreclaim) unbound
   workqueue.  This avoids imposing new requirements on how underlying
   filesystems process frontend IO, which would cause a large amount
   of work to update all filesystems.  Without this change, when XFS
   starts getting low on space, XFS flushes work on a non-memreclaim
   work queue, which causes a priority inversion problem:

00573 workqueue: WQ_MEM_RECLAIM writeback:wb_workfn is flushing !WQ_MEM_RECLAIM xfs-sync/vdc:xfs_flush_inodes_worker
00573 WARNING: CPU: 6 PID: 8525 at kernel/workqueue.c:3706 check_flush_dependency+0x2a4/0x328
00573 Modules linked in:
00573 CPU: 6 PID: 8525 Comm: kworker/u71:5 Not tainted 6.10.0-rc3-ktest-00032-g2b0a133403ab #18502
00573 Hardware name: linux,dummy-virt (DT)
00573 Workqueue: writeback wb_workfn (flush-0:33)
00573 pstate: 400010c5 (nZcv daIF -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00573 pc : check_flush_dependency+0x2a4/0x328
00573 lr : check_flush_dependency+0x2a4/0x328
00573 sp : ffff0000c5f06bb0
00573 x29: ffff0000c5f06bb0 x28: ffff0000c998a908 x27: 1fffe00019331521
00573 x26: ffff0000d0620900 x25: ffff0000c5f06ca0 x24: ffff8000828848c0
00573 x23: 1fffe00018be0d8e x22: ffff0000c1210000 x21: ffff0000c75fde00
00573 x20: ffff800080bfd258 x19: ffff0000cad63400 x18: ffff0000cd3a4810
00573 x17: 0000000000000000 x16: 0000000000000000 x15: ffff800080508d98
00573 x14: 0000000000000000 x13: 204d49414c434552 x12: 1fffe0001b6eeab2
00573 x11: ffff60001b6eeab2 x10: dfff800000000000 x9 : ffff60001b6eeab3
00573 x8 : 0000000000000001 x7 : 00009fffe491154e x6 : ffff0000db775593
00573 x5 : ffff0000db775590 x4 : ffff0000db775590 x3 : 0000000000000000
00573 x2 : 0000000000000027 x1 : ffff600018be0d62 x0 : dfff800000000000
00573 Call trace:
00573  check_flush_dependency+0x2a4/0x328
00573  __flush_work+0x184/0x5c8
00573  flush_work+0x18/0x28
00573  xfs_flush_inodes+0x68/0x88
00573  xfs_file_buffered_write+0x128/0x6f0
00573  xfs_file_write_iter+0x358/0x448
00573  nfs_local_doio+0x854/0x1568
00573  nfs_initiate_pgio+0x214/0x418
00573  nfs_generic_pg_pgios+0x304/0x480
00573  nfs_pageio_doio+0xe8/0x240
00573  nfs_pageio_complete+0x160/0x480
00573  nfs_writepages+0x300/0x4f0
00573  do_writepages+0x12c/0x4a0
00573  __writeback_single_inode+0xd4/0xa68
00573  writeback_sb_inodes+0x470/0xcb0
00573  __writeback_inodes_wb+0xb0/0x1d0
00573  wb_writeback+0x594/0x808
00573  wb_workfn+0x5e8/0x9e0
00573  process_scheduled_works+0x53c/0xd90
00573  worker_thread+0x370/0x8c8
00573  kthread+0x258/0x2e8
00573  ret_from_fork+0x10/0x20

2) Some filesystem writeback routines can end up taking up a lot of
   stack space (particularly XFS).  Instead of risking running over
   due to the extra overhead from the NFS stack, we should just call
   these routines from a workqueue job.  Since we need to do this to
   address 1) above we're able to avoid possibly blowing the stack
   "for free".

Use of dedicated workqueues improves performance over using the
system_unbound_wq.

Also, the creds used to open the file are used to override_creds() in
both nfs_local_call_read() and nfs_local_call_write() -- otherwise the
workqueue could have elevated capabilities (which the caller may not).

Lastly, care is taken to set PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO in
nfs_do_local_write() to avoid writeback deadlocks.

The PF_LOCAL_THROTTLE flag prevents deadlocks in balance_dirty_pages()
by causing writes to only be throttled against other writes to the
same bdi (it keeps the throttling local).  Normally all writes to
bdi(s) are throttled equally (after throughput factors are allowed
for).

The PF_MEMALLOC_NOIO flag prevents the lower filesystem IO from
causing memory reclaim to re-enter filesystems or IO devices and so
prevents deadlocks from occuring where IO that cleans pages is
waiting on IO to complete.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de> # eliminated wait_for_completion
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Trond Myklebust
d488b9d01f pnfs/flexfiles: enable localio support
If the DS is local to this client use localio to write the data.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Trond Myklebust
fa88a7d6ae nfs: enable localio for non-pNFS IO
Try a local open of the file being written to, and if it succeeds,
then use localio to issue IO.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Weston Andros Adamson
70ba381e1a nfs: add LOCALIO support
Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.

nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a Linux-only LOCALIO auxiliary RPC protocol.

This has dynamic binding with the nfsd module (via nfs_localio module
which is part of nfs_common). LOCALIO will only work if nfsd is
already loaded.

The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use LOCALIO support.

CONFIG_NFS_LOCALIO enables NFS client support for LOCALIO.

Lastly, LOCALIO uses an nfsd_file to initiate all IO. To make proper
use of nfsd_file (and nfsd's filecache) its lifetime (duration before
nfsd_file_put is called) must extend until after commit, read and
write operations. So rather than immediately drop the nfsd_file
reference in nfs_local_open_fh(), that doesn't happen until
nfs_local_pgio_release() for read/write and not until
nfs_local_release_commit_data() for commit. The same applies to the
reference held on nfsd's nn->nfsd_serv. Both objects' lifetimes and
associated references are managed through calls to
nfs_to->nfsd_file_put_local().

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de> # nfs_open_local_fh
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Mike Snitzer
df24c483e2 nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit
The nfsd_file will be passed, in future commits, by callers
that enable LOCALIO support (for both regular NFS and pNFS IO).

[Derived from patch authored by Weston Andros Adamson, but switched
 from passing struct file to struct nfsd_file]

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Mike Snitzer
946af9b3a0 nfsd: implement server support for NFS_LOCALIO_PROGRAM
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common.  The server expects this protocol to use
the same transport as NFS and NFSACL for its RPCs.  This protocol
isn't part of an IETF standard, nor does it need to be considering it
is Linux-to-Linux auxiliary RPC protocol that amounts to an
implementation detail.

The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes).  The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.

The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
Linux Kernel Organization       400122  nfslocalio

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[neilb: factored out and simplified single localio protocol]
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Weston Andros Adamson
fa4983862e nfsd: add LOCALIO support
Add server support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when both the client and server are
running on the same host.

If nfsd_open_local_fh() fails then the NFS client will both retry and
fallback to normal network-based read, write and commit operations if
localio is no longer supported.

Care is taken to ensure the same NFS security mechanisms are used
(authentication, etc) regardless of whether localio or regular NFS
access is used.  The auth_domain established as part of the traditional
NFS client access to the NFS server is also used for localio.  Store
auth_domain for localio in nfsd_uuid_t and transfer it to the client
if it is local to the server.

Relative to containers, localio gives the client access to the network
namespace the server has.  This is required to allow the client to
access the server's per-namespace nfsd_net struct.

This commit also introduces the use of NFSD's percpu_ref to interlock
nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
not destroyed while in use by nfsd_open_local_fh and other LOCALIO
client code.

CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Mike Snitzer
a61e147e6b nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
The next commit will introduce nfsd_open_local_fh() which returns an
nfsd_file structure.  This commit exposes LOCALIO's required NFSD
symbols to the NFS client:

- Make nfsd_open_local_fh() symbol and other required NFSD symbols
  available to NFS in a global 'nfs_to' nfsd_localio_operations
  struct (global access suggested by Trond, nfsd_localio_operations
  suggested by NeilBrown).  The next commit will also introduce
  nfsd_localio_ops_init() that init_nfsd() will call to initialize
  'nfs_to'.

- Introduce nfsd_file_file() that provides access to nfsd_file's
  backing file.  Keeps nfsd_file structure opaque to NFS client (as
  suggested by Jeff Layton).

- Introduce nfsd_file_put_local() that will put the reference to the
  nfsd_file's associated nn->nfsd_serv and then put the reference to
  the nfsd_file (as suggested by NeilBrown).

Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
Suggested-by: NeilBrown <neilb@suse.de> # nfsd_localio_operations
Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Mike Snitzer
2a33a85be4 nfs_common: add NFS LOCALIO auxiliary protocol enablement
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS
client to generate a nonce (single-use UUID) and associated nfs_uuid_t
struct, register it with nfs_common for subsequent lookup and
verification by the NFS server and if matched the NFS server populates
members in the nfs_uuid_t struct.

nfs_common's nfs_uuids list is the basis for localio enablement, as
such it has members that point to nfsd memory for direct use by the
client (e.g. 'net' is the server's network namespace, through it the
client can access nn->nfsd_serv).

This commit also provides the base nfs_uuid_t interfaces to allow
proper net namespace refcounting for the LOCALIO use case.

CONFIG_NFS_LOCALIO controls the nfs_common, NFS server and NFS client
enablement for LOCALIO. If both NFS_FS=m and NFSD=m then
NFS_COMMON_LOCALIO_SUPPORT=m and nfs_localio.ko is built (and provides
nfs_common's LOCALIO support).

  # lsmod | grep nfs_localio
  nfs_localio            12288  2 nfsd,nfs
  sunrpc                745472  35 nfs_localio,nfsd,auth_rpcgss,lockd,nfsv3,nfs

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
NeilBrown
86ab08beb3 SUNRPC: replace program list with program array
A service created with svc_create_pooled() can be given a linked list of
programs and all of these will be served.

Using a linked list makes it cumbersome when there are several programs
that can be optionally selected with CONFIG settings.

After this patch is applied, API consumers must use only
svc_create_pooled() when creating an RPC service that listens for more
than one RPC program.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Mike Snitzer
47e988147f nfsd: add nfsd_serv_try_get and nfsd_serv_put
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.

A percpu_ref is used to implement the interlock between
nfsd_destroy_serv and any caller of nfsd_serv_try_get.

This interlock is needed to properly wait for the completion of client
initiated localio calls to nfsd (that are _not_ in the context of nfsd).

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
NeilBrown
c63f0e48fe nfsd: add nfsd_file_acquire_local()
nfsd_file_acquire_local() can be used to look up a file by filehandle
without having a struct svc_rqst.  This can be used by NFS LOCALIO to
allow the NFS client to bypass the NFS protocol to directly access a
file provided by the NFS server which is running in the same kernel.

In nfsd_file_do_acquire() care is taken to always use fh_verify() if
rqstp is not NULL (as is the case for non-LOCALIO callers).  Otherwise
the non-LOCALIO callers will not supply the correct and required
arguments to __fh_verify (e.g. gssclient isn't passed).

Introduce fh_verify_local() wrapper around __fh_verify to make it
clear that LOCALIO is intended caller.

Also, use GC for nfsd_file returned by nfsd_file_acquire_local.  GC
offers performance improvements if/when a file is reopened before
launderette cleans it from the filecache's LRU.

Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC
Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
NeilBrown
5e66d2d92a nfsd: factor out __fh_verify to allow NULL rqstp to be passed
__fh_verify() offers an interface like fh_verify() but doesn't require
a struct svc_rqst *, instead it also takes the specific parts as
explicit required arguments.  So it is safe to call __fh_verify() with
a NULL rqstp, but the net, cred, and client args must not be NULL.

__fh_verify() does not use SVC_NET(), nor does the functions it calls.

Rather than using rqstp->rq_client pass the client and gssclient
explicitly to __fh_verify and then to nfsd_set_fh_dentry().

Lastly, it should be noted that the previous commit prepared for 4
associated tracepoints to only be used if rqstp is not NULL (this is a
stop-gap that should be properly fixed so localio also benefits from
the utility these tracepoints provide when debugging fh_verify
issues).

Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Chuck Lever
71c61a0077 NFSD: Short-circuit fh_verify tracepoints for LOCALIO
LOCALIO will be able to call fh_verify() with a NULL rqstp. In this
case, the existing trace points need to be skipped because they
want to dereference the address fields in the passed-in rqstp.

Temporarily make these trace points conditional to avoid a seg
fault in this case. Putting the "rqstp != NULL" check in the trace
points themselves makes the check more efficient.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Chuck Lever
7c0b07b49b NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
Currently, fh_verify() makes some daring assumptions about which
version of file handle the caller wants, based on the things it can
find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
case sometimes has no svc_rqst context, so this logic won't work in
that case.

Instead, examine the passed-in file handle. It's .max_size field
should carry information to allow nfsd_set_fh_dentry() to initialize
the file handle appropriately.

The file handle used by lockd and the one created by write_filehandle
never need any of the version-specific fields (which affect things
like write and getattr requests and pre/post attributes).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
NeilBrown
b0d87dbd8b NFSD: Refactor nfsd_setuser_and_check_port()
There are several places where __fh_verify unconditionally dereferences
rqstp to check that the connection is suitably secure.  They look at
rqstp->rq_xprt which is not meaningful in the target use case of
"localio" NFS in which the client talks directly to the local server.

Prepare these to always succeed when rqstp is NULL.

Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
NeilBrown
0a183f24a7 NFSD: Handle @rqstp == NULL in check_nfsd_access()
LOCALIO-initiated open operations are not running in an nfsd thread
and thus do not have an associated svc_rqst context.

Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:29 -04:00
Mike Snitzer
1545e488b1 nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
Eliminates duplicate functions in various files to allow for
additional callers.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:29 -04:00
Mike Snitzer
1fcb16674e nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno
Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be
used by fs/nfs/localio.c

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:29 -04:00
Mike Snitzer
4806ded4c1 nfs_common: factor out nfs_errtbl and nfs_stat_to_errno
Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and
fs/nfs/nfs3xdr.c

Will also be used by fs/nfsd/localio.c

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:29 -04:00
Dan Aloni
dfb07e990a nfs: add 'noalignwrite' option for lock-less 'lost writes' prevention
There are some applications that write to predefined non-overlapping
file offsets from multiple clients and therefore don't need to rely on
file locking. However, if these applications want non-aligned offsets
and sizes they need to either use locks or risk data corruption, as the
NFS client defaults to extending writes to whole pages.

This commit adds a new mount option `noalignwrite`, which allows to turn
that off and avoid the need of locking, as long as these applications
don't overlap on offsets.

Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Li Lingfeng
6d26c5e4d8 nfs: fix the comment of nfs_get_root
The comment for nfs_get_root() needs to be updated as it would also be
used by NFS4 as follows:
@x[
    nfs_get_root+1
    nfs_get_tree_common+1819
    nfs_get_tree+2594
    vfs_get_tree+73
    fc_mount+23
    do_nfs4_mount+498
    nfs4_try_get_tree+134
    nfs_get_tree+2562
    vfs_get_tree+73
    path_mount+2776
    do_mount+226
    __se_sys_mount+343
    __x64_sys_mount+106
    do_syscall_64+69
    entry_SYSCALL_64_after_hwframe+97
, mount.nfs4]: 1

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Roi Azarzar
615e693b14 NFSv4.2: Fix detection of "Proxying of Times" server support
According to draft-ietf-nfsv4-delstid-07:
   If a server informs the client via the fattr4_open_arguments
   attribute that it supports
   OPEN_ARGS_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS and it returns a valid
   delegation stateid for an OPEN operation which sets the
   OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS flag, then it MUST query the
   client via a CB_GETATTR for the fattr4_time_deleg_access (see
   Section 5.2) attribute and fattr4_time_deleg_modify attribute (see
   Section 5.2).

Thus, we should look that the server supports proxying of times via
OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS.

We want to be extra pedantic and continue to check that FATTR4_TIME_DELEG_ACCESS
and FATTR4_TIME_DELEG_MODIFY are set. The server needs to expose both for the
client to correctly detect "Proxying of Times" support.

Signed-off-by: Roi Azarzar <roi.azarzar@vastdata.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: dcb3c20f74 ("NFSv4: Add a capability for delegated attributes")
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Trond Myklebust
af94dca79b NFSv4: Fail mounts if the lease setup times out
If the server is down when the client is trying to mount, so that the
calls to exchange_id or create_session fail, then we should allow the
mount system call to fail rather than hang and block other mount/umount
calls.

Reported-by: Oleksandr Tymoshenko <ovt@google.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Zhaoyang Huang
03e02b9417 fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private
This patch is inspired by a code review of fs codes which aims at
folio's extra refcnt that could introduce unwanted behavious when
judging refcnt, such as[1].That is, the folio passed to
mapping_evict_folio carries the refcnts from find_lock_entries,
page_cache, corresponding to PTEs and folio's private if has. However,
current code doesn't take the refcnt for folio's private which could
have mapping_evict_folio miss the one to only PTE and lead to
call filemap_release_folio wrongly.

[1]
long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
{
...
//current code will misjudge here if there is one pte on the folio which
is be deemed as the one as folio's private
        if (folio_ref_count(folio) >
                        folio_nr_pages(folio) + folio_has_private(folio) + 1)
                return 0;
        if (!filemap_release_folio(folio, 0))
                return 0;

        return remove_mapping(mapping, folio);
}

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00