added back in 2015 for the sake of vfs_clone_file_range(),
which is in linux/fs.h these days
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The action force umount(umount -f) will attempt to kill all rpc_task even
umount operation may ultimately fail if some files remain open.
Consequently, if an action attempts to open a file, it can potentially
send two rpc_task to nfs server.
NFS CLIENT
thread1 thread2
open("file")
...
nfs4_do_open
_nfs4_do_open
_nfs4_open_and_get_state
_nfs4_proc_open
nfs4_run_open_task
/* rpc_task1 */
rpc_run_task
rpc_wait_for_completion_task
umount -f
nfs_umount_begin
rpc_killall_tasks
rpc_signal_task
rpc_task1 been wakeup
and return -512
_nfs4_do_open // while loop
...
nfs4_run_open_task
/* rpc_task2 */
rpc_run_task
rpc_wait_for_completion_task
While processing an open request, nfsd will first attempt to find or
allocate an nfs4_openowner. If it finds an nfs4_openowner that is not
marked as NFS4_OO_CONFIRMED, this nfs4_openowner will released. Since
two rpc_task can attempt to open the same file simultaneously from the
client to server, and because two instances of nfsd can run
concurrently, this situation can lead to lots of memory leak.
Additionally, when we echo 0 to /proc/fs/nfsd/threads, warning will be
triggered.
NFS SERVER
nfsd1 nfsd2 echo 0 > /proc/fs/nfsd/threads
nfsd4_open
nfsd4_process_open1
find_or_alloc_open_stateowner
// alloc oo1, stateid1
nfsd4_open
nfsd4_process_open1
find_or_alloc_open_stateowner
// find oo1, without NFS4_OO_CONFIRMED
release_openowner
unhash_openowner_locked
list_del_init(&oo->oo_perclient)
// cannot find this oo
// from client, LEAK!!!
alloc_stateowner // alloc oo2
nfsd4_process_open2
init_open_stateid
// associate oo1
// with stateid1, stateid1 LEAK!!!
nfs4_get_vfs_file
// alloc nfsd_file1 and nfsd_file_mark1
// all LEAK!!!
nfsd4_process_open2
...
write_threads
...
nfsd_destroy_serv
nfsd_shutdown_net
nfs4_state_shutdown_net
nfs4_state_destroy_net
destroy_client
__destroy_client
// won't find oo1!!!
nfsd_shutdown_generic
nfsd_file_cache_shutdown
kmem_cache_destroy
for nfsd_file_slab
and nfsd_file_mark_slab
// bark since nfsd_file1
// and nfsd_file_mark1
// still alive
=======================================================================
BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on
__kmem_cache_shutdown()
-----------------------------------------------------------------------
Slab 0xffd4000004438a80 objects=34 used=1 fp=0xff11000110e2ad28
flags=0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
CPU: 4 UID: 0 PID: 757 Comm: sh Not tainted 6.12.0-rc6+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
slab_err+0xb0/0xf0
__kmem_cache_shutdown+0x15c/0x310
kmem_cache_destroy+0x66/0x160
nfsd_file_cache_shutdown+0xac/0x210 [nfsd]
nfsd_destroy_serv+0x251/0x2a0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1ae/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Disabling lock debugging due to kernel taint
Object 0xff11000110e2ac38 @offset=3128
Allocated in nfsd_file_do_acquire+0x20f/0xa30 [nfsd] age=1635 cpu=3
pid=800
nfsd_file_do_acquire+0x20f/0xa30 [nfsd]
nfsd_file_acquire_opened+0x5f/0x90 [nfsd]
nfs4_get_vfs_file+0x4c9/0x570 [nfsd]
nfsd4_process_open2+0x713/0x1070 [nfsd]
nfsd4_open+0x74b/0x8b0 [nfsd]
nfsd4_proc_compound+0x70b/0xc20 [nfsd]
nfsd_dispatch+0x1b4/0x3a0 [nfsd]
svc_process_common+0x5b8/0xc50 [sunrpc]
svc_process+0x2ab/0x3b0 [sunrpc]
svc_handle_xprt+0x681/0xa20 [sunrpc]
nfsd+0x183/0x220 [nfsd]
kthread+0x199/0x1e0
ret_from_fork+0x31/0x60
ret_from_fork_asm+0x1a/0x30
Add nfs4_openowner_unhashed to help found unhashed nfs4_openowner, and
break nfsd4_open process to fix this problem.
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Keep async copy state alive for a few lease cycles after the copy
completes so that OFFLOAD_STATUS returns something meaningful.
This means that NFSD's client shutdown processing needs to purge
any of this state that happens to be waiting to die.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 7862 Section 4.8 states:
> A copy offload stateid will be valid until either (A) the client
> or server restarts or (B) the client returns the resource by
> issuing an OFFLOAD_CANCEL operation or the client replies to a
> CB_OFFLOAD operation.
Instead of releasing async copy state when the CB_OFFLOAD callback
completes, now let it live until the next laundromat run after the
callback completes.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently __destroy_client() consults the nfs4_client's async_copies
list to determine whether there are ongoing async COPY operations.
However, NFSD now keeps copy state in that list even when the
async copy has completed, to enable OFFLOAD_STATUS to find the
COPY results for a while after the COPY has completed.
DESTROY_CLIENTID should not be blocked if the client's async_copies
list contains state for only completed copy operations.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 7862 permits callback services to respond to CB_OFFLOAD with
NFS4ERR_DELAY. Currently NFSD drops the CB_OFFLOAD in that case.
To improve the reliability of COPY offload, NFSD should rather send
another CB_OFFLOAD completion notification.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 7862 Section 4.8 states:
> A copy offload stateid will be valid until either (A) the client
> or server restarts or (B) the client returns the resource by
> issuing an OFFLOAD_CANCEL operation or the client replies to a
> CB_OFFLOAD operation.
Currently, NFSD purges the metadata for an async COPY operation as
soon as the CB_OFFLOAD callback has been sent. It does not wait even
for the client's CB_OFFLOAD response, as the paragraph above
suggests that it should.
This makes the OFFLOAD_STATUS operation ineffective during the
window between the completion of an asynchronous COPY and the
server's receipt of the corresponding CB_OFFLOAD response. This is
important if, for example, the client responds with NFS4ERR_DELAY,
or the transport is lost before the server receives the response. A
client might use OFFLOAD_STATUS to query the server about the still
pending asynchronous COPY, but NFSD will respond to OFFLOAD_STATUS
as if it had never heard of the presented copy stateid.
This patch starts to address this issue by extending the lifetime of
struct nfsd4_copy at least until the server has seen the client's
CB_OFFLOAD response, or the CB_OFFLOAD has timed out.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd4_shutdown_copy() is just this:
while ((copy = nfsd4_get_copy(clp)) != NULL)
nfsd4_stop_copy(copy);
nfsd4_get_copy() bumps @copy's reference count, preventing
nfsd4_stop_copy() from releasing @copy.
A while loop like this usually works by removing the first element
of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy()
alters the async_copies list.
Best I can tell, then, is that nfsd4_shutdown_copy() continues to
loop until other threads manage to remove all the items from this
list. The spinning loop blocks shutdown until these items are gone.
Possibly the reason we haven't seen this issue in the field is
because client_has_state() prevents __destroy_client() from calling
nfsd4_shutdown_copy() if there are any items on this list. In a
subsequent patch I plan to remove that restriction.
Fixes: e0639dc580 ("NFSD introduce async copy feature")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
While this holds the flags from the CREATE_SESSION request, nothing
ever consults them. The only flag used is NFS4_SESSION_DEAD. Make it a
simple bool instead.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This field is written and is never consulted again. Remove it.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
refcount_t, by design, checks for unwanted situations and provides
warnings. It is rarely useful to have explicit warnings with refcount
usage.
In this case we have an explicit warning if a refcount_t reaches zero
when decremented. Simply using refcount_dec() will provide a similar
warning and also mark the refcount_t as saturated to avoid any possible
use-after-free.
This patch drops the warning and uses refcount_dec() instead of
refcount_dec_and_test().
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is
memory allocation failure. Putting a hard limit on the number of
clients is not really helpful as it will either happen too early and
prevent clients that the server can easily handle, or too late and
allow clients when the server is swamped.
The calculated limit is still useful for expiring courtesy clients where
there are "too many" clients, but it shouldn't prevent the creation of
active clients.
Testing of lots of clients against small-mem servers reports repeated
NFS4ERR_DELAY responses which doesn't seem helpful. There may have been
reports of similar problems in production use.
Also remove an outdated comment - we do use a slab cache.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There's issue as follows:
RPC: Registered rdma transport module.
RPC: Registered rdma backchannel transport module.
RPC: Unregistered rdma transport module.
RPC: Unregistered rdma backchannel transport module.
BUG: unable to handle page fault for address: fffffbfff80c609a
PGD 123fee067 P4D 123fee067 PUD 123fea067 PMD 10c624067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
RIP: 0010:percpu_counter_destroy_many+0xf7/0x2a0
Call Trace:
<TASK>
__die+0x1f/0x70
page_fault_oops+0x2cd/0x860
spurious_kernel_fault+0x36/0x450
do_kern_addr_fault+0xca/0x100
exc_page_fault+0x128/0x150
asm_exc_page_fault+0x26/0x30
percpu_counter_destroy_many+0xf7/0x2a0
mmdrop+0x209/0x350
finish_task_switch.isra.0+0x481/0x840
schedule_tail+0xe/0xd0
ret_from_fork+0x23/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
If register_sysctl() return NULL, then svc_rdma_proc_cleanup() will not
destroy the percpu counters which init in svc_rdma_proc_init().
If CONFIG_HOTPLUG_CPU is enabled, residual nodes may be in the
'percpu_counters' list. The above issue may occur once the module is
removed. If the CONFIG_HOTPLUG_CPU configuration is not enabled, memory
leakage occurs.
To solve above issue just destroy all percpu counters when
register_sysctl() return NULL.
Fixes: 1e7e557316 ("svcrdma: Restore read and write stats")
Fixes: 22df5a2246 ("svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter")
Fixes: df971cd853 ("svcrdma: Convert rdma_stat_recv to a per-CPU counter")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Refactor: Translating an on-the-wire value to a local host errno is
architecturally a job for the proc function, not the XDR decoder.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In particular, client-side source code needs the definition of
"struct rpc_procinfo" and does not want header files that pull
in "struct svc_rqst". Otherwise, the source does not compile.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Obviously, "nfs_ok" is defined only for NFS protocols. Other XDR
protocols won't know "nfs_ok" from Adam.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This tracepoint was a "note to self" and is not operational. It is
added only to client-side code, which so far we haven't needed. It
will cause immediate breakage once we start generating client code,
though, so remove it now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The last reference for `cache_head` can be reduced to zero in `c_show`
and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently,
`svc_export_put` and `expkey_put` will be invoked, leading to two
issues:
1. The `svc_export_put` will directly free ex_uuid. However,
`e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can
trigger a use-after-free issue, shown below.
==================================================================
BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd]
Read of size 1 at addr ff11000010fdc120 by task cat/870
CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
print_address_description.constprop.0+0x2c/0x3a0
print_report+0xb9/0x280
kasan_report+0xae/0xe0
svc_export_show+0x362/0x430 [nfsd]
c_show+0x161/0x390 [sunrpc]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
proc_reg_read+0xe1/0x140
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Allocated by task 830:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc_node_track_caller_noprof+0x1bc/0x400
kmemdup_noprof+0x22/0x50
svc_export_parse+0x8a9/0xb80 [nfsd]
cache_do_downcall+0x71/0xa0 [sunrpc]
cache_write_procfs+0x8e/0xd0 [sunrpc]
proc_reg_write+0xe1/0x140
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 868:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x37/0x50
kfree+0xf3/0x3e0
svc_export_put+0x87/0xb0 [nfsd]
cache_purge+0x17f/0x1f0 [sunrpc]
nfsd_destroy_serv+0x226/0x2d0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`.
However, `svc_export_put`/`expkey_put` will call path_put, which
subsequently triggers a sleeping operation due to the following
`dput`.
=============================
WARNING: suspicious RCU usage
5.10.0-dirty #141 Not tainted
-----------------------------
...
Call Trace:
dump_stack+0x9a/0xd0
___might_sleep+0x231/0x240
dput+0x39/0x600
path_put+0x1b/0x30
svc_export_put+0x17/0x80
e_show+0x1c9/0x200
seq_read_iter+0x63f/0x7c0
seq_read+0x226/0x2d0
vfs_read+0x113/0x2c0
ksys_read+0xc9/0x170
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x67/0xd1
Fix these issues by using `rcu_work` to help release
`svc_expkey`/`svc_export`. This approach allows for an asynchronous
context to invoke `path_put` and also facilitates the freeing of
`uuid/exp/key` after an RCU grace period.
Fixes: 9ceddd9da1 ("knfsd: Allow lockless lookups of the exports")
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The function `c_show` was called with protection from RCU. This only
ensures that `cp` will not be freed. Therefore, the reference count for
`cp` can drop to zero, which will trigger a refcount use-after-free
warning when `cache_get` is called. To resolve this issue, use
`cache_get_rcu` to ensure that `cp` remains active.
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 822 at lib/refcount.c:25
refcount_warn_saturate+0xb1/0x120
CPU: 7 UID: 0 PID: 822 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb1/0x120
Call Trace:
<TASK>
c_show+0x2fc/0x380 [sunrpc]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
proc_reg_read+0xe1/0x140
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The function `e_show` was called with protection from RCU. This only
ensures that `exp` will not be freed. Therefore, the reference count for
`exp` can drop to zero, which will trigger a refcount use-after-free
warning when `exp_get` is called. To resolve this issue, use
`cache_get_rcu` to ensure that `exp` remains active.
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 819 at lib/refcount.c:25
refcount_warn_saturate+0xb1/0x120
CPU: 3 UID: 0 PID: 819 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb1/0x120
...
Call Trace:
<TASK>
e_show+0x20b/0x230 [nfsd]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: bf18f163e8 ("NFSD: Using exp_get for export getting")
Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Since commit 75c7940d2a ("lockd: set missing fl_flags field when
retrieving args"), nlmsvc_retrieve_args() initializes the flc_flags
field. svcxdr_decode_lock() no longer needs to do this.
This clean up removes one dependency on the nlm_lock:fl field. No
behavior change is expected.
Analysis:
svcxdr_decode_lock() is called by:
nlm4svc_decode_testargs()
nlm4svc_decode_lockargs()
nlm4svc_decode_cancargs()
nlm4svc_decode_unlockargs()
nlm4svc_decode_testargs() is used by:
- NLMPROC4_TEST and NLMPROC4_TEST_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_GRANTED and NLMPROC4_GRANTED_MSG, which don't pass the
lock's file_lock to the generic lock API
nlm4svc_decode_lockargs() is used by:
- NLMPROC4_LOCK and NLM4PROC4_LOCK_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_UNLOCK and NLM4PROC4_UNLOCK_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_NM_LOCK, which calls nlmsvc_retrieve_args()
nlm4svc_decode_cancargs() is used by:
- NLMPROC4_CANCEL and NLMPROC4_CANCEL_MSG, which call nlmsvc_retrieve_args()
nlm4svc_decode_unlockargs() is used by:
- NLMPROC4_UNLOCK and NLMPROC4_UNLOCK_MSG, which call nlmsvc_retrieve_args()
All callers except GRANTED/GRANTED_MSG eventually call
nlmsvc_retrieve_args() before using nlm_lock::fl.c.flc_flags. Thus
this change is safe.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The nlm_cookie parameter has been unused since commit 09802fd2a8
("lockd: rip out deferred lock handling from testlock codepath").
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Since commit 103cc1fafe ("SUNRPC: Parametrize how much of argsize
should be zeroed") (and possibly long before that, even) all of the
memory underlying rqstp->rq_argp is zeroed already. There's no need
for the memset() in nlm4svc_decode_shareargs().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: Looks like the last usage of this typedef was removed by
commit 026fec7e7c ("sunrpc: properly type pc_decode callbacks") in
2017.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It's only current caller already length-checks the string, but let's
be safe.
Fixes: 0964a3d3f1 ("[PATCH] knfsd: nfsd4 reboot dirname fix")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up. The computed value of @p is saved each time through the
loop but is never used.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up. The result of "*p++" is saved, but is not used before it
is overwritten. The result of xdr_encode_opaque() is saved each
time through the loop but is never used.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
@ses is initialized to NULL. If __nfsd4_find_backchannel() finds no
available backchannel session, setup_callback_client() will try to
dereference @ses and segfault.
Fixes: dcbeaa68db ("nfsd4: allow backchannel recovery")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
fh_size is an unsigned int, thus it can never be less than 0.
Fixes: d8b26071e6 ("NFSD: simplify struct nfsfh")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up. AFAICT, there is no way to reach the out_free_conn label
with @old set to a non-NULL value, so the expire_client(old) call
is never reached and can be removed.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NFSD_MAY_LOCK means a few different things.
- it means that GSS is not required.
- it means that with NFSEXP_NOAUTHNLM, authentication is not required
- it means that OWNER_OVERRIDE is allowed.
None of these are specific to locking, they are specific to the NLM
protocol.
So:
- rename to NFSD_MAY_NLM
- set NFSD_MAY_OWNER_OVERRIDE and NFSD_MAY_BYPASS_GSS in nlm_fopen()
so that NFSD_MAY_NLM doesn't need to imply these.
- move the test on NFSEXP_NOAUTHNLM out of nfsd_permission() and
into fh_verify where other special-case tests on the MAY flags
happen. nfsd_permission() can be called from other places than
fh_verify(), but none of these will have NFSD_MAY_NLM.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NFSv4 LOCK operations should not avoid the set of authorization
checks that apply to all other NFSv4 operations. Also, the
"no_auth_nlm" export option should apply only to NLM LOCK requests.
It's not necessary or sensible to apply it to NFSv4 LOCK operations.
Instead, set no permission bits when calling fh_verify(). Subsequent
stateid processing handles authorization checks.
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Since SLOB was removed and since
commit 6c6c47b063 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.
The changes were made using Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
For convenience, copy the XDR extraction script from RFC
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT do not bypass
only GSS, but bypass any method. This is a problem specially for NFS3
AUTH_NULL-only exports.
The purpose of NFSD_MAY_BYPASS_GSS_ON_ROOT is described in RFC 2623,
section 2.3.2, to allow mounting NFS2/3 GSS-only export without
authentication. So few procedures which do not expose security risk used
during mount time can be called also with AUTH_NONE or AUTH_SYS, to allow
client mount operation to finish successfully.
The problem with current implementation is that for AUTH_NULL-only exports,
the NFSD_MAY_BYPASS_GSS_ON_ROOT is active also for NFS3 AUTH_UNIX mount
attempts which confuse NFS3 clients, and make them think that AUTH_UNIX is
enabled and is working. Linux NFS3 client never switches from AUTH_UNIX to
AUTH_NONE on active mount, which makes the mount inaccessible.
Fix the NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT implementation
and really allow to bypass only exports which have enabled some real
authentication (GSS, TLS, or any other).
The result would be: For AUTH_NULL-only export if client attempts to do
mount with AUTH_UNIX flavor then it will receive access errors, which
instruct client that AUTH_UNIX flavor is not usable and will either try
other auth flavor (AUTH_NULL if enabled) or fails mount procedure.
Similarly if client attempt to do mount with AUTH_NULL flavor and only
AUTH_UNIX flavor is enabled then the client will receive access error.
This should fix problems with AUTH_NULL-only or AUTH_UNIX-only exports if
client attempts to mount it with other auth flavor (e.g. with AUTH_NULL for
AUTH_UNIX-only export, or with AUTH_UNIX for AUTH_NULL-only export).
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NFSv4.1 OP_EXCHANGE_ID response from server may contain server
implementation details (domain, name and build time) in optional
nfs_impl_id4 field. Currently nfsd does not fill this field.
Send these information in NFSv4.1 OP_EXCHANGE_ID response. Fill them with
the same values as what is Linux NFSv4.1 client doing. Domain is hardcoded
to "kernel.org", name is composed in the same way as "uname -srvm" output
and build time is hardcoded to zeros.
NFSv4.1 client and server implementation fields are useful for statistic
purposes or for identifying type of clients and servers.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NLMv2 is completely different protocol than NLMv1 and NLMv3, and in
original Sun implementation is used for RPC loopback callbacks from statd
to lockd services. Linux does not use nor does not implement NLMv2.
Hence, NLMv3 is not backward compatible with NLMv2. But NLMv3 is backward
compatible with NLMv1. Fix comment.
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Turn nfsd_compound_encode_err tracepoint into a class and add a new
nfsd_compound_op_err tracepoint.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently we pass back the size and whether it has been modified, but
those just mirror values tracked inside the delegation. In a later
patch, we'll need to get at the timestamps in the delegation too, so
just pass back a reference to the write delegation, and use that to
properly override values in the iattr.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We already have a slot for this in the kstat structure. Just overwrite
that instead of keeping a copy.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This is always the same value, and in a later patch we're going to need
to set bits in WORD2. We can simplify this code and save a little space
in the delegation too. Just hardcode the bitmap in the callback encode
function.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The inode that nfs4_open_delegation() passes to this function is
wrong, which throws off the result. The inode will end up getting a
directory-style change attr instead of a regular-file-style one.
Fix up nfs4_delegation_stat() to fetch STATX_MODE, and then drop the
inode parameter from nfsd4_change_attribute(), since it's no longer
needed.
Fixes: c5967721e1 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Introduce logic in the code generators to emit maxsize (XDR
width) definitions. In C, these are pre-processor macros.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Not yet complete.
The tool doesn't do any math yet. Thus, even though the maximum XDR
width of a union is the width of the union enumerator plus the width
of its largest arm, we're using the sum of all the elements of the
union for the moment.
This means that buffer size requirements are overestimated, and that
the generated maxsize macro cannot yet be used for determining data
element alignment in the XDR buffer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The XDR width of a pointer type is the sum of the widths of each of
the struct's fields, except for the last field. The width of the
implicit boolean "value follows" field is added as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>